[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]

Re: how to kill object servers



>    Remember: that's a guarantee we DON'T make,
>    and don't intend to make until long after first product, if ever.
>    (We're a library, not a bank.)  We only guarantee that the database
>    is consistent, not that it doesn't revert to the state a couple seconds
>    before the CPU fried.
> 
> The guarantee we've talked about making (in addition to normal
> serialization consistency) is as follows: We provide a FeBe request
> (with a meaning similar to fsync) which guaratees that on normal crash
> recovery the backend will be in a consistent state no earlier than
> when the fsync request was made.  However, any consistent state after
> an fsync is allowed.

That is NOT a guarantee we make.  We guarantee that NONE of the changes
requested AFTER the "fsync" will be done unless ALL the changes requested
BEFORE the "fsync" have been done.  This is NOT violated by restoration
to ancient states, and it is the same guarantee that we make for the
"consistent" ticks.

If the "fsync" actually waits until things hit the disk, it does not
strengthen the guarantee.  It just means that the loss of the transaction
will be less probable, because the transaction will survive those crashes
that leave the disk partition usable.  But failure of a single medium still
causes the transaction to revert.

Also: we have given the user no additional confidence unless we don't
send the response until after the data hits the disk, and no additional
security unless we cause the data to be written immediately (or earlier
than we otherwise would have written it).

> Note for our concurrent future: the guarantee is
> made as of the time of reception of the fsync request, not the time of
> response, but the guarantee isn't in force until the response is sent.

That doesn't make sense to me.  How can we be said to have made a "guarantee"
when the request arrives, before we do anything to "write the policy"?  How
can we "make" a "guarantee" that isn't "in force"?  (Are you saying that the
request's position in the request stream is a declaration of what is to be
preserved?)

>    (I suspect that only special-purpose object servers make that guarantee.
>     It's very costly.)
> 
> Is the [omitted] scenario consistent with the costs involved, or is this
> even more expensive then I know?

Much more expensive, and I haven't dug into the literature to examine JUST
how much more.

> Unlike NFS, we're certainly not
> contemplating a commit-to-disk on every FeBe request.  (which is the
> substantial performance cost NFS pays for being stateless)

It's worse than NFS.  You must complete writes to TWO disks, or a disk-and-
a-tape, before acknowleging the request is complete.  The transactions that
precede the commit request and its acknowlegement must survive even a total
medium failure.

	michael