[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]
My Rough Notes re Udanax-Green
- To: "David G. Durand" <david@xxxxxxxxxxxxxxxxxxx>
- Subject: My Rough Notes re Udanax-Green
- From: Jeff Rush <jrush@xxxxxxxxxx>
- Date: Wed, 25 Oct 2000 11:45:26 -0500
- Cc: udanax@xxxxxxxxxx
- References: <email@example.com>
- Sender: jrush@xxxxxxxxxx
(I'm working up some diagrams in Gimp to go with this)
In the Beginning was the Docuverse:
The docuverse can be thought of as a infinite number line
with all the information it contains or ever will contain
strung out along the line. Since the number line is
infinitely divisible, we can insert new material at any
point, similar to how real numbers work on a number line.
We can also refer to a portion of the information, whether
it exists today or will someday, using a (position, width)
Arbitrarily we limit the docuverse to occupying the range
of numbers from 1.<0> to 1.<infinity>, which allows us
the convenience of referencing the entire docuverse with
a span of (position 1, width 1) while not limiting
capacity in any real way.
Tumbling thru the Docuverse:
To represent positions along this docuverse line, we use
'tumblers', which are essentially base infinity real numbers,
where carries to/borrows from other digits don't occur.
Consider that if we used based 10, we could only have ten
positions, with ten positions inside each of those, etc.
So if we had, say, eight file cabinets, each with seven
drawers, each with five dividers, things would work out.
But as we store more information, the day comes with we
need more dividers than ten, or more drawers than ten.
Because of our many links between documents, we really
don't want to renumber everything, so we switch to
base 16, and have 16 cabinets of 16 drawers, etc. This
lets us continue using standard arithmetic to locate
items -- 2nd cabinet, third drawer, fifth divider.
Eventually we fill that, so let's just raise the base
so we can have any number of anything. That's base
infinity. Now the rules of arithmetic get funny, because
how can you mathematically have a carry from a digit
when the digit, being potentially infinite, never overflows.
That's tumblers in a nutshell.
Inside Tip: The Udanax-Green software currently uses base
2 ** 32, storing tumblers as an array of 32-bit integers,
because of current CPU architectures. But when we go to
64-bit CPUs and 2 ** 64 integers, the scheme transitions
Nomenclature of the Enfilade:
An Enfilade is a collection of nodes called Crums, arranged in
tree structure. The single node at the top of the tree is
the root, called the fullcrum. Nodes at the bottom are called,
naturally enough, BottomCrums. If a node is not a BottomCrum,
it is an UpperCrum. The root node is an UpperCrum.
Crums exist in two states: persistent (on disk) and volatile
Each Crum has a dirty boolean and an age integer. All Crums are
linked into a global doubly-linked list. There is a reaping
process whereby Crums not accessed in a period of time are
removed from memory, being written to disk if they have been
Crums on disk are addressed by a (block#, byte#) tuple, called
a DiskLoafAddr. Once assigned, Crums retain the same address
on disk and do not move about.
Nature of their Mapping:
Enfilades are unidirectional, one-to-one mappings, where a
given key corresponds to a given datum. The choice of what
is a key and what is a datum define three types of Enfilades:
* Grand (granf) I -> I
Maps coordinates in the IStream up to but not into the
VStream of a specific document. The VStream is represented
by either (a) a block of original text or (b) the root
node of a nested POOM enfilade describing a permutation of
the VStream of this and/or another document that when
rearranged, windows into this (position, width) gap in the
The Grand Enfilade is the starting point for front-ends
reaching into the docuverse.
(a tumbler indicating a width along the IStream)
(a Crum containing a piece of original text)
(a block#/byte# tuple pointing to a POOM enfilade)
NOTE: By summing the width keys of each Crum as we descend
thru the UpperCrums of the Grand Enfilade seeking a
position in the docuverse, we can derive our
displacement within the docuverse and know when we
There is only one Grand Enfilade in the Udanax-Green system.
* Span (spanf) V -> V
Maps coordinates from the VStream to elsewhere in the VStream
(a displacement/width tuple in the VStream)
(a displacement/width tuple elsewhere in the VStream)
There is only one Span Enfilade in the Udanax-Green system.
The Span Enfilade is used to answer the questions:
1) Given a range of characters on my screen, determine
where/in which other documents those characters also appear.
The spanf finds the pooms that map to the i-span.
* Poom (poom) [Permutation Of Order Matrix] I -> V
Maps coordinates from the VStream to the IStream
There are many POOM Enfilades in the Udanax-Green system,
one for every rearrangement of the document content in use.
(an IStream displacement/width tuple)
(a VStream displacement/width tuple)
The POOM says, in effect: given that there is a specific
place and width of information in the docuverse, provide
a map of how that information appears, in a linear fashion
within that window, from the various pieces scattered
elsewhere in the docuverse.
NOTE: Since POOMs appear under BottomCrums in the Grand
Enfilade, you'd think a shorter relative IStream
would suffice. But by retaining the context of
fully-qualified IStream address in the key, a
single POOM can be linked into an arbitrary number
of Grand BottomCrums.
Enfilades rely upon two types of addresses:
* a tumbler address representing an unchanging or invariant
point in the docuverse, called an IStreamAddr. The IStream
is the sequence of original data as stored on disk, which
never changes ordering once written.
* a tumbler address representing a relative position within
a document, whose mapping to data shifts about, called a
An IStreamAddr is a fully-qualified tumbler broken into up
to four fields by zero (.0.) digits. If there are fewer than
four fields present, the fields are associated with their
IStreamAddrs are assigned to information or 'babtized' in a
hierarchical fashion, via the API operations:
* create node representing our server
* create account within a (node)
* create document within a (node, account)
* create version within a (node, account, document)
* insert/remove/rearrange text
IStreamAddrs are never deleted, retired or reused.
The Variant-Stream or VStream is the permutation of data that
represents a particular version. VStreamAddrs refer to
positions within a specific document. They are the <element> part
where the above portion of a tumbler is called the 'docid'.
The VStream of a document consists of several dataspaces, (a) any
of which may be empty and (b) directly correspond to specific types
of BottomCrums in Udanax-Green.
1) VStream addresses beginning with a '1.n...' *may* represent
either original text entered into this specific document, *or*
text from another document that is *windowed* into this one
at a specific offset/width.
If it is original text, it is broken up into small pieces and
stored within TextCrums, on the bottom layer of the Grand
If it is windowed text, a (block#, byte#) tuple pointing to
the root of a POOM enfilade is stored in an OrglCrum.
[ORGL means 'link to original' ?]
2) VStream addresses beginning with a '2.n...' represent links
that reside within (have as their home document) this specific
document. Links may have multiple end-points all or none of
which may point into this specific document.
The home document is the document within which the data was originally
A vaddress is an intra-document reference, using the provided document
A vspan is a tumbler range, within a single document. A vspan to cover
entire document, exclusive of links, is displacement 1.1, length 1.
A span is a tumbler range, using global or fully-specified tumbler
There are three transforms: I-to-V and V-to-I and V-to-V
(Clarify how sequential-versions and parallel-versions work.)
The default node is "1.1". (node 1)
The default account is "220.127.116.11". (account 1)
The default document is "18.104.22.168.0.1". (document 1)
Explanations of Design Choices:
* Why do IStreamAddr tumblers always start with a 1?
This "one" refers to the all-encompassing docuverse and derives from
the fact that
all server nodes are descended from it. This convention allows us to
refer to the
entire docuverse with simply a "one" in the first position of a
* Why is the server node broken out as a separate field in an
??? so we can link to all documents on a specific server ??? why?
* Why is the account broken out as a separate field in an IStreamAddr
So we can link to all documents owned by a group/subgroup.
* Why is the document number broken out as a separate field in an
This allows us to indicate daughter documents and versions easily.
allows us to make a link to all versions of a document. [ACTUALLY
OWNER-MADE VERSIONS, BUT NOT TO NON-OWNER-MADE VERSIONS]