[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]

My Rough Notes re Udanax-Green



(I'm working up some diagrams in Gimp to go with this)

In the Beginning was the Docuverse:

The docuverse can be thought of as a infinite number line
with all the information it contains or ever will contain
strung out along the line.  Since the number line is
infinitely divisible, we can insert new material at any
point, similar to how real numbers work on a number line.
We can also refer to a portion of the information, whether
it exists today or will someday, using a (position, width)
tuple.

Arbitrarily we limit the docuverse to occupying the range
of numbers from 1.<0> to 1.<infinity>, which allows us
the convenience of referencing the entire docuverse with
a span of (position 1, width 1) while not limiting
capacity in any real way.

Tumbling thru the Docuverse:

To represent positions along this docuverse line, we use
'tumblers', which are essentially base infinity real numbers,
where carries to/borrows from other digits don't occur.

Consider that if we used based 10, we could only have ten
positions, with ten positions inside each of those, etc.
So if we had, say, eight file cabinets, each with seven
drawers, each with five dividers, things would work out.

But as we store more information, the day comes with we
need more dividers than ten, or more drawers than ten.
Because of our many links between documents, we really
don't want to renumber everything, so we switch to
base 16, and have 16 cabinets of 16 drawers, etc.  This
lets us continue using standard arithmetic to locate
items -- 2nd cabinet, third drawer, fifth divider.

Eventually we fill that, so let's just raise the base
so we can have any number of anything.  That's base
infinity.  Now the rules of arithmetic get funny, because
how can you mathematically have a carry from a digit
when the digit, being potentially infinite, never overflows.

That's tumblers in a nutshell.

Inside Tip: The Udanax-Green software currently uses base
2 ** 32, storing tumblers as an array of 32-bit integers,
because of current CPU architectures.  But when we go to
64-bit CPUs and 2 ** 64 integers, the scheme transitions
transparently.


Nomenclature of the Enfilade:

An Enfilade is a collection of nodes called Crums, arranged in
tree structure.  The single node at the top of the tree is
the root, called the fullcrum.  Nodes at the bottom are called,
naturally enough, BottomCrums.  If a node is not a BottomCrum,
it is an UpperCrum.  The root node is an UpperCrum.

Crums exist in two states: persistent (on disk) and volatile
(in ram).

Persistence Mechanism:

  Each Crum has a dirty boolean and an age integer.  All Crums are
  linked into a global doubly-linked list.  There is a reaping
  process whereby Crums not accessed in a period of time are
  removed from memory, being written to disk if they have been
  modified.

  Crums on disk are addressed by a (block#, byte#) tuple, called
  a DiskLoafAddr.  Once assigned, Crums retain the same address
  on disk and do not move about.

Nature of their Mapping:

  Enfilades are unidirectional, one-to-one mappings, where a
  given key corresponds to a given datum.  The choice of what
  is a key and what is a datum define three types of Enfilades:

  * Grand (granf)    I -> I

    Maps coordinates in the IStream up to but not into the
    VStream of a specific document.  The VStream is represented
    by either (a) a block of original text or (b) the root
    node of a nested POOM enfilade describing a permutation of
    the VStream of this and/or another document that when
    rearranged, windows into this (position, width) gap in the
    IStream.

    The Grand Enfilade is the starting point for front-ends
    reaching into the docuverse.

    Key:
        (a tumbler indicating a width along the IStream)

    Datum:
        one of:

	(a Crum containing a piece of original text)

	or:

	(a block#/byte# tuple pointing to a POOM enfilade)

    NOTE: By summing the width keys of each Crum as we descend
          thru the UpperCrums of the Grand Enfilade seeking a
          position in the docuverse, we can derive our
          displacement within the docuverse and know when we
          have arrived.

    There is only one Grand Enfilade in the Udanax-Green system.

  * Span (spanf)  V -> V

    Maps coordinates from the VStream to elsewhere in the VStream

    Key:
        (a displacement/width tuple in the VStream)

    Datum:
        (a displacement/width tuple elsewhere in the VStream)

    There is only one Span Enfilade in the Udanax-Green system.

    The Span Enfilade is used to answer the questions:
    
    1) Given a range of characters on my screen, determine
       where/in which other documents those characters also appear.

    2) ???

    The spanf finds the pooms that map to the i-span.

  * Poom (poom) [Permutation Of Order Matrix]   I -> V

    Maps coordinates from the VStream to the IStream

    There are many POOM Enfilades in the Udanax-Green system,
    one for every rearrangement of the document content in use.

    Key:
      (an IStream displacement/width tuple)

    Datum:
      (a VStream displacement/width tuple)

    The POOM says, in effect: given that there is a specific
    place and width of information in the docuverse, provide
    a map of how that information appears, in a linear fashion
    within that window, from the various pieces scattered
    elsewhere in the docuverse.

    NOTE: Since POOMs appear under BottomCrums in the Grand
          Enfilade, you'd think a shorter relative IStream
          would suffice.  But by retaining the context of
          fully-qualified IStream address in the key, a
          single POOM can be linked into an arbitrary number
          of Grand BottomCrums.


Coordinate Spaces:

  Enfilades rely upon two types of addresses:

  * a tumbler address representing an unchanging or invariant
    point in the docuverse, called an IStreamAddr.  The IStream
    is the sequence of original data as stored on disk, which
    never changes ordering once written.

  * a tumbler address representing a relative position within
    a document, whose mapping to data shifts about, called a
    VStreamAddr.

  The IStream

  An IStreamAddr is a fully-qualified tumbler broken into up
  to four fields by zero (.0.) digits.  If there are fewer than
  four fields present, the fields are associated with their
  functions left-to-right.

     <node>.0.<account>.0.<document>.0.<element>

     <node>.0.<account>.0.<document>

     <node>.0.<account>

     <node>

  IStreamAddrs are assigned to information or 'babtized' in a
  hierarchical fashion, via the API operations:
  
  * create node representing our server
  * create account within a (node)
  * create document within a (node, account)
  * create version within a (node, account, document)
  * insert/remove/rearrange text

  IStreamAddrs are never deleted, retired or reused.

  The VStream

  The Variant-Stream or VStream is the permutation of data that
  represents a particular version.  VStreamAddrs refer to
  positions within a specific document.  They are the <element> part
  after the:

      <node>.0.<account>.0.<document>

  where the above portion of a tumbler is called the 'docid'.

  The VStream of a document consists of several dataspaces, (a) any
  of which may be empty and (b) directly correspond to specific types
  of BottomCrums in Udanax-Green.

  1) VStream addresses beginning with a '1.n...' *may* represent
     either original text entered into this specific document, *or*
     text from another document that is *windowed* into this one
     at a specific offset/width.

     If it is original text, it is broken up into small pieces and
     stored within TextCrums, on the bottom layer of the Grand
     Enfilade.

     If it is windowed text, a (block#, byte#) tuple pointing to
     the root of a POOM enfilade is stored in an OrglCrum.

     [ORGL means 'link to original' ?]

  2) VStream addresses beginning with a '2.n...' represent links
     that reside within (have as their home document) this specific
     document.  Links may have multiple end-points all or none of
     which may point into this specific document.


Glossary:

The home document is the document within which the data was originally
stored.

A vaddress is an intra-document reference, using the provided document
ID.
A vspan is a tumbler range, within a single document.  A vspan to cover
an
entire document, exclusive of links, is displacement 1.1, length 1.
A span is a tumbler range, using global or fully-specified tumbler
fields.

There are three transforms: I-to-V and V-to-I and V-to-V


(Clarify how sequential-versions and parallel-versions work.)



The default node is     "1.1".          (node 1)
The default account is  "1.1.0.1".      (account 1)
The default document is "1.1.0.1.0.1".  (document 1)


Explanations of Design Choices:

* Why do IStreamAddr tumblers always start with a 1?

  This "one" refers to the all-encompassing docuverse and derives from
the fact that
  all server nodes are descended from it.  This convention allows us to
refer to the
  entire docuverse with simply a "one" in the first position of a
tumbler.

* Why is the server node broken out as a separate field in an
IStreamAddr tumbler?

  ??? so we can link to all documents on a specific server ??? why?

* Why is the account broken out as a separate field in an IStreamAddr
tumbler?

  So we can link to all documents owned by a group/subgroup.

* Why is the document number broken out as a separate field in an
IStreamAddr tumbler?

  This allows us to indicate daughter documents and versions easily. 
This convention
  allows us to make a link to all versions of a document.  [ACTUALLY
ONLY TO
  OWNER-MADE VERSIONS, BUT NOT TO NON-OWNER-MADE VERSIONS]