[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]

Version Compare/Merge: Remaining Issues

Interesting issues include:

- How do we display nested movements in our 
compare/merge window?
- How do we deal with vcopies?
- Under what circumstances do we get a result that is 
not heuristically optimal, and what should we do about it?
- What other enhancements/alterations may be 
- Finally, what other kinds of comparison information 
might be interesting in other comparison displays? All we 
have discussed so far is focused strictly on creating the 
Release 1 Tapestry comparison. There is another comparison 
that interests me: the Quick Compare. A separate message 
documenting that, while it is fresh in my mind, will follow 
another day.

Meanwhile, here are discussions of the other issues:


Displaying nested movements is a moderately 
difficult problem. I did not think it through clearly 
enough at Capabilities Review time (as dean said at the 
time, what all did marcs NOT think of? :-) The issue is 
that with nested movement, a specific chunk of moved 
material may have been moved several times, i.e., there 
are several movement locations, and the particular place 
you click may be a part of several different movement 
groups (for example, if you click on a sentence that was 
moved inside a paragraph that was itself moved to a new 
chapter, if you click on the sentence, there are 2 
moved-blocks and 2 moved locations associated with it). 
There are several approaches; there is no risk that we 
cannot solve this problem; but it is not yet solved. I 
will be thinking about it.


Vcopies can be dealt with more or less the same way 
we deal with insertions and deletions. The most bizarre 
case is when there are several copies in the original 
and several in the final. The following algorithm will 
work well enough for a first go:

     - Map as many copies one-to-one from the 
original onto the final as possible. A pair of copies 
should be mapped onto each other if they are "near" 
each other in the working document: here, "nearness" 
is a count of the number of movement runs separating 
them in the working document. Map the 2 nearest 
vcopies (one from the original, one from the final) 
onto each other, repeatedly, until you run out of 
vcopies in one or the other.
     -If there are left over copies in the 
original, treat them as deletions. If there are left 
over copies in the final, treat them as insertions.


Under what circumstances do we get poor results? The 
simplest case is a 5-part group of movement runs with no 
reverses. This is not as common an event as it might 
sound: remember, a run only qualifies as a movement run 
if it is not properly adjacent to EITHER of its 
neighbors. If you throw out reverses as movements, you 
now need to move ALL the movement runs large distances, 
being careful to ensure that no original neighbors wind 
up next to each other at the far end (else, they were 
merely reversed).

The simplest case where this occurs is in the 
transformation from "A B C D E" to "B E C A D". In this 
case, we would say that A was moved between C and D. 
The algorithm has a 50% chance of making an error, and 
saying that D was moved behind A, then C was moved in 
front of A.

This can be characterized as an error in 
lookahead: without looking ahead to future 
transformations, the anchor method, in its fixation on 
finding a movement that makes a small object adjacent 
to a large object, may not notice that it could save 
an extra movement later on by moving the large object 
between the two small ones that will eventually 
surround it. I believe that we should implement this 
extra bit of lookahead, so that this 5-part-rearrange 
works correctly.

There is a more general version of this problem. 
To guarantee heuristically optimal results, we would 
have to search the tree of alternative movement 
sequences. Though there is a warm place in my heart 
for the alpha-beta minimax algorithm needed to do this 
efficiently, I do not believe it is worth the effort 
for Release 1. With just a one-ply lookahead, our 
system will, I believe, create simpler transformations 
than normal humans would. 'Tis sufficient. 


I may have performed overkill in defining the size 
of a composite run to have 2 parts. We may want to go 
back to defining it to be just the sum of the sizes of 
the movement runs it contains. This will increase the 
amount of nesting, but decrease the motion of large 

The algorithm for composing movement runs in the 
working document does NOT guarantee the maximum 
separation of size, into the very largest and very 
smallest objects. It can fail for reasons of lookahead 
as well. If we get counterintuitive results (too many 
insertions and deletions being moved around), we can 
enhance this as well. 

Intuition suggests that, if objects are sufficiently 
different in size, it is easier to understand moving two 
small objects than to move one large object. 
Consequently the improvement to the anchor method I 
propose above, to move the larger object between the two 
smaller objects to save a move, will sometimes produce 
heuristically better results that are unfortunately 
counterintuitive. My personal experiments suggest that 
this effect starts to appear when the anchor is larger 
than the sum of the sizes of its properly adjacent 
neighbors. We may wind up implementing this during Beta 
as well. 

With a crew of critics like Xanadu, I'm sure someone 
will come up with even more amusing instances where this 
algorithm does not create the most intuitive 
representation...I look forward with some trepidation to 
seeing the counterexamples.


P.S. This is still no where near being the most important thing
we have to think about. Absolutely everything else that 
people are working on is, in fact, more important.