Feb 2005 (incomplete)
I am writing this document in order to help myself understand the versioning system I need to make for an optimizing compiler.
This document does not cover patch theory. Patch theory needs to know the topology of the versions it acts on; what a patch means to act on that topology.
Version |
A "version" is a fixed set of objects and variables. |
|---|---|
Empty Version |
An "Empty Version" is a version that has no objects and no variables defined within it. |
Patch |
A "patch" relates two versions (one called the former and one called the latter) and indicates "the former was used a prototype for the latter". Patches exist for humans to understand the high-level relationship between versions, and for machines to efficiently store version information. Technically, a patch can be made to transform any version to any other. Patches are objects, and often store managment information like who made patched and for what reason. |
Environment |
A set versions in a historical sequence. |
By identifying the features and relationships that can exist between versions, I can make a framework/API for trying different strategies. This nessesarily includes the ability to perform transactions and isolate appplication packages.
The differences between any two versions can be grouped into three aspects:
The ability to isolate packages is the most important for the particular series of optimizations I want to implement. I need to identify static and dynamic portions of the code AND DATA to effectivly perform standard optimization techniques. Furthermore, I want the optimizer to be able to identify these static portions on it's own using hints from the programmer.
I will use CVS, where I can, to provide examples.
Versioning can be seperated into four main categories:
The first three categories of versioning are similar to the three different features of a Directed Acyclic Graph (DAG); where verticies correspond to versions, and the edges correspond to "patches". I will use a graph analogy to summarize these first three.
The fourth category allows us to use second order functions in enclosed environments so that an optimizing compiler can take advantage of more optimization opportunities.
Historical versioning consists of two versions related by a single patch. Intuitivly see why this pattern is called historical.
Temporal Versioning
Temporal versioning is a special type of historical versioning that relates time to the patches between versions. Temporal versioning usually uses a timestamp attribute to replace the need for explicit former/latter attributes. The existence of all three attributes is redundant because of the functional dependency: TimeStamp ->> (former, latter).
Temporal versioning can be further broken into proactive, or reactive; using the time-commenced or time-terminated for it's timestamp respectivly. The particular choice if temporal versioning is usually determined by the endpoints of the problem since they are equivelent representations. Notice that using both values is redundant.
Environments encapsulate Historical Versioning
Since versions are static entities, they are almost useless in a proceedural environment. This is why we define an "environment" as a set of versions in a historical sequence.
Environments encapsulate the managment of versions and patches inside thier domain. The details of how patches are managed are isolated from the rest of an application, while exposing an interface that allows object creation/deletion and changing of variable values.
Logical versioning is can be viewed as one version with many (usually orthogonal) patches diverging from it.
In the case of CVS, this is eqivelent to multiple programmers checking out, and changing, a file in repository. A versioning system must be able to handle the apparent inconsistency between the logical versions.
Inspecting the different aspects between two latter versions are instructive here. In a high level way, we can extract programmer intent by seeing how the latter versions differ. If the differences are orthogonal, then we can assume each version represents it's own resource. If the differences are collisional, then we can assume the versions represent different senarios/solutions to the same problem.
Compositive versioning merges versions to a single version.
CVS does two things when integrating files. First it identifies the patches made to each, verifying conflcts that may occur. Second the patches are applied to the original. A general versioning system should copy this 2-stepped procedure, and provide interfaces for context managers to manage those conflicts and provide the rules for merging.
We can further divide compositive versioning into 3 subcategorys; one for each aspect of difference.
The handling of collisional differences is probably the most useful and complex feature in any versioning system. So much so, that any application implementing it, will be known for it. CVS is characterized by it's ability to handle collisional differences in cooperative versioning; yet CVS' domain is quite simple!
Descriptive versioning involves reifying versions and thier contents in a larger context. A version who's objects and variables describe other versions is called a meta-version. Meta-versions are needed by second order methods, like compliers.
A meta-version that describes itself is called a reflective version.
By declaring what versions a particular set of code is allowed to act on, the compiler is better able to deduce optimization strategies.
A meta-version maps object references in the child version to objects in the meta-version. When there are more than one child versions, then this mapping will define object equality between those child versions.
Reflective Versioning
Reflective versions require a level of consistency that makes them diffucult to construct. A well-founded tower of meta-versions