Apr 2000
Spaces
A Model of Candidate Keys
The nature of candidate keys is the unique restriction on the domain of the attribute values in a table. Namely, the tuple that makes up the key is unique. Candidate keys are also important in the definition of Domain-Key Normal Form, a form that has been proven to be at least stronger than fifth normal form. The declaration of a candidate key is defined by its set of attributes.
The DBOS models candidate keys using Spaces. Spaces define the unique tuple definition of the candidate keys. Spaces are necessary in locating objects in recursively infinite sets, for use in aggregation queries, and assigning attributes to tuples.
The Space is similar to a table with certain constraints. These constraints add meaning that make Spaces an important descriptive structure. Space functionality can be described as the ability to map multiple dimensions to a single dimension.
Spaces and Infinitely Recursive Sets
One problem with trying to describe everything in a model of self is the recursive aspect. For example, the DBOS uses the concept of a variable. Variables have two attributes, Object and Field. We can see that for any object and field combination we have a value (possibly null if the variable does not exist). Given any particular variable object, it too has variables. If every variable was to be in the database, there would be an infinite number of them. It is obvious that an infinite number of variable objects can not be stored, so our implementation of the model has to make it appear as if any infinity of variables exist.
With an infinity of variables in our model, we can not allocate enough key space to them; key space is intrinsically finite. Any block of keys we do assign would be finite, and there would be no guarantee that the system would not require more. It is also a waste to allocate keys to objects that are not needed. Given the objects in a physical DBOS, the Object/Field pairs over all objects and fields identifies all the variables directly known to the system. The number of these variables is finite. So, by allowing the system to refer to variables exclusively by their Object/Field key allows the implementation to store a finite number of variables at any given time. Only those variables referenced by primary key need to be stored explicitly. Variables requested by the system, via Object/Field key, are given a temporary primary key until such time they are no longer directly referenced.
The system must manage a mapping from the primary key of the variable to the Object/Field key pair that also identifies the variable uniquely. The primary key is needed to manage variables of variables to an arbitrary recursive depth. The existence of the primary key for variables is congruent to the model used in the DBOS; all objects have a single attribute primary key.
Variable Table
Spaces and Aggregation
When aggregation is performed, with the use of GROUPBY, the results must be stored as attribute values in objects. The structure of these objects are all similar, and defined by a class. The big issue is what class do these attributes belong?
We will give an example and assume that we have a table of people's children. Each row contains the Key of a person and the key their child. Multiple children will result in multiple rows in the Person table for the same parent.
PersonKey
Count(ChildKey)
FROM
Person
GROUPBY
PersonKey
;
Every attribute must belong to a class, therefore, the count of the children are best seen as an attribute of the person. Actually the count attribute could belong to a set that is isomorphic to the set of people, given a parent() operator. The distinction is not apparent until more than one column is grouped.
Here is an example that does not double-count children, a table that contains the two biological parents, and the child conceived. The count of the children for all pairs of people is:
MotherKey
FatherKey?
Count(ChildKey)
FROM
Person
GROUPBY
MotherKey
FatherKey
;
The child count does not belong to either parent, it belongs to the unique combination of father and mother. The Father/Mother pair is a candidate key for the child count attribute.
Technically all variables can be seen as belonging to a point in a space, not a class. This is just a generalization of what was assumed before; every object exists as a point in object space. From above, grouping by just a single attribute, we mentioned the difference between the generated attribute belonging to the grouped by class and belonging to something isomorphic. We can finish our exposition with the statement: The space of objects is isomorphic to the set of objects under all operations.
The Formalism of Spaces
We notice that any {primary key, n-tuple key} set definition is isomorphic to an n-dimensional matrix, where cell values consists of primary keys. This matrix is called a Space. Consider our example, a Variable Space:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We can see all three variables in the Space. We also see all the possible variables that can exist. As mentioned above: The matrix can never be full because it would be infinitely large; the order of the Object dimension is dependent on the number of variables in existence.
We can formalize the use of spaces by introducing nomenclature and identifying valid operation on this nomenclature.
Definition:
A Space is an explicit definition of a bijective mapping from a Cartesian products of sets to a set.
The formal theory of spaces begins in the investigation of the existence of explicit bijective functions between sets. In other words, if we know of the existence of a bijection, f, such that f : A -> B then we write "A = B". The choice of "=" for the notation is intuitive because f-1 is also a bijection and, "B = A". We take, as given, that the identity bijection always exists: A = A.
Spaces are maps between Cartesian products and abstract sets. The names of the sets used in the product are put in curly braces, separated by commas. For example, we write the existence of a bijection g: A x B -> C as: "C = {A, B}" or deductively as: "C = {B, A}". In the case of a Cartesian product of a set with itself, we subscript the set names for clarity:
C = (A1, A2} and C = {A2, A1}
The nomenclature being introduced allows associativity.
Definition:
If A = {E, D} and E = {B, C} then it is equivalent to write A = {{B, C}, D}.
Theorem:
{{B, C}, D} = {B, C, D} = {B, {C, D}}
The proofs required for all the results posed are elementary, and have been removed for brevity.
The nomenclature introduced will help in the formal deduction of the dependencies between the various field that make up the candidate keys. Click here to see an example of this notation in use.
Spaces and Functional Dependencies
The consequences of having functional dependencies on data allows for the definition of natural bijections.
If A ->> B
then, by definition,
f : A -> B
such that
b1
b2
B,
a1
a2
A
where f(a1) = b1
and f(a2) = b2. The existence of this f allows use to conclude that the mapping g : A x B -> A, defined by g(a, b) = a, is a bijection. We call g
a natural bijection. In other words: The existence of the functional dependency A ->> B
allows us to write A = {A, B}.
We can see an example of this. We know that for any field, there exists only one class that field belongs. We note that this constraint defines Field ->> Class. Therefore Field = {Field, Class}.
Summary
Spaces are just a name given to the model of candidate keys. Their extensive use in DB theory raises this concept above all other constraints. Their simplicity wrt there usefulness makes them an intuitive, and therefore powerful, tool.