May 2002
References, Symbols and Quoting
Overview
This document covers some of the basic questions about what references and symbols are, and how they should be implemented in a reflective architecture.
References
Defining "reference" is difficult to do. This is because references lack structure, and appear fundamental in may respects. The problem is that referencing requires form and context to be communicated. This form and context can not be untangled from the concept of referencing. Therefore, trying to define referencing will only result in the definition of another type of object, and making us loose the original notion of a reference. Therefore we leave the definition of reference to the philosophers, and make no attempt to construct a definition for referencing. I will, instead, construct a definition of symbolism.
References are communicated through the use of symbols. Here I am liberal with the use of the word "symbol". A symbol can take the form of a glyph, a picture, a series of sounds, a number, and source code. This paper is interested in the definition of symbolism.
Symbols
Symbols are concrete objects the can be observed, discussed and defined. It is the ability to make a reference to a symbol that makes symbolism so powerful. The definition of symbolism will be powerful enough to cover all our worldly needs.
Symbols are objects that have three important attributes. The first is that they can be perceived, so their description can be transmitted. The second is the context that surrounds its interpretation. The third, most obvious, is they represent another object. We will represent these three aspects as an ordered triple.
(Symbol, Context, Value)
If we want to remove the need to mention the context of symbols, we must use a single context where all symbols unambiguously define unique objects. We can construct another symbol that, no matter the context, always represents the same Value. This will be called the context free symbolism:
(uSymbol, Value)
The "u" prefix is meant to stand for unique.
Now we can understand why defining referencing to be the same a symbolism would force us to loose the original notion of a reference. Symbolism's ordered pair, sometimes called an arrow, is actually a symbol itself: a symbol of a reference. The best that can be done is make another symbol to represent this arrow.
Quoting
Quoting is textual method of referencing text, and a specialization of symbolism. Syntactically, quoting defines a reference to a textual object by using a symbol, similar to the referenced text but adding quotes.
A := "My dog Spot"
B := My dog Spot
We see here the B refers to an animal: my dog Spot. Note that this textual symbol needs a lot of context in order to reference an animal uniquely. A refers to B. Therefore A does not refer to the dog, nor does it refer to the context required to interpret B. A still requires, at least, a language context.
(B, lots of context, my dog Spot)
(A, language context, B)
Symbolism in Computing
Everything manipulated by a machine must be explicit. Therefore machine references use symbolism to achieve referencing.
Source Code
In a computer science context, quoting of source code can be done. Given a particular language, one can refer to a function by defining it completely in source code. I would like you to read The Will and the Word (cache) for a better exposition of the distinction between a function and its description.
Another method of symbolizing a function is by using a name. We can equate a name to a function by referring to the function we clearly specified with source code. Depending on the context of this name, the name can represent the function (when making a function call) or the source code (when applying reflection). These two aspects of function can be separated providing a new uSymbol for each aspect.
By assigning a name to a function we can have a more concise reference to a function. This conciseness, specifically this finiteness, is necessary for recursive functions.
Namespaces
Namespaces are a method of referencing for use by humans; Machines only need a simple UID system. Namespaces are talked about often enough that they deserve mention here. Simply put, namespaces are equivalent to contexts. With this insight we can see that name collisions are a natural side effect of allowing the same symbol to refer to different objects. The problem is that the scope of the namespace is not sufficient to handle the scope of the contexts that are required to interpret the symbol correctly. Function overloading is one method that is used to extend the namespace context to a context capable of determining unique references.
Knowing that multiple contexts are allowed to exist also hints to the fact that multiple namespaces are allowed to exist. Additional namespaces can be used to resolve ambiguity of reference. Additional namespaces can also allow different programmers to see different names for the same object. This can be useful: during development when short names are preferred, and in the final release when long names are preferred. IDEs should really provide namespace selection and editing to reduce programmer keystrokes.
RDF Triples
The RDF project is an attempt to provide semantic information about a multitude of knowledge domains by providing a standard encoding scheme. RDF triples define a typed arrow system containing nodes (objects) and typed arrows (triples). We say that RDF is a typed arrow system because the triple, usually described as (Subject, Predicate, Object), easily relates to the graphic arrow (Tail, Type, Head). Some of the shortcomings of RDF are a lack of reflection (namely on the triples themselves*) and a needless complexity (typing arrows is not necessary). Nevertheless RDF can be easily converted to a better canonical form; preserving all the important descriptive information provided by ontological domains.
Pointers
Let us consider C pointers as another example of machine referencing. The context of the pointer is entwined in the context of the machine's electrical and physical state (the state of the program). The Value referenced is a function (F0) of the pointer bits and the machine context.
Value = F0(Symbol=Bits, Context=MachineContext)
Abstractly combining the symbol and context to create a uSymbol might be possible in theory, but is very likely impossible to do in reality. The machine context is effectively infinitely complex because it is part of an unpredictable universe. What we do instead, is define a uniform context that contains the complexities of the machine context. Then we can break up every machine context into the uniform context plus and some other simpler part. By making the uniform context implicit to the machine, we avoid having to state it. Then we can combine the bits with the simpler portion to get a single uSymbol.
Value = F0(Symbol=Bits, Context=SpecificContext+CommonContext)
Value = F1(uSymbol=Bits+SpecificContext, Context=CommonContext)
Leaving the CommonContext implied we have
Value = F(uSymbol=Bits+SpecificContext)
This means we can create a system that has a unique symbol for everything it references AND needs know nothing about the machine it resides upon. We effectively achieve context free symbolism. Of course, we already knew this is possible because many of these systems exist already.
The concept of a uSymbol is often done in database design, assigning a UID to every record in a table. Popular procedural languages do the same, establishing a relationship between the pointer and the location of an object descriptor. Object references are similar to pointers, but remove complication by removing the unnecessary computer memory concept. Object references are internally managed (uSymbol, Value) arrows.
Object references are sufficiently abstract to lead us into a discussion on reflective machine symbolism.
Symbolism in a Reflective Environment
A reflective environment requires that the act of symbolism be reified so that it can be reflected upon. In other words, the function F described above must be available for inspection. We do this by representing F as a set of (uSymobl, Value) pairs.
Every one of these pairs must be available to the reflective system. This is done by giving every pair its own uSymbol; we call this uSymbol the ID so we do not get it confused with the symbolism it represents. We end up with a set of triples in a context free environment.
(ID, uSymbol, Value)
The DBOS variable object encapsulates the concept of a symbol, albeit not optimally for implementation reasons.
nth Order Symbols
It turns out that the (ID, uSymbol, Value) triple is sufficient to represent nth order symbols. Here is an abstract example. Start with an arrow:
(uSymbol=B, Value=C)
We make a uSymbol (A1) for this arrow:
(uSymbol=A1, Value=(uSymbol=B, Value=C))
We can recurs, making symbols that refer to our symbols:
(uSymbol=A2, Value=(uSymbol=A1, Value=(uSymbol=B, Value=C)))
And here is a third order symbol:
(uSymbol=A3, Value=(uSymbol=A2, Value=(uSymbol=A1, Value=(uSymbol=B, Value=C))))
Even with every uSymbol referring to only one object, we are still allowed to have many symbols to refer to the same object. We see that (uSymbol=B, Value=C) has two symbols "(uSymbol=B, Value=C)" and "A1". We will replace the former with the latter in A2:
(uSymbol=A2, Value=(uSymbol=A1, Value=A1))
A3 can also be restated similarly:
(uSymbol=A3, Value=(uSymbol=A2, Value=A2))
Of course, from a machine perspective the parenthesis are just syntax, and can be removed for the sake of an efficient implementation.
(ID=A2, uSymbol=A1, Value=A1)
(ID=A3, uSymbol=A2, Value=A2)
Objects, Attributes and Variables
A first order object system, such as RDF above, does not allow self reference*. For example, does not allow reference to any of the triples describing the object system. Yet we have shown the triple has sufficient expressability to be used in a reflective environment. What we need is to show a mapping between these two systems.
The transformation is to simply reify the triples (typed arrows) using pairs (untyped arrows).
(Subject, Predicate, Object) <=> (V1, Subject, Object),
(V2, Predicate, V1),
(V3, "Type", V2)
V1 is the equivalent untyped arrow. V2 points to V1 and is used to indicate the type (predicate). Finally, to distinguish between a type arrow and a second order arrow (described below), the predicate arrow is also given a type (V3). I should mention that this is just one of many possible transformations to an untyped arrow system.
Conclusion
Symbolism will always need context to be useful. Having a common context greatly simplifies reflection on symbolism by removing the complicated considerations of context. But a single common context means that there must be a single "right way" of interpreting symbolism graphs. This may be a limitation if the assumptions made in the common context have an impedance mismatch with the machine context.
For example, the JVM common context does not model x86 processor context. The JVM context was chosen for simplicity and not for implementation efficiency. The C# CLR is a much closer match to the x86 context and therefore can support a wider range of languages efficiently. Therefore, we need not concern ourselves that a Lisp context has an inefficient translation to the CLR context because an equivalent x86 translation of Lisp would not be much better.
Choosing your common context based on a single machine context could also be bad because it locks your assumptions into a single platform. In this case, the Java context is a better choice than CLR; the benefits of simplicity make up for loss of efficiency on that single platform.
Technically we do not know how well programmers can implement optimizations between the JVM, CLR and machine contexts. It could very well be possible that, in practice, optimizations can show these two platforms to be equivalent contexts in terms of translation efficiency. This was only an example.
In any case, additional contexts must have a formal translation to the common context. Only then can a new breed of symbols (with ambiguity, or not) for the new context be used. This will inevitably lead us into a discussion of what a formal translation between contexts is. There is so much more to say. It appears that the conversation of context begins where symbolism leaves off.
*RDF can not refer to it's own triples using triples. There is a syntactic hack that allows reference to triples (A, B, (C, D, E)), but this assumes that triples are unique and the underlying system is left to implement these as 4-tuples (3 plus an ID). The syntactic hack precludes the existence of a self referencing triple (A, B, (A, B, (A, B, ...))).
February 2003: Added RDF references and paragraphs
May 2002: Finally got a good draft
.