Feb 2002
Garbage Collection
Overview
This document proposes anti-cyclic datastructures for GC optimization. The techniques here should be applicable to both in-time GC (like refcounting) and lazy GC (like generational GC). The idea is to reduce the processor time it takes to identify cycles and increase the efficiency of GC.
Background
To support the discussion about garbage collection we will make the analogy that the pointers in the data structures are directed edges in a graph. We will refer to the Seed Vertex which is an object that is always considered active and available to the system. The Active Graph is the set of vertices the system can reach by following the directed edges from the seed vertex. The point of GC is to identify the vertices that are not part of the active graph.
Reference Counts
A garbage collector (GC) using only reference counts is the most efficient implementation but assumes that there are no cycles in the graph. Unfortunately the possibility of cycles makes a complete implementation of GC much less efficient; the cost of finding isolated cycles of vertices is higher than keeping track of reference counts. Research has focused on the identification of these cycles when optimizing GC.
We will show that there is no need for cyclic data structures for a system to be Turing Complete. This proof will allow us to ban the use of cyclic data structures and allow a simple, reference count, implementation of GC. We will also suggest that the performance impact that might be expected can be worked around using higher order optimization techniques.
Theorem
Given any directed graph, there exists a DAG that can represent it.
Proof
The proof consists of constructing an object that will model all the directed edges in the original graph.
class DirectedEdge {
Vertex from;
Vertex to;
}//DirectedEdge
It is simple to see that any directed graph can be represented by the original set of vertices and a set of DirectedEdges. We note that the pointers used in these structures only point from DirectedEdge to the vertices. Therefore there are no cycles, and the representation is a DAG.
The application of this theorm can convert cyclic graphs to DAGs by identifying one edge in every cycle and converting it to a DirectedEdge. Consider a loop of created by the following object:
class ListElement {
Object object;
ListElement next;
}//ListElement
We can make this structure acyclic by modeling the last edge (the edge that points to the same object that the Head does) with a DirectedEdge. There will have to be a corresponding alteration in the logic of the routines that use this structure.
The new structure contains no cycles. A simple inspection shows that the algorithms used to access this structure may be more complicated, and therefore take longer to execute. For instance, if the next element in the original list is requested, then this new structure requires a specific check to see if the last
element is current, and if so make the next element the head
elemeent.
Optimization
Naivly enforcing that there be no cycles in the graph is as expensive as identifying cycles in the graph. In the gereral case, the latter must be done to perform the former . Therefore it can be recognized that naive cycle checking is impossible if we are to keep the speed gains we desire.
High level checks can be incorporated to flag possible cycles during run time. Type checking can remove whole classes from cycle checks. There are many hierarchal class definitions that do not have to be checked for cycles. For eaxmple, the DirectedEdge structure can not possibly be part of a cycle.
Standards, such as only allowing an object to point to a "lower" object, can be implemented at runtime to prevent cyclic structures from forming. The ListElement class can be annoted with a depth parameter to verify any construction does not result in cyclic structures.
The implementation language can avoid cyclic structures by providing non-cyclic primitives with apparent cyclic capabilities. For example, the cyclic list above could be encapsulated in a class library with an acyclic implementation.
Compilers can identify simple cycles and rewrite the implemention useing DAG methods. This is too compilcated to be considered here though..
Efficiency is defined as the least number of processor cycles to implement. This does not mean that it is fastest. For example generational GC is faster in environments where there is "significant" free processor time.
June 2000 - First draft
Feb 2002- some changes