June 2002
Aggregation Types
Introduction
The purpose of this document is to provide a classification framework for aggregates. This classification framework will allow us to produce efficient running implementations of particular aggregates.
Aggregation is a function from a family of sets to a family of atoms. Some familiar aggregates are MAX, MIN, SUM, AVERAGE, and COUNT.
The Multiset Group
We will use group theory to describe aggregation. First we define the domain upon which aggregates can act, then we will start classifying aggregates. Given a finite set A with n elements, consider the set
A* = kiai |
i,j
1..n
, ai,aj
A, ai<>aj when i<>j, ki >= 0
The operation acting on two elements of A*, b=kibi
and c=
ljcj
, is defined by
bc=
(ki+lj)bi | bi=cj
. We can easily see that A*
is an abelian group, and I=
0ai |
ai
A
is the identity element. We say that A*
is the multiset group of A. We will always use the star notation to indicate a multiset group. Note that this group does not have inverses.
Cumulative Aggregates
An aggregate, AGG, is cumulative if it is homomorphic on the multiset domain:
AGG(x)AGG(y) = AGG(xy) x,y A*, where A =
a | a
dom(AGG)
The operation on the codomain depends on the type of aggregation used and is denoted by concatenation in a same way as operations in the multiset group. The reader should be able to identify the appropriate codomain operation easily enough.
Knowing an aggregate is cumulative allows you combine the aggregate values instead of recalculating the aggregates from the combined domain. All of the popular aggregates (MIN, MAX, COUNT, SUM, PRODUCT) are homomorphic. For example if MAX(D)=43 and MAX(E)=67 then MAX(DE)=67; we never have to inspect the members of the D and E multisets.
The Multiset Group with Inverses
Given a set finite set A with n elements, consider the set
A =
kiai |
i,j
1..n
, ai,aj
A, ai<>aj when i<>j, ki
The only difference here is in the domain of the ki
coefficients. This wider domain allows for the existence of inverse multisets. We say that A
is the invertable multiset group of A. We will always use the plus-minus notation to indicate an invertable mutiset group.
Homomorphic Aggregates
An aggregate, AGG, is homomorphic if it satisfies the following:
AGG(x)AGG(y) = AGG(xy), x,y
A
Since A
has inverses, we can find a y-1 for any y
A
. Therefore
AGG(x)AGG(y)AGG(y-1) = AGG(xy)AGG(y-1) = AGG(xyy-1) = AGG(x)
Homomorphic aggregates can be "undone" without having to recalculate from the original multisets. The homomorphic aggregates are COUNT, SUM and PRODUCT. MAX and MIN are not homomorphic. For example, if SUM(D)=495 and SUM(E)=605 then SUM(ED-1)=110; we never need to know the contents of D and E.
Summary
If aggregate values must be tracked during program execution, then finding an efficient method of updating those aggregates is important. The classification scheme shown above formalizes the categorization of the various aggregates so that the two different types can be discussed and implemented in a general fashion.