June 2002
Aggregation Semantics
Introduction
The purpose of this document is to cover the semantics of aggregation in an object oriented environment, and more specifically in the DBOS.
Aggregation on One Field
We start with an example. Consider the set of Customers and the set of Orders those customers made.
Customer
CustomerID
Name AS String
Order
OrderID
Date
Customer
Amount AS Currency
We also define a special type of Customer, the Valued Customer, that has spent lots of money at the estabilshment.
ValuedCustomer extends Customer
Total AS Currency
From the structural declaration, we have no way of knowing if this Total is really an independant attribute, or just a calculation, like so:
DEFINE VIEW ValuedCustomer
Select
CustomerID
Name
SUM(Amount) as Total
From
Customer, Order
Where
CustomerID=Customer
GroupBy
CustomerID
If you did not know of the view definition, then you would not know that the Total attribute was not just another attribute of Customer. The act of aggregration appears to be adding attributes to the type of the field grouped by.
A question arises around the aggregation of more than one field, and no fields. Is the same thing happening for those fields too?
Aggregation on Many Fields
Grouping by more than one field can be seen as a forcing the inheritence hierarchy to rearrange to hold the new attributes for the objects.
DEFINE VIEW TotalOrdersView
Select
CustomerID
Date
SUM(Amount) as Total
From
Order
GroupBy
CustomerID
Date
The Order is broken into three tables. BasicOrder, is the common attributes between the original, Order, table, and the group by fields. The TotalOrders table that holds the new attribute, and the SingleOrder table that holds the difference between Order and the groupby fields.
BasicOrder
OrderID
CustomerID
Date
TotalOrders extends BasicOrder
OrderID
Total
SingleOrder extends BasicOrder
OrderID
Amount
This was unaceptable. Worst case senario, all tables will have only one attribute, and all classes are a mutiple inheritence of all the attributes required.
The concept of spaces was considered to be the solution. Space descriptions overlay the existing class hierachy, and add extra rules for relfection.
Spaces are the result of the interplay between aggregation and normalization. Let us look at an example table:
Order
OrderID
Store
Customer
Date
Amount
Suppose we would like to have the total number of orders, by customer, for each store.
SELECT
Store
Customer
COUNT(1) Number
FROM
Order
GROUPBY
Store,
Customer
Now we would also like the total sales to each customer, by store.
- Type of Aggregation Function
- The field to use by the aggregation function
- The source table
- The fields to group by
SELECT
Store
Customer
SUM(Amount) Total
FROM
Order
GROUPBY
Store,
Customer
We now note that the table schemas are quite similar. But not similar enough to extract important information. The structure of tables and the object heriarcy should be sufficient to determine if two attributes belong to the same object. This is not the case here. Without knowledge about how these two tables were generated, there is no way to know if given a particular Store/Customer combo if the Number of orders and the Total Sales correspond to the same source information.
The common parts should be extracted and placed in their own table for removal of redundancy.
Customer/Store
ID
Store
Customer
TotalOrders
Customer/StoreID
Number
TotalSales
Customer/StoreID
Total
We have added the description of the fact that the TotalOrders and TotalSales are attributes of the same concept; the Customer/Store pair. Other attributes can be added
The objects used to describe aggregates come from analyzing the effects of normalization on the obvious structures used to describe aggregates. Let us generate these naïve structures now and see what normaliztion brings.
Using SQL as our template for structure the following information is need
Here is the table definition for an aggregate
| Aggregate | |
| Aggregation Type | |
| Field to Aggregate | |
| Source Table |
Here is the table to store the group by fields, note many to one relation to the aggregate
| Group By | |
| Key of Aggregate | |
| Field to group by |
We must be able to store the result
| Result | |
| Grouped Field Value1 | |
| Grouped Field Value 2 | |
| ... | |
| Grouped Field Value n | |
| Aggregate Result |
If we chose more than one aggregation type on the same source table, with the same fields grouped we would notice a certain similarity between the Result and Result 2 tables
| Result2 | |
| Grouped Field Value 1 | |
| Grouped Field Value 2 | |
| ... | |
| Grouped Field Value n | |
| Aggregate Result2 |
These two results are duplicating record entries and can be optimized by combining them into a single table, or identifying thier common grouped fields as a seperate entity, a Space. The latter option is better because it identifies tables with muti-attribute keys.
| Result Point 1 | |
| Point | |
| Aggregate Result |
| Result Point 2 | |
| Point | |
| Aggregate Result2 |
| Point | |
| Grouped Field Value 1 | |
| Grouped Field Value 2 | |
| ... | |
| Grouped Field Value n |
The point class is actually declaring a Space. There is allowed only one record with a given combintion of field values. Each space must be a class to declare the fields that make up the various dimensions of the space.
Back to our naive 2-table structure at the top. We notice that the same aggregate can be defined for different tables. Maybe the count aggregate, to count the number of record in a table is used often. We can easily the functional dependency between "Source Table" and the "Field to Aggregate". The Aggregate structure must be broken into two structures:
| Aggregate Class | |
| Aggregation Type | |
| Field to Aggregate |
| Aggregate Table | |
| Source Table | |
| Aggregate Class |