|
Links
1. Introduction
2. ATLaS SQL on Tables
3. UDAs
4. Table Functions
5. Programming
in ATLaS
5.1 Recursion
5.2 ROLAPs
5.3 References &Data Structures
5.4 Apriori Algorithm
6.External Functions
6.1
Scalar Functions
6.2 Table Functions
6.3 Built-ins
References
ATLaS Syntax
Program Structure
Table Dcl
SQL Statements
UDAs
Table Function
|
User-Defined Aggregates
ATLaS supports the standard five aggregates count,
sum, avg, min, and max without the
DISTINCT option. But the real power of ATLaS
follows from its User-Defined Aggregates (UDAs) discussed next. As a first
example, we define an aggregate equivalent to the standard avg
aggregate in SQL.
Standard
Average The first line of this aggregate function declares
a local table, state, to keep the sum and
count of the values processed so far. While, for this particular example,
state contains only one tuple, it is in fact
a table that can be queried and updated using SQL statements and can contain
any number of tuples (see later examples). These SQL statements are grouped
into the three blocks labelled respectively INITIALIZE,
ITERATE, and TERMINATE.
To compute the average, the SQL statement in INITIALIZE
inserts the value taken from the input stream and sets the count
to 1. The ITERATE statement updates the table
by adding the new input value to the sum and 1 to the count. The TERMINATE
statement returns the final result(s) of computation by INSERT
INTO RETURN (to conform to SQL syntax, RETURN
is treated as a virtual table; however, it is not a stored table
and cannot be used in any other role):
| |
AGGREGATE myavg(Next Int)
: Real |
| |
{ |
TABLE state(sum Int,
cnt Int); |
| |
|
|
INSERT INTO state VALUES (Next,
1); |
| |
|
|
UPDATE state SET sum=sum+Next,
cnt=cnt+1; |
| |
|
|
INSERT INTO RETURN SELECT
sum/cnt FROM state; |
The basic initialize-iterate-terminate template used
to define the average aggregate of SQL-2, can now be used to defined powerful
new aggregates required by new database applications.
OnLine
AverageFor instance, there is much current interest in online
aggregates [2].
Since averages converge toward the correct value well before all the tuples
in the set have been visited, we can have an online aggregate that returns
the average-so-far every, say, 200 input tuples. (In this way, the user
or the calling application can stop the computation as soon as convergence
is detected.) Online averages can be expressed in ATLaS as follows:
| |
AGGREGATE online_avg(Next
Int) : Real |
| |
{ |
TABLE state(sum Int, cnt Int); |
| |
|
|
INSERT INTO state VALUES (Next,
1); |
| |
|
|
UPDATE state SET sum=sum+Next,
cnt=cnt+1; |
| |
|
|
SELECT sum/cnt FROM state
WHERE cnt % 200 = 0; |
Therefore, the online average program has been obtained
from the traditional average program by removing the statements from TERMINATE
and adding a RETURN statement to ITERATE.
Our UDA online_avg takes
a stream of values as input and returns a stream of values as output (one
every 200 tuples). In this example only one tuple is added to output by
the the INSER INTO RETURN statement; in general,
however, such statement can produce (a stream of) several tuples. Thus
ATLaS UDAs operate as general stream transformers.
ATLaS uses the same basic framework to define both traditional
aggregates and non-blocking aggregates. ATLaS UDAs are non-blocking when
their TERMINATE clause is either empty or
absent.
The typical default semantics for SQL aggregates is that
the data is first sorted according to the GROUP-BY
attributes; this is a blocking operation. However, ATLaS’s
default semantics for UDAs is that the data is pipelined through the INITIALIZE
and ITERATE clauses where the input
stream is transformed into the output stream: the only blocking operations
(if any) are those specified in TERMINATE,
and only take place at the end of the computation.
Calling
User-Defined Aggregates (UDAs) UDAs are called as any other
builtin aggregate. For instance, given a database table employee(Eno,
Name, Sex, Dept, Sal), the following statement computes the average
salary of employees in department 1024 by their gender:
| |
|
|
|
|
|
SELECT Sex, online_avg(Sal)
|
| |
|
|
|
|
|
FROM employee WHERE Dept=1024
GROUP BY Sex; |
Thus the results of the selection, defined by Dept=
1024, are pipelined to the aggregate in a stream-like fashion.
SQLCODE
This a convenient labor-saving device found in most SQL systems, that
comes very handy for the ATLaS programmer who wants to correlate a statement
with the next. SQLCODE is set to a positive value when the last statement
had a null effect, and to zero otherwise. Thus to tell the user that no
employee was found in department 1024, we can modify the previous program
as follows:
| |
|
|
|
|
|
SELECT Sex, online_avg(Sal)
|
| |
|
|
|
|
|
FROM employee WHERE Dept=1024
GROUP BY Sex; |
| |
|
|
|
|
|
select ’Nobody found
in that department’ |
In the last statement, the predicates in the WHERE
clause controls its conditional execution, in a fashion similar
to that of the IF clauses in a procedural
programming language. In fact, the ATLaS compiler recognizes, and optimizes
execution of, such conditional predicates.
|
|
|
|
Minima:
Points and Values. In the next Example, we have a sequence
of point-value pairs, and we define a minpair aggregate
that returns the point where a minimum occurs along with its value at
the minimum.
| AGGREGATE minpair(iPoint Int,
iValue Int): (mPoint Int, mValue Int) |
| { |
TABLE mvalue(value Int) MEMORY;
TABLE mpoints(point Int) MEMORY; |
| |
|
INSERT INTO mvalue VALUES
(iValue); |
| |
|
INSERT INTO mpoints VALUES(iPoint); |
| |
|
UPDATE mvalue SET value =
iValue WHERE iValue < value; |
| |
|
DELETE FROM mpoints WHERE
SQLCODE = 0; |
| |
|
INSERT INTO mpoints SELECT
iPoint FROM mvalue |
| |
|
|
|
|
|
|
|
|
|
|
WHERE iValue =mvalue.value; |
| |
|
INSERT INTO RETURN SELECT
point, value FROM mpoints, mvalue; |
Here have used two internal tables: the mvalue table
holds, as its only entry, the current min value, while points
holds all the points where this value occurs. In the ITERATE
statement we have used SQLCODE to ‘remember’
if the previous statement updated mvalue; this
is the situation in which the old value was larger than the new one and
the old points must be discarded.
Then, the last statement in ITERATE
adds the new iPoint to mpoints
if the input value is equal to the current min value. In the UDA
definition the formal parameters of the UDA function are treated as constants
in the SQL statements. Thus, this third INSERT
statement adds the constant value of iPoint to
the mpoints relation, provided that iValue
is the same as the value in mvalue--thus
the FROM and WHERE clauses
operate here as conditionals. The RETURN statement
returns the final list of min pairs as a stream.
For instance, say that we have a time series containing
the daily closing prices of certain stocks arranged in temporal sequence
(i.e. the table stock_prices,
below). Then the following program computes the local minima for each
stock:
| |
/* The declaration of AGGREGATE
minpair should go here*/
|
| |
TABLE stock_prices(Day
Int, Stock char(4), Cprice Real) |
|
source ’D:\mydabase\stock_prices’
|
| |
|
select Stock,
minpair(Day, Stock)
iPoint, minpair(Day, Stock)
iValue |
| |
|
group by Stock
ordered by Stock, minpair(Day, Stock)
iPoint |
Observe the use of “ ”
to identify the different components in the two-column tuples returned
by the aggregate minpair. Since temporal
data types are not yet supported in the current ATLaS version, we are
using integers to represent dates: thus May, 27, 1999 is represented as
19990527.
The next table summarizes the syntax for declaring new
aggregates.

| Figure 3.1: |
The declaration of UDAs |
Initializing
Tables and Combining Blocks. Let us now introduce the following
two syntactic variations of convenience supported in ATLaS:
- Tables declared in UDAs can be initialized as part
of their declaration, via an SQL statement. This executes at the time
when the first tuple is processed, thus the result is the same as if
the initialization had been executed in the INITIALIZE
block.
- Different blocks can be merged together when they
perform the same function. In the next example the INITIALIZE
and ITERATE blocks are merged together.
Our Online Averages UDA could also have been written
as follows:
| |
AGGREGATE online_avg(Next
Int) : Real |
| |
{ |
TABLE state(sum Int, cnt Int)
AS VALUES(0, 0); |
| |
|
|
UPDATE state SET sum=sum+Next,
cnt=cnt+1; |
| |
|
|
SELECT sum/cnt FROM state
WHERE cnt % 200 = 0; |
In the previous example, the statement has been omitted:
this is equivalent to writing ‘TERMINATE: {
} ’. An empty INITIALIZE statement
can also be omitted in a similar fashion.
The results produced by online averages depend on the
order in which the data is streamed through the UDA. This illustrate a
common situation in stream processing: the abstract semantics of the aggregate
used is order-independent, but approximations must be used because of
efficiency and real-time requirements (e.g., nonblocking computations);
often, the approximate UDA is order-dependent.
In other situations, no approximation is involved, and
the dependence on order follows from the very semantics of the UDA. For
instance, this is the case of the rising aggregate
described below.
Rising.
In addition to temporal extensions of standard aggregates (suggested homework:
write them in ATLaS), TSQL2 [7]
proposes this new aggregate to return the maximal time periods during
which a certain attribute values has been increasing monotonically. We
can apply this aggregate to our stock_prices(Day
Int, Stock char(4), Cprice Real) table to find the periods during
which different stocks have been rising, as follows:
| |
|
select Stock,
rising(Day, Cprice)
Start, rising(Day, Cprice)
End |
here rising is defined
as follows:
| |
AGGREGATE rising(iPoint Int,
iValue Real) : (Start Int, End Int) |
| |
{ TABLE
rperiod(First Int, Last Int, Value Real) MEMORY; |
| |
|
|
|
INSERT INTO rperiod VALUES
(iPoint, iPoint, iValue); |
| |
|
|
|
INSERT INTO return SELECT
First, Last |
| |
|
|
|
|
|
WHERE iValue <=
Value AND First < Last; |
| |
|
|
|
UPDATE rperiod SET Last=iPoint,
Value=iValue |
| |
|
|
|
UPDATE rperiod SET First=iPoint,
Last=iPoint, Value=iValue |
| |
|
TERMINATE:{
INSERT INTO return SELECT First, Last |
Therefore we have a sequence of time-value ordered by
increasing time. We store a zero length period iPoint,
iPoint whenever the new iValue is
not increasing (also at INITIALIZE). Also
a non-zero length period is currently held in rperiod
we return it. When the new iValue is
larger than the previous stored value, we advance the End
of the current period to the current point.
|
|