|
Links
1.
Introduction
2. ATLaS SQL on Tables
3. UDAs
4. Table Functions
5. Programming in ATLaS
5.1 Recursion
5.2 ROLAPs
5.3 References
5.4 Apriori Algorithm
6.External Functions
6.1 Scalar Functions
6.2 Table Functions
6.3 Built-ins
References
ATLaS Syntax
Program Structure
Table Dcl
SQL Statements
UDAs
Table Functions
|
External Function
ATLaS supports both scalar external functions and table
external functions.
6.1 Scalar
Functions
String(char*) Return Type
If the return type is "char*" in an external function written in C/C++, we require the users to allocate the return space as following:
extern char * _allocateResultSpace(int size);
char* ext(...){
char* result = _allocateResultSpace(STR_SIZE); // replace STR_SIZE with your actual string size
...
return result;
}
Then you will need to link it with adllib.a. (See below Unix/Linux)
Unix/Linux
To declare an external function to be dynamically loaded
into ATLaS, we use the following syntax:
|
external int ginif(a int) in 'gini.so';
|
The above statement declares a UDF ginif
which takes one integer-type parameter and returns an integer result.
This function is supported by a shared library, ’gini.so’.
We can use C or any other language to create functions
in shared libraries. On a UNIX system, the following command compiles
C source code to dynamical libraries.
-
To compile UDFs writen in C,
gcc -shared -o gini.so -fPIC gini.c
If you use string return type described above, you will need to link it as:
gcc -shared -o gini.so -fPIC -I atlas_path/include/adllib.h gini.c atlas_path/lib/libadl.a
(Change atlas_path to your Atlas path.)
-
To compile UDFs writen in C++,
g++ -shared -o gini.so -fPIC gini.cc
If you use string return type described above, you will need to link it as:
g++ -shared -o gini.so -fPIC -I atlas_path/include/adllib.h gini.cc atlas_path/lib/libadl.a
(Change atlas_path to your Atlas path.)
NOTE: For functions writen in C++, you have to include an extern definition in the .cc file, for example for the gini function include,
extern "C" int gini(int);
|
Once defined, the UDF can be used in ATLaS. For instance:
|
select gini(a) from test;
|
In order to dynamically load the library, the OS must
be able to find it. In UNIX, the OS searchs for the library in all the
paths specified by the environment variable LD_LIBRARY_PATH.
Windows
On Windows, we should use DLL file. E.g:
external int ginif(a int) in 'gini.dll';
You should either put 'gini.dll' into the current path, or specify the full path in the above statement.
-
To build DLL file from C file, you could run the following script:
E.g. c:\atlas\compileDLL c:\atlas\dll\gini.c
Or if you installed CREditor and its script, there is a tool "BuildDLL" under "Tools" menu.
-
To build DLL file from C++ file, you could run the following script:
E.g. c:\atlas\compileDLLCPP c:\atlas\dll\gini.cc
6.2 Table
Functions
Unix/Linux
In much the same way, we can use external UDF as table
functions.
For instance, we want to stream through the first K Fibonacci
numbers. It is not difficult to write a C function to generate the Fibonacci
numbers. The following ATLaS program demonstrates how to use such an external
table function.
|
external table (i int, f int) fib(k int) in 'tabf.so';
select t.i, t.f
from table (fib(10)) as t;
|
In order to declare an external table function, we must
use TABLE as the return type. The above declaration indicates ’fib’
is an external function found in shared library ’tabf.so’,
and ’fib’ returns a stream of tubples (i,f), where f is the
i-th Fibonacci number. Then, in the following query, we stream through
the first 10 Fibonacci numbers by calling ’table (fib(10))’.
How do we implement a table function in C? Unlike stateless
scalar functions, table functions must keep their internal state between
calls. More specifically, the function must be able to: i) determine the
first call from subsequent calls; ii) tell the caller whether a tuple
is successfully returned; iii) use a mechanism to return tuples to the
caller. As an example, the following code implements function ’fib’:
|
#include <db.h>
struct result {
int a;
int b;
};
int fib(int first_entry, struct result *tuple, int k)
{
static int count;
static int last;
static int next;
if (first_entry == 1) {
count = 0;
next=1;
last=0;
}
if (count++ <k) {
tuple->a = count;
tuple->b = last;
last = next;
next = next+tuple->b;
return 0;
} else {
return DB_NOTFOUND;
}
}
|
In addition to the arguments (here is ’k’)
passed to the table function, we have 2 extra arguments: i) first_entry,
if first_entry=1 then it is the first call; ii) tuple, which is a pointer
to a structure where results are to be stored. External table functions
always return an integer value, 0 if successful, DB_NOTFOUND otherwise.
A possible use of table functions is to scan file system
data, and return results to the database system after filtering. Our test
indicates that on a linux system, external table functions accessing file
system data is almost 100 times faster than accessing the same data in
the Berkeley DB format.
Windows
Please refer to Sec. 6.1 above
6.3 Built-in
Aggregates and Functions
ATLaS supports the standard builtin aggregates: min(),
max(), sum(),
avg(), and count().
ATLaS supports the following builtin functions: (they
are being added constantly.)
- srand(INT) : INT
The srand() function sets its argument as the seed for a new sequence
of pseudo-random integers to be returned by rand(). These sequences
are repeatable by calling srand() with the same seed value. srand()
always returns 0.
- rand() : REAL
The rand() function returns a pseudo-random real between 0 and 1. The
following code set 10 as a random seed, and displays two random values.
|
VALUES(srand(10));
VALUES(rand(), rand());
|
- sqrt(REAL) : REAL
The sqrt(x) function returns the non-negative square root of x.
- timeofday() : CHAR(20)
The gettimeofday function gets the system’s notion of the current
time. The current time is expressed in elapsed seconds and microseconds
since 00:00 Universal Coordinated Time, January 1, 1970. It returns
a string in the form of x’y”, where x is the seconds and
y is the microseconds. This function is maily used to measure the performance
of ATLaS queries, as in the following example:
|
INSERT INTO stdout VALUES(timeofday());
... some ATLaS queries ...
INSERT INTO stdout VALUES(timeofday());
|
- concat(CHAR(), CHAR()) : CHAR()
Concatenate two strings.
|
|