Chapter 6—The ATLaS User Manual

 


Links

1. Introduction
2. ATLaS SQL on Tables
3. UDAs
4. Table Functions
5. Programming in ATLaS
5.1 Recursion
5.2 ROLAPs
5.3 References
5.4 Apriori Algorithm
6.External Functions
6.1 Scalar Functions
6.2 Table Functions
6.3 Built-ins

References

ATLaS Syntax   Program Structure
  Table Dcl
  SQL Statements
  UDAs
  Table Functions

 

 

 

 

 

External Function

ATLaS supports both scalar external functions and table external functions.

6.1 Scalar Functions

  • String(char*) Return Type

    If the return type is "char*" in an external function written in C/C++, we require the users to allocate the return space as following:

      extern char * _allocateResultSpace(int size);
      char* ext(...){
        char* result = _allocateResultSpace(STR_SIZE);  // replace STR_SIZE with your actual string size
        ...
        return result;
      }
    

    Then you will need to link it with adllib.a. (See below Unix/Linux)

  • Unix/Linux

    To declare an external function to be dynamically loaded into ATLaS, we use the following syntax:

    external int ginif(a int) in 'gini.so';

    The above statement declares a UDF ginif which takes one integer-type parameter and returns an integer result. This function is supported by a shared library, ’gini.so’.

    We can use C or any other language to create functions in shared libraries. On a UNIX system, the following command compiles C source code to dynamical libraries.

    • To compile UDFs writen in C,

       gcc -shared -o gini.so -fPIC gini.c 

      If you use string return type described above, you will need to link it as:

       gcc -shared -o gini.so -fPIC -I atlas_path/include/adllib.h gini.c atlas_path/lib/libadl.a
      (Change atlas_path to your Atlas path.)
      

    • To compile UDFs writen in C++,

       g++ -shared -o gini.so -fPIC gini.cc 

      If you use string return type described above, you will need to link it as:

       g++ -shared -o gini.so -fPIC -I atlas_path/include/adllib.h gini.cc atlas_path/lib/libadl.a
      (Change atlas_path to your Atlas path.)
      

      NOTE: For functions writen in C++, you have to include an extern definition in the .cc file, for example for the gini function include,

      extern "C" int gini(int);

    Once defined, the UDF can be used in ATLaS. For instance:

    select gini(a) from test;

    In order to dynamically load the library, the OS must be able to find it. In UNIX, the OS searchs for the library in all the paths specified by the environment variable LD_LIBRARY_PATH.

  • Windows

    On Windows, we should use DLL file. E.g:

    external int ginif(a int) in 'gini.dll';

    You should either put 'gini.dll' into the current path, or specify the full path in the above statement.

    • To build DLL file from C file, you could run the following script:
      E.g. c:\atlas\compileDLL c:\atlas\dll\gini.c

      Or if you installed CREditor and its script, there is a tool "BuildDLL" under "Tools" menu.

    • To build DLL file from C++ file, you could run the following script:
      E.g. c:\atlas\compileDLLCPP c:\atlas\dll\gini.cc

6.2 Table Functions

  • Unix/Linux

    In much the same way, we can use external UDF as table functions.

    For instance, we want to stream through the first K Fibonacci numbers. It is not difficult to write a C function to generate the Fibonacci numbers. The following ATLaS program demonstrates how to use such an external table function.

    external table (i int, f int) fib(k int) in 'tabf.so';  
     
    select t.i, t.f  
    from table (fib(10)) as t;

    In order to declare an external table function, we must use TABLE as the return type. The above declaration indicates ’fib’ is an external function found in shared library ’tabf.so’, and ’fib’ returns a stream of tubples (i,f), where f is the i-th Fibonacci number. Then, in the following query, we stream through the first 10 Fibonacci numbers by calling ’table (fib(10))’.

    How do we implement a table function in C? Unlike stateless scalar functions, table functions must keep their internal state between calls. More specifically, the function must be able to: i) determine the first call from subsequent calls; ii) tell the caller whether a tuple is successfully returned; iii) use a mechanism to return tuples to the caller. As an example, the following code implements function ’fib’:

    #include <db.h>  
    struct result {  
      int a;  
      int b;  
    };  
    int fib(int first_entry, struct result *tuple, int k)  
    {  
      static int count;  
      static int last;  
      static int next;  
      if (first_entry == 1) {  
        count = 0;  
        next=1;  
        last=0;  
      }  
      if (count++ <k) {  
        tuple->a = count;  
        tuple->b = last;  
        last = next;  
        next = next+tuple->b;  
        return 0;  
      } else {  
        return DB_NOTFOUND;  
      }  
    }

    In addition to the arguments (here is ’k’) passed to the table function, we have 2 extra arguments: i) first_entry, if first_entry=1 then it is the first call; ii) tuple, which is a pointer to a structure where results are to be stored. External table functions always return an integer value, 0 if successful, DB_NOTFOUND otherwise.

    A possible use of table functions is to scan file system data, and return results to the database system after filtering. Our test indicates that on a linux system, external table functions accessing file system data is almost 100 times faster than accessing the same data in the Berkeley DB format.

  • Windows

    Please refer to Sec. 6.1 above

6.3 Built-in Aggregates and Functions

ATLaS supports the standard builtin aggregates: min(), max(), sum(), avg(), and count().

ATLaS supports the following builtin functions: (they are being added constantly.)

  • srand(INT) : INT
    The srand() function sets its argument as the seed for a new sequence of pseudo-random integers to be returned by rand(). These sequences are repeatable by calling srand() with the same seed value. srand() always returns 0.
  • rand() : REAL
    The rand() function returns a pseudo-random real between 0 and 1. The following code set 10 as a random seed, and displays two random values.

        VALUES(srand(10));  
     
        VALUES(rand(), rand());

  • sqrt(REAL) : REAL
    The sqrt(x) function returns the non-negative square root of x.
  • timeofday() : CHAR(20)

    The gettimeofday function gets the system’s notion of the current time. The current time is expressed in elapsed seconds and microseconds since 00:00 Universal Coordinated Time, January 1, 1970. It returns a string in the form of x’y”, where x is the seconds and y is the microseconds. This function is maily used to measure the performance of ATLaS queries, as in the following example:

        INSERT INTO stdout VALUES(timeofday());  
     
        ... some ATLaS queries ...  
     
        INSERT INTO stdout VALUES(timeofday());
  • concat(CHAR(), CHAR()) : CHAR()

    Concatenate two strings.