ioda fulltext engine

• Home

• Introduction

• joda Program

• jodad Server

• Project Page

Introduction to the Interface Documentation of the ioda Binaries

From the ioda project are four binaries available, all working on the same data format:

"joda" is a command line program which offers the complete functionality of ioda. This interfase is completely described here.
"jodad" is the server version of joda and offers the main functions. Please see jodad_interface.
"libjodafulltext.so" is a library which can be used native from a C-program or - with the aid of import modules - by other languages. Ready-to-go import moduls for Perl, Python and PHP can be found in the ioda project files. The library offers the same main functions as the jodad server. (please see jodafulltext.h in the source package for the interface.).
jodacgi is a CGI program for simple access to the fulltext indext hrough a web interface. It offers only retrieving functions.The published program is only a trunk to demonstrate how it works basically. No HTML-formatting is done and the format of the meta data is only a suggestion. Some documentation can be found in the source file jodacgi.pas.

Preliminary Remarks:

joda can index files from the file system and files coming from a master database, i.e. a SQL database. For the indexing of files joda can run standalone. It has functions to index multple functions recursively by one call from a directory tree. Optionally it can store some meta information with every file.

In the database config file (*.config) the parameter "useFileRef" decides which mode will be used. If set to a value of 1 (on) or 2 (on plus memory cache) joda will create a list (*.ref file) containing all names of the files which has been indexed. In this case joda itself assigns an ID value to every file. This id is stored with every (first occurence of a) word of the file and is used for all retrieving and deleting purposes.

In the other case - without file reference - no *.ref file will created and joda must get the ID for each record from the calling program. In common cases the ID is the primary or unique key of the database record to be archived.

Since release 1.2, "clone" config files are available. Using this mode, it is not longer necessary to create a config file for each database. In other words: if the options may be equal, one config file can be used for several databases. The syntax for using the clone mode is "realconfig:cloneconfig". I.e. if you have indexed several volumes of some data, like data2001...data2005, you can have one config "data.config" for all of them: "data:data2005".

Extended Features (datestamp and bitfilters):

In general joda stores the source ID of each words, the word position (in a word count value) and an additional, optional info byte. This needs 8 bytes for the first occurence of a word in a file and 2 bytes for all repeaters in the same file or record. With the parameter "useBigOccList" it is possible to store more meta data with every word. This needs more disk space and therefore reduces the performance a little. But in the case joda runs as a stand alone database (without a SQL master database) it could makes sense.

If using "bigOccList=10", joda stores a datestamp with every first word. This stamp is 16 bit value, internally counting days from Jan. 1st 1995.It enables joda for a datestamp filter while retrieving. So a datestamp range can given (from date ... to date).

If using "bigOccList=12", joda stores a datestamp like descibed above and additional 16 bits of information with each (first) word. So the formerly info word now gets into a double word value (32 bit). There are quite sophisticated bit oriented methods available to use this bits as filters for retrieving. Additional filter options, including regular expressions, are present in the mode with file reference lists.

Please remember that neither those complex bit operations nor storage and analysis of meta data and filenames are needed in many common cases. Especially if joda is running as a slave to a SQL database, the storing and retrieving functions are quite simple.

Packages to download:

ioda-1.3-src.tar.bz2                      Pascal, Perl, Python and PHP sources, 
                                          Makefile and C header file

ioda-1.3-bin.tar.bz2                      Binaries and the library (.so), compiled for Linux on i586

ioda-1.3-docs.tar.bz2	                  Documentation in HTML format (this pages)

ioda-1.3-mediawiki_demo.tar.bz2           PHP files showing a possible way of the integration 
                                          of joda into Wikipedia/Mediawiki 1)

ioda-1.3-samples.tar.bz2                  Examples for: config file, stoppword lists, 
                                          perl requester to the server jodad,
                                          server requests, archiving tool scripts

ioda-charsets.tar.bz2                     Charset tables for ISO-8859 databases handling UTF-8 queries

1) A complete ready-to-run example for using joda as Wikipedia search engine, using a fully indexed de-wikipedia from Oct 6 2005, is available for download from http://magnus.de/wikipedia/wikidemo.tar.bz2 (240 MB bz2 for Linux on i586).

• Introduction