• Home | • Introduction | • joda Program | • jodad Server | • Project Page |
From the ioda project are four binaries available, all working on the same data format:
joda can index files from the file system and files coming from a master database, i.e. a SQL database. For the indexing of files joda can run standalone. It has functions to index multple functions recursively by one call from a directory tree. Optionally it can store some meta information with every file.
In the database config file (*.config) the parameter "useFileRef" decides which mode will be used. If set to a value of 1 (on) or 2 (on plus memory cache) joda will create a list (*.ref file) containing all names of the files which has been indexed. In this case joda itself assigns an ID value to every file. This id is stored with every (first occurence of a) word of the file and is used for all retrieving and deleting purposes.
In the other case - without file reference - no *.ref file will created and joda must get the ID for each record from the calling program. In common cases the ID is the primary or unique key of the database record to be archived.
Since release 1.2, "clone" config files are available. Using this mode, it is not longer necessary to create a config file for each database. In other words: if the options may be equal, one config file can be used for several databases. The syntax for using the clone mode is "realconfig:cloneconfig". I.e. if you have indexed several volumes of some data, like data2001...data2005, you can have one config "data.config" for all of them: "data:data2005".
In general joda stores the source ID of each words, the word position (in a word count value) and an additional, optional info byte. This needs 8 bytes for the first occurence of a word in a file and 2 bytes for all repeaters in the same file or record. With the parameter "useBigOccList" it is possible to store more meta data with every word. This needs more disk space and therefore reduces the performance a little. But in the case joda runs as a stand alone database (without a SQL master database) it could makes sense.
If using "bigOccList=10", joda stores a datestamp with every first word. This stamp is 16 bit value, internally counting days from Jan. 1st 1995.It enables joda for a datestamp filter while retrieving. So a datestamp range can given (from date ... to date).
If using "bigOccList=12", joda stores a datestamp like descibed above and additional 16 bits of information with each (first) word. So the formerly info word now gets into a double word value (32 bit). There are quite sophisticated bit oriented methods available to use this bits as filters for retrieving. Additional filter options, including regular expressions, are present in the mode with file reference lists.
Please remember that neither those complex bit operations nor storage and analysis of meta data and filenames are needed in many common cases. Especially if joda is running as a slave to a SQL database, the storing and retrieving functions are quite simple.
ioda-1.3-src.tar.bz2 Pascal, Perl, Python and PHP sources,
Makefile and C header file
ioda-1.3-bin.tar.bz2 Binaries and the library (.so), compiled for Linux on i586
ioda-1.3-docs.tar.bz2 Documentation in HTML format (this pages)
ioda-1.3-mediawiki_demo.tar.bz2 PHP files showing a possible way of the integration
of joda into Wikipedia/Mediawiki 1)
ioda-1.3-samples.tar.bz2 Examples for: config file, stoppword lists,
perl requester to the server jodad,
server requests, archiving tool scripts
ioda-charsets.tar.bz2 Charset tables for ISO-8859 databases handling UTF-8 queries
1) A complete ready-to-run example for using joda as Wikipedia search engine, using a fully indexed de-wikipedia from Oct 6 2005, is available for download from http://magnus.de/wikipedia/wikidemo.tar.bz2 (240 MB bz2 for Linux on i586).
• Home | • Introduction | • joda Program | • jodad Server | • Project Page |