Overview

All TextMaven applications are configured by a single configuration file. It specifies, the dictionaries available, the database server, and the output writers available. As default, the file config.xml is used. A configuration file different than the default can be specified by setting the Java system property textmaven.configuration. This done with the -D switch of the Java program.

For example:

java -Xmx256m -Dtextmaven.configuration=your-config-file.xml ...

Elements and Structure

Below a sample configuration file is depicted. The tags are explained below step by step.

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE TMConfig SYSTEM "file:./config.dtd">

<TMConfig> 
    <!-- Separator by which CSV columns are separated. -->
    <separator></separator>
    
    <!-- Class driving the translation process -->
    <processor classname="textmaven.application.translator.Processor"/>
   
    
    <!-- Specifies all available servers -->
    <servers>
        <server classname="textmaven.dictionaries.Server" 
                id="1">
            <host>localhost</host>
            <jdbcDSN>jdbc:mysql</jdbcDSN>
            <jdbcDriver>com.mysql.jdbc.Driver</jdbcDriver>
       	    <dbName>lexbase</dbName>
            <user>textmaven</user>
            <password>w3(2wgm)69ghaw!%</password>
            <port>3306</port>
            <stopCommand>net stop mysql</stopCommand>
            <startCommand>net start mysql</startCommand>
            <description>MySQL database server</description>
            <sqlFactory>textmaven.dictionaries.MySQLStatementFactory</sqlFactory>
        </server>
    </servers>
 
    <!-- Dictionary section. Each dictionary listed separately -->
    <dictionaries>
    	
    	<!-- Default dictionary to use -->
    	<default>seqAll</default>
    
  	<fileDictionary classname="textmaven.dictionaries.DictOrgDictionary" id="wn">
            <description>Webster</description>
            <compressed>true</compressed>
            <fileName>%textmaven.home%\dictdb\wn</fileName>
            <language>en</language>
        </fileDictionary>

  	<xmlDictionary classname="textmaven.dictionaries.XMLDictionary" id="gcide_xml">
            <description>Gcide as XML</description>
            <fileName>%textmaven.home%\dictdb\gcide.xml</fileName>
            <language>en</language>
        </xmlDictionary>

        <dbDictionary classname="textmaven.dictionaries.DBDictionary" 
                      id="gcide_db">
            <language>en</language>
            <serverRef>1</serverRef>
            <tableName>gcide</tableName>
            <autoCommit>false</autoCommit>
            <description>Gcide</description>
            <indexFile>ndx_gcide_db.txt</indexFile>
        </dbDictionary>

  	<csvDictionary classname="textmaven.dictionaries.CSVDictionary" id="wn_cvs">
            <description>Webster as CSV</description>
            <fileName>wn.csv</fileName>
            <separator>°</separator>
            <entryTerminator>$</entryTerminator>
            <language>en</language>
        </csvDictionary>

	

        <!-- Composite dictionaries define specific search strategies. Two strategies
             are supported: 
                 sequential - dictionaries are searched in the specified order until a word was found
                 union - all dictionaries are searched and all matches are returned.
        -->
        <compositeDictionary classname="textmaven.dictionaries.DictionarySequence"
                             id="seqAll">
            <description>Default composite dictionary sequential access</description>
            <language>en</language>
            <dictionary>wn</dictionary>
            <dictionary>gcide</dictionary>
            <dictionary>en_ge</dictionary>
        </compositeDictionary>

        
    </dictionaries>
    
    <!-- Writers format their translations -->
    <writers>
        <writer classname="textmaven.application.translator.writer.BrowserTranslationWriter"
             id="browser"/>
        <writer .../> 

    </writers>
    
    <!-- Stemmers will be associated with the composite dictionary, if languages are matching -->
    <stemmers>
        <stemmer classname="textmaven.stemmer.Stemmer_en">
            <language>en</language>
        </stemmer>
    </stemmers>
</TMConfig>

General parameters

<TMConfig> 
    <!-- Separator by which CSV columns are separated. -->
    <separator></separator>1 
    
    <!-- Class driving the translation process -->
    <processor classname="textmaven.application.translator.Processor"/>2 

	...
</TMConfig>
1Defines the separator which is used for all CSV files.
2Defines the class implementing the IProcessor interface which is the main class driving the Translator. The class is specified by the attribute classname.

Server configuration

<TMConfig> 

    ...
    
    <!-- Specifies all available servers -->
    <servers>
        <server classname="textmaven.dictionaries.Server" 1
                id="1" 2>
            <host>localhost</host>3
            <jdbcDSN>jdbc:mysql</jdbcDSN>4
            <jdbcDriver>com.mysql.jdbc.Driver</jdbcDriver>5
       	    <dbName>lexbase</dbName>6
            <user>textmaven</user>7
            <password>w3(2wgm)69ghaw!%</password>8 
            <port>3306</port>9
            
            <stopCommand>net stop mysql</stopCommand>10
            <startCommand>net stop mysql</startCommand>11
            <description>MySQL database server</description>
            <sqlFactory>textmaven.dictionaries.MySQLStatementFactory</sqlFactory>12
        </server>
    </servers>

    ...
</TMConfig>
1Specifies the name of the class implementing this server.
2Unique server wide identification which is used to reference this server.
3Name of the host where the server is running.
4JDBC data source name.
5Classname of the JDBC driver.
6Name of the database.
7Database username. This name is defined by the script setupdb.bat
8Password. Defined by the script setupdb.bat
9Port where the database server is listening. This is the default MySQL server port.
10Specifies the command which is executed to start the server.
11Specifies the command which is executed to stop the server.
12Factory class for the database used. This class has to implement the interface textmaven.dictionaries.ISQLStatementFactory.

Dictionary configuration

<TMConfig> 

    ...
    
    <dictionaries>
    	
    	<default>seqAll</default>1
    
  	<fileDictionary classname="textmaven.dictionaries.DictOrgDictionary" id="wn">2
            <description>Webster</description>
            <compressed>true</compressed>
            <fileName>%textmaven.home%\dictdb\wn</fileName>
            <language>en</language>
        </fileDictionary>

  	<xmlDictionary classname="textmaven.dictionaries.XMLDictionary" id="gcide_xml">3
            <description>Gcide as XML</description>
            <fileName>%textmaven.home%\dictdb\gcide.xml</fileName>
            <language>en</language>
        </xmlDictionary>

        <dbDictionary classname="textmaven.dictionaries.DBDictionary" 
                      id="gcide_db">4
            <language>en</language>
            <serverRef>1</serverRef>
            <tableName>gcide</tableName>
            <autoCommit>false</autoCommit>
            <description>Gcide</description>
            <indexFile>ndx_gcide_db.txt</indexFile>
        </dbDictionary>

  	<csvDictionary classname="textmaven.dictionaries.CSVDictionary" id="wn_cvs">5
            <description>Webster as CSV</description>
            <fileName>wn.csv</fileName>
            <separator>°</separator>
            <entryTerminator>$</entryTerminator>
            <language>en</language>
        </csvDictionary>

	

        <!-- Composite dictionaries define specific search strategies. Two strategies
             are supported: 
                 sequential - dictionaries are searched in the specified order until a word was found
                 union - all dictionaries are searched and all matches are returned.
        -->
        <compositeDictionary classname="textmaven.dictionaries.DictionarySequence"
                             id="seqAll">6
            <description>Default composite dictionary sequential access</description>
            <language>en</language>
            <dictionary>wn</dictionary>
            <dictionary>gcide</dictionary>
            <dictionary>en_ge</dictionary>
        </compositeDictionary>


</TMConfig>

Common attributes and subtags:

Attribute idSpecifies a unique dictionary identification which can be used to reference this dictionary.
Attribute classnameSpecifies the class implementing the dictionary. All dictionaries need to implement the interface IDictionary.
Tag descriptionProvides some details about the dictionary which is used at places where this more verbose form is suitable.
Tag languageSpecifies the language in which the lookup keys are specified. The specified language determines which stemmer is used.

1Specifies the identification of the dictionary to be used by default.
2Dictionaries provided by www.dict.org. Those dictionaries are read-only. Access to them is usually quite slow. The tag filename specifies the dictionary filename without any extension, while the tag comressed is usually set to true indicating that the dictionary is compressed. The full filename is built by appending the extension dict and dz. The latter is only appended if the dictionary is compressed. For each dictionary it is assumed that an index exist in the same directory. The index filename is built from the specified filename and the extension index.
3Dictionaries specified in XML format. XML dictionaries can also be written. But since they are read completly into memory and access is rather slow they are better suited for database independent transport.
4Database backed dictionary. This is the fastest dictionary which is also writeable. The tag serverRef specifies the identification of the server which is hosting the dictionary. tableName specifies the name of the table storing the dictionary. In order to accelerate the stemming process an index can be specified which is loaded into memory. It is strongly suggested to always specify an index file! If autoCommit is set to false, write operations will not be committed before the dictionary is closed. Setting this option to false speeds up converting a dictionary into a database backed dictionary.
5Dictionary in comma separated format.CSV dictionaries are read only and only provide sequential access. Like XML backed dictionaries they are typically used for database independent transport. The separator used can be specified in tag separator. However, it is necessary to ensure that this separator character does not occur in the file elsewhere. To cope with the situation that an entry can span more than one line each entry is ended by a terminator specified by the entryTerminator.
6A composite dictionary is a set of dictionaries. A composite dictionary can be used like any other dictionary with the exception that it is not writeable. Usually, a composite dictionary is used as default dictionary. Two types of composite dictionaries exist: Sequential implemented by the class textmaven.dictionaries.DictionarySequence, and union dictionaries implemented textmaven.dictionaries.DictionaryUnion. The former one, will search for keys in the order specified and stop when a translation for the specified key was found. A union composite will search through all contained dictionaries and return all translations found.

Writer

<TMConfig> 

    ...
    
    <!-- Writers format their translations -->
    <writers>
        <writer classname="textmaven.application.translator.writer.BrowserTranslationWriter"
             id="browser"/>1
        <writer .../> 

    </writers>
    
    ...
</TMConfig>
1A writer specified by its classname and a unique writer identification. Through its identification the writer can be referenced on the command line of the translator application. Writers implement the interface textmaven.application.writer.ITranslationWriter

Stemmer

<TMConfig> 

    ...
    
    <stemmers>
        <stemmer classname="textmaven.stemmer.Stemmer_en">1
            <language>en</language>
        </stemmer>
    </stemmers>
    
    ...
</TMConfig>
1Specifies the stemmer class implementing the interface textmaven.stemmer.IStemmer. For each language there exactly one stemmer is required. Dictionaries use the stemmer matching their language specification.