All TextMaven applications are configured by a single configuration file. It specifies, the dictionaries available, the database server, and the output writers available. As default, the file config.xml is used. A configuration file different than the default can be specified by setting the Java system property textmaven.configuration. This done with the -D switch of the Java program.
For example:
java -Xmx256m -Dtextmaven.configuration=your-config-file.xml ...
Below a sample configuration file is depicted. The tags are explained below step by step.
<?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE TMConfig SYSTEM "file:./config.dtd"> <TMConfig> <!-- Separator by which CSV columns are separated. --> <separator></separator> <!-- Class driving the translation process --> <processor classname="textmaven.application.translator.Processor"/> <!-- Specifies all available servers --> <servers> <server classname="textmaven.dictionaries.Server" id="1"> <host>localhost</host> <jdbcDSN>jdbc:mysql</jdbcDSN> <jdbcDriver>com.mysql.jdbc.Driver</jdbcDriver> <dbName>lexbase</dbName> <user>textmaven</user> <password>w3(2wgm)69ghaw!%</password> <port>3306</port> <stopCommand>net stop mysql</stopCommand> <startCommand>net start mysql</startCommand> <description>MySQL database server</description> <sqlFactory>textmaven.dictionaries.MySQLStatementFactory</sqlFactory> </server> </servers> <!-- Dictionary section. Each dictionary listed separately --> <dictionaries> <!-- Default dictionary to use --> <default>seqAll</default> <fileDictionary classname="textmaven.dictionaries.DictOrgDictionary" id="wn"> <description>Webster</description> <compressed>true</compressed> <fileName>%textmaven.home%\dictdb\wn</fileName> <language>en</language> </fileDictionary> <xmlDictionary classname="textmaven.dictionaries.XMLDictionary" id="gcide_xml"> <description>Gcide as XML</description> <fileName>%textmaven.home%\dictdb\gcide.xml</fileName> <language>en</language> </xmlDictionary> <dbDictionary classname="textmaven.dictionaries.DBDictionary" id="gcide_db"> <language>en</language> <serverRef>1</serverRef> <tableName>gcide</tableName> <autoCommit>false</autoCommit> <description>Gcide</description> <indexFile>ndx_gcide_db.txt</indexFile> </dbDictionary> <csvDictionary classname="textmaven.dictionaries.CSVDictionary" id="wn_cvs"> <description>Webster as CSV</description> <fileName>wn.csv</fileName> <separator>°</separator> <entryTerminator>$</entryTerminator> <language>en</language> </csvDictionary> <!-- Composite dictionaries define specific search strategies. Two strategies are supported: sequential - dictionaries are searched in the specified order until a word was found union - all dictionaries are searched and all matches are returned. --> <compositeDictionary classname="textmaven.dictionaries.DictionarySequence" id="seqAll"> <description>Default composite dictionary sequential access</description> <language>en</language> <dictionary>wn</dictionary> <dictionary>gcide</dictionary> <dictionary>en_ge</dictionary> </compositeDictionary> </dictionaries> <!-- Writers format their translations --> <writers> <writer classname="textmaven.application.translator.writer.BrowserTranslationWriter" id="browser"/> <writer .../> </writers> <!-- Stemmers will be associated with the composite dictionary, if languages are matching --> <stemmers> <stemmer classname="textmaven.stemmer.Stemmer_en"> <language>en</language> </stemmer> </stemmers> </TMConfig>
<TMConfig> <!-- Separator by which CSV columns are separated. --> <separator></separator> <!-- Class driving the translation process --> <processor classname="textmaven.application.translator.Processor"/> ... </TMConfig>
<TMConfig> ... <!-- Specifies all available servers --> <servers> <server classname="textmaven.dictionaries.Server" id="1" > <host>localhost</host> <jdbcDSN>jdbc:mysql</jdbcDSN> <jdbcDriver>com.mysql.jdbc.Driver</jdbcDriver> <dbName>lexbase</dbName> <user>textmaven</user> <password>w3(2wgm)69ghaw!%</password> <port>3306</port> <stopCommand>net stop mysql</stopCommand> <startCommand>net stop mysql</startCommand> <description>MySQL database server</description> <sqlFactory>textmaven.dictionaries.MySQLStatementFactory</sqlFactory> </server> </servers> ... </TMConfig>
<TMConfig> ... <dictionaries> <default>seqAll</default> <fileDictionary classname="textmaven.dictionaries.DictOrgDictionary" id="wn"> <description>Webster</description> <compressed>true</compressed> <fileName>%textmaven.home%\dictdb\wn</fileName> <language>en</language> </fileDictionary> <xmlDictionary classname="textmaven.dictionaries.XMLDictionary" id="gcide_xml"> <description>Gcide as XML</description> <fileName>%textmaven.home%\dictdb\gcide.xml</fileName> <language>en</language> </xmlDictionary> <dbDictionary classname="textmaven.dictionaries.DBDictionary" id="gcide_db"> <language>en</language> <serverRef>1</serverRef> <tableName>gcide</tableName> <autoCommit>false</autoCommit> <description>Gcide</description> <indexFile>ndx_gcide_db.txt</indexFile> </dbDictionary> <csvDictionary classname="textmaven.dictionaries.CSVDictionary" id="wn_cvs"> <description>Webster as CSV</description> <fileName>wn.csv</fileName> <separator>°</separator> <entryTerminator>$</entryTerminator> <language>en</language> </csvDictionary> <!-- Composite dictionaries define specific search strategies. Two strategies are supported: sequential - dictionaries are searched in the specified order until a word was found union - all dictionaries are searched and all matches are returned. --> <compositeDictionary classname="textmaven.dictionaries.DictionarySequence" id="seqAll"> <description>Default composite dictionary sequential access</description> <language>en</language> <dictionary>wn</dictionary> <dictionary>gcide</dictionary> <dictionary>en_ge</dictionary> </compositeDictionary> </TMConfig>
Common attributes and subtags:
Attribute id | Specifies a unique dictionary identification which can be used to reference this dictionary. |
Attribute classname | Specifies the class implementing the dictionary. All dictionaries need to implement the interface IDictionary. |
Tag description | Provides some details about the dictionary which is used at places where this more verbose form is suitable. |
Tag language | Specifies the language in which the lookup keys are specified. The specified language determines which stemmer is used. |
Specifies the identification of the dictionary to be used by default. | |
Dictionaries provided by www.dict.org. Those dictionaries are read-only. Access to them is usually quite slow. The tag filename specifies the dictionary filename without any extension, while the tag comressed is usually set to true indicating that the dictionary is compressed. The full filename is built by appending the extension dict and dz. The latter is only appended if the dictionary is compressed. For each dictionary it is assumed that an index exist in the same directory. The index filename is built from the specified filename and the extension index. | |
Dictionaries specified in XML format. XML dictionaries can also be written. But since they are read completly into memory and access is rather slow they are better suited for database independent transport. | |
Database backed dictionary. This is the fastest dictionary which is also writeable. The tag serverRef specifies the identification of the server which is hosting the dictionary. tableName specifies the name of the table storing the dictionary. In order to accelerate the stemming process an index can be specified which is loaded into memory. It is strongly suggested to always specify an index file! If autoCommit is set to false, write operations will not be committed before the dictionary is closed. Setting this option to false speeds up converting a dictionary into a database backed dictionary. | |
Dictionary in comma separated format.CSV dictionaries are read only and only provide sequential access. Like XML backed dictionaries they are typically used for database independent transport. The separator used can be specified in tag separator. However, it is necessary to ensure that this separator character does not occur in the file elsewhere. To cope with the situation that an entry can span more than one line each entry is ended by a terminator specified by the entryTerminator. | |
A composite dictionary is a set of dictionaries. A composite dictionary can be used like any other dictionary with the exception that it is not writeable. Usually, a composite dictionary is used as default dictionary. Two types of composite dictionaries exist: Sequential implemented by the class textmaven.dictionaries.DictionarySequence, and union dictionaries implemented textmaven.dictionaries.DictionaryUnion. The former one, will search for keys in the order specified and stop when a translation for the specified key was found. A union composite will search through all contained dictionaries and return all translations found. |
<TMConfig> ... <!-- Writers format their translations --> <writers> <writer classname="textmaven.application.translator.writer.BrowserTranslationWriter" id="browser"/> <writer .../> </writers> ... </TMConfig>
<TMConfig> ... <stemmers> <stemmer classname="textmaven.stemmer.Stemmer_en"> <language>en</language> </stemmer> </stemmers> ... </TMConfig>