Main Menu

Powered by TEITOK
© Maarten Janssen, 2014-

TEITOK Help Pages

Setting up NeoTag

TEITOK comes with a built-in part-of-speech tagger called NeoTag. NeoTag can be used to automatically assign part-of-speech tags and a lemmatized form to each word in an XML document. NeoTag is merely the application, in order to run it needs a parameter set for the language of the document using the desired tagset.

The parameter set consists of a folder with files that have been collected from a training corpus in the same language and tagged with the same tagset. This has to be done with some scripts outside of the TEITOK system, or by downloading one of the parameter set available for NeoTag. Once the parameter folder is in place, you can indicate in the global settings where the parameter folder is located, as in the example below.

The folder attribute indicates the full path to the folder. Since there might be a need for a different parameter set for different XML files, there can be more than one parameters folder. This is for instance the case if there are manuscripts from different languages in the corpus, or manuscript using a different alphabet. In that case, the restriction attribute indicates under which condititions to use which parameters folder in XPath format. "//" means "for any XML file". To restrict a parameter folder to a specific type of XML file, one could use something like "//language[code='pt']" to apply the parameters folder only to XML files where there is an element <language> in the teiHeader with an attribute code with the value pt, which should in principle means that the parameter folder will be used for all manuscripts in the Portuguese language.

Back to index