TEITOK Help Pages

XML Reader

Many corpus project will contain not only the actual corpus data, but also additional data that should be displayed on the site. Although this could be done with static HTML pages, in many cases there will be structural data that should be displayed, such as the database of the informants, a database of the members of the project, etc. To make it easy to include such structural data in a TEITOK project site, TEITOK provides a simple XML tool, which can be used to display, search, and even edit structural data stored in an XML file.

The XML files are not full-fledged XML, but rather a simple XML represenation of a speadsheet, with in principle structure like this:

cell contentcell content cell contentcell content

The XML file itself, say spreadsheet.xml, is kept in the Resources folder, and in the edit resource files, it can be edited using the ACE XML editor. But in order to properly use the XML within the XML reader, we need to define what each row should contain, and how to treat each column. This is done in a file called speadsheet-entry.xml. The design of the entry definition is quite simple, and looks as follows:

First Name Last Name

This definition tells us that our database contains XML nodes of the type row, each of which has two daughters, one with the tag name cell1 which contains the first name of the person described in the row, and one called cell2 which contains the last name. And if we create a list of all the records, both columns will be shown in the list, with the first name showing first since it is the first field in the entry description.

With this simple design, the XML reader allows you to list all the people in the spreadsheet.xml, search for people by first or last name, edit existing records (when logged in as staff), and add new records. You can click on the record to get more details (including also fields that are not shown in the list), and each column is named by the name it is given in the entry description. The user will in principle never see the node names row, cell1, or cell2 (yet it still makes more sense to give them meaningful names like <person/>, <first/>, and <last/>). And the XML file can be provided with additional content by hand, but the XML reader will ignore everything accept for the /*/row/cell1 and /*/row/cell2.

This design is very flexible but cannot handle either lists (there can be only one <cell1/> per row), nor can it handle nested data (all fields have to be directly below the //row). Lists are impossible by design, but in order to keep the XML file conform to a standard, it is possible to use a somewhat more complicated entry description using XPath commands. If our file contains person data in the TEI/XML format, and we want to keep track of the name and the mother tongue of each person, we can define that as follows:

Name Mother tongue

In the interface, the data are still treated as a flat table, but in the underlying XML file, each person record now has the data in the hierarchical design specified in the TEI specifications. We can also keep data in attributes rather than as fields by having the XPath point to an attribute.

Linked XML files

It is often recommendable to use an external XML file for data in a project - say if we want to keep a lot of information about the author of each text, and we have multiple texts by the same author. In that case, it is easier to use a file authors.xml where we centrally keep all author information, so that we do not have to keep and potentially change the author information for each individual text, which can easily lead to incoherence. So for all books written by William Sommerset Maugham, we keep a link to @corresp="authors.xml#WSM", which according to TEI P5 should be in //author/personName/@corresp.

In that set-up, we want to have the list of all related XML files display below the record details. We do this simply by indicating which CQP field each record corresponds to. So if the ID of each author is stored in text_authid, we indicate that the ID of each person in our XML file corresponds to that CQP field. Since we now have a meaningful ID field, we have to make sure it is editable as well, but we do not want to see it in the list. So if we also indicate the @code of the mother tongue, the entry definition authors-entry.xml looks as follows:

Identifier Name Mother tongue L1 code

With this, if we look at the details for the entry WSM in our authors.xml file, it will show the list of all XML files in the CQP version of the project that have WSM as their text_authid.

Back to index