Main Menu

Powered by TEITOK
© Maarten Janssen, 2014-

TEITOK Help Pages

Page-by-Page transcription

When transcribing a manuscript, the basic elements to work with are lines and pages. But in TEI, these elements are only "labels" - empty elements marking the beginning of a new page/line, and not XML nodes by themselves. This because they can cross with words or paragraphs, which are seen as the base units in a TEI document. This makes it difficult to transcribe a manuscript page-by-page. In order to solve this, TEITOK allows to use a different format during the transcription format, which takes pages and lines as basic units, and convert that format to the standard TEI format once the transcription is complete. This page-by-page pre-TEI format can be created from a manuscript PDF file using the Manuscript PDF to TEI tool, which is linked from the "create new XML file" option the admin menu. This tool takes as input a PDF file, and generates from it an empty XML file (with metadata) containing a page element for each page of the PDF linked to that page as its facsimile image.

In this pre-TEI format, the content of each page or line is not HTML, but text-based code. Each manuscript page is treated individually, and once a page is fully transcribed, it can be marked as done. Using the status button on the bottom you can get an overview of the status of each page in the manuscript. The tool automatically jumps to the first non-finished page (unless a specific page is selected). Once all the pages are transcribed, the tool will suggest to convert the pre-TEI format to proper TEI/XML. This is an irreversible process, so it should only done once the transcription is complete - although it is possible at any stage to abandon the page-by-page transcription and convert before the transcription is finished.

Transcription Conventions

To make transcription easier, in the transcription field (on the right of the image) you can use a couple of conventions. The transcription can be done in full-screen mode to use as much of the screen as possible for the facsimile image.

  • A newline can be used to generate an <lb/> element.
  • A vertical bar | can be used at the end of a line to indicate that the word continues beyond the end of the li.
  • Custom symbols can be used for hard-to-type characters, for instance using a ~ for a combining tilde.
  • Square brackets can be used for TEI tags (markdown-style annotation)

The use of markdown-style in the conversion works as follows:

  • All [tag:content] will be converted to <tag>content</tag> - so [del:word] can be used to mark "word" as deleted
  • All [tag@feature=value:content] will be converted to <tag feature="value">content</tag> for tags that need attributes (only one attribute is supported)
  • [i:txt], [b:txt], and [dc:A] can be used as abbreviations for italic, bold, and dropcaps (<hi rend="italic">txt</hi>, <hi rend="bold">txt</hi>, and <hi type="dropcap">txt</hi>)

Back to index