TEI Lite was the name adopted for what the TEI editors originally conceived of as a simple demonstration of how the TEI encoding scheme might be adopted to meet 90% of the needs of 90% of the TEI user community. In retrospect, it was predictable that many people should imagine TEI Lite to be all there is to TEI, or find TEI Lite to be far too heavy for their needs.
The original TEI Lite was based largely on observations of existing and previous practice in the encoding of texts, particularly as manifest in the collections of the Oxford Text Archive and in our own experience. It is therefore unsurprising that it seems to have become, if not a de facto standard, at least a common point of departure for electronic text centres and encoding projects world wide. Maybe the fact that we actually produced this shortish, readable, manual for it also helped.
Early adopters of TEI Lite included a number of ‘Electronic Text Centers’, many of whom produced their own documentation and tutorial materials (some examples are listed in the TEI Tutorials pages). It was also widely adopted as the basis for TEI-conformant authoring systems. Documentation introducing TEI Lite has been widely used for tutorial purposes and has been widely translated (see further the list of versions at http://www.tei-c.org/Lite/ ).
With the publication of TEI P4, the XML version of the TEI Guidelines, which uses the generation of TEI Lite as an example of the modification mechanism built into the TEI Guidelines, the opportunity was taken to produce a lightly revised XML-conformant version, but the present revision is the first substantively changed version since its first appearance in 1997. This revision takes advantage of the many new features introduced into the TEI Guidelines at release P5. A brief list of those changes likely to affect users of previous versions of this document is given below ( Substantive changes from the P4 version).
This document provides an introduction to the recommendations of the Text Encoding Initiative (TEI), by describing a specific subset of the full TEI encoding scheme. The scheme documented here can be used to encode a wide variety of commonly encountered textual features, in such a way as to maximize the usability of electronic transcriptions and to facilitate their interchange among scholars using different computer systems. It is fully compatible with the full TEI scheme, as defined by TEI document P5, Guidelines for Electronic Text Encoding and Interchange, as of February 2006, and available from the TEI Consortium website at http://www.tei-c.org .
The Text Encoding Initiative (TEI) Guidelines are addressed to anyone who wants to interchange information stored in an electronic form. They emphasize the interchange of textual information, but other forms of information such as images and sound are also addressed. The Guidelines are equally applicable in the creation of new resources and in the interchange of existing ones.
The Guidelines provide a means of making explicit certain features of a text in such a way as to aid the processing of that text by computer programs running on different machines. This process of making explicit we call markup or encoding. Any textual representation on a computer uses some form of markup; the TEI came into being partly because of the enormous variety of mutually incomprehensible encoding schemes currently besetting scholarship, and partly because of the expanding range of scholarly uses now being identified for texts in electronic form.
The TEI Guidelines describe an encoding scheme which can be expressed using a number of different formal languages. The first editions of the Guidelines used the Standard Generalized Markup Language (SGML); since 2002, this has been replaced by the use of the Extensible Markup Language (XML). These markup languages have in common the definition of text in terms of elements and attributes, and rules governing their appearance within a text. The TEI's use of XML is ambitious in its complexity and generality, but it is fundamentally no different from that of any other XML markup scheme, and so any general-purpose XML-aware software is able to process TEI-conformant texts.
The TEI was sponsored by the Association for Computers and the Humanities, the Association for Computational Linguistics, and the Association for Literary and Linguistic Computing, and is now maintained and developed by an independent membership consortium, hosted by four major Universities. Funding has been provided in part from the U.S. National Endowment for the Humanities, Directorate General XIII of the Commission of the European Communities, the Andrew W. Mellon Foundation, and the Social Science and Humanities Research Council of Canada. The Guidelines were first published in May 1994, after six years of development involving many hundreds of scholars from different academic disciplines worldwide. During the years that followed, the Guidelines were increasingly influential in the development of the digital library, in the language industries, and even in the development of the World Wide Web itself. The TEI consortium was set up in January 2001, and a year later produced an edition of the Guidelines entirely revised for XML compatibility. In 2004, it set about a major revision of the Guidelines to take full advantage of new schema languages, the first release of which appeared in 2005. This revision of the TEI Lite manual conforms to version 0.3 of this most recent edition of the Guidelines, TEI P5.
The present document describes a manageable selection from the extensive set of elements and recommendations resulting from those design goals, which is called TEI Lite.
In selecting from the several hundred elements defined by the full TEI scheme, we have tried to identify a useful ‘starter set’, comprising the elements which almost every user should know about. Experience working with TEI Lite will be invaluable in understanding the full TEI scheme and in knowing how to integrate specialized parts of it into the general TEI framework.
The reader may judge our success in meeting these goals for him or herself. At the time of first writing (1995), our confidence that we have at least partially done so is borne out by its use in practice for the encoding of real texts. The Oxford Text Archive uses TEI Lite when it translates texts from its holdings from their original markup schemes into SGML; the Electronic Text Centers at the University of Virginia and the University of Michigan have used TEI Lite to encode their holdings. And the Text Encoding Initiative itself uses TEI Lite, in its current technical documentation — including this document.
Although we have tried to make this document self-contained, as suits a tutorial text, the reader should be aware that it does not cover every detail of the TEI encoding scheme. All of the elements described here are fully documented in the TEI Guidelines themselves, which should be consulted for authoritative reference information on these, and on the many others which are not described here. Some basic knowledge of XML is assumed.
We begin with a short example, intended to show what happens when a passage of prose is typed into a computer by someone with little sense of the purpose of mark-up, or the potential of electronic texts. In an ideal world, such output might be generated by a very accurate optical scanner. It attempts to be faithful to the appearance of the printed text, by retaining the original line breaks, by introducing blanks to represent the layout of the original headings and page breaks, and so forth. Where characters not available on the keyboard are needed (such as the accented letter a in faàl or the long dash), it attempts to mimic their appearance.
All TEI-conformant texts contain (a) a TEI header (marked up as a teiHeader element) and (b) the transcription of the text proper (marked up as a text element). These two elements are combined together to form a single TEI element.
The TEI header provides information analogous to that provided by the title page of a printed text. It has up to four parts: a bibliographic description of the machine-readable text, a description of the way it has been encoded, a non-bibliographic description of the text (a text profile), and a revision history. The header is described in more detail in section 19 The Electronic Title Page.
A TEI text may be unitary (a single work) or composite (a collection of single works, such as an anthology). In either case, the text may have an optional front or back. In between is the body of the text, which, in the case of a composite text, may consist of groups, each containing more groups or texts.
In the remainder of this document, we discuss chiefly simple text structures. The discussion in each case consists of a short list of relevant TEI elements with a brief definition of each, followed by definitions for any attributes specific to that element, and a reference to any classes of which the element is a member. These references are linked to full specifications for each object, as given in the TEI Guidelines. In most cases, short examples are also given.
The type attribute on the div element may be used to supply a conventional name for this category of text division, or otherwise distinguish them. Typical values might be ‘book’, ‘chapter’, ‘section’, ‘part’, ‘poem’, ‘song’, etc. For a given project, it will usually be advisable to define and adhere to a specific list of such values.
A div element may itself contain further, nested, divs, thus mimicking the traditional structure of a book, which can be decomposed hierarchically into units such as parts, containing chapters, containing sections, and so on. TEI texts in general conform to this simple hierarchic model.
The xml:id attribute may be used to supply a unique identifier for the division, which may be used for cross references or other links to it, such as a commentary, as further discussed in section 8 Cross References and Links. It is often useful to provide an xml:id attribute for every major structural unit in a text, and to derive its values in some systematic way, for example by appending a section number to a short code for the title of the work in question, as in the examples below.
The n attribute may be used to supply (additionally or alternatively) a short mnemonic name or number for the division. If a conventional form of reference or abbreviation for the parts of a work already exists (such as the book/chapter/verse pattern of Biblical citations), the n attribute is the place to record it.
The xml:lang attribute may be used to specify the language of the division. Languages are identified by an internationally defined code, as further discussed in section 6.3 Foreign Words or Expressions below.
The rend attribute may be used to supply information about the rendition (appearance) of a division, or any other element, as further discussed in section 6 Marking Highlighted Phrases below. As with the type attribute, a project will often find it useful to predefine the possible values for this attribute, but TEI Lite does not constrain it in anyway.
These four attributes, xml:id, n, xml:lang, and rend are so widely useful that they are allowed on any element in any TEI schema: they are global attributes. Other global attributes defined in the TEI Lite scheme are discussed in section 8.3 Special kinds of Linking.
Note that the l element marks verse lines, not typographic lines: the original lineation of the first few lines above has not therefore been made explicit by this encoding, and may be lost. The lb element described in section 5 Page and Line Numbers may be used to mark typographic lines if so desired.
When working from a paginated original, it is often useful to record its pagination, if only to simplify later proof-reading. Recording the line breaks may be useful for the same reason; treatment of end-of-line hyphenation in printed source texts will require some consideration.
The pb and lb elements are special cases of the general class of milestone elements which mark reference points within a text. TEI Lite also includes a generic milestone element, which is not restricted to special cases but can mark any kind of reference point: for example, a column break, the start of a new kind of section not otherwise tagged, or in general any significant change in the text not marked by an XML element. The names used for types of unit and for editions referred to by the ed and unit attributes may be chosen freely, but should be documented in the header. The milestone element may be used to replace the others, or the others may be used as a set; they should not be mixed arbitrarily.
Highlighted words or phrases are those made visibly different from the rest of the text, typically by a change of type font, handwriting style, ink colour etc., which is intended to draw the reader's attention to some associated change.
The global rend attribute can be attached to any element, and used wherever necessary to specify details of the highlighting used for it. For example, a heading rendered in bold might be tagged <head rend="bold">, and one in italic <head rend="italic">.
Some features (notably quotations and glosses) may be found in a text either marked by highlighting, or with quotation marks. In either case, the elements q and gloss (as discussed in the following section) should be used. If the rendition is to be recorded, use the global rend attribute.
Interpreting the role of the highlighting, the sentence might look like this:On the one hand the Nibelungenlied is associated with the new rise of romance of twelfth-century France, the romans d'antiquité, the romances of Chrétien de Troyes, and the German adaptations of these works by Heinrich van Veldeke, Hartmann von Aue, and Wolfram von Eschenbach.
To record how a quotation was printed (for example, in-line or set off as a display or block quotation), the rend attribute should be used. This may also be used to indicate the kind of quotation marks used.
The creator of the electronic text must decide whether quotation marks are replaced by the tags or whether the tags are added and the quotation marks kept. If the quotation marks are removed from the text, the rend attribute may be used to record the way in which they were rendered in the copy text.
As with highlighting, it is not always possible and may not be considered desirable to interpret the function of quotation marks in a text in this way. In such cases, the tag <hi rend="quoted"> might be used to mark quoted text without making any claim as to its status.
As these examples show, the foreign element should not be used to tag foreign words if some other more specific element such as title, mentioned, or term applies. The global xml:lang attribute may be attached to any element to show that it uses some other language than that of the surrounding text.
zh or zho | Chinese | grc | Ancient Greek |
en | English | ell or el | Greek |
enm | Middle English | ja or jpn | Japanese |
fr or fra | French | la or lat | Latin |
de or deu | German | sa or san | Sanskrit |
The n attribute may be used to supply the number or identifier of a note if this is required. The resp attribute should be used consistently to distinguish between authorial and editorial notes, if the work has both kinds; otherwise, the TEI header should state which kind they are.
Explicit cross references or links from one point in a text to another in the same or another document may be encoded using the elements described in this section. Implicit links (such as the association between two parallel texts, or that between a text and its interpretation) may be encoded using the linking attributes discussed in section 8.3 Special kinds of Linking.
The difference between these two elements is that ptr is an empty element, simply marking a point from which a link is to be made, whereas ref may contain some text as well — typically the text of the cross-reference itself. The ptr element would be used for a cross reference which is to be indicated by some non-verbal means such as a symbol or icon, or in an electronic text by a button. It is also useful in document production systems, where the formatter can generate the correct verbal form of the cross reference.
The type attribute should be used (as above) to distinguish amongst different purposes for which these general purpose elements might be used in a text. Some other uses are discussed in section 8.3 Special kinds of Linking below.
So far, we have shown how the elements ptr and
ref may be used for cross-references or links whose targets
occur within the same document as their source. However, the same
elements may also be used to refer to elements in any other XML
document or resource, such as a document on the web, or a database
component. This is possible because the value of the
target attribute may be any valid universal resource
indicator (URI). A full definition of this term, defined by the
W3C (the consortium which manages the development and maintenance of
the World Wide Web), is beyond the scope of this tutorial: however,
the most frequently encountered version of a URI is the familiar
‘URL’ used to indicate a web page, such as
http://www.tei-c.org/index.xml
.
A URL may reference a web page or just a part of one, for example
http://www.tei-c.org/index.xml#SEC2
. The sharp sign
indicates that what follows it is the identifier of an element to be
located within the XML document identified by what precedes it: this
example will therefore locate an element which has an
xml:id attribute value of SEC2 within the
document retrieved from http://www.tei-c.org/index.xml
.
In the examples we have discussed so far, the part to the left of the
sharp sign has been omitted: this is understood to mean that the
referenced element is to be located within the current document.
Within a URL, parts of an XML document can be specified by means of other more sophisticated mechanisms, using a special language called Xpath, also defined by the W3C. This is particularly useful where the elements to be linked to do not bear identifiers and must therefore be located by some other means. A full specification of the language is well beyond the scope of this document; here we provide only a flavour of its power.
In the XPath language, locations are defined as a series of steps, each one identifying some part of the document, often in terms of the locations identified by the previous step. For example, you would point to the third sentence of the second paragraph of chapter two by selecting chapter two in the first step, the second paragraph in the second step, and the third sentence in the last step. A step can be defined in terms of the document tree itself, using such concepts as parent, descendent, preceding, etc. or, more loosely, in terms of text patterns, word or character positions.
The process of encoding an electronic text has much in common with the process of editing a manuscript or other text for printed publication. In either case a conscientious editor may wish to record both the original state of the source and any editorial correction or other change made in it. The elements discussed in this and the next section provide some facilities for meeting these needs.
LB
on the resp
attribute indicates that ‘LB’
corrected the duplication of for.The type attribute may be used to distinguish types of abbreviation by their function.
The TEI scheme defines elements for a large number of ‘data-like’ features which may appear almost anywhere within almost any kind of text. These features may be of particular interest in a range of disciplines; they all relate to objects external to the text itself, such as the names of persons and places, numbers and dates. They also pose particular problems for many natural language processing (NLP) applications because of the variety of ways in which they may be presented within a text. The elements described here, by making such features explicit, reduce the complexity of processing texts containing them.
The name element by contrast is provided for the special case of referencing strings which consist only of proper nouns; it may be used synonymously with the rs element, or nested within it if a referring string contains a mixture of common and proper nouns.
Simply tagging something as a name is rarely enough to enable automatic processing of personal names into the canonical forms usually required for reference purposes. The name as it appears in the text may be inconsistently spelled, partial, or vague. Moreover, name prefixes such as van or de la, may or may not be included as part of the reference form of a name, depending on the language and country of origin of the bearer.
twenty-one
,
xxi
, and 21
) and their presentation is
language-dependent (e.g. English 5th becomes
Greek 5.; English 123,456.78
equals French
123.456,78). In natural-language processing or
machine-translation applications, it is often helpful to distinguish
them from other, more ‘lexical’ parts of the text.
In other applications, the ability to record a number's value in
standard notation is important. The num element provides
this possibility:
Where the internal structure of a list item is more complex, it may be preferable to regard the list as a table, for which special-purpose tagging is defined below ( 13 Tables).
Lists of bibliographic items should be tagged using the listBibl element, described in the next section.
He was a member of Parliament for Warwickshire in 1445, and died March 14, 1470 (according to Kittredge, Harvard Studies 5. 88ff).
For lists of bibliographic citations, the listBibl element should be used; it may contain a series of bibl elements.
Not all the components of a document are necessarily textual. The most straightforward text will often contain diagrams or illustrations, to say nothing of documents in which image and text are inextricably intertwined, or electronic resources in which the two are complementary.
Any textual information accompanying the graphic, such as a heading and/or caption, may be included within the figure element itself, in a head and one or more p elements, as may also any text appearing within the graphic itself. It is strongly recommended that a prose description of the image be supplied, as the content of a figDesc element, for the use of applications which are not able to render the graphic, and to render the document accessible to vision-impaired readers. (Such text is not normally considered part of the document proper.)
When a digitized version of the graphic concerned is available, it may be embedded at the appropriate point within the document in this way.
It is often said that all markup is a form of interpretation or analysis. While it is certainly difficult, and may be impossible, to distinguish firmly between ‘objective’ and ‘subjective’ information in any universal way, it remains true that judgments concerning the latter are typically regarded as more likely to provide controversy than those concerning the former. Many scholars therefore prefer to record such interpretations only if it is possible to alert the reader that they are considered more open to dispute, than the rest of the markup. This section describes some of the elements provided by the TEI scheme to meet this need.
A more general purpose segmentation element, the seg has already been introduced for use in identifying otherwise unmarked targets of cross references and hypertext links (see section 8 Cross References and Links); it identifies some phrase-level portion of text to which the encoder may assign a user-specified type, as well as a unique identifier; it may thus be used to tag textual features for which there is no provision in the published TEI Guidelines.
A seg element of one type (unlike the s element which it superficially resembles) can be nested within a seg element of the same or another type. This enables quite complex structures to be represented; some examples were given in section 8.3 Special kinds of Linking above. However, because it must respect the requirement that elements be properly nested, and may not cut across each other, it cannot cope with the common requirement to associate an interpretation with arbitrary segments of a text which may completely ignore the document hierarchy. It also requires that the interpretation itself be represented by a single coded value in the type attribute.
Moreover, interp is an empty element, which must be linked to the passage to which it applies either by means of the ana attribute discussed in section 8.3 Special kinds of Linking above, or by means of its own inst attribute. This means that any kind of analysis can be represented, with no need to respect the document hierarchy, and also facilitates the grouping of analyses of a particular type together. A special purpose interpGrp element is provided for the latter purpose.
For example, suppose that you wish to mark such diverse aspects of a text as themes or subject matter, rhetorical figures, and the locations of individual scenes of the narrative. Different portions of our sample passage from Jane Eyre for example, might be associated with the rhetorical figures of apostrophe, hyperbole, and metaphor; with subject-matter references to churches, servants, cooking, postal service, and honeymoons; and with scenes located in the church, in the kitchen, and in an unspecified location (drawing room?).
Although the focus of this document is on the use of the TEI scheme for the encoding of existing ‘pre-electronic’ documents, the same scheme may also be used for the encoding of new documents. In the preparation of new documents (such as this one), XML has much to recommend it: the document's structure can be clearly represented, and the same electronic text can be re-used for many purposes — to provide both online hypertext or browsable versions and well-formatted typeset versions from a common source for example.
To facilitate this, the TEI Lite schema includes some elements for marking features of technical documents in general, and of XML-related documents in particular.
A formatting application, given a text like that above, can be instructed to format examples appropriately (e.g. to preserve line breaks, or to use a distinctive font). Similarly, the use of tags such as ident greatly facilitates the construction of a useful index.
<
to represent each < character which marks
the start of an XML tag within the examples. A more
general solution is to mark off the whole body of each example as
containing data which is not to be scanned for XML mark-up by the
parser. This is achieved by enclosing it within a special XML
construct called a
CDATA
marked section, as
in the following example:
The list element used within the example above will not be regarded as forming part of the document proper, because it is embedded within a marked section (beginning with the special markup declaration <![CDATA[ , and ending with ]]>).
Note also the use of the gi element to tag references to element names (or generic identifiers) within the body of the text.
This example also demonstrates the use of the type attribute to distinguish the different kinds of division to be generated: in the first case a table of contents (a toc) and in the second an index.
When an existing index or table of contents is to be encoded (rather than one being generated) for some reason, the list element discussed in section 11 Lists should be used.
While production of a table of contents from a properly tagged document is generally unproblematic for an automatic processor, the production of a good quality index will often require more careful tagging. It may not be enough simply to produce a list of all parts tagged in some particular way, although extracting (for example) all occurrences of elements such as term or name will often be a good departure point for an index.
With the advent of XML and its adoption of Unicode as the required character set for all documents, most problems previously associated with the representation of the divers languages and writing systems of the world are greatly reduced. For those working with standard forms of the European languages in particular, almost no special action is needed: any XML editor should enable you to input accented letters or other ‘non-ASCII’ characters directly, and they should be stored in the resulting file in a way which is transferable directly between different systems.
There are two important exceptions: the characters & and < may not be
entered directly in an XML document, since they have a special
significance as initiating markup. They must always be represented as
entity references, like this: &
or
<
. Other characters may also be represented by
means of entity reference where necessary, for example to retain
compatibility with a pre-Unicode processing system.
For many purposes, particularly in older texts, the preliminary material such as title pages, prefatory epistles, etc., may provide very useful additional linguistic or social information. P5 provides a set of recommendations for distinguishing the textual elements most commonly encountered in front matter, which are summarized here.
Typeface distinctions should be marked with the rend attribute when necessary, as described above. Very detailed description of the letter spacing and sizing used in ornamental titles is not as yet provided for by the Guidelines. Changes of language should be marked by appropriate use of the lang attribute or the foreign element, as necessary. Names, wherever they appear, should be tagged using the name, as elsewhere.
Epistles which appear elsewhere in a text will, of course, contain these same elements.
A corpus or collection of texts, which share many characteristics, may have one header for the corpus and individual headers for each component of the corpus. In this case the type attribute indicates the type of header. <teiHeader type="corpus"> introduces the header for corpus-level information.
Determining exactly what constitutes a new edition of an electronic text is left to the encoder.
The extent statement describe the approximate size of a file.
The seriesStmt element groups information about the series, if any, to which a publication belongs. It may contain title, idno, or respStmt elements.
The notesStmt, if used, contains one or more note elements which contain a note or annotation. Some information found in the notes area in conventional bibliography has been assigned specific elements in the TEI scheme.
The refsDecl element is used to document the way in which any standard referencing scheme built into the encoding works. In its simplest form, it consists of prose description.
Linkage between a particular text and a category within such a taxonomy is made by means of the catRef element within the textClass element, as further described below.
The creation element is useful for documenting where a work was created, even though it may not have been published or recorded there.
The revisionDesc element provides a change log in which each change made to a text may be recorded. The log may be recorded as a sequence of change elements each of which contains a brief description of the change. The attributes date and who may be used to identify when the change was carried out and the agency responsible for it.
This revision of the TEI Lite schema conforms to the TEI P5 Guidelines, which makes a number of changes from the TEI P4 Guidelines underlying earlier versions of TEI Lite. The following brief list indicates some of the major changes which will be needed in existing TEI P4-conformant documents before they can be used with the new schema. A fuller list is in preparation for publication as a part of TEI P5: the items listed here relate specifically to changes in TEI Lite only.
http://www.tei-c.org/ns/1.0
The TEI Lite is a pure subset of the TEI. All of the elements defined in it are taken from the following standard TEI modules: tei, core, header, textstructure, figures, linking, analysis, and tagdocs.
The following elements from those modules are excluded from the schema: <ab>, <alt>, <altGrp>, <altIdent>, <analytic>, <attDef>, <attList>, <attRef>, <biblItem>, <biblStruct>, <binaryObject>, <broadcast>, <c>, <cb>, <cl>, <classSpec>, <classes>, <content>, <correction>, <datatype>, <defaultVal>, desc, <distinct>, <div1>, <div2>, <div3>, <div4>, <div5>, <div6>, <div7>, <egXML>, <elementSpec>, <equipment>, <equiv>, <exemplum>, <fsdDecl>, <floatingText>, <headItem>, <headLabel>, <hyphenation>, <imprimatur>, <interpretation>, <join>, <joinGrp>, <link>, <linkGrp>, <listRef>, <m>, <macroSpec>, <measure>, <meeting>, <memberOf>, <metDecl>, <metSym>, <moduleRef>, <moduleSpec>, <monogr>, <normalization>, <phr>, <postBox>, <postCode>, <quotation>, <recording>, <recordingStmt>, <remarks>, <schemaSpec>, <scriptStmt>, <segmentation>, <series>, <span>, <spanGrp>, <specDesc>, <specGrp>, <specGrpRef>, <specList>, <state>, <stdVals>, <street>, <stringVal>, <tag>, <timeline>, <valDesc>, <valItem>, <valList>, <variantEncoding>, <w>, <when>
Here is the TEI Lite schema itself :
att.global provides attributes common to all elements in the TEI encoding scheme. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Module | tei | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Members | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Attributes |
att.global.linking (@corresp, @synch, @sameAs, @copyOf, @next, @prev, @exclude, @select) att.global.analytic (@ana)
|
TEI: (TEI document) contains a single TEI-conformant document, comprising a TEI header and a text, either in isolation or as part of a teiCorpus element. |
abbr: (abbreviation) contains an abbreviation of any sort. |
add: (addition) contains letters, words, or phrases inserted in the text by an author, scribe, annotator, or corrector. |
addrLine: (address line) contains one line of a postal address. |
address: contains a postal address, for example of a publisher, an organization, or an individual. |
anchor: (anchor point) attaches an identifier to a point within a text, whether or not it corresponds with a textual element. |
appInfo: (application information) records information about an application which has edited the TEI file. |
application: provides information about an application which has acted upon the document. |
argument: A formal list or prose description of the topics addressed by a subdivision of a text. |
att: (attribute) contains the name of an attribute appearing within running text. |
att.ascribed: provides attributes for elements representing speech or action that can be ascribed to a specific individual. |
att.canonical: provides attributes which can be used to associate a representation such as a name or title with canonical information about the object being named or referenced. |
att.datable: provides attributes for normalization of elements that contain dates, times, or datable events. |
att.datable.w3c: provides attributes for normalization of elements that contain datable events using the W3C datatypes. |
att.declarable: provides attributes for those elements in the TEI Header which may be independently selected by means of the special purpose decls attribute. |
att.declaring: provides attributes for elements which may be independently associated with a particular declarable element within the header, thus overriding the inherited default for that element. |
att.dimensions: provides attributes for describing the size of physical objects. |
att.divLike: provides attributes common to all elements which behave in the same way as divisions. |
att.docStatus: provides attributes for use on metadata elements describing the status of a document. |
att.editLike: provides attributes describing the nature of a encoded scholarly intervention or interpretation of any kind. |
att.global.analytic: provides additional global attributes for associating specific analyses or interpretations with appropriate portions of a text. |
att.global.linking: defines a set of attributes for hypertext and other linking, which are enabled for all elements when the additional tag set for linking is selected. |
att.handFeatures: provides attributes describing aspects of the hand in which a manuscript is written. |
att.internetMedia: provides attributes for specifying the type of a computer resource using a standard taxonomy. |
att.interpLike: provides attributes for elements which represent a formal analysis or interpretation. |
att.measurement: provides attributes to represent a regularized or normalized measurement. |
att.naming: provides attributes common to elements which refer to named persons, places, organizations etc. |
att.placement: provides attributes for describing where on the source page or object a textual element appears. |
att.pointing: defines a set of attributes used by all elements which point to other elements by means of one or more URI references. |
att.ranging: provides attributes for describing numerical ranges. |
att.responsibility: provides attributes indicating who is responsible for something asserted by the markup and the degree of certainty associated with it. |
att.segLike: provides attributes for elements used for arbitrary segmentation. |
att.sourced: provides attributes identifying the source edition from which some encoded feature derives. |
att.spanning: provides attributes for elements which delimit a span of text by pointing mechanisms rather than by enclosing it. |
att.tableDecoration: provides attributes used to decorate rows or cells of a table. |
att.transcriptional: provides attributes specific to elements encoding authorial or scribal intervention in a text when transcribing manuscript or similar sources. |
att.translatable: provides attributes used to indicate the status of a translatable portion of an ODD document. |
att.typed: provides attributes which can be used to classify or subclassify elements in any way. |
author: in a bibliographic reference, contains the name(s) of the author(s), personal or corporate, of a work; for example in the same form as that provided by a recognized bibliographic name authority. |
authority: (release authority) supplies the name of a person or other agency responsible for making an electronic file available, other than a publisher or distributor. |
availability: supplies information about the availability of a text, for example any restrictions on its use or distribution, its copyright status, etc. |
back: (back matter) contains any appendixes, etc. following the main part of a text. |
bibl: (bibliographic citation) contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged. |
biblFull: (fully-structured bibliographic citation) contains a fully-structured bibliographic citation, in which all components of the TEI file description are present. |
biblScope: (scope of citation) defines the scope of a bibliographic reference, for example as a list of page numbers, or a named subdivision of a larger work. |
body: (text body) contains the whole body of a single unitary text, excluding any front or back matter. |
byline: contains the primary statement of responsibility given for a work on its title page or at the head or end of the work. |
cRefPattern: (canonical reference pattern) specifies an expression and replacement pattern for transforming a canonical reference into a URI. |
catDesc: (category description) describes some category within a taxonomy or text typology, either in the form of a brief prose description or in terms of the situational parameters used by the TEI formal textDesc. |
catRef: (category reference) specifies one or more defined categories within some taxonomy or text typology. |
category: contains an individual descriptive category, possibly nested within a superordinate category, within a user-defined taxonomy. |
cell: contains one cell of a table. |
change: summarizes a particular change or correction made to a particular version of an electronic text which is shared between several researchers. |
choice: groups a number of alternative encodings for the same point in a text. |
cit: (cited quotation) contains a quotation from some other document, together with a bibliographic reference to its source. In a dictionary it may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example. |
classCode: (classification code) contains the classification code used for this text in some standard classification system. |
classDecl: (classification declarations) contains one or more taxonomies defining any classificatory codes used elsewhere in the text. |
closer: groups together salutations, datelines, and similar phrases appearing as a final group at the end of a division, especially of a letter. |
code: contains literal code from some formal language such as a programming language. |
corr: (correction) contains the correct form of a passage apparently erroneous in the copy text. |
creation: contains information about the creation of a text. |
date: contains a date in any format. |
dateline: contains a brief description of the place, date, time, etc. of production of a letter, newspaper story, or other work, prefixed or suffixed to it as a kind of heading or trailer. |
del: (deletion) contains a letter, word, or passage deleted, marked as deleted, or otherwise indicated as superfluous or spurious in the copy text by an author, scribe, annotator, or corrector. |
desc: (description) contains a brief description of the object documented by its parent element, including its intended usage, purpose, or application where this is appropriate. |
distributor: supplies the name of a person or other agency responsible for the distribution of a text. |
div: (text division) contains a subdivision of the front, body, or back of a text. |
divGen: (automatically generated text division) indicates the location at which a textual division generated automatically by a text-processing application is to appear. |
docAuthor: (document author) contains the name of the author of the document, as given on the title page (often but not always contained in a byline). |
docDate: (document date) contains the date of a document, as given (usually) on a title page. |
docEdition: (document edition) contains an edition statement as presented on a title page of a document. |
docImprint: (document imprint) contains the imprint statement (place and date of publication, publisher name), as given (usually) at the foot of a title page. |
docTitle: (document title) contains the title of a document, including all its constituents, as given on a title page. |
edition: (edition) describes the particularities of one edition of a text. |
editionStmt: (edition statement) groups information relating to one edition of a text. |
editor: secondary statement of responsibility for a bibliographic item, for example the name of an individual, institution or organization, (or of several such) acting as editor, compiler, translator, etc. |
editorialDecl: (editorial practice declaration) provides details of editorial principles and practices applied during the encoding of a text. |
eg: (example) contains any kind of illustrative example. |
emph: (emphasized) marks words or phrases which are stressed or emphasized for linguistic or rhetorical effect. |
encodingDesc: (encoding description) documents the relationship between an electronic text and the source or sources from which it was derived. |
epigraph: contains a quotation, anonymous or attributed, appearing at the start of a section or chapter, or on a title page. |
expan: (expansion) contains the expansion of an abbreviation. |
extent: describes the approximate size of a text as stored on some carrier medium, whether digital or non-digital, specified in any convenient units. |
figDesc: (description of figure) contains a brief prose description of the appearance or content of a graphic figure, for use when documenting an image without displaying it. |
figure: groups elements representing or containing graphic information such as an illustration or figure. |
fileDesc: (file description) contains a full bibliographic description of an electronic file. |
foreign: (foreign) identifies a word or phrase as belonging to some language other than that of the surrounding text. |
formula: contains a mathematical or other formula. |
front: (front matter) contains any prefatory matter (headers, title page, prefaces, dedications, etc.) found at the start of a document, before the main body. |
funder: (funding body) specifies the name of an individual, institution, or organization responsible for the funding of a project or text. |
gap: (gap) indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible, invisible, or inaudible. |
geoDecl: (geographic coordinates declaration) documents the notation and the datum used for geographic coordinates expressed as content of the <geo> element elsewhere within the document. |
gi: (element name) contains the name (generic identifier) of an element. |
gloss: identifies a phrase or word used to provide a gloss or definition for some other word or phrase. |
graphic: indicates the location of an inline graphic, illustration, or figure. |
group: contains the body of a composite text, grouping together a sequence of distinct texts (or groups of such texts) which are regarded as a unit for some purpose, for example the collected works of an author, a sequence of prose essays, etc. |
head: (heading) contains any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc. |
hi: (highlighted) marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made. |
ident: (identifier) contains an identifier or name for an object of some kind in a formal language. |
idno: (identifier) supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way. |
index: (index entry) marks a location to be indexed for whatever purpose. |
interp: (interpretation) summarizes a specific interpretative annotation which can be linked to a span of text. |
interpGrp: (interpretation group) collects together a set of related interpretations which share responsibility or type. |
item: contains one component of a list. |
keywords: contains a list of keywords or phrases identifying the topic or nature of a text. |
l: (verse line) contains a single, possibly incomplete, line of verse. |
label: contains the label associated with an item in a list; in glossaries, marks the term being defined. |
langUsage: (language usage) describes the languages, sublanguages, registers, dialects, etc. represented within a text. |
language: characterizes a single language or sublanguage used within a text. |
lb: (line break) marks the start of a new (typographic) line in some edition or version of a text. |
lg: (line group) contains a group of verse lines functioning as a formal unit, e.g. a stanza, refrain, verse paragraph, etc. |
list: (list) contains any sequence of items organized as a list. |
listBibl: (citation list) contains a list of bibliographic citations of any kind. |
macro.anyXML: defines a content model within which any XML elements are permitted |
macro.limitedContent: (paragraph content) defines the content of prose elements that are not used for transcription of extant materials. |
macro.paraContent: (paragraph content) defines the content of paragraphs and similar elements. |
macro.phraseSeq: (phrase sequence) defines a sequence of character data and phrase-level elements. |
macro.phraseSeq.limited: (limited phrase sequence) defines a sequence of character data and those phrase-level elements that are not typically used for transcribing extant documents. |
macro.specialPara: ('special' paragraph content) defines the content model of elements such as notes or list items, which either contain a series of component-level elements or else have the same structure as a paragraph, containing a series of phrase-level and inter-level elements. |
macro.xtext: (extended text) defines a sequence of character data and gaiji elements. |
measureGrp: (measure group) contains a group of dimensional specifications which relate to the same object, for example the height and width of a manuscript page. |
mentioned: marks words or phrases mentioned, not used. |
milestone: marks a boundary point separating any kind of section of a text, typically but not necessarily indicating a point at which some part of a standard reference system changes, where the change is not represented by a structural element. |
model.addrPart: groups elements such as names or postal codes which may appear as part of a postal address. |
model.addressLike: groups elements used to represent a postal or e-mail address. |
model.applicationLike: groups elements used to record application-specific information about a document in its header. |
model.biblLike: groups elements containing a bibliographic description. |
model.biblPart: groups elements which represent components of a bibliographic description. |
model.catDescPart: groups component elements of the TEI Header Category Description. |
model.choicePart: groups elements (other than choice itself) which can be used within a choice alternation. |
model.common: groups common chunk- and inter-level elements. |
model.dateLike: groups elements containing temporal expressions. |
model.div1Like: groups top-level structural divisions. |
model.divBottom: groups elements appearing at the end of a text division. |
model.divBottomPart: groups elements which can occur only at the end of a text division. |
model.divGenLike: groups elements used to represent a structural division which is generated rather than explicitly present in the source. |
model.divLike: groups elements used to represent un-numbered generic structural divisions. |
model.divPart: groups paragraph-level elements appearing directly within divisions. |
model.divTop: groups elements appearing at the beginning of a text division. |
model.divTopPart: groups elements which can occur only at the beginning of a text division. |
model.divWrapper: groups elements which can appear at either top or bottom of a textual division. |
model.editorialDeclPart: groups elements which may be used inside editorialDecl and appear multiple times. |
model.egLike: groups elements containing examples or illustrations. |
model.emphLike: groups phrase-level elements which are typographically distinct and to which a specific function can be attributed. |
model.encodingDescPart: groups elements which may be used inside encodingDesc and appear multiple times. |
model.entryPart: groups elements appearing at any level within a dictionary entry. |
model.entryPart.top: groups high level elements within a structured dictionary entry |
model.frontPart: groups elements which appear at the level of divisions within front or back matter. |
model.gLike: groups elements used to represent individual non-Unicode characters or glyphs. |
model.global: groups elements which may appear at any point within a TEI text. |
model.global.edit: groups globally available elements which perform a specifically editorial function. |
model.global.meta: groups globally available elements which describe the status of other elements. |
model.glossLike: groups elements which provide an alternative name, explanation, or description for any markup construct. |
model.graphicLike: groups elements containing images, formulae, and similar objects. |
model.headLike: groups elements used to provide a title or heading at the start of a text division. |
model.hiLike: groups phrase-level elements which are typographically distinct but to which no specific function can be attributed. |
model.highlighted: groups phrase-level elements which are typographically distinct. |
model.imprintPart: groups the bibliographic elements which occur inside imprints. |
model.inter: groups elements which can appear either within or between paragraph-like elements. |
model.lLike: groups elements representing metrical components such as verse lines. |
model.labelLike: groups elements used to gloss or explain other parts of a document. |
model.limitedPhrase: groups phrase-level elements excluding those elements primarily intended for transcription of existing sources. |
model.listLike: groups list-like elements. |
model.measureLike: groups elements which denote a number, a quantity, a measurement, or similar piece of text that conveys some numerical meaning. |
model.milestoneLike: groups milestone-style elements used to represent reference systems. |
model.msItemPart: groups elements which can appear within a manuscript item description. |
model.msQuoteLike: groups elements which represent passages such as titles quoted from a manuscript as a part of its description. |
model.nameLike: groups elements which name or refer to a person, place, or organization. |
model.nameLike.agent: groups elements which contain names of individuals or corporate bodies. |
model.noteLike: groups globally-available note-like elements. |
model.pLike: groups paragraph-like elements. |
model.pLike.front: groups paragraph-like elements which can occur as direct constituents of front matter. |
model.pPart.data: groups phrase-level elements containing names, dates, numbers, measures, and similar data. |
model.pPart.edit: groups phrase-level elements for simple editorial correction and transcription. |
model.pPart.editorial: groups phrase-level elements for simple editorial interventions that may be useful both in transcribing and in authoring. |
model.pPart.transcriptional: groups phrase-level elements used for editorial transcription of pre-existing source materials. |
model.personPart: groups elements which form part of the description of a person. |
model.phrase: groups elements which can occur at the level of individual words or phrases. |
model.phrase.xml: groups phrase-level elements used to encode XML constructs such as element names, attribute names, and attribute values |
model.profileDescPart: groups elements which may be used inside profileDesc and appear multiple times. |
model.ptrLike: groups elements used for purposes of location and reference. |
model.publicationStmtPart: groups elements which may appear within the publicationStmt element of the TEI Header. |
model.qLike: groups elements related to highlighting which can appear either within or between chunk-level elements. |
model.quoteLike: groups elements used to directly contain quotations. |
model.resourceLike: groups non-textual elements which may appear together with a header and a text to constitute a TEI document. |
model.respLike: groups elements which are used to indicate intellectual or other significant responsibility, for example within a bibliographic element. |
model.segLike: groups elements used for arbitrary segmentation. |
model.sourceDescPart: groups elements which may be used inside sourceDesc and appear multiple times. |
model.stageLike: groups elements containing stage directions or similar things defined by the module for performance texts. |
model.teiHeaderPart: groups high level elements which may appear more than once in a TEI Header. |
model.titlepagePart: groups elements which can occur as direct constituents of a title page, such as docTitle, docAuthor, docImprint, or epigraph. |
name: (name, proper noun) contains a proper noun or noun phrase. |
note: contains a note or annotation. |
notesStmt: (notes statement) collects together any notes providing information about a text additional to that recorded in other parts of the bibliographic description. |
num: (number) contains a number, written in any form. |
opener: groups together dateline, byline, salutation, and similar phrases appearing as a preliminary group at the start of a division, especially of a letter. |
orig: (original form) contains a reading which is marked as following the original, rather than being normalized or corrected. |
p: (paragraph) marks paragraphs in prose. |
pb: (page break) marks the boundary between one page of a text and the next in a standard reference system. |
pc: (punctuation character) a character or string of characters regarded as constituting a single punctuation mark. |
postscript: contains a postscript, e.g. to a letter. |
principal: (principal researcher) supplies the name of the principal researcher responsible for the creation of an electronic text. |
profileDesc: (text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting. |
projectDesc: (project description) describes in detail the aim or purpose for which an electronic file was encoded, together with any other relevant information concerning the process by which it was assembled or collected. |
ptr: (pointer) defines a pointer to another location. |
pubPlace: (publication place) contains the name of the place where a bibliographic item was published. |
publicationStmt: (publication statement) groups information concerning the publication or distribution of an electronic or other text. |
publisher: provides the name of the organization responsible for the publication or distribution of a bibliographic item. |
q: (separated from the surrounding text with quotation marks) contains material which is marked as (ostensibly) being somehow different than the surrounding text, for any one of a variety of reasons including, but not limited to: direct speech or thought, technical terms or jargon, authorial distance, quotations from elsewhere, and passages that are mentioned but not used. |
quote: (quotation) contains a phrase or passage attributed by the narrator or author to some agency external to the text. |
ref: (reference) defines a reference to another location, possibly modified by additional text or comment. |
refState: (reference state) specifies one component of a canonical reference defined by the milestone method. |
refsDecl: (references declaration) specifies how canonical references are constructed for this text. |
reg: (regularization) contains a reading which has been regularized or normalized in some sense. |
relatedItem: contains or references some other bibliographic item which is related to the present one in some specified manner, for example as a constituent or alternative version of it. |
resp: (responsibility) contains a phrase describing the nature of a person's intellectual responsibility. |
respStmt: (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply. |
revisionDesc: (revision description) summarizes the revision history for a file. |
row: contains one row of a table. |
rs: (referencing string) contains a general purpose name or referring string. |
s: (s-unit) contains a sentence-like division of a text. |
said: (speech or thought) indicates passages thought or spoken aloud, whether explicitly indicated in the source or not, whether directly or indirectly reported, whether by real people or fictional characters. |
salute: (salutation) contains a salutation or greeting prefixed to a foreword, dedicatory epistle, or other division of a text, or the salutation in the closing of a letter, preface, etc. |
samplingDecl: (sampling declaration) contains a prose description of the rationale and methods used in sampling texts in the creation of a corpus or collection. |
scriptNote: describes a particular script distinguished within the description of a manuscript or similar resource. |
seg: (arbitrary segment) represents any segmentation of text below the ‘chunk’ level. |
seriesStmt: (series statement) groups information about the series, if any, to which a publication belongs. |
sic: (latin for thus or so ) contains text reproduced although apparently incorrect or inaccurate. |
signed: (signature) contains the closing salutation, etc., appended to a foreword, dedicatory epistle, or other division of a text. |
soCalled: contains a word or phrase for which the author or narrator indicates a disclaiming of responsibility, for example by the use of scare quotes or italics. |
sourceDesc: (source description) describes the source from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence. |
sp: (speech) An individual speech in a performance text, or a passage presented as such in a prose or verse text. |
speaker: A specialized form of heading or label, giving the name of one or more speakers in a dramatic text or fragment. |
sponsor: specifies the name of a sponsoring organization or institution. |
stage: (stage direction) contains any kind of stage direction within a dramatic text or fragment. |
table: contains text displayed in tabular form, in rows and columns. |
taxonomy: defines a typology used to classify texts either implicitly, by means of a bibliographic citation, or explicitly by a structured taxonomy. |
teiCorpus: contains the whole of a TEI encoded corpus, comprising a single corpus header and one or more TEI elements, each containing a single text header and a text. |
teiHeader: (TEI Header) supplies the descriptive and declarative information making up an electronic title page prefixed to every TEI-conformant text. |
term: contains a single-word, multi-word, or symbolic designation which is regarded as a technical term. |
text: contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample. |
textClass: (text classification) groups information which describes the nature or topic of a text in terms of a standard classification scheme, thesaurus, etc. |
time: contains a phrase defining a time of day in any format. |
title: contains a title for any kind of work. |
titlePage: (title page) contains the title page of a text, appearing within the front or back matter. |
titlePart: contains a subsection or division of the title of a work, as indicated on a title page. |
titleStmt: (title statement) groups information about the title of a work and those responsible for its intellectual content. |
trailer: contains a closing title or footer appearing at the end of a division of a text. |
typeNote: describes a particular font or other significant typographic feature distinguished within the description of a printed resource. |
unclear: contains a word, phrase, or passage which cannot be transcribed with certainty because it is illegible or inaudible in the source. |
val: (value) contains a single attribute value. |