teiJournal will need to support a specific range of features that may be found in journal contributions. It's important to enumerate and describe these features before we begin coding. This list of features is based on our mechanical analysis of tags and attributes used in the ACH Abstracts and Scandinavian Canadian Journal projects, and also on the Tag Zoo sample document from Digital Humanities Quarterly.
A typical document would have the following basic hierarchical structure (in its XML form):
<TEI> tag)
<teiHeader> tag). This contains all metadata, including a formal <biblStruct> block with bibliographical information about the document. Once a print version has been published, this should also include start page and end page, so that pagination can be rendered as in the print version, even when producing a PDF of the individual contribution. Also included would be a keyword list showing topics, which can be used to generate a keyword index. The type of document (article, review etc.) will most likely be encoded as a rend attribute or the root <TEI> tag. Inside the <titleStmt> tag would be several types of title for the document, including main, sub, and running (shorter version of the main title for running title on page headers). There may be two running titles, one for left and one for right; if there's only one, then the other will be supplied from metadata in the journal volume's <teiHeader>. One type might designate the book being reviewed in the case of a review, and might then include a <biblStruct> element with details of the book.<text> tag).
<body> tag). This contains the main body of the text; content features are detailed in the sections below.<back> tag). This includes the bibliography/references for the article. It does NOT include footnotes, which are encoded inline.<div> tag). Although previous projects have used numbered divisions (<div1>, <div2> etc.), these are problematic in TEI, and are arguably redundant, since the nesting level can determine the hierarchical position of a <div>. Therefore major document divisions will be encoded as <div> tags, and headings with <head>, and nesting level can determine any special handling. The type attribute can be used to distinguish types of <div> if necessary. For instance, the first <div> might be type="abstract" to indicate that it's the abstract for the paper.<head> tag). Each <div> can contain a <head> tag, which will be styled/sized based on hierarchical depth.<epigraph> tag). This will typically occur after a <head> element, and it may contain a <cit> tag.<p> tag). Contained within divs, and typically the lowest level block element (although we will need to investigate situations such as embedded blockquotes here). Note that this does not automatically translate to XHTML <p>; typically, we will want to convert it to an XHTML <div> with a special class, to allow for other block elements inside it (blockquotes etc.).<ab> tag). This tag is useful for situations in which it's important to embed a block of text in such a way that it can be handled distinctly from paragraphs or blockquotes. The type attribute can be used to distinguish types of <ab> where necessary.<figure> tag). Images may be included in a variety of formats, including JPEG, PNG, PDF and SVG. Rendering (as inline, block, full-size or otherwise) can be controlled using the rend attribute.<table> tag). Tables will be supported in a basic form (captions using the <head> tag, rows, and cells), with the rend attribute used to distinguish various key types, and the role attribute on sub-elements to distinguish between labels, data and so on. Previous projects have distinguished "grid" and "layout" types of table; we should anticipate others, and possibly allow for rend attributes on cells for alignment purposes.<list> and <item> tags). Lists of various types will be distinguished by the type attribute. These are possible types: force-numbering (where the n attribute on each item is used for itse number, instead of automatic numbering), lower-alpha, lower-roman, no-bullet, none, ordered, simple, unordered.<opener>, <dateline>, <byline>, and <salute>). Where contributions such as letters, notes, queries etc. are part of the journal, these elements will be useful in marking up such documents.These features would be found inside paragraphs, table cells etc.:
<choice>, <abbr> and <expan> tags). Tagging up abbreviations with expansions enables us to generate abbreviation keys, among other things. Abbreviations will be expanded using <choice> tags on their first usage; subsequently, they need only be tagged as <abbr>, and the XSLT can then retrieve the expansion from the previous instance.<lg> and <l> tags). Required for poetry.<caesura> tag). This is required for verse-forms such as Old English poetry.<cit>, <quote>, and <ref> tags). <ref> tags point to their reference, where appropriate, using their target attribute; this will typically point to the xml:id attribute of a <biblStruct> tag in the bibliography, like this: <ref target="#bloggs_2005">(Bloggs, 2005)</ref>. Inline quotes use a <quote> tag embedded in a <cit>, which also contains a <ref> tag; where there is no text attached to the quote which identifies the source (because the sentence structure of the text places it elsewhere, or it's understood), the <ref> tag will still be there, with its target, but it will have no content. . The <ref> tag might also point to an external location. The rend attribute can be used to specify whether a quote is rendered as a block or not; conversely, a length-based trigger in XSLT could create blockquotes automatically when a quote exceeds a certain length.<note> tag). Footnotes should be inserted inline; rendering will place them at the end of the text (for PDF) or as popups (for XHTML). Numbering is not required; it can be supplied automatically.<lb> tag). These are used for single-line breaks within paragraphs or other blocks.<emph> and <hi> tags). A typical usage would be emphasis intended to be rendered as italic. Italics are probably the default behaviour for the <emph> tag, but a rend attribute is available for other cases. <hi> is more generic, and is always used in conjunction with a rend attribute. Care should be taken to avoid using these tags where something more specific and categorizable is in play (such as would be captured by <mentioned> or <soCalled>). The <hi> element should also be used for drop-capitals; there is no reliable algorithm for placing drop-capitals only where they "work" in a pleasing way, so it's anticipated that this rendering feature will actually have to be specified in the markup.<mentioned>, <term> and <soCalled> tags). <mentioned> is used when a word or phrase is mentioned rather than used (as in a linguistic discussion of a vocabulary item). <soCalled> is used to establish authorial distance, while <term> specifies a technical term. Marking up the latter can be useful for building indexes of terms.<orig> and <gloss> tags). Any <term> or <mentioned> tag may have an xml:id attribute, which links it to the target attribute of a <gloss> element which contains an explanation of it. This may be useful when building glossaries of terms. <ptr> and <ref> and <ref> tags). Both of these tags can be used to specify a link to one or more XPointers, which are space-separated in the target attribute. The main difference is that <ref> allows content in the tag, so it would be used when the rendered text of the link might be different from the URI itself. Using <ptr> enables us to encode a link using only the URI; using <ref> enables us to link some text to a URI. In the case of the latter, PDF rendering for printing may need to show both, since clickable linking is not available in a print document. These tags can also be used for email addresses, using the mailto: protocol.<title> tag). This is standard TEI stuff, with the level attribute set to a, j, m, s, or u, as in the Guidelines.<name> tag). Names are marked up in structured contexts such as <biblStruct> elements, but may also be tagged elsewhere. They contain <forename> and <surname> elements which can be used to create a regularized form of the name for indexing or sorting purposes, and also enable the name to be displayed surname first or forename first, as required by the context.<date> tag). Dates appear in various places, some formal (such as in a <biblStruct>) and some informal. They should be tagged in contexts where it might be useful, using the when attribute to encode a standard format (YYYY-MM-DD, YYYY-MM, or YYYY).<code> tag). This should be able to be either block or inline, based on the rend attribute; in the case of block rendering, it might preserve whitespace (to allow for indenting) and contain <lb> tags.Reference information, especially in the form of a bibliography in the <back> of a document, will be marked up using a very formal, highly-structured <biblStruct> element. This will be documented in detail with a wide range of examples, as we have done in previous projects. Structure must be rigorously defined and applied, to allow for processing into the correct output formats to conform with APA (our first target), and later possibly MLA and Chicago. Bibliographical markup will be described and documented in detail here.