Interpreting differences among transcripts

C. M. Sperberg-McQueen, Black Mesa Technologies LLC

Claus Huitfeldt, University of Bergen

28 June 2018



Or: → Talks


  • logical structure of transcription
    not how to transcribe
  • rational reconstruction of concepts
  • digitization, editing, text encoding
  • semantics of markup
    How does one say, in logic, “This <sec> element transcribes a block of text in the manuscript identified in the MECS-Wit header”?

The story in a nutshell

* T reinstantiates the types of E.

With well defined exceptions, each token in T corresponds to a token in E and has the same type.

With well defined exceptions, each token in E corresponds to a token in T which has the same type.

Key concepts

  • type, token, marks; document, text
  • atomic and compound types and tokens
  • type inventory, type system
  • reading
  • transcription policy
  • inferred reading

There are many details we do not have time to discuss here, but the main idea of our model of transcription is simple: E contains tokens which instantiate types, and T contains corresponding tokens which re-instantiate those types.


  • Context
  • Examples and key concepts:
    • Readings, inferred readings (Wittgenstein)
    • Identification of tokens (atomic, compound) (Tumba Edithae)
    • Selection of tokens (Tumba Edithae, Addams)
    • Type systems, type equivalences (Sor Juana, Kahlo)
  • Proposed formalization:
    • Type inventories and systems
    • Readings (tokens, documents)
    • Transcription policy
    • Inferred reading
  • Conclusions

It can be helpful to restrict one's attention to a single transcript, and we have often done that. But when we have access to two transcripts A and B, the situation becomes a bit more complex. Just as our second operating system or our second programming language gives us a clearer understanding of what characteristics are common to all computers and which are peculiar to a particular OS or programming language, so we hope that examining cases of two transcripts will help us refine or at least illustrate usefully our model of transcription.

I've already outlined the context of our work; the bulk of the talk will be devoted to examples of transcripts which differ from each other, or do not, and which provide contradictory information about E, or do not. All four combinations prove possible.

If there is time, our conclusion will briefly sketch our current formalization.

Some notation

  • E = the exemplar
  • T = any transcript
  • A, B, C = several transcripts
  • [T] = unpublished transcript
  • *T = transcript constructed for this talk

For brevity, we adopt some simple conventions: T is a transcript; when we have two, they are A and B. E is the exemplar. And we mark unpublished transcripts and transcripts constructed for this talk, so you can check them for special pleading.

Example: Wittgenstein

[A] [B] C
munonyqi wunouyqi muuvnyzi

At most one of these can be correct.

The first letter may be m or w, but not (normally) both.

Wittgenstein, Geheimschrift

The passage is written in Geheimschrift. Simple substitution cipher: a=z b=y c=x ... l=p m=o n=n.

Figure 4. A word in Wittgenstein's Geheimschrift

A: “munonyqi” = “ofnmnbkr”

B: “wunouyqi” = “dfnmfbkr”

C: “muuvnyzi” = “offenbar”.

Moral: a transcript reflects a reading of E.

Moral: a reading maps tokens to types (inter alia).

Example: A tomb

[A] B









What is the writing on this tomb?

What is that mark?

Figure 5. West side of Tumba Edithae, Magdeburg

“DCCCC XLVIIo” vs “DCCCCXLVII”. Not a disagreement about the type of a token.

Disagreement about the existence of the token. (Which marks are tokens? Which are just marks?)

Moral: Readings involve identifying tokens as well as mapping them to types.

Where does this text start?

Start at north or at south?

Figure 8. North side of Tumba Edithae, Magdeburg (1 of 4)

Figure 9. South side of Tumba Edithae, Magdeburg (1 of 4)

Moral: Texts and documents are not (just) sets of characters; they have structure.

(At least sequence, probably more.)

Moral: Tokens and types are more than characters: words, sentences, paragraphs, ...

Which writing counts?

Neither A nor B transcribes the modern graffiti.

Moral: The focus of a transcription determines what parts of E are transcribed: all? some? which?

Figure 10. On north side of Tumba Edithae, Magdeburg

Example: altho

*A *B C



(Bracketed italics = editorial addition)


(Bracketed words = editorial intervention)

Q. What is in E?

A. A word spelled “altho”.

B. The word “although”, spelled “altho”.

C. Some form of the word “although”.

Marked editorial interventions

Figure 13. One word (altho) from a letter of Jane Addams

Moral: Some tokens in T do not claim to have exemplars in E.

Moral: Some tokens in T have exemplars in E, but not their constituents.

Moral: T can tell us a word is in E without telling us how it's spelled.

Example: Sor Juana (long s)

A *B



What is in E?

A: The word “vista”.

B: The word “vista” spelled with long s.

N.B. No contradiction here. Long-s and lowercase-s are not mutually exclusive types.

Type systems

Figure 16. Detail of a 17th-century printing of a sonnet of Sor Juana Ines de la Cruz

Moral: A reading of a document depends on the type system (= set of type inventories) used.

Moral: When different type inventories are used (e.g. graphemes vs allographs), different type mappings don't entail contradiction.

Underscoring and italics

*A *B

nos juntaremos ya para siempre

Underscoring in E rendered with italics.

nos juntaremos ya para siempre

Underscoring in E rendered with underscoring.

What is in E?

A, B: “nos juntaremos ya para siempre”, with “para siempre” underlined.

Example: Kahlo, Letter to Rivera

Figure 17. Frida Kahlo to Diego Rivera (detail)

Moral: Transcribers sometimes map one type in E to another in T.

Such type equivalences are not part of the reading of E, only part of the transcription policy of T.

Type inventories, type systems

A type inventory I is a set of mutually exclusive types.

We ascribe no properties to types beyond identity.

A type system P is a set of type inventories (e.g. characters, words, ...).

These may be disjoint or overlapping.

Reading of a token

A reading R of a token k with respect to a type inventory I is a tuple R = (k, I, p) where

  • k is the token being read
  • I is a type inventory
  • p is in I

Reading of document

A reading R of a document D is a tuple (D, K, P, M) where

  • D is the document being read
  • K is a set of tokens identified as being in D
  • P is a type system*
  • M (the mapping of R) is a set of triples (k, I, p), where
    • k is in K
    • I is in P
    • p is in I
    • No two triples have the same k and I.
    • There is at least one triple for every k in K.

Transcription policy

A transcription policy π is a triple (SE, ST, Q), where

  • SE is a unary predicate. SE(k) is true iff k is a “special token” in E (not to be transcribed).
  • ST is a unary predicate. ST(k) is true iff k is a “special token” in T (lacking exemplar in E).
  • Q is a set of pairs (pE, pT); for purposes of π, type pE occurring in E will be reinstantiated in T using type pT. (E.g. underline, italics.)

Inferred reading of E

Any T reflects a reading of E and allows us to reconstruct it at least in part. A reconstruction of a reading of E given transcript T is a tuple RR = (E, T, πT, RT, R(E,T)), where

  • E is the exemplar
  • T is the transcript
  • π = (SE, ST, Q) is the transcription policy reflected in T
  • RT is a reading of T
  • R(E,T) = (E, K(E,T), P, M(E,T)) is a reading of E

In practice RT should be compatible with π.

R(E,T) is the reconstructed reading of E.


  • A reading of a token with respect to a type inventory maps it to a type. (Wittgenstein)
  • A reading of a document with respect to a type system identifies tokens and maps them to types. (Tumba Edithae)
  • Different readings may use different type systems. (Sor Juana)
  • A transcription policy
    • distinguishes normal and special tokens in E (Tumba Edithae)
    • distinguishes normal and special tokens in T (Addams)
    • defines some equivalences betweeen types (Kahlo)
  • From any T, we can reconstruct a reading of E.

Wrapping it up

  • Transcripts provide information about exemplars partly by reinstantiation, partly by description.
  • Reinstantiation is intrinsically digital / based on reproduction of discrete symbols.
    Contrast facsimile, which is intrinsically analog.
  • Reinstantiation of types is necessarily relative to a given type system, a given rule of selection, and a given reading of E.
  • A meaningful statement that “T transcribes E” must presuppose a transcription policy and readings of both E and T.

This is where we came in ...

To say “This <sec> element transcribes a block of text in the manuscript identified in the MECS-Wit header”, one can write (oversimplifying slightly):
(∃ b : Token) (∀ d: Document)
(identifies(/doc/catno, d) ⇒
(token-in-document(b, d)
∧ (∃ RD : reading) (∃ KD : set Tokens) (∃ PD : typesystem) (∃ MD : tt-mapping)
  (RD = (d, KD, PD, MD)
∧ (∃ RT : reading) (∃ KT : set Tokens) (∃ PT : typesystem) (∃ MT : tt-mapping)
   (RT = (/, KT, PT, MT)
∧ (∃ π : transcription-policy) (∃ SE : unary-predicate) (∃ ST : unary-predicate) (∃ Q : type-type-function)
   (π = (SE, ST, Q)
∧ (MT(.) = MD(b) ∨ MT(.) = Q(MD(b))))))))

Page maintained by MLCD Project
Style based on 'SyndicateMe' by rhildred