Digital Classicist: Aggregating Classical Datasets with Linked Data

Last week’s Digital Classicist seminar concerned the question of Linked Data, and its application to data about inscriptions. In his paper, Aggregating Classical Datasets with Linked Data, David Scott of the Edinburgh Parallel Computing Centre described the Supporting Productive Queries for Research (SPQR) project, a collaboration between EPCC and CeRch at KCL. The concept is that inscriptions contain many different kinds of information: information concerning personal names (gods, emperors, officials etc), places, concepts, and so on. When epigraphers and historians wish to use inscriptions for historical research, they undertake a reflexive and extremely unpredictable approach to building link s- both implicit and explicit – between different the kinds of information. SPQR’s long term aim is to facilitate these searches between data to make life easier for classicists and epigraphers to establish links between inscriptions. SPQR is using as case studies the Heidelberger Gesamtverzeichnis, the Inscriptions of Aphrodisias, and Inscriptions of Roman Tripolitania (the latter being the subject of a use case I undertook for the TEXTvre project last year). There have been a number of challenges in the preparation of the data. Epigraphers of course are not computer scientists; and there therefore do not prepare their data is such a way as to make their data machine-readable. The data can the fore be fuzzy, incomplete, uncertain and implicit or open to interpretation. Nor are epigraphers going to sit down and write programmes to do their analysis. Epigraphers have highly interactive workflows that are difficult to predict, but methodologically and in terms of research questions. When you answer one question inscriptions, too often it can lead you on to other questions of which the original workflow took no account. Epigraphic data therefore is distributed and has diverse representations. It can appear in Excel or Word, or in a relational database. It might be available via static or interactive webpages; or one might have to download a file. But there are overlaps in the content, in terms of e.g. places and persons which might be separate or contemporaneous.

The SPQR approach is based on URIs, where each subject and relationship are given URIs, and each object is a URI or literal. For example a subject could be, the object URI is a value for ‘material is…’ and the literal is ‘White marble’. This approach allows the user to build pathways of interpretation through sub-object units of the data.

SPQR is looking at inscriptions marked up in EpiDoc. In EpiDoc, one might find information on provenance; descriptions including date and language; edited texts; translations; findspots; and thematerial from which the inscriptions themselves were made. As my use case for IRT showed, the flexibility afforded by EpiDoc is of great value to digital epigraphers, that flexibility can also count against consistent markup. E.g. an object’s material can be represented as or as material>: bot a different representations of the same thing. SPQR is therefore is re-encoding the EpiDoc using uniform descriptions. The EpiDoc resources also contain references on findspots: name is given as ancientFindspot and modernFindspot (ancient findspot refers to the Barrington atlas; modern names to GeoNames). This is an example of data being linked together: reference sets containing both ancient and modern places are queried simultaneously. SPQR is based on the Linking and Querying Ancient Texts project, which used a relational database approach. The data – essentially, the same three datasets being used by SPQR – is stored as tables. Each row describes a particular inscription, and the columns contain attribute information such as date, place etc. In order to search across these, the user has to have all the tables available, or write an SQL query. This is not straightforward, since this relies on the data being consistently encoded and, as noted above, epigraphers using EpiDoc do not always encode things consistently.

The visual interface being used by SPQR is Gruff. This uses a straightforward colour coding approach, where the literals are yellow, the objects are grey, and the predicates represented as arrows of different colours, depending on the type of predicate.

SPQR Gruff interface

The talk was followed by a wide ranging discussion, which mostly centred on the nature of the things to be linked. There seemed to be a high level consensus that more needed to be done on the terminology behind the objects we are linking. If we are not careful in this then there is a danger that we will end up trying to represent the whole world (which perhaps would echo the big visions of some early adopters of CRM models a few years ago). As will no doubt be picked up in Charlotte Roueche and Charlotte Tupman’s presentation next week (which alas I will not be able to attend), all this comes down to defining units of information. EpiDoc, as a disciplined and rigorous mark-up schema gives us the basis for this, but there need to be very strict guidelines for its application in any given corpus.

Digital Classicist: Developing a RTI System for Inscription Documentation in Museum Collections and the Field

In the first of this summer’s Digital Classicist Seminar Series, Kathryn Piquette and Charles Crowther of Oxford discussed Developing a Reflectance Transformation Imaging (RTI) System for Inscription Documentation in Museum Collections and the Field: Case studies on ancient Egyptian and Classical material . In a well-focused discussion on the activities of their AHRC DEDEFI project of (pretty much) this name, they presented the theory behind RTI and several case studies.

Kathryn began by setting out the limitations of existing imaging approaches in documenting inscribed material. These include first hand observation, requiring visits to archives sites, museums etc. Advantages are that the observer can also handle the object, experiencing texture, weight etc. Much information can be gathered from engaging first hand, but the costs are typically high and the logistics complex. Photography is relatively cheap and easy to disseminate as a surrogate, but it fixed light position one is stuck with often means important features are missed. Squeeze making overcomes this problem, but you lose any sense of the material, and do not get any context. Tracing has similar limitations, but there is the risk of other information being filtered out. Likewise line drawings often miss erasures, tool marks etc; and are on many occasions not based on the original artefact anyway, which risks introducing errors. Digital photography has the advantage of being cheap and plentiful, and video cann capture people engaging with objects. Laser scanning resolution is changeable, and some surfaces do not image well. 3D printing is currently in its infancy. The key point is that all such representations are partial, and all impose differing requirements when one comes to analyse and interpret inscribed surfaces. There is therefore a clear need for fuller documentation of such objects.

Shadow stereo has been used by this team in previous projects to analyse wooden Romano British writing tablets. These tablets were written on wax, leaving tiny scratches in the underlying wood. Often reused, the scratches can be made to reveal multiple writings when photographed in light from many directions. It is possible then to build algorithmic models highlighting transitions from light to shadow, revealing letterforms not visible to the naked eye. The RTI approach used in the current project was based on 76 lights on the inside of a dome placed over the object. This gives a very, very high definition rendering of the object’s surface in 3D, exposed consistently by light from every angle. This ‘raking light photography’ takes images taken from different locations with a 24.5 megapixel camera, and the multiple captures are combined. This gives a sense not only of the objects surface, but of its materiality: by selecting different lighting angles, one can pick out tool marks, scrape marks, fingerprints and other tiny alterations to the surface. There are various ways of enhancing the images, all of which are suitable for identifying different kinds of feature. Importantly, as a whole, the process is learnable by people without detailed knowledge of the algorithms underlying the image process. Indeed one advantage of this approach is it is very quick and easy – 76 images can be taken in around in around five minutes. At present, the process cannot handle large inscriptions on stone, but as noted above, the highlight RTI allows more flexibility. In one case study, RTI was used in conjunction with a flatbed scanner, giving better imaging of flat text bearing objects. The images produced by the team can be viewed using an open source RTI viewer, with an ingenious add-on developed by Leif Isaksen which allows the user to annotate and bookmark particular sections of images.

The project has looked at several case studies. Oxford’s primary interest has been in inscribed text bearing artefacts, Southampton’s in archaeological objects. This raises interesting questions about the application of a common technique in different areas: indeed the good old methodological commons comes to mind. Kathryn and Charles discussed two Egyptian case studies. One was the Protodynastic Battlefield Palette. They showed how tools marks and making processes could be elicited from the object’s surface, and various making processes inferred. One extremely interesting future approach would be to combine RTI with experimental archaeology: if a skilled and trained person were to create a comparable artefact, one could use RTI to compare the two surfaces. This could give us deeper understanding about the kind of experiences involved in making an object such as the battlefield palette, and to base that understanding on rigorous, quantitative methodology.

It was suggested in the discussion that a YouTube video of the team scanning an artefact with their RTI dome would be a great aid to understanding the process. It struck me, in the light of Kathryn’s opening critique of the limitations of existing documentation, that this implicitly validates the importance of capturing people’s interaction with objects: RTI is another kind of interaction, and needs to be understood accordingly.

Another important question raised was how one cites work such as RTI. Using a screen grab in a journal article surely undermines the whole point. The annotation/bookmark facility would help, especially in online publications, but more thought needs to be given to how one could integrate information on materiality into schema such as EpiDoc. Charlotte Roueche suggested that some tag indicating passages of text that had been read using this method would be valuable. The old question of rights also came up: one joy of a one-year exemplar project is that one does not have to tackle the administrative problems of publishing a whole collection digitally.