Stuart Dunn

Theoretical Archaeology Group Conference: Presentation

I gave a presentation on MiPP at the TAG conference in Bristol before Christmas, in the session organized by CASPAR entitled ‘Audio-visual practice-as-research in archaeology’. The crux of the presentation was the present-day MoCap data that we gathered from Sue et al at the site this summer, what we are doing with it, and what we would like to do with it. Currently, in my mind at least, this centres on the typology of movement that we’re developing – reviewing the footage and identifying entities of posture, task, instrument and target, and building links between them. In that sense, it is more of a taxonomy (i.e. hierarchical), rather than an ontology (i.e. flat; relationship-based). This, I think, could be very illuminating in terms of understanding archaeological practice; but of course we have to avoid be overly reductionist: every archaeologists is unique of course, and we must be clear that the typology is a means of reflecting that practice and representing it in a systematic way, rather than pigeonholing what archaeologists actually do in the field. Also, while preparing the paper, it struck me that among the things we will have to address for DEDEFI purposes are practical questions such as cost (the suits are currently prohibitively expensive for any excavation project to purchase themselves); practicality in terms of staff and infrastructure needed on site (Animazoo had to have a heavy direct involvement in our work at Silchester), ethics and privacy. And, to cull from the presentation before mine, distinguishing the kind of archaeological practice we are interested in from ‘weird practices’; which may have nothing to do with the archaeological process.

As always with these presentations, it was the questions afterwards which were really interesting (although alas I had to leave before the general discussion at the end of the day, as legions of snow clouds closed in on southern England). It was clear, once again, that engagement with other archaeological practitioners is key of MiPP is to be a success; but that a project which is about process rather than material needs to have its proper archaeological context spelled out if that engagement is to happen. I suspect, however, that once the second stand of the project – the dynamic reconstructions – are under way and demonstrable in a more final form; this will actually be very much easier. We must also link these processes to current discussions about agency and materiality, as discussed for example by Martin Wobst. Ruth Tringham of UC Berkeley indicated that similar issues had come up in her team’s thinking about process at Catalhoyuck. I was asked what merits the various motion capture systems have over simply videoing the excavators at work in HD: this indicates to me that we need to investigate, document and demonstrate in a very robust way the functionaries that the bvh and .fbx viewers that we are using can bring for panning, zooming, viewing the data from multiple angles in 3D and – critically – linking the data with the archaeological data that is there: these critical advantages over standard video are extremely important for the question of ‘why’, as opposed to ‘how’ do we take MoCap out of the studio. A further functionality which I think we need, which struck me when was reviewing the data earlier this week, is that we need the subject’s line of site to be projected onto the floor surface. This is not obvious in the current footage, and yet it is central to documenting the subject’s relationship with his or her material. Finally, I was asked about capturing the movements of larger numbers of people at the same time. This, of course, was originally envisaged as part of MiPP, but had to be abandoned due to technological constraints. Of course this would open the process up to capturing the pathways of visitors through, and around, sites.

Overall – still much to do, but I sense that some really interesting issues are beginning to emerge.

Lessons from the real world on community engagement

An interesting insight into last week’s tuition fees protests in London can be gleaned from the protesters’ use of a ‘Live protest map’ on Google. The minute by minute updates, provided in real time by the demonstrators, provide an extremely compelling account of the day, made all the more so by the brevity and to-the-point nature of the individual observations (e.g. ‘Escaped Police Horse Victoria street’). While I could write at length on the merits of the protests themselves and the impact of the vote in Parliament, this blog is about locative technologies… and this map shows, in microcosm, the great power of combining timelines with maps for illustrating complex events with multiple histories. And, as in this case, to serve specific purposes – real time information sharing among those involved. There are lessons to be learned, I think, for those responsible for the development of websites that seek to document and describe histories such as this with user generated content, particularly of urban areas.

Discussions with CCED (or how I learned to stop worrying about vagueness and love point data)

I met recently with Prof. Stephen Taylor of the University of Reading. Prof. Taylor is one of the investigators of the Clergy of the Church of England (CCED) database project; whose backend development is the responsibility of the Centre for Computing in the Humanities (CCH). Like so many other online historical resources, CCED’s main motivation is to bring things together, in this case information about the CofE clergy between 1540 and 1835, just after which predecessors to the Crockford directory began to appear. There is, however, a certain divergance between what CCED does and what Crockford (simply a list of names of all clergy) does.

CCED started as a list of names, with the relatively straightforward ambition of documenting the name of every ordained person between those dates, drawing on a wide variety of historical sources. Two things fairly swiftly became apparent: that a digital approach was needed to cope with the sheer amounts of information involved (CD-ROMS were mooted at first), and that afacility to build queries around location would be critical to the use historians make of the resource. There is therefore clearly scope for considering how Chalice and CCED might complement one another.

Even more importantly however, some of the issues which CCED have come up against in terms of structure have a direct bearing on Chalice’s ambitions. What was most interesting from Chalice’s point of view was the great complexity which the geographic component contains. It is important to note that there was no definitive list of English ecclesiastical parish names prior to the CCED (crucially, what was needed, was a list which also followed through the history of parishes – e.g. dates of creation, dissolution, merging, etc.), and this is a key thing that CCED provides, and is and of itself of great benefit to the wider community.

Location in CCED is dealt with in two ways: jurisdictional and geographical (see this article). Contrary to popular opinion, which tends to perceive a neat cursus honorum descending from bishop to archdeacon to deacon to incumbent to curate etc, ecclesiastical hierarchies can be very complex. For example, a vicar might be geographically located within a diocese, and yet not report to the bishop responsible for that diocese (‘peculiar’ jurisdictions).

In the geographic sense, location is dealt with in two distinct ways – according to civil geographical areas, such as counties, and according to what might be described as a ‘popular understanding’ of religious geography, treating a diocese as a single geographic unit. Where known, each parish name has a date associated with it, and for the most part this remains constant throughout the period, although where a name has changed there are multiple records (a similar principle to the attestation value of Chalice names, but a rather different approach in terms of structure).

Sub-parish units are a major issue for CCED, and there are interesting comparisons in the issues this throws up for EPNS. Chapelries are a key example: these existed for sure, and are contained with CCED, but it is not always possible to assign them to a geographical footprint (I left my meeting with Prof. Taylor considerably less secure in my convictions about spatial footprints) at least beyond the fact that, almost by definition, they will be been associated with a building. Even then there are problems, however. One example comes from East Greenwich, where there is a record of a curate being appointed, but there is no record of where the chapel is or was, and no visible trace of it today.

Boundaries are particularly problematic. The phenomenon of ‘beating the bounds’ around parishes only occurred where there was an economic or social interest in doing this, e.g. when there was an issue of which jurisdiction tithes should be paid to. Other factors in determining these boundaries was folk memories, and the memories of the oldest people in the settlement. However, it is the case that, for a significant minority of parishes at least, pre Ordnance Survey there was very little formal/mapped conception of parish boundaries.

For this reason, many researchers consider that mapping based on points is more useful that boundaries. An exception is where boundaries followed natural features such as rivers. This is an important issue for Chalice to consider in its discussion about capturing and marking up natural features: where and how have these featured in the assignation and georeferencing of placenames, and when?

A similar issue is the development of urban centres in the late 18th and 19th centuries: in most cases these underwent rapid changes; and a system of ‘implied boundaries’ reflects the situation then more accurately than hard and fast geolocations.

Despite this, CCED reflects the formal structured entities of the parish lists. Its search facilities are excellent if you wish to search for information about specific parishes whose name(s) you know, but, for example, it would be very difficult to search for ‘parishes in the Thames Valley’; or (another example given in the meeting), to define all parishes within one day’s horse riding distance of Jane Austen’s home, thus allowing the user to explore the clerical circles she would have come into contact with but without knowing the names of the parishes involved.

At sub-parish level, even the structured information is lacking. For example, there remains no definitive list of chapelries. CCED has ‘created’ chapelries, where the records indicate that one is apparent (the East Greenwich example above is an instance of this). In such cases, a link with Chalice and/or Victoria County History (VCH) could help establish/verify such conjectured associations (posts on Chalice’s discussions with VCH will follow at some point).

When one dips below even the imperfect georeferencing of parishes, there are non-geographic, or semi-geographic, exceptions which need to be dealt with: chaplains of naval vessels are one example; as are cathedrals, which sit outside the system, and indeed maintain heir own systems and hierarchies. In such cases, it is better to pinpoint the things that can be pinpointed, and leave it to the researcher to build their own interpretations around the resulting layers of fuzziness. One simple point layer that could be added to Chalice, for example, is data from Ordnance Survey’s describing the locations churches: a set of simple points which would associate the names of a parish with a particular location, not worrying too much about the amorphous parish boundaries, and yet eminently connectible to the structure of a resource such as CCED.

In the main, the interests that CCED share with Chalice are ones of structural association with geography. Currently, Chalice relies on point based grid georeferencing, where that has been provided by county editors for the English Place Name Survey. However, the story is clearly far more complex than this. If placename history is also landscape history, one must also accept that it is also intimately linked to Church history; since the Church exerted so much influence of all areas of life of so much of the period of history in question.

Therefore Chalice should consider two things:

what visual interface/structure would work best to display complex layers of information
how can the existing (limited) georeferencing of EPNS be enhanced by linking to it?

The association of (EPNS, placename, church, CCED, VCH) could allow historians to construct the kind of queries they have not been able to construct before.

AC Grayling, and publication being only part of the story

AC Grayling has an intriguing take on the impact of the cuts on Higher Education. In the current New Statesman magazine, he tackles the problem, as he sees it of humanities disciplines attempting to model themselves too much on the sciences. He reserves particular criticism for the way in which humanists communicate their work via scholarly journals. Humanists, he says, have become ‘gatekeepers of magnificent estates, into which they should usher as many people as possible, adding as they do their own insights and reflections … but the tendency to lock the gates behind polysyllabic obscurities in imitation of scientific research is one reason why we have lost sight of the importance to society of a higher education in the humanities’. Once can certainly not fail to agree with the basic premise that the humanities should be there to enrich the individuals who study them, and the society as a whole that they form, but I suspect that it is not true to describe the proliferation of the kind of formulaic humanities article that Grayling decries as an attempt by the humanities to ape the sciences. Rather, this is external pressure from without: academics in the humanities (as elsewhere), especially young academics, are under immense pressure to publish, quite simply because publication to satisfy the REF is the single most important measurement by which they get promotion, job security (in strictly relevant terms of course), professional recognition and institutional credibility (factors not likely to trouble, say, highly bankable professors of philosophy at Birkbeck College). But Grayling’s point hints at a wider issue: couldn’t the sheer availability of material online, of citations, abstracts, full articles, JSTOR, institutional repositories, also be feeding a drive towards the (perhaps) more structured, article-based publication model of the sciences? And is this phenomenon present only in publication of scholarship and research (which are not, as Grayling correctly implies, the same thing)?

Notwithstanding that Grayling does not consider research which crosses CP Snow’s famous divide – I feel that considering only the publication end of the research cycle is a somewhat reductionist approach. At the other end of that cycle – at the grant writing end – there is a similar disconnect (I do not wish to discuss here how often grant applicants actually do what they say they will when they write applications – the applications are a statement of intent, and of mind). I am occasionally asked by various funding bodies to review research grants in both the humanities and sciences, and I was struck recently by how abstract the descriptions of the content to be focused on and the methods to be used often is in the former – one recent example amounted to ‘I will go to archive X, read some books, and write about what I have read’. But that’s OK: the researcher knows what they want to look for, and when the research is exploratory, one cannot penalize a grant for not spelling it out. However, when you compare this with a grant written by people who, say, are more used to writing out methodologies for repeatable experiments, the story is rather different (and more structured).

So if there is indeed (as Grayling suggests) a fragmentation and trivialization of scholarly outputs in the humanities, I think we need look no further than the great democratization of knowledge bought about by the internet, coupled with a lack of epistemic structure, which is lacking because it has never been needed. Until now. Solving this problem is probably *the* grand challenge for the digital humanities.

Session at CAA2011, Beijing: Digging with words: e-text and e-archaeology

Submissions are invited for the session, Digging with words: e-text and e-archaeology, at Computer Applications and Archaeology 2011, Beijing, April 12th-16th 2011. For further details, including author guidelines and submission information, please go to http://www.caa2011.org/#home|default.

Deadline is 15th November 2011.

Digging with words: e-text and e-archaeology

Keywords:
text, digital libraries, text mining, grey literature

Abstract:
There are many complex ways in which archaeology is written about. Formal publications in journals, books, site reports, so-called ‘grey literature’, field notes, excavation daybooks, diaries and, latterly, websites and blogs, all contain a collective written discourse about the past, and how it is discovered. Added to this may be historical sources about sites and artefacts: if excavating a site of the Classical period in Greece for example, it is likely that the excavator will wish to consult Classical authors such as Strabo or Thucydides. Furthermore, evidence from text bearing objects such as inscriptions will heavily influence the interpretation of any site at which it is found. Hitherto, an excavator is likely to have accessed most secondary documentary evidence via institutional libraries and catalogues, or via booksellers or publishers. However, the relatively recent provision on a large scale of such documentary evidence digitally — the Perseus library at Tufts, and online inscriptions corpora such as the Inscriptions of Roman Cyrenaica and Inscriptions of Aphrodisias are good examples — combined with increasingly sophisticated techniques for interrogating that content, and extracting information automatically, prompts us to rethink the very nature of the evidence with which we can form interpretations about the past. Once distinctions between text and artefact, history (or philology) and archaeology were clear. Now however (for example) texts can be parsed for formal units of information and databases of entities built, which can then be used to underpin new knowledge or enhance resource discovery. On the other hand, the bases of comparanda for assessing archaeological data are becoming more widely available in digital form, along with digital representations of those artefacts, allowing deeper comparison and (textual) annotation. This prompts questions as to how the digital medium can be used in their interpretation. This session will seek to explore these distinctions by bringing together archaeologists with interests in textual evidence, textual scholars, historians and philologists. Themes will include, but are not limited to:

* Theoretical considerations of the nature of textual and archaeological evidence
* The use of standards and mark-up schemas in digitized archaeology texts
* Text mining and parsing (especially including geoparsing), and automatic entity extraction
* Linking textual evidence with archaeological evidence using linked data and semantic web technologies
* Provision for non-Latin texts in digital libraries for archaeology, with an emphasis on Chinese and other Asian scripts particularly encouraged

Types of evidence at SDH

This week at the SDH conference in Vienna. Some first rate papers so far. Yesterday, presented in the panel on archaeology, organized by Helen Katsiadaki of of the Research Centre for the Study of Modern Greek History of the Academy of Athens, along with Maria Ilvanidou and Vladimir Stissi. My own presentation, entitled Wiring it all together: Spatial data infrastructures for archaeology, was somewhat informed by the recent JISC Techwatch report by Mike Batty et al, Data mash-ups and the future of mapping. Well worth a read if you’re in to that sort of thing. Anyway, my main point – informed by some background discussion of various aspects of the MiPP and CHALICE projects that are already in the public domain, was that we are awash with bits and pieces of spatial infrastructure, such as OpenLayers, OpenStreetMap and GeoNames, and ubiquitous KML-based platforms such as Google earth and Google Maps, all of which are geared towards a range of nondistinct, methods-based tasks; but that, when applied to archaeological data, they support the visualization of outcomes rather than any kind of spatial analysis as understood by archaeologists, especially those with any kind of GIS background. It was therefore something of a wake up call to hear Vladimir, in his paper, describe GIS-like elements of a large-scale finds database which had (in his view) the same type of limitation. So maybe we should question that assumption that GIS=analysis and KML=visualization is by itself useful. Both involve highly reductionist, quantitative boilings down of the world into nice neat vector parcels of points, lines and polygons; and instead think of how the data is entered into the system into the system into the first place. What fields are certain and tied to controlled vocabularies; what are uncertain, and can that uncertainty be dealt with algorithmically? This would allow us to consider how different types of evidence are dealt with – possibly one of the grand challenges for digital humanities overall. This came over strongly in Maria’s paper, which described interesting and creative approaches to network analysis in Roman Crete — incorporating geodata, manually extracted placename data from travel writing, archaeological evidence and so forth.

Hadrian’s Wall

This is a long-preview flag that next April I will be walking the length of Hadrian’s Wall. This exercise will have two aims: firstly, (hopefully) to work with the Pleiades project to see how direct GPS observation can support/contribute to the connected datasets they hold; and secondly I’m hoping to raise some cash for Cancer Research UK. Any ideas for the former, and/or any sponsorship for the latter, would be very welcome (options of how to give will be posted here shortly).

Decoding Digital Humanities

I went to an interesting meeting yesterday of the London chapter of Decoding Digital Humanities, in the pleasant surroundings of the Jeremy Bentham pub near UCL. Called to mind Melissa Terras’s keynote at DH2010 , in which she highlighted the story in which the great man’s body is wheeled into UCL Senate meetings and recorded as ‘present not voting’.

We had an interesting discussion based around Alan Liu’s 2003 paper to the MLA, ‘The Humanities: A Technical Profession‘, which focused around the institutional nature of the humanities. For example, it was noted – as has often been in the past – that formal value systems are often difficult to apply to humanities research (or perhaps humanities ‘scholarship’, this distinction being that scholarship is the process of aggregating knowledge over years, rather than the more task-oriented conception of ‘research’ to answer specific questions). It could be argued that the concept of ‘excellence’, for example, can be far more easily applied to research based on high quality, verifiable data from experiments, rather than a monograph representing the outcome of months or years of interpretive research. This I think also raises the question of ‘repeatability’, which has come up in other contexts — it’s a generally accepted tenet of scientific practice that in order for any experiment to be valid, it must be repeatable by other scientists in other labs under comparable conditions.

One very interesting aspect of this discussion was the idea of language as a tool in itself. Richard Lewis pointed out that in this very discussion we critiqued the semantic meaning of the word ‘science’, so the kind of uncritical assumption that one might make about a tool’s functionality need not always apply. The present-day definition of the word science, of course, contains its meaning within some government-ordained STEM framework; but it’s worth noting, for example, the more comprehensive meaning of a word such as Wissenschaft, whose application in fields such as Classics and philology has been explored elsewhere by Greg Crane.

Arguably, one major factor that drove the sciences and humanities apart in the nineteenth century was the sheer complexity of the tools and infrastructure that the latter adopted (more on this digression on Craig Bellamy’s 2Cultures blog). So the important questions – and, I think one of the interesting issues to come out of Liu’s paper – are how do we identify and categorize components of e-infrastructure that a) we need in for the humanities, b) those that we do *not* need for the humanities and c) which would we like to adapt, applying some kind of meaningful cost/benefit test of the adaption process to (humanities) research questions. All this will no doubt come up at the forthcoming Supporting the Digital Humanities conference in Vienna.

Questions about placenames

The CHALICE project is led by EDINA at the University of Edinburgh and, along with ourselves at CeRch, partnered by Edinburgh’s Language Technology Group and the Centre for Data Digitization and Analysis in Belfast. The aim is to compile an RDF’able Linked Data gazetteer of placenames, derived from automated extraction of geospatial entities using the Edinburgh geoparser from the publication of the English Place Names Survey. Division of labour is EDINA leads and coordinates, LTG does the heavy lifting, CDDA does the digitization and we, CeRch, look at the medium and long term implementation of the gazetteer by developing use cases with real research projects.

So far, it seems that there might be an interesting link up with CCH’s Prosopography of Anglo-Saxon England project. This contains no historical sources as such, more it is a collection of references and ‘sign posts’, whose aim is to build up lists of individuals who appear throughout the Anglo-Saxon sources. This includes a list of *modern* placenames associated with different references throughout the sources; and one useful application might be to connect the modern toponyms in EPNS with this, allowing searching from a separate collection using historic (i.e. Anglo-Saxon) variants. It seems to me – and I will be happy to be corrected by anyone with more philological credentials than myself – that the Anglo-Saxon material is probably the richest and most interesting seam of material for the kind of coordination that a CHALICE-type gazetteer can bring. Or maybe I am being unduly influenced by recently reading Archaeology, Place-Names & History: An Essay on Problems of Co-ordination by F. T. Wainwright (1962), recommended (and indeed lent) to me by Jo Walsh, CHALICE’s project manager (and reviewed by the same here).

So last Friday, we all met at the EPNS’s premises in Nottingham (or rather the premises of the University of Nottingham’s Institute for Name Studies, which hosts them) for a JISC-funded kickabout on the subject. My sketchy summary of the discussion follows.

How can we develop gazetteers suitable for wider use? Probably by using standards which others do not have to bust a gut to follow, and which provide stability.
The Getty Thesaurus is an example of a stable gazetteer. There are problems with Geonames, but it lacks stability in terms of content, *but* by publishing stable URIs it at least documents and exposes that instability. C.f. for example the sameas.org concept definition service, which was mentioned a few times: it seems that that, as an abstract entity on the linked data web, the instance of ‘Stuart Dunn’ that I am pretty sure refers to me in fact belongs to several different higher level entities. Whether that is due to the Web’s instability or my own, I’m not sure.
It was noted that while the concept is constant, URLs can become inappropriate – e.g. the Vision of Britain website has data from Estonia.
OS research has looked at issues such as namespace hosting – this has important implications for going beyond geographical areas (such as England).
Different people produce different things: are these different resources, or can they bought together as a single resource? Theoretically they can, but much of all the EPNS’s material is on paper. There is nothing in the structure that would forbid it, but it is not digital.

A definitive report of the meeting will be produced in due course.

Grey Literature

There’s an interesting discussion going on on the Forum for Information Heritage Standards in Heritage list, on which I have been lurking and keeping one eye, concerned with the theme of standardizing grey literature. For the non cognoscenti, grey literature records are reports about heritage objects and activities (at least in this context, but the term is known in other areas too), especially archaeological excavations, which are not widely available and therefore not widely read. The thread has been started by the data standards section of English Heritage, with the aim of establishing how the heritage community might go about standardizing the reporting process, and thereby making the grey literature more accessible. Numerous approaches have been discussed – such as Catherine Hardman of the ADS mentioning the A&H e-Science programme-funded Archaeotools project, which uses natural language processing to index grey lit. on the basis of what, where and when entities after it has been deposited – although, of course, this depends on the resource being digital in the first place (or has been digitized), which, of course, it may not have been — the virtues of paper record keeping have been aired in the discussion, and clearly no one is suggesting doing away with it.

My own thoughts: the word ‘literature’ of course implies a fundamentally non-digital way of doing things. Lief Isaksen has raised, on the list, the importance of ‘grey data’, and I think this raises fundamental issues of *how the reports are compiled*, or rather how they could or should be. As I and others have discussed elsewhere, the process of gathering data in archaeology and heritage is faced with many new digital opportunities: the good old VERA project at Reading being a case in point. In many cases, perhaps we should be thinking of some elements of grey literature – and only some, before anyone writes any angry comments – those reports which document projects which are already gathering significant amounts of data digitally – should be seen as documentation and interpretation of that data, drawing perhaps on some of the good practice models of the old AHDS. This would enable the depositor to ‘tie’ the report to whatever format the data may be in – photos, GIS/GPS points, spreadsheets, etc. ‘Standardization’, such as it is, could then be drawn from schema types such as RDF. In such cases, why go through the fundamentally ‘literary’ process of compiling a grey literature report, when something much more lightweight could and should be possible in the digital age?