Session at CAA2011, Beijing: Digging with words: e-text and e-archaeology

Submissions are invited for the session, Digging with words: e-text and e-archaeology, at Computer Applications and Archaeology 2011, Beijing, April 12th-16th 2011. For further details, including author guidelines and submission information, please go to|default.

Deadline is 15th November 2011.

Digging with words: e-text and e-archaeology

text, digital libraries, text mining, grey literature

There are many complex ways in which archaeology is written about. Formal publications in journals, books, site reports, so-called ‘grey literature’, field notes, excavation daybooks, diaries and, latterly, websites and blogs, all contain a collective written discourse about the past, and how it is discovered. Added to this may be historical sources about sites and artefacts: if excavating a site of the Classical period in Greece for example, it is likely that the excavator will wish to consult Classical authors such as Strabo or Thucydides. Furthermore, evidence from text bearing objects such as inscriptions will heavily influence the interpretation of any site at which it is found. Hitherto, an excavator is likely to have accessed most secondary documentary evidence via institutional libraries and catalogues, or via booksellers or publishers. However, the relatively recent provision on a large scale of such documentary evidence digitally — the Perseus library at Tufts, and online inscriptions corpora such as the Inscriptions of Roman Cyrenaica and Inscriptions of Aphrodisias are good examples — combined with increasingly sophisticated techniques for interrogating that content, and extracting information automatically, prompts us to rethink the very nature of the evidence with which we can form interpretations about the past. Once distinctions between text and artefact, history (or philology) and archaeology were clear. Now however (for example) texts can be parsed for formal units of information and databases of entities built, which can then be used to underpin new knowledge or enhance resource discovery. On the other hand, the bases of comparanda for assessing archaeological data are becoming more widely available in digital form, along with digital representations of those artefacts, allowing deeper comparison and (textual) annotation. This prompts questions as to how the digital medium can be used in their interpretation. This session will seek to explore these distinctions by bringing together archaeologists with interests in textual evidence, textual scholars, historians and philologists. Themes will include, but are not limited to:

* Theoretical considerations of the nature of textual and archaeological evidence
* The use of standards and mark-up schemas in digitized archaeology texts
* Text mining and parsing (especially including geoparsing), and automatic entity extraction
* Linking textual evidence with archaeological evidence using linked data and semantic web technologies
* Provision for non-Latin texts in digital libraries for archaeology, with an emphasis on Chinese and other Asian scripts particularly encouraged

Types of evidence at SDH

This week at the SDH conference in Vienna. Some first rate papers so far. Yesterday, presented in the panel on archaeology, organized by Helen Katsiadaki of of the Research Centre for the Study of Modern Greek History of the Academy of Athens, along with Maria Ilvanidou and Vladimir Stissi. My own presentation, entitled Wiring it all together: Spatial data infrastructures for archaeology,  was somewhat informed by the recent JISC Techwatch report by Mike Batty et al, Data mash-ups and the future of mapping. Well worth a read if you’re in to that sort of thing. Anyway, my main point – informed by some background discussion of various aspects of the MiPP and CHALICE projects that are already in the public domain, was that we are awash with bits and pieces of spatial infrastructure, such as OpenLayers, OpenStreetMap and GeoNames, and ubiquitous KML-based platforms such as Google earth and Google Maps, all of which are geared towards a range of nondistinct, methods-based tasks; but that, when applied to archaeological data, they support the visualization of outcomes rather than any kind of spatial analysis as understood by archaeologists, especially those   with any kind of GIS background. It was therefore something of a wake up call to hear Vladimir, in his paper, describe GIS-like elements of a large-scale finds database which had (in his view) the same type of limitation. So maybe we should question that assumption that GIS=analysis and KML=visualization is by itself useful. Both involve highly reductionist, quantitative boilings down of the world into nice neat vector parcels of points, lines and polygons; and instead think of how the data is entered into the system into the system into the first place.  What fields are certain and tied to controlled vocabularies; what are uncertain, and can that uncertainty be dealt with algorithmically? This would allow us to consider how different types of evidence are dealt with – possibly one of the grand challenges for digital humanities overall. This came over strongly in Maria’s paper, which described interesting and creative approaches to network analysis in Roman Crete — incorporating geodata, manually extracted placename data from travel writing, archaeological evidence and so forth.

Hadrian’s Wall

This is a long-preview flag that next April I will be walking the length of Hadrian’s Wall. This exercise will have two aims: firstly, (hopefully) to work with the Pleiades project to see how direct GPS observation can support/contribute to the connected datasets they hold; and secondly I’m hoping to raise some cash for Cancer Research UK. Any ideas for the former, and/or any sponsorship for the latter, would be very welcome (options of how to give will be posted here shortly).

Decoding Digital Humanities

I went to an interesting meeting yesterday of the London chapter of Decoding Digital Humanities, in the pleasant surroundings of the Jeremy Bentham pub near UCL. Called to mind Melissa Terras’s keynote at DH2010 , in which she highlighted the story in which the great man’s body is wheeled into UCL Senate meetings and recorded as ‘present not voting’.

We had an interesting discussion based around Alan Liu’s 2003 paper to the MLA, ‘The Humanities: A Technical Profession‘, which focused around the institutional nature of the humanities. For example, it was noted – as has often been in the past – that formal value systems are often difficult to apply to humanities research (or perhaps humanities ‘scholarship’, this distinction being that scholarship is the process of aggregating knowledge over years, rather than the more task-oriented conception of ‘research’ to answer specific questions).  It could be argued that the concept of ‘excellence’, for example, can be far more easily applied to research based on high quality, verifiable data from experiments, rather than a monograph representing the outcome of months or years of interpretive research. This I think also raises the question of ‘repeatability’, which has come up in other contexts — it’s a generally accepted tenet of scientific practice that in order for any experiment to be valid, it must be repeatable by other scientists in other labs under  comparable conditions.

One very interesting aspect of this discussion was the idea of language as a tool in itself. Richard Lewis pointed out that in this very discussion we critiqued the semantic meaning of the word ‘science’, so the kind of uncritical assumption that one might make about a tool’s functionality need not always apply. The present-day definition of the word science, of course, contains its meaning within some government-ordained STEM framework; but it’s worth noting, for example, the more comprehensive meaning of a word such as Wissenschaft, whose application in fields such as Classics and philology has been explored elsewhere by Greg Crane.

Arguably, one major factor that drove the sciences and humanities apart in the nineteenth century was the sheer complexity of the tools and infrastructure that the latter adopted (more on this digression on Craig Bellamy’s 2Cultures blog). So the important questions – and, I think one of the interesting issues to come out of Liu’s paper – are how do we identify and categorize components of e-infrastructure that a) we need in for the humanities, b) those that we do *not* need for the humanities and c) which would we like to adapt, applying some kind of meaningful cost/benefit test of the adaption process to (humanities) research questions. All this will no doubt come up at the forthcoming Supporting the Digital Humanities conference in Vienna.

Questions about placenames

The CHALICE project is led by EDINA at the University of Edinburgh and, along with ourselves at CeRch, partnered by Edinburgh’s Language Technology Group and the Centre for Data Digitization and Analysis in Belfast. The aim is to compile an RDF’able Linked Data gazetteer of placenames, derived from automated extraction of geospatial entities using the Edinburgh geoparser from the publication of the English Place Names Survey. Division of labour is EDINA leads and coordinates, LTG does the heavy lifting, CDDA does the digitization and we, CeRch, look at the medium and long term implementation of the gazetteer by developing use cases with real research projects.

So far, it seems that there might be an interesting link  up with CCH’s Prosopography of Anglo-Saxon England project. This contains no historical sources as such, more it is a collection of references and ‘sign posts’, whose aim is to build up lists of individuals who appear throughout the Anglo-Saxon sources. This includes a list of *modern* placenames associated with different references throughout the sources; and one useful application might be to connect the modern toponyms in EPNS with this, allowing searching from a separate collection using historic (i.e. Anglo-Saxon) variants. It seems to me – and I will be happy to be corrected by anyone with more philological credentials than myself – that the Anglo-Saxon material is probably the richest and most interesting seam of material for the kind of coordination that a CHALICE-type gazetteer can bring. Or maybe I am being unduly influenced by recently reading Archaeology, Place-Names & History: An Essay on Problems of Co-ordination by F. T. Wainwright (1962), recommended (and indeed lent) to me by Jo Walsh, CHALICE’s project manager  (and reviewed by the same here).

So last Friday, we all met at the EPNS’s premises in Nottingham (or rather the premises of the University of Nottingham’s Institute for Name Studies, which hosts them) for a JISC-funded kickabout on the subject. My sketchy summary of the discussion follows.

  • How can we develop gazetteers suitable for wider use? Probably by using standards which others do not have to bust a gut to follow, and which provide stability.
  • The Getty Thesaurus is an example of a stable gazetteer.  There are problems with Geonames, but it lacks stability in terms of content, *but* by publishing stable URIs it at least documents and exposes that instability. C.f. for example the concept definition service, which was mentioned a few times: it seems that that, as an abstract entity on the linked data web, the instance of ‘Stuart Dunn’ that I am pretty sure refers to me in fact belongs to several different higher level entities. Whether that is due to the Web’s instability or my own, I’m not sure.
  • It was noted that while the concept is constant, URLs can become inappropriate – e.g. the Vision of Britain website has data from Estonia.
  • OS research has looked at issues such as namespace hosting  – this has important implications for going beyond geographical areas (such as England).
  • Different people produce different things: are these different resources, or can they bought together as a single resource? Theoretically they can, but much of all the EPNS’s material is on paper. There is nothing in the structure that would forbid it, but it is not digital.

A definitive report of the meeting will be produced in due course.

Grey Literature

There’s an interesting discussion going on on the Forum for Information Heritage Standards in Heritage list, on which I have been lurking and keeping one eye, concerned with the theme of standardizing grey literature. For the non cognoscenti, grey literature records are reports about heritage objects and activities (at least in this context, but the term is known in other areas too), especially archaeological excavations, which are not widely available and therefore not widely read. The thread has been started by the data standards section of English Heritage, with the aim of establishing how the heritage community might go about standardizing the reporting process, and thereby making the grey literature more accessible. Numerous approaches have been discussed – such as Catherine Hardman of the ADS mentioning the A&H e-Science programme-funded Archaeotools project, which uses natural language processing to index grey lit. on the basis of what, where and when entities after it has been deposited – although,  of course, this depends on the resource being digital in the first place (or has been digitized), which, of course, it  may not have been — the virtues of paper record keeping have been aired in the discussion, and clearly no one is suggesting doing away with it.

My own thoughts: the word ‘literature’ of course implies a fundamentally non-digital way of doing things. Lief Isaksen has raised, on the list, the importance of  ‘grey data’, and I think this raises fundamental issues of *how the reports are compiled*, or rather how they could or should be. As I and others have discussed elsewhere, the process of gathering data in archaeology and heritage is faced with many new digital opportunities: the good old VERA project at Reading being a case in point. In many cases, perhaps we should be thinking of some elements of grey literature – and only some, before anyone writes any angry comments – those reports which document projects which are already gathering significant amounts of data digitally – should be seen as documentation and interpretation of that data, drawing perhaps on some of the good practice models of the old AHDS. This would enable the depositor to ‘tie’ the report to whatever format the data may be in – photos, GIS/GPS points, spreadsheets, etc. ‘Standardization’, such as it is, could then be drawn from schema types such as RDF. In such cases, why go through the fundamentally ‘literary’ process of compiling a grey literature report, when something much more lightweight could and should be possible in the digital age?

“Reading” maps

Went for a nice walk at the weekend from Reading along the Kennet and Avon canal to Aldemaston. Took the two relevant 1:50000 OS Landrangers, depsite the fact that the whole route lies on the towpath, thus making getting lost difficult. A very pleasant stroll apart from an exchange of Anglo-Saxon pleasantries with a careering lycra lout on a tensile graphite framed, Boudica razor tyred, Effoff Mk II Superbike. Despite following such a topographically well-defined route on the OS however, I still noticed a strong, and entirely irrational, impluse is to follow the route on the sheet. I found myself obsessively checking the path at every bridge, lock and level crossing. This is an issue addressed by Mike Parker in his recent book, Map Addict (Harper Collins 2010). This wonderful tome rambles wanders with glorious inconsistency around the obsessions and experiences of Parker, a self-confessed ‘consummate map junkie’; and one stop upon the journey is a discussion of gender-specific mapreading, in the process debunking of the myth that women cannot read maps. The distinction Parker draws is between a ‘male’ impulse to plan routes, measure distances and note waymarkers, versus a ‘female’ impulse to navigate by semantics and points of (personal) significance. This is just one issue in so-called called ‘cognitive spatial literacy’  (see, e.g. this paper) which is likely to become more and more important not just in research as ‘virtual world’ tools become more prevalent, but also in how research is done. It’s critical to note that there are certain assumptions in such ‘datascapes’, and one important way of characterizing these is how we perceive the data we are observing. On the other hand, one can’t tar all spatial digital representations with this brush (a point made very eloquently by Parker in the chapter entitled ‘Pratnav’); they have been there, even in the venerable Ordnance Survey. To give one example, reprising my post from February about battle sites, an article in the current issue of Sheetlines, the journal of the Charles Close Society notes the location in OS of several historic battles, but in doing so draws attention to the fact that these are represented as authoritative points, when actually they probably weren’t. In other words it invites a kind of ‘spatial reading’ that subject might not justify.

AHeSSC atque Vale

A slight hiatus on the blog front. Not, for once, due to idleness or indolence (at least not entirely), but more due to a faulty laptop and extended absence from the office.

Last Friday saw the final projects meeting of the Arts and Humanities e-Science Initiative, held in the pleasant surroundings of the Anatomy Theatre and Museum at King’s College London. This was a great opportunity for a last reflect on what the seven projects have achieved, and where things might go beyond the end of the funding (AHeSSC itself finishes at the end of this month). I was somewhat taken with the term ‘digital craftsmanship’, which implies some concept of ‘making’. Certainly the first four presentations – from Medieval Warfare on the Grid, eSAD, e-Curator and Archaeotools, all of which have an historical or archaeological interest of some kind, one can detect commonalities of the ‘making’ variety: making a hypothesis based on Agent-based Modelling; building an interpretation using an interpretation support ontology, forming questions of what, why or when around distributed datasets, and so on. And the three that followed – e-Dance, Purcell Plus and MusicSpace are similarly concerned with digital creativity. It occurred to me that it is useful to think of these different kinds of making ‘things’ on one side, and on the other side the intellectual and/or interpretive things you can do with them on the other: reception studies, digital repatriation of cultural artefacts – providing a digital replica of an artefact removed from another country to that country (no one is claiming that any of the modes of making are perfect, or would please everyone), reading and understanding texts, visualising and de-constructing interpretive processes… And in the middle you have the difficult things that enable and hinder mapping from one side to the other: the absence of mass digitization programmes that steer engagement with digital content in ways that are (or can be) totally at odds with what is interesting, or what it might be intellectually desirable to do; copyright (ugh, don’t go there, say especially those concerned with music), the fact that most engagements with these technologies are driven by individual research questions and the success (or otherwise) of individual project grants, and not by overarching research paradigms. This, I think, has been both the upside and the downside of the Initiative: it has – wonderfully – fulfilled its aim, set out in 2005, to be driven by humanities and arts research questions. The problem now is that it is only driven by humanities and arts research questions. Which begs the question of how this work can be sustained when there is no Initiative to support it.
What will ride to the rescue? The Digital Economy? The problem with the digital economy is that it is going through an analogue recession: this means that when our paymasters say they want us to collaborate, it is not because they like collaboration; it is because they think it will bring in the folding stuff. Not a long term model. Perhaps we should just accept that this will be a very, very long and slow process, and – even though the realisation that e-science is NOT just Grid has come about in less than five years, sustaining and growing the kind of fantastic, ground breaking research that the Initiative has been able to support in the seven projects, six workshops and three demonstrators will take a long time. As was said several times in the workshop, it will take engagement with the research councils, a recognition (not least by them) that the benefits will not all come in the short term, and an awareness to capitalize on highly relevant concepts such as Linked Data.

It’s by no means all doom and gloom. In a very upbeat summing up, Dave De Roure noted that the Digital Humanities have been around considerably longer than e-Science, and may yet outlast it notwithstanding the recent trenchant analysis of Melissa Terras in her keynote at Digital Humanities 2010). The work of the projects has been, by any standard, world leading in the field, and the opportunities which have been created – and which have been exploited by our colleagues – are surely unquestionable. And as Dave pointed out, we have been able to look well beyond so-called ‘acceleration of research’ – doing things faster, cheaper and bigger – and instead done new things, and done them better. And I think there is a lesson about what kind of support a programme of this type needs, which is is equally interesting. In 2006 we, AHeSSC, were commissioned to provide helpdesk-type support, but I think it is probably fair to say that something a little more sophisticated was needed and – hopefully – provided.

MiPP (1)

And so begins our Motion in Place Platform project, an AHRC DEDEFI grant that CeRch has with colleagues in Sussex and Bedford. The idea is to assess how performance documentation technologies can be used to capture and describe the archaeological research process. The aim is to reconsider and reconceptualize how archaeology is done, and to look at different approaches to the 3D reconstruction and understanding of heritage sites. Thanks to the kind permission of Professor Michael Fulford at the University of Reading, we are able to use the marvellous Silchester Roman Town excavation in Hampshire as a test bed. Silchester is a wonderful panorama of Iron Age and Imperial Roman occupation, leading to complete abandonment and thus fantastic preservation of the stratigraphies – but a big and complicated dig, which poses some daunting challenges for our project.

Last week, Matt Earley and Alex Chasmar from Animazoo were on site testing the kit for complete unknowns, like can ultrasonic motion trackers actually work out doors, near a big and noisy generator

MoCap tests
? The answer is yes, fortunately, they can (if it didn’t we would have had a problem). The tests went extremely well, the only possible variable being if we get a strong wind (likely, in such an exposed spot).