Types of evidence at SDH

This week at the SDH conference in Vienna. Some first rate papers so far. Yesterday, presented in the panel on archaeology, organized by Helen Katsiadaki of of the Research Centre for the Study of Modern Greek History of the Academy of Athens, along with Maria Ilvanidou and Vladimir Stissi. My own presentation, entitled Wiring it all together: Spatial data infrastructures for archaeology,  was somewhat informed by the recent JISC Techwatch report by Mike Batty et al, Data mash-ups and the future of mapping. Well worth a read if you’re in to that sort of thing. Anyway, my main point – informed by some background discussion of various aspects of the MiPP and CHALICE projects that are already in the public domain, was that we are awash with bits and pieces of spatial infrastructure, such as OpenLayers, OpenStreetMap and GeoNames, and ubiquitous KML-based platforms such as Google earth and Google Maps, all of which are geared towards a range of nondistinct, methods-based tasks; but that, when applied to archaeological data, they support the visualization of outcomes rather than any kind of spatial analysis as understood by archaeologists, especially those   with any kind of GIS background. It was therefore something of a wake up call to hear Vladimir, in his paper, describe GIS-like elements of a large-scale finds database which had (in his view) the same type of limitation. So maybe we should question that assumption that GIS=analysis and KML=visualization is by itself useful. Both involve highly reductionist, quantitative boilings down of the world into nice neat vector parcels of points, lines and polygons; and instead think of how the data is entered into the system into the system into the first place.  What fields are certain and tied to controlled vocabularies; what are uncertain, and can that uncertainty be dealt with algorithmically? This would allow us to consider how different types of evidence are dealt with – possibly one of the grand challenges for digital humanities overall. This came over strongly in Maria’s paper, which described interesting and creative approaches to network analysis in Roman Crete — incorporating geodata, manually extracted placename data from travel writing, archaeological evidence and so forth.

Hadrian’s Wall

This is a long-preview flag that next April I will be walking the length of Hadrian’s Wall. This exercise will have two aims: firstly, (hopefully) to work with the Pleiades project to see how direct GPS observation can support/contribute to the connected datasets they hold; and secondly I’m hoping to raise some cash for Cancer Research UK. Any ideas for the former, and/or any sponsorship for the latter, would be very welcome (options of how to give will be posted here shortly).

Decoding Digital Humanities

I went to an interesting meeting yesterday of the London chapter of Decoding Digital Humanities, in the pleasant surroundings of the Jeremy Bentham pub near UCL. Called to mind Melissa Terras’s keynote at DH2010 , in which she highlighted the story in which the great man’s body is wheeled into UCL Senate meetings and recorded as ‘present not voting’.

We had an interesting discussion based around Alan Liu’s 2003 paper to the MLA, ‘The Humanities: A Technical Profession‘, which focused around the institutional nature of the humanities. For example, it was noted – as has often been in the past – that formal value systems are often difficult to apply to humanities research (or perhaps humanities ‘scholarship’, this distinction being that scholarship is the process of aggregating knowledge over years, rather than the more task-oriented conception of ‘research’ to answer specific questions).  It could be argued that the concept of ‘excellence’, for example, can be far more easily applied to research based on high quality, verifiable data from experiments, rather than a monograph representing the outcome of months or years of interpretive research. This I think also raises the question of ‘repeatability’, which has come up in other contexts — it’s a generally accepted tenet of scientific practice that in order for any experiment to be valid, it must be repeatable by other scientists in other labs under  comparable conditions.

One very interesting aspect of this discussion was the idea of language as a tool in itself. Richard Lewis pointed out that in this very discussion we critiqued the semantic meaning of the word ‘science’, so the kind of uncritical assumption that one might make about a tool’s functionality need not always apply. The present-day definition of the word science, of course, contains its meaning within some government-ordained STEM framework; but it’s worth noting, for example, the more comprehensive meaning of a word such as Wissenschaft, whose application in fields such as Classics and philology has been explored elsewhere by Greg Crane.

Arguably, one major factor that drove the sciences and humanities apart in the nineteenth century was the sheer complexity of the tools and infrastructure that the latter adopted (more on this digression on Craig Bellamy’s 2Cultures blog). So the important questions – and, I think one of the interesting issues to come out of Liu’s paper – are how do we identify and categorize components of e-infrastructure that a) we need in for the humanities, b) those that we do *not* need for the humanities and c) which would we like to adapt, applying some kind of meaningful cost/benefit test of the adaption process to (humanities) research questions. All this will no doubt come up at the forthcoming Supporting the Digital Humanities conference in Vienna.

Questions about placenames

The CHALICE project is led by EDINA at the University of Edinburgh and, along with ourselves at CeRch, partnered by Edinburgh’s Language Technology Group and the Centre for Data Digitization and Analysis in Belfast. The aim is to compile an RDF’able Linked Data gazetteer of placenames, derived from automated extraction of geospatial entities using the Edinburgh geoparser from the publication of the English Place Names Survey. Division of labour is EDINA leads and coordinates, LTG does the heavy lifting, CDDA does the digitization and we, CeRch, look at the medium and long term implementation of the gazetteer by developing use cases with real research projects.

So far, it seems that there might be an interesting link  up with CCH’s Prosopography of Anglo-Saxon England project. This contains no historical sources as such, more it is a collection of references and ‘sign posts’, whose aim is to build up lists of individuals who appear throughout the Anglo-Saxon sources. This includes a list of *modern* placenames associated with different references throughout the sources; and one useful application might be to connect the modern toponyms in EPNS with this, allowing searching from a separate collection using historic (i.e. Anglo-Saxon) variants. It seems to me – and I will be happy to be corrected by anyone with more philological credentials than myself – that the Anglo-Saxon material is probably the richest and most interesting seam of material for the kind of coordination that a CHALICE-type gazetteer can bring. Or maybe I am being unduly influenced by recently reading Archaeology, Place-Names & History: An Essay on Problems of Co-ordination by F. T. Wainwright (1962), recommended (and indeed lent) to me by Jo Walsh, CHALICE’s project manager  (and reviewed by the same here).

So last Friday, we all met at the EPNS’s premises in Nottingham (or rather the premises of the University of Nottingham’s Institute for Name Studies, which hosts them) for a JISC-funded kickabout on the subject. My sketchy summary of the discussion follows.

  • How can we develop gazetteers suitable for wider use? Probably by using standards which others do not have to bust a gut to follow, and which provide stability.
  • The Getty Thesaurus is an example of a stable gazetteer.  There are problems with Geonames, but it lacks stability in terms of content, *but* by publishing stable URIs it at least documents and exposes that instability. C.f. for example the sameas.org concept definition service, which was mentioned a few times: it seems that that, as an abstract entity on the linked data web, the instance of ‘Stuart Dunn’ that I am pretty sure refers to me in fact belongs to several different higher level entities. Whether that is due to the Web’s instability or my own, I’m not sure.
  • It was noted that while the concept is constant, URLs can become inappropriate – e.g. the Vision of Britain website has data from Estonia.
  • OS research has looked at issues such as namespace hosting  – this has important implications for going beyond geographical areas (such as England).
  • Different people produce different things: are these different resources, or can they bought together as a single resource? Theoretically they can, but much of all the EPNS’s material is on paper. There is nothing in the structure that would forbid it, but it is not digital.

A definitive report of the meeting will be produced in due course.

Grey Literature

There’s an interesting discussion going on on the Forum for Information Heritage Standards in Heritage list, on which I have been lurking and keeping one eye, concerned with the theme of standardizing grey literature. For the non cognoscenti, grey literature records are reports about heritage objects and activities (at least in this context, but the term is known in other areas too), especially archaeological excavations, which are not widely available and therefore not widely read. The thread has been started by the data standards section of English Heritage, with the aim of establishing how the heritage community might go about standardizing the reporting process, and thereby making the grey literature more accessible. Numerous approaches have been discussed – such as Catherine Hardman of the ADS mentioning the A&H e-Science programme-funded Archaeotools project, which uses natural language processing to index grey lit. on the basis of what, where and when entities after it has been deposited – although,  of course, this depends on the resource being digital in the first place (or has been digitized), which, of course, it  may not have been — the virtues of paper record keeping have been aired in the discussion, and clearly no one is suggesting doing away with it.

My own thoughts: the word ‘literature’ of course implies a fundamentally non-digital way of doing things. Lief Isaksen has raised, on the list, the importance of  ‘grey data’, and I think this raises fundamental issues of *how the reports are compiled*, or rather how they could or should be. As I and others have discussed elsewhere, the process of gathering data in archaeology and heritage is faced with many new digital opportunities: the good old VERA project at Reading being a case in point. In many cases, perhaps we should be thinking of some elements of grey literature – and only some, before anyone writes any angry comments – those reports which document projects which are already gathering significant amounts of data digitally – should be seen as documentation and interpretation of that data, drawing perhaps on some of the good practice models of the old AHDS. This would enable the depositor to ‘tie’ the report to whatever format the data may be in – photos, GIS/GPS points, spreadsheets, etc. ‘Standardization’, such as it is, could then be drawn from schema types such as RDF. In such cases, why go through the fundamentally ‘literary’ process of compiling a grey literature report, when something much more lightweight could and should be possible in the digital age?

“Reading” maps

Went for a nice walk at the weekend from Reading along the Kennet and Avon canal to Aldemaston. Took the two relevant 1:50000 OS Landrangers, depsite the fact that the whole route lies on the towpath, thus making getting lost difficult. A very pleasant stroll apart from an exchange of Anglo-Saxon pleasantries with a careering lycra lout on a tensile graphite framed, Boudica razor tyred, Effoff Mk II Superbike. Despite following such a topographically well-defined route on the OS however, I still noticed a strong, and entirely irrational, impluse is to follow the route on the sheet. I found myself obsessively checking the path at every bridge, lock and level crossing. This is an issue addressed by Mike Parker in his recent book, Map Addict (Harper Collins 2010). This wonderful tome rambles wanders with glorious inconsistency around the obsessions and experiences of Parker, a self-confessed ‘consummate map junkie’; and one stop upon the journey is a discussion of gender-specific mapreading, in the process debunking of the myth that women cannot read maps. The distinction Parker draws is between a ‘male’ impulse to plan routes, measure distances and note waymarkers, versus a ‘female’ impulse to navigate by semantics and points of (personal) significance. This is just one issue in so-called called ‘cognitive spatial literacy’  (see, e.g. this paper) which is likely to become more and more important not just in research as ‘virtual world’ tools become more prevalent, but also in how research is done. It’s critical to note that there are certain assumptions in such ‘datascapes’, and one important way of characterizing these is how we perceive the data we are observing. On the other hand, one can’t tar all spatial digital representations with this brush (a point made very eloquently by Parker in the chapter entitled ‘Pratnav’); they have been there, even in the venerable Ordnance Survey. To give one example, reprising my post from February about battle sites, an article in the current issue of Sheetlines, the journal of the Charles Close Society notes the location in OS of several historic battles, but in doing so draws attention to the fact that these are represented as authoritative points, when actually they probably weren’t. In other words it invites a kind of ‘spatial reading’ that subject might not justify.

AHeSSC atque Vale

A slight hiatus on the blog front. Not, for once, due to idleness or indolence (at least not entirely), but more due to a faulty laptop and extended absence from the office.

Last Friday saw the final projects meeting of the Arts and Humanities e-Science Initiative, held in the pleasant surroundings of the Anatomy Theatre and Museum at King’s College London. This was a great opportunity for a last reflect on what the seven projects have achieved, and where things might go beyond the end of the funding (AHeSSC itself finishes at the end of this month). I was somewhat taken with the term ‘digital craftsmanship’, which implies some concept of ‘making’. Certainly the first four presentations – from Medieval Warfare on the Grid, eSAD, e-Curator and Archaeotools, all of which have an historical or archaeological interest of some kind, one can detect commonalities of the ‘making’ variety: making a hypothesis based on Agent-based Modelling; building an interpretation using an interpretation support ontology, forming questions of what, why or when around distributed datasets, and so on. And the three that followed – e-Dance, Purcell Plus and MusicSpace are similarly concerned with digital creativity. It occurred to me that it is useful to think of these different kinds of making ‘things’ on one side, and on the other side the intellectual and/or interpretive things you can do with them on the other: reception studies, digital repatriation of cultural artefacts – providing a digital replica of an artefact removed from another country to that country (no one is claiming that any of the modes of making are perfect, or would please everyone), reading and understanding texts, visualising and de-constructing interpretive processes… And in the middle you have the difficult things that enable and hinder mapping from one side to the other: the absence of mass digitization programmes that steer engagement with digital content in ways that are (or can be) totally at odds with what is interesting, or what it might be intellectually desirable to do; copyright (ugh, don’t go there, say especially those concerned with music), the fact that most engagements with these technologies are driven by individual research questions and the success (or otherwise) of individual project grants, and not by overarching research paradigms. This, I think, has been both the upside and the downside of the Initiative: it has – wonderfully – fulfilled its aim, set out in 2005, to be driven by humanities and arts research questions. The problem now is that it is only driven by humanities and arts research questions. Which begs the question of how this work can be sustained when there is no Initiative to support it.
What will ride to the rescue? The Digital Economy? The problem with the digital economy is that it is going through an analogue recession: this means that when our paymasters say they want us to collaborate, it is not because they like collaboration; it is because they think it will bring in the folding stuff. Not a long term model. Perhaps we should just accept that this will be a very, very long and slow process, and – even though the realisation that e-science is NOT just Grid has come about in less than five years, sustaining and growing the kind of fantastic, ground breaking research that the Initiative has been able to support in the seven projects, six workshops and three demonstrators will take a long time. As was said several times in the workshop, it will take engagement with the research councils, a recognition (not least by them) that the benefits will not all come in the short term, and an awareness to capitalize on highly relevant concepts such as Linked Data.

It’s by no means all doom and gloom. In a very upbeat summing up, Dave De Roure noted that the Digital Humanities have been around considerably longer than e-Science, and may yet outlast it notwithstanding the recent trenchant analysis of Melissa Terras in her keynote at Digital Humanities 2010). The work of the projects has been, by any standard, world leading in the field, and the opportunities which have been created – and which have been exploited by our colleagues – are surely unquestionable. And as Dave pointed out, we have been able to look well beyond so-called ‘acceleration of research’ – doing things faster, cheaper and bigger – and instead done new things, and done them better. And I think there is a lesson about what kind of support a programme of this type needs, which is is equally interesting. In 2006 we, AHeSSC, were commissioned to provide helpdesk-type support, but I think it is probably fair to say that something a little more sophisticated was needed and – hopefully – provided.

MiPP (1)

And so begins our Motion in Place Platform project, an AHRC DEDEFI grant that CeRch has with colleagues in Sussex and Bedford. The idea is to assess how performance documentation technologies can be used to capture and describe the archaeological research process. The aim is to reconsider and reconceptualize how archaeology is done, and to look at different approaches to the 3D reconstruction and understanding of heritage sites. Thanks to the kind permission of Professor Michael Fulford at the University of Reading, we are able to use the marvellous Silchester Roman Town excavation in Hampshire as a test bed. Silchester is a wonderful panorama of Iron Age and Imperial Roman occupation, leading to complete abandonment and thus fantastic preservation of the stratigraphies – but a big and complicated dig, which poses some daunting challenges for our project.

Last week, Matt Earley and Alex Chasmar from Animazoo were on site testing the kit for complete unknowns, like can ultrasonic motion trackers actually work out doors, near a big and noisy generator

MoCap tests
? The answer is yes, fortunately, they can (if it didn’t we would have had a problem). The tests went extremely well, the only possible variable being if we get a strong wind (likely, in such an exposed spot).

DARIAH workshop in Athens

Last week I was in Athens organized by the DARIAH project entitled ‘Scholarly activity and information process’

This was principally about understanding the processes of research that an e-infrastructure – whatever that might be – underpins and supports. Numerous perspectives emerged on how such a process might be conceptualized; but I think what emerged as the common theme was definition, and how we define the things we are talking about. For example, a starting point for much of the workshop was John Unsworth’s conception of the ‘scholarly primitive’; building blocks of research which, in a 2000 paper, he defined as ‘Discovering, Annotating, Comparing, Referring, Sampling, Illustrating and Representing‘. Seamus Ross’s critique of this however conceived of these more as processes; whereas a primitive should be seen as something which engages more fundamentally with generating knowledge from ‘primary data’ (itself a thing which used to have a widely accepted definition but, I would suggest, is much harder to pin down in the digital age). One example he gave was ‘question forming’ – which, of course, is not a primitive aspect of research that is confined to the digital milieu. Simon Mahony from UCL developed this idea with a perspective on the titles of projects his students come up with – which rarely include an actual question, which defines how the work they will do will bring new perspectives.

For me, it was interesting that this question of ‘what is the building block of [digital] humanities research’ reflects so closely the discussion in the last year or so on e-science fundamentals. Both areas – digital humanities and e-science – I think, share a implicit desire to show that they are fully professionalized academic disciplines, which I have no problem with (despite my own suspicion that academic disciplines are themselves basically nineteenth century concoctions to make Oxbridge colleges look tidier). But the problem is always one of language and description. This also applies to research methods, as well as object of research. César González-Pérez’s very interesting presentation on methods, for example, introduced the idea of the ‘method fragment’, a particular way of approaching or manipulating information, which can be defined consistently, and linked to others in a non-linear way to describe an overarching workflow. (The non-linear bit is, I think, crucial). This agrees well with the position set out in a forthcoming paper in the proceedings of AHM2009 by myself, Sheila Anderson and Tobias Blanke, which utilizes Short and McCarty’s famous ‘Methodological Commons’ for the digital humanities. However, again we come back to the problem of definition and language. It is convenient and logical for us, as developers and providers of a research e-infrastructure, to conceive of the research process in such a way, but we also have to remember that an historian about to embark on a research project does not go to their bookshelf, take down the Big Bumper Book of History Research Methods, select one, and stick to it for two years. Even if such a book existed, and if it were fully comprehensive, footnoted, agreed by the history research community (the economic history community? Or political history? Or social history? Since when have academics every agreed about such things anyway?), they would select, choose, modify, ignore, change, make it up as they go along… and if an e-infrastructure gets in the way of that, it is doomed. I was also glad to have the opportunity to get off my chest a problem I have with the word ‘tool’ to describe a software application, interface etc… a hammer is not likely to give me ideas and thoughts on better and better ways to knock in nails. However, a piece of research software might – if it is any good – give me pause to think about how I approach data, and to think computationally about the knowledge I could generate by analyzing it. As Alexandra Bounia made brilliantly clear in her presentation – which invited us to think what research we would do if we were putting together a museum exhibition, and how we would do it and why – we are talking here about a whole lot more than acquiring, storing and distributing data. An obvious point maybe, but one that is too important not to be made explicitly in such discussions.