(Not quite) moving mountains: recording volcanic landscapes in digital gazetteers

Digital gazetteers have been immensely successful as means of linking and describing places online. GeoNames for example, now contains 10,000,000 geographical names corresponding to over 7,500,000 unique features. However, as we will be outlining at the ICA Digital Technologies in Cartographic Heritage next month in relation to the Heritage Gazetteer of Cyprus project, one assumption which often underlies them is fixity: an assumption that a name and a place and, by extension, its location on the Earth’s surface are immutably linked. This allows gazetteers to be treated as authorities. For example, a gazetteer with descriptions fixed to locations can be used to associate postal codes with a layer of contemporary environmental data and describe relationships between them; or to create a look-up list for the provision of services. It can also be very valuable for research, where a digital edition of a text has mentions of places. If contained in a parallel gazetteer, these can be used to provide citations and external authorities to those places, and also to other references in other texts.

However, physical geography changes. In the Aegean, where  the African tectonic plate is subducting beneath the Eurasian plate to the north, the South Aegean Volcanic Arc has been created, a band of active and dormant volcanic islands including the islands of  Aegina, Methana, Milos, Santorini, Kolumbo, and Kos, Nisyros and Yali. Each of these locations has a fixed modern aspect, and can be related to a record in a digital gazetteer.  However, these islands have changed over the years as a result of historical volcanism, and this history requires  the flexibility of a digital gazetteer to adequately represent it.


The island of Thera. The volcanic dome of Mt. Profitis Elias is shown viewed from the north.

I recently helped refine the entry in the Pleiades gazetteer for the Santorini Archipelago. Pleiades assigns a URI to each location, and allows information to be associated with that location via the URIs.  Santorini provides a case study of how multiple Pleaides URIs, associated with different time periods, can trace the history of the archipelago’s volcanism.  The  five present-day islands frame two ancient calderas, the larger formed more recently in the great Late Bronze Age eruption, and the other formed very much earlier in the region’s history. Originally, it is most likely that a single island was present, which fragmented over the millennia in response to the eruptions. Working backwards therefore, we begin with a URI for the islands as a whole:  http://pleiades.stoa.org/places/808255902. This covers the entire entity of the ‘Santorini Archipelago’. We associate all the names that have pertained to the island group through history – Καλλιστη (Calliste; 550 BC – 330 BC); Hiera (550 BC – 330 BC) and Στρογγύλη (Strongyle; AD 1918 – AD 2000), as well as the modern designation ‘Santorini Archipelago’ itself.  These four names have been used, at different times as either a collective term for all the islands, or, in the case of Strongyle, for the (geologically) original single island. This URI-labelled entity has lower-level connections with the islands that were formed during the periods of historic volcanism:  Therasia, Thera,  Nea Kameni, Mikro Kameni, Palea Kameni, Caimeni  and Aspronisi. Each, in turn, has its own URI.


The Santorini Archipelago in Pleiades

Mikro Kameni  and Caimeni are interesting cases as they no longer exist on the modern map. They are attested respectively by the naval survey of Thomas Graves of HMS Volgae (1851), and Caimeni was attested by Thomaso Porcacchi in 1620. Both formed elements of what are now the Kameini islands, but due to the fact that they have these distinct historical attestations, they are assigned URIs, with the time periods when they were known to exist according to the sources, even though they do not exist today.

This speaks to a wider issue of digital gazetteers, and their role in the understanding of past landscapes. With the authority they imbue to place-names, gazetteers might, if developed without reference to the nuances of landscape changes over time, potentially risk implicitly enforcing the view, no longer widely accepted, that places are immutable holders of history and historic events; where, in the words of Tilley in A Phenomenology of Landscape: Places, Paths and Monuments (1994), ‘space is directly equivalent to, and separate from time, the second primary abstracted scale according to which societal change could be documented and ‘measured’.’ (p. 9). The evolution of islands due to volcanism show clearly the need for a critical framework that avoids such approaches to historical and archaeological spaces.

Image biographies?

Last week, thanks to my Fellowship of the Software Sustainability Institute, I attended Electronic Visualization and the Arts in Florence. This fascinating and wide-ranging conference bought together a tremendous range of people and ideas. Strategy, application, theory. A fascinating take on crowd-sourcing appeared in the form of a project of ETH-Bibliotek in Zurich, cataloging a massive image archive of daily life working for Swissair using the knowledge of Swissair retirees.


Another issue that came up was the challenge associated with archiving a digital image for the  very long term – 150 years or more? Today we still have images taken in the 1920s and 1930s, can digital imaging deliver similar longevity? The short answer is almost certainly not. One strategy discussed, by Graham Diprose and Mike Seabourne, is to archive digital artwork and photography by printing it on specially prepared paper; an approach they describe as a ‘technology proof form of insurance’. I think this raises important issues about how images, digital or otherwise, are dealt with as ‘objects’. This topic has a certain hinterland in the domain of cultural heritage, as the concept of the ‘object biography’ has been discussed since at least 1999, when Chris Gosden and Yvonne Marshall wrote that ‘as people and objects gather time, movement and change, they are constantly transformed, and these transformations of person and object are tied up with each other’ (‘The Cultural Biography of Objects’, World Archaeology, Volume 31 No. 2 [October 1999]: 169-178). The key difference with images – an susbset class of objects that has only existed for a little over one hundred years – subject, significance and material can be separated. What an image depicts is separate from the materiality of the photograph itself — and from the cloud of numbers that make up a digital image. Such considerations encourage us to think about what the ‘biography’ of an image might look like. And this is important. The Gosden-Marshall model of the object biography has gained currency in a number of major museums, including the Pitt Rivers. The implications of the image biography, where meaning and material are preserved side by side, will be preserved side by side, can only be done if the relationships between these two are also preserved. This will include methods for preserving metadata (a point made in the session by Nick Lambert); but it also accords, I think, with the broader intellectual direction of where visualization is going.

Florence, Piazza degli Ottaviani. The EVA venue is on the left.

To explain this: this year, in London, EVA International will celebrate its 25th anniversary.  In the last twenty five years, much digital visualization has been contingent of the presentation of 3D material on the 2D screen. My sense from the last three or so EVA Londons, confirmed by EVA Florence, is that in the next twenty five years, visualization will involve the 2D screen less and less. A greater proportion of demos at EVA London now involve objects, not screen-based presentations. Carol MacGillivray, Bruno Mathez and Frederic Fol Leymarie’s  Diasynchonosope project in 2013 and Gary Priestnall, Jeremy Gardiner, Jake Durrant & James Goulding’s Projection Augmented Relief Models in 2012 are particularly striking examples, but there are many more.  Preserving these visulaziations, including more conventional digital images, will require an integrated cloud of thinking on software sustainability, and its relationship to curation practice, digital augmentation and policy. I look forward to seeing more illustrations of this in London in July. I was also happy to participate in a network meeting of EVA international on the third day.The meeting ended with a small ceremony to inaugurate a plaque in the newly refurbished room to commemorate the event (see photo – this is a facsimile, pending the real thing being engraved). Image

Heritage Gazetteer of Cyprus: evolving thinking

Place names are a form of shared understanding, and of shared definition. Last week saw some discussions at the University of Cyprus’s Archaeological Research Unit on the technical and conceptual approaches that our Heritage Gazetteer of Cyprus (finded by the A. G. Leventis Foundation) will take; and how it will attempt to express such shared understandings of place on the island. In this process, we hope to shed new light on its history and archaeology.  This post sets out the current state of our thinking, which will surely evolve further over the coming weeks and months, and will – we hope – form the synopsis of a longer and more formal and detailed article, which will be submitted for publication in due course.

There can be few places in the world where this matters more. The island of Cyprus has occupied a place of immense strategic importance since the time of its earliest occupation, where, for example, there is direct evidence of trading links between Cyprus and New Kingdom Egypt (C16th BC), to the present day. The island sits, as it always has, at (and indeed as) the crossroads between Europe, the Middle East and Africa. Our gazetteer therefore seeks to represent structure, without privileging one interpretation over another. In our world, the fact that an authority has attested the existence of a place name in a particular form is sufficient to be considered for inclusion. We do not make a value judgement on the extent of the authority or the provenance of the name. In the longer term an Editorial Board will be formed to consider such matters, but in the framework we are establishing, we are concentrating on expressing the relationship between forms, rather than peer-reviewing them. In this, with so many other projects in this field, we draw inspiration from the Pleiades gazetteer and Pelagios project as means of defining and sharing ancient geography, although in this project we are not constrained to the Classical world to quite the extent that previous efforts has been. As will become clear, this makes matters considerably more complex.

We begin with a tripartite structure, beginning with modern place name and administrative unit. These will be derived from the Complete Gazetteer of Cyprus, an authoritative statement of the state of Cypriot geography as of 1983. These will be represented by a consistent URI string, www.cyprusgazetteer.org/[CGCid (please note none of these URIs are currently operational), where CGC represents a unique value derived from the CGC itself. We are currently in discussions with the University of Cyprus’s Library on linking up with existing digitization work here, so exactly what form the gazetteer references to the CGC will take remain to be determined. However, we know that the administrative district (and its spatial footprint) is an attribute of such a location. Beneath this, we introduce a layer of Historic Units. These represent sites which are larger than individual monuments or findspots, which have some form of toponymic identity in their own right, and these can contain smaller units.  The URI string for these will be www.cyprusgazetteer.org/CGCid/HUid. Examples might include Old Nicosia (which would be associated upwards with the modern name of Nicosia, Λευκωσία, and whatever CGC-derived URI we end up using). This feature term will also include settlements such a monasteries, and also cemeteries and archaeological excavations. As well as a georeference – likely to be a pologyon  – expressing the HU’s spatial footprint, much as the Pleiades project does for sites where this is an issue, such as Aphrodisias:


We will also include a reference here to the heritage taxonomies being developed by the Department of Antiquities’ Cyprus Archaeological Digitization programme – with whom we hope to collaborate closely in the coming months. This will not only aid the searchability of Historic Units, it will also ensure that we are coordinating with local expertise on the ground.

Prehistoric sites represent a further challenge. Frequently, but not always, the name form given to a prehistoric site in the literature (where it is attested) is that of the nearest modern settlement. Therefore, in such cases, we will populate the HU section of the URI with a null value, so that while the name will be conceptually on the level below the highest CGC-derived modern form, and be distinct from it (including having its own separate georeference/spatial footprint) it will still share that form. However this will also avoid providing a false positive, by forcing a link with a Historic Unit where so such link exists. Each form will also have a date, or a date range, illustrating when it was in use, when it was attested as being in use, and the date of the document providing that attestation. In all cases we will seek to provide the earliest form of the date.

Our lowest level will be an Archaeological Entity record, comprising the URI string www.cyprusgazetteer.org/[CGCid]/[HUid_or_Null if not applicable]/AEid]. This will also have a georeference, either a point or a footprint, and a CADiP reference. The naming attribution for the forms and attestations will work in much the same way as for the Historical Units, and they will be similarly dated, both by attestation and documentation.

A new element in our project is that the community will be invited to add names to Units and Entities, as long as each name is accompanied by a verifiable reference, and a date or date range. Users will be able to add unlimited forms (the way a name is spelled). One example might be the Church of St Nicholas of the Roof, which might also be represented as ‘St. Nikolaus of the Roof’, or Άγιος Νικόλαος της Στέγης’. These are differing forms, but they may or may not be attested in the same source.

The complexities can be illustrated by the case of Hala Sultan Tekke:


In contemporary times this name refers to the small but extremely important mosque on Larnaca Salt Lake, according to Islamic tradition the final resting place of the Prophet’s foster mother and one of Islam’s holiest sites. However, Hala Sultan Tekke is also a major site of the Late Bronze Age, with an urban settlement a few hundred meters to the west of the mosque.


In our structure, the mosque would be represented as an Archaeological Entity, with the name forms Hala Sultan Tekke, and Mosque of Umm Haram, etc. This will stem from the same URI, with extensions for the two names. The Bronze Age site will have a separate URI, with a null value inserted to show it does not belong to a named Historic Unit. Both URIs will contain the GCG-derived unique identifier for Dromolaxia, which is attributed to the Larnaca district.

This project addresses a question of fundamental importance in the Digital Humanities, and indeed in digital history and archaeology: how can place and location be expressed as narratives in the contemporary world around us? How can digital mapping and geodata be deployed in such a way that deals sensitively with contemporary views of place and history, and how can the stories that these two things tell when they come together be told? Our Gazetteer is beginning to show us what the building blocks of those narratives might look like. We welcome discussion from any and all who are interested in this, in Cyprus or anywhere else.

The Good, the Bad and the Ugly

Blogging Archaeology


I am here snicking in a late (and brief) entry to Doug’s archaeology blogging carnival. This month’s challenge is to set out the best, the worst and the ugly about archaeological blogging. So here goes…

The Good

My own experience is that academic (not necessarily just archaeological) blogging is at its best when it emerges from some real-world collaborative or communal activity. In 2012, I was fortunate enough to spend a couple of weeks participating in IUPUI’s Spatial Narratives Summer Institute, funded by the NEH. There was an official blog for this to which we all contributed, but many of us blogged independently about the experience, and the ideas that we were developing.  It can be no coincidence that my own post composed in this period, ‘Deep Maps in Indy‘ is my most viewed blog article ever. Apart from the fact that we generally referred to each other’s posts, thereby increasing our page views and likes (see below), this contributed to a sense of shared purpose and common cause – and this is especially so when one is in the company of great archaeology/cultural heritage bloggers such as Mia Ridge. The same was true of the CAA2012 session on the ‘Archaeology/Digital Humanities Venn Diagram session, which was subsequently Storified by Graeme Earl. Again, providing a sense of coming together from the real world, and continuity through a variety of different perspectives.

The Bad 

I would go with what many others have said on this subject. One of the worst and most frustrating aspects of academic blogging is low hits/visitor numbers. As Doug says, one feels that one is talking to a brick wall. I have to say I get less hung up on getting low volumes of comments on my posts (although getting any comments, good or bad, is always very welcome). My suspicion is that this aspect of blogging is being eroded by the Twittersphere — if you have something to say about a post, chances are you’ll tweet your reaction rather than using the Comment button. Whether positive or negative, this can actually be a good thing, drawing new readers to your blog and increasing your profile. That is unless, of course, you have some meaty response to make to a posting that could not possibly fit into 140 characters — but is perhaps the increasingly Twitterfied internet drawing blog readers away from that kind of reaction?

The Ugly

Having read some of the hair-raising examples in the Blogging Carnival of the things that can go so badly wrong with blogging and tweeting – for example getting fired for saying the wrong thing about one’s employers – I would have to say that I have, as of today, been spared any such experiences (and hope very much that things continue so).

The most apt sense of ugliness in reference to my own blogging is the literal one. I am not at all enamoured with my rather prosaic WordPress layout, but alas all the time I have to give to blogging is spent writing content rather than on aesthetics. But I know that is not what is really meant by the question.

Why blog?

Blogging Archaeology


From @OpenAccessArch over at http://dougsarchaeology.wordpress.com comes a simple, yet important and engaging question:

Why blogging? – Why did you, or if it was a group- the group, start a blog?

This is part of a so-called Blogging Carnival in the run up to next year’s SAA. So here goes…

The immediate answer is straightforward enough. I started my – subsequently rather neglected – blog because I wanted a platform where I could post personal musings and viewpoints, of generally low scholarly import, which I would not expect any peer-reviewed environment to publish, and probably would not want them to. Also, as the years advance and the brain cells retreat, having an archive, however irregular and infrequently accessioned, of what I was thinking about a particular subject at a particular time, is inherently useful. For example, I was having a discussion on Friday last week about the A G Leventis Cyprus Gazetteer project and its software requirements: we touched on the difficulties of defining lines between historic territories for which there were and are no contemporary maps (for example the later prehistoric periods in Cyprus). I remembered that I had blogged on a similar question that arose in the CHALICE project three years ago or so. Simply as an aide memoire of how we approached the problem back then helped a lot.

However, Doug’s question raises a deeper issue about ‘why blog?’, as opposed to ‘why do I blog?’ I was taken back to my first-year garret at Durham University, where – in 1995, before the ubiquity of email, before the e-publishing revolution (has there been one?) and, certainly, before Web 2.0 – I read Paul Bahn’s ‘Bluffer’s Guide to Archaeology’. One thing from this terrific little book that stuck with me was Bahn’s railing against those who dug but did not publish. Excavation was the experiment that destroyed its subject, not to publish your results promptly was (and is) an abdication of a sacred responsibility: Bahn spoke of an (unnamed) professor, a leader of his field, who had published nothing for decades as being ‘the clot that blocks the system’. But since 1995, the whole concept of ‘publishing’, never mind ‘scholarly publishing’ has been transformed out of any possible recognition by the web. While I doubt that anyone attending SAA needs to be told this, the question of ‘why blog’ in archaeology makes me wonder if we have really thought through the issues that change has wrought for our subject in as much detail as they require. In some ways perhaps we have. The thoroughly excellent Journal of Open Archaeological Data has bought us the concept that (digital) data can be published alongside scholarly articles, provided they are Open Access, and in a trusted repository which has a long term sustainability plan, and evidence of durability in the future.

But you only have to look at the mass of links the Doug alone has identified to see that any the age of sourcing archaeological discourse to only ‘professional’ or ‘official’ channels is long past. Intelligent and informed comment and analysis sits alongside the great mass of everything out there. While blogging will never, and should never, try to replace scientific documentation of archaeological site data, its very informality can drive interest in the quirky, the unusual, in subjects neglected by the scholarly discourse. It moves us even beyond the Hodderian notion of ‘multivocaility’, allowing a platform not merely many voices, but the many narratives that those voices tell. This has happened elsewhere in the Digital Humanities – witness Melissa Terras’s work on resource creation via amateur digitisation, which seeks out those topics neglected by official memory institutions. What is archaeological blogging but the ‘amateur’ – in the primary sense of the word – done for love – digitzation of ideas, outside the formal framework of documentation and publication?

Most of us can probably agree that the writing is on the wall (it has been for decades) for the Gibbonesque views of the past, a past told through the elite and literate eyes. In their marvellously entertaining UnRoman Britain (2010, The History Press), Miles Russell and Stuart Laycock note that “[i]t is perhaps the high visibility and obvious distinctiveness of Rome’s archaeological footprint that has caused disproportionate focus upon things that are more ‘Roman’ than the more ‘normal’ aspects that are, to coin a phrase, ‘UnRoman’” (21). Blogging, and the integration of other kinds of social media with the archaeological discourse, provide space for the discussion of such perspectives (there are some fantastic examples of this – eg Rita Roberts’s blog and Bones Don’t Lie), and encourage the process of re-evaluation and scrutiny of established narratives.    


So – blogging in archaeology means we do not have to extend the monolithic method-based systems which have bought us the established narratives by which we have come to know both the near and distant past, rather it allows us to expand (on) them by allowing those who are curious enough to form alternative narratives. Ian Hodder’s dictum that ‘interpretation occurs at the trowel’s edge’ may still hold true, but the changing nature of archaeological data and publishing means it can occur on the keyboard too. And this is what the SAA blog carnival is such a great idea.

Gazetteer of Byzantine Cyprus

Work is getting underway on our Gazetteer of Medieval Cyprus project. By ‘our’ I mean myself and my KCL colleagues Tassos Papacostas and Charlotte Roueche, and various others in the Department of Digital Humanities whose contributions will be kicking in soon. The project as a whole is funded by the A. G. Leventis Foundation. This is a pilot gazetteer to develop a methodology, which will be tested in the online publication of a small body of Byzantine material, already assembled by Tassos. The long-term aim is to provide archaeologists and historians studying Cyprus at any period with a freely available set of tools and skills. The medium-term aim is to provide a well-structured framework for the digital analysis and publication of materials from Byzantine Cyprus, from the end of Late Antiquity to the period of the Crusades (c. AD 650-1200). This will subsequently serve as the basis for other independently funded but interoperable projects to display, interlink and contextualize their data.


The collaborative landscape is fast moving. We have been talking to the excellent Pelagios project, and hope that our GBC will contribute data to the Pelagios federation. The exemplum is based around a Cyprus Gazetteer of places. The aim here is to draw on existing resources in order to build a stable and usable resource; while our work will only focus on ‘Byzantine’ places, the resultant resource will be designed to be  steadily expanded.  We will begin by linking toponym entries in the authoritative Complete Gazetteer of Cyprus / CGC (Christodoulou and Konstantinidis 1987) with their equivalents in GeoNames. This will mean, in the future, that ancient sites will be searchable by the modern administrative areas in which they lie, as well as being linkable via archaeological attributes such as feature type, time period, etc. It will also allow us to build a data structure that will be linkable to other geospatial resources on the web. A basic URI structure has been developed, which will allow the geographical hierarchy in the database to be described, and to be infinitely extensible. We begin by aligning placenames in the CGC with those in geonames, and then treat each archaeological entity identified in the exemplum Architectural Catalogue as a ‘child’ of that unique geographical entity, using the unique GeoNames reference number. The resource we are currently developing will have a user interface which will allow users to add archaeological entity points to this structure.

For me, the key interest of this project lies in the inversion of scale it brings to the digital gazetteer world. Many existing digital gazetteers, such as GeoNames, deal with very large geographic regions but the data associable with them is very ‘thin’ (for more discussion see Linda L. Hill’s Georeferencing: The Geographic Associations of Information, and my 2007 review of it). This, I think, is a natural imperative of the Geospatial Web to expand to wider and wider coverage. Cyprus, however, represents a relatively small geographic area, with a very thick and chronologically complex layer of data, with interconnections across the Aegean and Near East from the Bronze Age to the medieval period. This will present us with exciting opportunities to test how locations are ‘attested’ across many disparate sources, and how those attestations can be most usefully presented and documented.

Reconstruction, visualization and frontier archaeology

Recently on holiday in the North East, I took in two Roman forts of the frontier of Hadrian’s Wall, Segedunum and Arbeia. Both have stories to tell, narratives, about the Roman occupation of Britain, and in the current period both have been curated in various ways. At both, the curating authorities (Tyne and Wear Museums), with ongoing archaeological research being undertaken by the fantastic WallQuest community archaeology project.

The public walkthrough reconstructions of what the buildings and the contents might have been like at both sites pose some interesting questions about the nature of historical/archaeological narratives, and how they can be elaborated. At Segedunum, there is a reconstruction of a bath house. Although the fort itself had such a structure, modern development means that it is not in the same place, nor does the foundations of the reconstruction relate directly to archaeological evidence. The features of the bath house are drawn from composite analysis of bath houses from throughout the Roman Empire. So what we have here is a narrative, but it is a generic narrative: it is stitched together, generalized, a mosaic of hundreds of disparate narratives, but it can only be very loosely constrained by time (a bath house such as that at Segedunum would have had a lifespan of 250-300 years), and not to any one individual. we cannot tell the story of any one Roman officer or auxiliary solider who used it.

Reconstructed bath house at Segedunum

On the other hand at Arbeia, there are three sets of granaries, the visible foundations all nicely curated and accessible to the public. You can see the stone piers and columns that the granary floors were mounted on, to allow air movement to stop the grain rotting. Why three granaries for a fort of no more than 600 occupants? Because in the third century, the Emperor Severus wanted to conquer the nearby Caledonii; and for his push up into Scotland we needed a secure supply base with plenty of grain.

Granaries at Arbeia, reconstructed West Gatehouse in the background

This is an absolute narrative. It is constrained by actual events which are historical and documented. At the same fort is a reconstructed gateway, which is this time situated on actual foundations. This is an inferential narrative, with some of the gateway’s features being reconstructed again from composite evidence from elsewhere (did it have two or three stories? A shingled roof? We don’t know, but we infer). These narratives are supported by annotated scale models in the gateway structure which we, they paying public (actually Arbeia is free), can view and review at our leisure. This speaks to the nature of empirical, inferential and conjectural reconstruction detailed in a forthcoming book chapter by myself and Kirk Woolford (of contributions to the EVA conference, published by Springer).

Narratives are personal, but the can also be generic. In some ways this speaks back to the concept of the Deep Map (see older posts). The walkthrough reconstruction constitutes, I think, half a Deep Map. It provides a full sensory environment, but is not ‘scholarly’ in that it does not elucidate what it would have been like for a first or second century Roman, or auxiliary soldier to experience the environment. Maybe the future of 3D visualization should be to integrate modelling, reconstruction, remediation, and interpretation to bring available (and reputable) knowledge from whatever source about what that original sensory experience would have been – texts, inscriptions, writing tablets, environmental archaeology, experimental archaeology etc. In other words, visualization should no longer be seen as a means of making hypothetical visual representations of what the past might of been, but of integrating knowledge about the experience of the environment derived from all five senses, but using vision as the medium.  It can never be a total representation incorporating all possible experiences under all possible environmental conditions, but then a map can never be a total representation of geography (except, possibly, in the world of Borges’s On the Exactitude of Science).

N=all: Big Data and curation practices

Data curation and digital preservation are often confused, but they are very different things. Terminology is a big problem in this area, especially where common terms from one domain – e.g. ‘curation’ in a musuem or cultural heritage context – are used in another. So can the emerging debate on Big Data help us move forward on a definition of ‘digital curation’?

The current issue of ‘Foreign Affairs’ has a paper by Kenneth Cukier and Viktor Mayer-Schoenberger entitled ‘The Rise of Big Data: How It’s Changing the Way We Think About the World‘. In it they argue that big data represents an epistemic change n how we do statistics, from the model of extrapolating general trends of patterns and populations from small representative random samples, to generalised overviews of entire datasets using data mining. In this world, ‘N=all’. The latter, they argue, are both imperfect and about correlation, rather that causation. I.e. Google claims to be able to track flu outbreaks by correlating certain search terms; but it doesn’t claim to know the actual reason why people made those searches – which would be a ‘traditional’ statistical research question. Recently however, Google’s method dramatically overestimated peak flu levels; a cursory reminder that correlation and causation are very different things.

Cukier and Mayer-Schoenberger argue that big data research means ‘giving up on clean carefully curated data and tolerating some messiness’. They also argue that the process of ‘datafication’ – capturing more and more forms of intangible processes such as friendships (as in Facebook likes) thoughts (Twitter) and professional relationships (LinkedIn) means that this body of data is growing less formal even as it exponentially grows in volume.

For me this raises two questions:

1. What does this mean for a museum-focused definition of ‘curation’. Can we give up on cleanly curated museum and cultural heritage collections and tolerate messiness? If so how, and where does that data come from?

2. By what processes can ‘the museum experience’ be ‘datafied?’ I have an idea forming that this could be to do, at least partly, with removing some of the interaction between audience and collection from being time and space specific. E.g. I don’t have to actually go to the British Museum to encounter all aspects of the experience of the Pompeii exhibition as some of those aspects have been datafied by others (both employees of the BM and other visitors), and I can review them wherever or whenever I like.

The main question is what does ‘big data curation’ actually mean? I am not sure I agree with the definition implied in the Cukier/Mayer-Schoenberger view, where it is precluded. That a curated dataset is necessarily one that is ‘small data’, shaped, presented and processed by a series of well-understood human interventions into a human readable narrative.  However, they also make the very valid point that ‘in a world of big data, it is the most human traits that will need to be fostered – creativity, intuition, and intellectual ambition’. So whereas the present understanding in cultural heritage of what ‘curation’ means – the communication of a story or narrative of a collection of objects for an audience of specialists and/or non-specialists, where N can never = all – in big data terms, it means taking the imperfections of correlation across patterns in big data, and refining these by bridging with communities of experts – experts with the uncomputable human traits to take the broad brushstrokes that software tools are pulling out of our datafied world, and make worldly sense of them.

To crowd-source or not to crowd-source

Shortly before Christmas, I was engaged in discussion with a Swedish-based colleague about crowd-sourcing and the humanities. My colleague – an environmental archaeologist – posited that it could be demonstrated that crowd-sourcing was not an effective methodology for his area. Ask randomly selected members of the public to draw a Viking helmet. You would get a series of not dissimilar depictions – a sort of pointed or semi-conical helmet, with horns on either side. But Viking helmets did not have horns.

Having recently published a report for the AHRC on humanities crowd-sourcing, a research review which looked at around 100 publications, and about the same number of projects, activities, blogs etc, I would say the answer to this apparent fault is: don’t identify Viking helmets by asking the public to draw them. Obvious as this may sound, it is in fact just an obvious example of a complex calculation that needs to be carried out when assessing if crowd-sourcing is appropriate for any particular problem. Too often, we found in our review, crowd-sourcing was used simply because there was a data resource there, or some infrastructure which would enable it, and not because there was a really important or interesting question that could be posed by engaging the public – although we found honourable exceptions to this. Many such projects contributed to the workshop we held last May, which can be found here. To help identify which sorts of problems would be appropriate, we have developed – or rather, since this will undoubtedly involve in the future, I should say we are developing – a four facet typology of humanities crowd-sourcing scenarios. These facets are asset type (the content or data forming the subject of the activity), process type (what is done with that content) task type (how it is done), and the output type (the thing, resource or knowledge produced). What we are now working on is identifying – or trying to identify – examples of how these might fit together to form successful crowd-sourcing workflows.

To put it in the terms of my friend’s challenge: an accurate image of a Viking helmet is not an output which can be generated by setting creative tasks to underpin the process of recording and creating content, and the ephemeral and unanchored public conception of what a Viking helmet looks like is not an appropriate asset to draw from. Obvious as this may sound, it hints that a systematic framework for identifying where crowd-sourcing will, and won’t, work, is methodologically possible. And this could, potentially, be very valuable as the humanities faces increasing interest from well-organized and well-funded citizen science communities such as Zooniverse (which already supports and facilitates several of the early success stories in humanities crowd-sourcing such as Ancient Lives and OldWeather).

This of course raises a host of other issues. How on earth can peer-review structures cope with this, and should they try to? What motivates the public, and indeed academics, to engage with crowd-sourcing? We hint at some answers. Transparency and documentation is essential for the former area, and we found that in the latter, most projects swiftly develop a core community of very dedicated followerswho undertake reams of work, but – possibly like many more conventional collaborations – finding those people, or letting them find you, is not always easy.

The final AHRC report is available: Crowdsourcing-connected-communities.

Visualising the visualisers

The main event to be at in London this month is, of course EVA London 2012, from the 10th to the 12th. There is, apparently, some kind of sporty shindig going on out Stratford way, but I don’t really know very much about it. Maybe it  would have helped if the media had covered it a bit more.

As outgoing editor of EVA’s Proceedings, I’ve prepared some rudimentary visualisations of the conference’s content, which seemed sort of appropriate. The number of papers involved is far too small to have any statistical significance, which means I can’t hope to do anything on the scale of David McCandless’ Information is Beautiful work, or the UCL Digital Humanities Infographic, plus life got in the way of my grand plans to do anything more sophisticated. I nod respectfully in the direction of both however, as the ultimate source of the thought that this might be an interesting exercise. More/better visualisations may well be added in the course of the conference next week, as the inspiration sinks in.

In each case, you can click on the image for a bigger/more legible version.

1. A Wordle of the full text of the EVA London 2012 Proceedings


Wordles can, of course, lead you up erroneous pathways, but herewith the full (unlemmatized) text of the Proceedings (courtesy of www.wordle.net). There is no significance in the layout, but the size of the font is proportional to the frequency of the word.

2. A Wordle of the keywords supplied by accepted EVA authors when they submitted their abstracts


Lemmatized this time, and regularised for spelling variants (e.g. ‘Visualization’ changed to ‘Visualisation’).

3. Topic areas covered (as determined by the EVA London 2012 Programme Committee)


Categorisations made by the EVA Committee when developing the programme back in March. This, and the following two images, were developed using IBM’s Many Eyes software.

4. Institutional affiliations of EVA 2012 authors, by number of instances of authorship or co-authorship


5. Current national affiliations of EVA 2012 authors