Happy 2012

My blogging has been somewhat quiet for the last couple of months (well, non existent really). Normal service has now been resumed.  This is partly due to many of my waking hours being taken up with the Digital Exposure of English Place-names project, a JISC mass digitization content effort to digitise the entire corpus of the Survey of English place-Names (earning an hon mensh in the Times Higher). This is a fantastic project with CDDA in Belfast, Nottingham and Edinburgh. It will make SEPN available as a linked data gazetteer via Unlock Text, and as downloadable XML for text mining and visualization.

Also, just before Christmas, I was in the fair city of Umeå, at a NEDIMAH workshop on information visualization. Our homework is to gather evidence about important topics in this area. Still digesting it really, but will try to amass such evidence here.

 

Semantic MediaWiki: a tool for collaborative databases

This promises to be a great event at KCL later this month:

Semantic MediaWiki: a tool for collaborative databases

Monday 26th September 2011

Anatomy Theatre and Museum, King’s College London
6th floor, King’s Building, Strand Campus, London WC2R 2LS

In association with Judaica Europeana, the British Library and the European Holocaust Research Infrastructure (EHRI) Project

On 26th September, the Centre for e-Research will host a day exploring the Semantic MediaWiki, led by New York City-based developer Yaron Koren.

Please register for the event(s) you wish to attend using the links below.

WORKSHOP: Semantic MediaWiki: a practical workshop

15:30 – 17:00, Anatomy Museum

The first part of the day will consist of an interactive seminar in the Anatomy Museum led by Koren, demonstrating the principles of Semantic MediaWiki. Participants will have an opportunity to create and use their own data structures on a public test wiki. The workshop will be of particular interest to people interested in the development and application of wiki technologies, and their place in digital research infrastructures.

Please register to attend at: http://www.eventbrite.com/event/1519008395

LECTURE: The Judaica Europeana Haskala (Jewish Enlightenment) database

18:00, Anatomy Lecture Theatre (TBC) followed by refreshments (All welcome)

With an introduction by Lena Stanley Clamp, Director, European Association for Jewish Culture

This seminar will give an overview of Semantic MediaWiki, with a special focus on the Judaica Europeana Haskala database of the Jewish Enlightenment literature, which is currently being converted into an SMW system. While the focus of the lecture will be on Semantic MediaWiki, the lecture will be of relevant to broad aspects of e-Research. The British Library is a partner in the Judaica Europeana project to assist with technical advice and dissemination.

There will also be a short introduction to the EU-funded EHRI project by Tobias Blanke.

Please register to attend at: http://www.eventbrite.com/event/1995834595

About Semantic MediaWiki

Semantic wikis are a technology that combines the massively collaborative abilities of a wiki with the well-defined structure and data-reusability of a database. Semantic MediaWiki is an extension, first developed in 2005, that adds this capability to MediaWiki, the popular open-source wiki application best known for powering Wikipedia. SMW is by far the most successful semantic wiki technology, currently in use on hundreds of wikis around the world, including internal use at major companies like Audi and Boeing.

There are a set of additional MediaWiki extensions that work alongside Semantic MediaWiki to extend its functionality; Semantic MediaWiki is almost always used together with one or more of these, and the term ‘Semantic MediaWiki’ is sometimes used to describe the entire set. Using Semantic MediaWiki and its related extensions, one can easily create custom data structures that provide forms for letting users add and edit data, and whose data can be queried and displayed in a variety of ways, including tables, charts, maps and calendars.

About the speakers

Yaron Koren is one of the main developers of Semantic MediaWiki. He has been involved with the project since 2006; and runs the MediaWiki consulting company WikiWorks, and helps to run the MediaWiki-based wiki farm Referata. Yaron grew up in Israel and the United States, and currently lives in New York City.

Lena Stanley Clamp is the director of the European Association for Jewish Culture and manager of the Judaica Europeana project, which will contribute vast quantities of digital content documenting Jewish life in Europe to Europeana – Europe’s libraries, archives and museums online.

Bobby on the beat

Finally, a footnote on Greyfriars Bobby. Bobby was, of course, the Skye terrier (my auto spell-check changed this to Skype terrier, thus surely coining a term for a person who IMs you incessantly) whose loyalty and devotion in refusing to abandon his departed master Auld Jock’s grave inspired generations. So what genius thought this would be a good sign to put at the Kirkyard entrance? Or is the cold-hearted Sexton alive and well, and working for Edinburgh council?

Bobbies keep out

Fringe Benefits

I saw two shows at the Fringe. Sammy McMillan, aka Sammy J, and ‘Randy’, his bug-eyed purple puppet, was highly amusing. The Carroll Myth by the Schmuck’s Theatre Company was rather meatier, featuring the dark shadows around his ‘friendship’ with the 11 year old Alice Pleasance Liddell, and his own descent into madness. A cast of characters including the Mad Hatter, Tweedle-Dum and Tweedle-Dee, the Walrus, a trio of Cheshire Cats etc swirl around the hapless Carroll, with highly effective presence and undoubtedly accomplished menace.  An interesting interpretation of what one might call an individual’s ‘personal myth’, and a nice reflection on the jumbled lines joining that myth to literature and to popular culture (and perception). It makes you wonder what other artefacts, literary or otherwise, one could trace a ‘myth’ through. All in all, it was decidedly odd but rather clever.

On Stallman and Surveillance

Key billing at the Turing Festival was, of course, Richard M. Stallman, the so-called ‘prophet of free software’. He delivered a ringing peroration, indicting proprietary software in all its forms and, occasionally throughout the meeting, engaging in spirited discourse with those who react less strongly to the ‘i-Bad’, and the ‘Amazon Swindle’.

Richard M. Stallman, GNU in hand, addressing the Turing Festival

Stallman identified a number of threats to our freedom online and, by extension, to our freedom overall in the Information age. The surveillance carried out by governments via our own devices and the reporting of our activities to others is one such threat. Mobile phones which send your GPS location to third parties is another. Various online features of Windows, and ‘like’ buttons on Facebook, all allow data on us to be harvested. Remote surveillance is carried out via systems that are not ours, for example ISPs keeping records about users. This can be used to attack democratic activity. And data gathered for the most legitimate of reasons can still be absued by future regimes. ANY data retention, Stallman argued, is dangerous. In a free society you are not guaranteed anonymity – you can be recognized in the street. But it is diffuse, it cannot be collated easily. With computerization and digitization, all this can be indexed. Censorship is another threat, even in the supposedly democratic West.

Stallman also discussed ‘threats’ posed by proprietary standards, whose source elements are not viewable by their users. Of course, the recent experience of the Digital Humanities suggests that matters influencing, or limiting the application of free standards are not limited to the mechanics of what is open and what is not. Followers of the travails of the TEI on Twitter and elsewhere will know that openness in governance and administration is just as importance as openness of schemata and documentation. One cannot detatch the one from the other, as one risks doing if one simply demands that the source be open.

It is difficult not to admire the elegance of Stallman’s dictum that ‘either users control their programmes or the programmes control the users’; and few, outside the neoist of neo-cons, doubt the horrors inflicted on Americans and non-Americans alike by the reactionary and deeply unpatriotic PATRIOT act. But one does perhaps have  to wonder if, even in our ultra-technologized age, all this rests on the assumption that we *have* to give up our freedoms to technology in the first place. If I assume that anything I write in an email might potentially become public – just ask the climate scientists at UEA’s Climate Research Unit – then what does it matter if Windows is tracking my emails through Outlook? Stallman also made the point that Open Source communities are typically more interested in improving their code bases rather than enabling the users by making the software. Again, one needs to question why, exactly, there should be such a stark either or approach. I suppose this might take on a very different perspective, or set of perspectives, if one is using open vs. Proprietary software in the development of products or commercial services, or dealing with particularly sensitive information. But in the academic humanities, the question of whether this is something that should really bug us. Is it really making ‘war on sharing’ to point out that there is a trade-off between (say) the ease of using ESRI Arc products versus GRASS, or if should really bother us. Stallman surely has a point when he says that big corporations make universities dependent on their products by providing cheap site licences, but if it provides a level playing field across the ac.uk domain, doesn’t this allow us to make better use of our fEC ravaged budgets? And if Autodesk wants to burrow into the code underneath MiPP’s reconstructions using some clever Trojans that they installed alongside our software without telling us, then good luck to them. They could save themselves the effort by simply downloading it from our website, where we make it available for free.

Other highlights of the Festival included a hugely entertaining talk by David McCandless on data visualization. Rather reminiscent of Stephen Levitt and Stephen J. Dubner’s Freakonomics, McCandless’s thesis is that any data, anywhere, can be visualized in some way. Well worth checking out his website.  Also were Arjan Haring and  Maurits Kaptein from Persuasion API, talk on the science of persuasion. I guess I need to get some advice from them on writing grant applications.

Turing Festival

It’s been a busy weekend at the fantastic Turing Festival on Edinburgh’s Fringe. One can only hope that this kick-off to the Turing Centenary year leads to Alan Turing, one of the great geniuses of the twentieth century, gaining the historical recognition he deserves.

Dome of the Surgeons' Hall, Edinburgh, where the Turing Festival was held

It was very useful to be able to think through some of the issues that MiPP has raised. What have found in this project is the potential – and potential only, really, since it was capital grant rather than a research project – for embodiment  based on actual people in heritage visualization, rather than simple representation. Even if the latter is based on motion capture (which, as far as I know is rare), it is usually only employed to generate scenarios which are, effectively, digital surrogates of re-enactments. Despite stimulating conversations, and some differing views, within the MiPP team, I still do not believe that this is what the project is, or should be, doing: rather we are seeking to demonstrate that it is *OK* to use conjecture or interpretation, provided that the provenance of the reconstruction in question is crystal clear, and that a conjectured model of, say, an Iron Age round house dweller sweeping or querning is not based on direct empirical evidence, but is rather derived from it, albeit by circuitous interpretive routes. Surely this should be the principle behind all archaeological illustration anyway?

 

The Wall: day 1

Here I am in Heddon on the Wall, 15 miles walk west of Newcastle. The guidebooks say this is the most lacklustre stretch, taking one along miles of Tyne river bank, city Quayside, greenfield suburbia and, at one point, the Wylam Waggonway, a dismantled railway connecting the eastern fringes of Northumberland with the big city. I, however, found this insight into Newcastle’s industrial history fascinating: the ghosts of ships and coal are just as much part of this region’s past as Roman centurions and Celtic warriors. And by the way, the guidebooks also got it wrong when they warned me that I faced threats and abuse by ne’er-do-well Wallsend locals. I had several cheery Good Mornings, and even a couple of Good Luck, Mates. The way the planners have knotted together ‘Hadrian’s Way’, as the path is known as it winds south of the Wall’s actual course through Benwell and Denton in Newcastle itself, is very clever. All those different pathways, created at different times, for different reasons. When we plot pathways and networks on maps of the ancient world, what now-vanished social, political and economic complexities are we unwittingly overwriting? If, somehow, we forgot that Hadrian’s Wall began at Segendunum and continued to Heddon, how would we recall the composite significance of Hadrian’s Way?

More here tomorrow, and for the rest of this week.

The first bit of the Wall at Segendunum. The site's viewing tower can be seen in the background.

The first bit of the Wall at Segendunum. The site’s viewing tower can be seen in the background.

MIPP: Forming questions (addendum)

By way of a little follow up to yesterday’s post on MiPP, I am currently reading Hunter Davis’s A Walk Along the Wall, in preparation for my own walk along there next month (in aid of Cancer Research UK). He says ‘if they can reproduce a fort on a painting, why can’t it be done in real life? I wouldn’t have been put off the Romans for twenty years, not if I could actually have seen something [emphasis in original]’.

 

CHALICE use case

Jo and I recently met with Stuart Jeffrey and Michael Charno at the Archaeology Data Service in York, to discuss a putative third CHALICE use case. The ADS is the main repository for archaeological data in the UK, and thus has many potential crossovers with CHALICE, and faces many comparable issues in terms of delivering the kind of information services its users want.

Much of the ADS’s discovery metadata as far as topography is concerned is based on the National Monument Record (NMR); and therefore on modern placenames. The ADS’s ArchSearch facility is based on a facetted classification principle: users can come into the system from a national perspective, and use parameters of ‘what’, ‘when’ and ‘where’ to pare the data down until they have a result set that conforms to their interests, with the indexing and classification into facets undetaken by ADS staff during the accession process. In parallel with this, the ADS has experimented with NLP algorithms to extract place types – types of monument, types of site, types of feature etc from so-called ‘greay Literature’, employing the MIDAS period terms. The principle of using NLP to build metadata is not in itself unproblematic: many depositors prefer to be certain that *they* are responsible for creating, and signing off, the descriptive metadata for their records. As with other organizations that we’ve spoken to, Stuart noted that georeferencing collections according to county > district > parish can create problems due to boundary changes; also many users do not necessarily approach administrative units in a systematic way. For example, most people would not, in their searching behaviour, characterize ‘Blackpool’ as a subunit of ‘Lancashire’. This throws up interesting structural parallels with what we heard from the CCED project. Another good example the ADS recently encountered, is North Lincolnshire, which is described by Wikipedia as “a unitary authority area in the region of Yorkshire and the Humber in England… [and] for ceremonial purposes it is part of Lincolnshire.” This came up while creating a Web service for the Heritage Gateway for them. It was assumed that users would naturally look for North Lincolnshire in Lincolnshire, however the Heritage Gateway used the official hierarchy, which put North Lincolnshire in Yorkshire and the Humber. They were working on addressing that in the next version of their interface.

It was strongly agreed that there is a very good case to be made for using CHALICE to enrich ADS metadata with historical variants, and that those wishing to search the collections via location would benefit from such enrichment. This view of things sits well alongside the CCED case (which focuses on connections of structure and georeferenceing) and VCH (which focuses on connections between semantic entities). What is interesting is that all three cases have different implications for the technology, costs and research use: in the next three months or so the project will work on describing and addressing these implications.

Theoretical Archaeology Group Conference: Presentation

Reposted from the MiPP project blog.

I gave a presentation on MiPP at the TAG conference in Bristol before Christmas, in the session organized by CASPAR entitled ‘Audio-visual practice-as-research in archaeology’. The crux of the presentation was the present-day MoCap data that we gathered from Sue et al at the site this summer, what we are doing with it, and what we would like to do with it. Currently, in my mind at least, this centres on the typology of movement that we’re developing – reviewing the footage and identifying entities of posture, task, instrument and target, and building links between them. In that sense, it is more of a taxonomy (i.e. hierarchical), rather than an ontology (i.e. flat; relationship-based).  This, I think, could be very illuminating in terms of understanding archaeological practice; but of course we have to avoid be overly reductionist: every archaeologists is unique of course, and we must be clear that the typology is a means of reflecting that practice and representing it in a systematic way, rather than pigeonholing what archaeologists actually do in the field. Also, while preparing the paper, it struck me that among the things we will have to address for DEDEFI purposes are practical questions such as cost (the suits are currently prohibitively expensive for any excavation project to purchase themselves); practicality in terms of staff and infrastructure needed on site (Animazoo had to have a heavy direct involvement in our work at Silchester), ethics and privacy. And, to cull from the presentation before mine, distinguishing the kind of archaeological practice we are interested in from ‘weird practices’; which may have nothing to do with the archaeological process.

As always with these presentations, it was the questions afterwards which were really interesting (although alas I had to leave before the general discussion at the end of the day, as legions of snow clouds closed in on southern England). It was clear, once again, that engagement with other archaeological practitioners is key of MiPP is to be a success; but that a project which is about process rather than material needs  to have its proper archaeological context spelled out if that engagement is to happen.  I suspect, however, that once the second stand of the project – the dynamic reconstructions – are under way and demonstrable in a more final form; this will actually be very much easier. We must also link these processes to current discussions about agency and materiality, as discussed for example by Martin Wobst. Ruth Tringham of UC Berkeley indicated that similar issues had come up in her team’s thinking about process at Catalhoyuck. I was asked what merits the various motion capture systems have over simply videoing the excavators at work in HD: this indicates to me that we need to investigate, document and demonstrate in a very robust way the functionaries that the bvh and .fbx viewers that we are using can bring for panning, zooming, viewing the data from multiple angles in 3D and – critically – linking the data with the archaeological data that is there: these critical advantages over standard video are extremely important for the question of ‘why’, as opposed to ‘how’ do we take MoCap out of the studio. A further functionality which I think we need, which struck me when was reviewing the data earlier this week, is that we need the subject’s line of site to be projected onto the floor surface. This is not obvious in the current footage, and yet it is central to documenting the subject’s relationship with his or her material. Finally, I was asked about capturing the movements of larger numbers of people at the same time. This, of course, was originally envisaged as part of MiPP, but had to be abandoned due to technological constraints. Of course this would open the process up to capturing the pathways of visitors through, and around, sites.

Overall – still much to do, but I sense that some really interesting issues are beginning to emerge.