A History of Place – Stuart Dunn

A History of Place 3: Dead Trees and Digital Content

The stated aim of this series of posts is to reflect on what it means to write a book in the Digital Humanities. This is not a subject one can address without discussing how digital content and paper publication can work together. I need to say at the outset that A History of Place does not have any digital content per se. Therefore, what follows is a more general reflection of what seems to be going on at the moment, perhaps framing what I’d like to do for my next book.

It is hardly a secret that the world of academic publication is not particularly well set up for the publication of digital research data. Of course the “prevailing wind” in these waters is the need for high-quality publications to secure scholarly reputation, and with it the keys to the kingdom of job security, tenure and promotion. As long as DH happens in universities, the need to publish in order to be tenured and promoted is not going to go away There is also the symbiotically related need to satisfy the metrics imposed by governments and funding agencies. In the UK for example, the upcoming Research Excellence Framework exercise explicitly sets out to encourage (ethically grounded) Open Access publication, but this does nothing to problematize the distinction, which is particularly acute in DH, between peer-reviewed research outputs (which can be digital or analogue) and research data, which is perforce digital only. Yet research data publication is a fundamental intellectual requirement for many DH projects and practitioners. There is therefore a paradox of sorts, a set of shifting and, at times, conflicting motivations and considerations, which those contemplating such are faced with.

It seems to be that journals and publishers are responding to this paradox in two ways. The first facilitates the publication of traditional articles online, albeit short ones, which draw on research datasets which are deposited elsewhere, and to require certain minimum standards of preservation, access and longevity. Ubiquity Press’s Journal of Open Archaeological Data, as the name suggests, follows this model. It describes its practice thus:

JOAD publishes data papers, which do not contain research results but rather a concise description of a dataset, and where to find it. Papers will only be accepted for datasets that authors agree to make freely available in a public repository. This means that they have been deposited in a data repository under an open licence (such as a Creative Commons Zero licence), and are therefore freely available to anyone with an internet connection, anywhere in the world.

In order to be accepted, the “data paper” must reference a dataset which has been accepted for accession in one of 11 “recommended repositories”, including, for example, the Archaeology Data Service and Open Context. It recommends that more conventional research papers then reference the data paper.

The second response is more monolithic, where a publisher takes on both the data produced by or for the publication, and hosts/mounts it online. One early adopter of this model is Stanford University Press’s digital scholarship project, which seeks to

[A]dvance a publishing process that helps authors develop their concept (in both content and form) and reach their market effectively to confer the same level of academic credibility on digital projects as print books receive.

In 2014, when I spent a period at Stanford’s Center for Electronic and Spatial Text Analysis, I was privileged to meet Nicolas Bauch, who was working on SUP’s first project of this type, Enchanting the Desert. This wonderful publication presents and discusses the photographic archive of Henry Peabody, who visited the Grand Canyon in 1879, and produced a series of landscape photographs. Bauch’s work enriches the presentation and context of these photographs by showing them alongside viewsheds of the Grand Canyon from the points where they were taken, this providing a landscape-level picture of what Peabody himself would have perceived.

However, to meet the mission SUP sets out in the passage quoted above requires significant resources, effort and institutional commitment over the longer term. It also depends on the preservation not only of the data (which JOAD does by linking to trusted repositories), but also the software which keeps the data accessible and usable. This in turn presents the problem encapsulated rather nicely in the observation that data ages like a fine wine, whereas software applications age like fish (much as I wish I could claim to be the source of this comparison, I’m afraid I can’t). This is also the case where a book (or thesis) produces data which in turn depends on a specialized third-party application. A good example of this would be 3D visualization files that need Unity or Blender, or GIS shapefiles which need ESRI plugins. These data will only be useful as long as those applications are supported.

My advice therefore to anyone contemplating such a publication, which potentially includes advice to my future self, is to go for pragmatism. Bearing in mind the truism about wine and fish, and software dependency, it probably makes sense to pare down the functional aspect on any digital output, and focus on the representational, i.e. the data itself. Ideally, I think one would go down the JOAD route, and have one’s data and deposit one’s data in a trusted repository, which has the professional skills and resources to keep the data available. Or, if you are lucky enough to work for an enlightened and forward-thinking Higher Education Institution, a better option still would be to have its IT infrastructure services accession, publish and maintain your data, so that it can be cross-referred with your paper book which, in a wonderfully “circle of life” sort of way, will contribute to the HEI’s own academic standing and reputation.

One absolutely key piece of advice – probably one of the few aspects of this, in fact, that anyone involved in such a process would agree on – is that any Universal Resource Indicators you use must be reliably persistent. This was the approach we adopted in the Heritage Gazetteer of Cyprus project, one of whose main aims was to provide a structure for URI references to toponyms that was both consistent and persistent, and thus citable – as my colleague Tassos Pappacostas demonstrated in his online Inventory of Byzantine Churches on Cyprus, published alongside the HGC precisely to demonstrate the utility of persistent URIs for referencing. As I argue in Chapter 7 of A History of Place in fact, developing resources which promote the “citability” of place, and link the flexibility of spatial web annotations with the academic authority of formal gazetteer and library structures is one of the key challenges for the spatial humanities itself.

I do feel that one further piece of advice needs a mention, especially when citing web pages rather than data. Ensure the page is archived using the Internet Archive’s Wayback Machine, then cite the Wayback link, as advocated earlier this year here:

When I cite a website in my research, I've recently made a point of archiving it in the Wayback Machine whenever possible and using the resulting URL in my footnotes. I've long since grown tired of broken links even in publications only a few years old. https://t.co/fhbOj6Lvwz

— André Brett (@DrDreHistorian) March 4, 2019

This is very sound advice, as this will ensure persistence even the website itself depreciates.

Returning to the publication of data alongside a print publication however: the minimum one can do is simply purchase a domain name and publish the data oneself, alongside the book. This greatly reduces the risk of obsolescence, keeps you in control, and recognizes the fact that books start to date the moment they are published by their very nature.

All these approaches require a certain amount of critical reduction of the idea that publishing a book is a railway buffer which marks the conclusion of a major part of one’s career. Remember – especially if you are early career – that this will not be the last thing you ever publish, digitally or otherwise. Until those bells and whistles hybrid digital/paper publishing model arrive, it’s necessary to remember that there are all sorts of ways data can be preserved, sustained and form a valuable part of a “traditional” monograph. The main thing for your own monograph is to find the one that fits, and it may be that you have to face down the norms and expectations of the traditional academic monograph, and settle for something that works, as opposed to something that is perfect.

A History of Place 2: Indexing

I opted to compile the index of A History of Place myself. I made this choice for various reasons, but the main one was that the index seemed to me to be an important part of the volume’s framing and presentation. Reflecting on this, it seems a little ironic, as in some ways a book’s index exemplifies the age of the pre-digital publication. Using someone’s pre-decided terms to navigate a text is antithetical to the expectations and practices of our Googleized society. Let’s face it, no one reading the e-version of A History of Place is ever going to use the index, and in some ways compiling the index manually, reviewing the manuscript and linking key words with numbers which would, in due course, correspond with dead-tree pages felt almost like a subversive act.

But like an expertly curated library catalogue, an expertly compiled index is an articulation of a work’s structure and requires a set of decisions that are more complex than they may at first seem. These must consider the expectations and needs of your readers, and at the same time reflect, as accurately as possible, the current terminologies of your field. The process of indexing made me realize it gives one a chance (forces one in fact) to reflect – albeit in a bit of a hurry – on the key categories, terminology and labels that oneself and one’s peers use to describe what they do. It thus forces one to think about what terms mean, and which are important – both to one’s own work, and to the community more broadly (some of whom might even read the book).

There is also the importance of having a reliable structure. As I outline in the book itself, and have written elsewhere in relation to crowdsourcing, some have argued that using collaborative (or crowdsourced) methods to tag library catalogues for the purposes of searching and information retrieval disconnects scholarly communities from the ‘gatekeepers of the cultural record’, which undermines the very idea of the academic source itself (Cole & Hackett, 2010: 112–23) [1]. Cole and Hackett go on to highlight the distinction between “search” and “research”; whereby the former offers a flat and acrticial way into a resource (or collection of resources) based on user-defined keywords, whereas the latter offers a curated and grounded “map” of the resource. While, in this context, Cole and Hackett were talking about library catalogues, exactly the same principle applies book indices.

I don’t wish to overthink what remains, after all, a rather unglamorous part of the writing process; however even in the digital age, the index continues to matter. Even so, there is no shame at all in busy academics (or any other writers) delegating the task of compiling an index to a student or contract worker, provided of course that person is fully and properly paid for their efforts, and not exploited. But I think it is necessary to have a conversation with that person about strategy and decision making. What follows is some examples from A History of Place which exemplify issues which authors might wish to consider when approaching their index, and/or discussing with their indexer. By discussing these examples I try to explore the decisions I made about which terms and sub-terms I decided to include, and why.

To begin with the practicalities, the wise advice provided by Routledge was:

You don’t have to wait for the numbered page proofs of your book to arrive – start to think about entries when you have completed the final draft of your typescript. The index is always the last part of the book to be put together and submission of your final copy will be subject to a tight deadline. Preparing it now may save you time later on. [emphasis added]

I would suggest that it is a good idea to think about these even before the numbered proofs turn up.

And then

On receipt of [numbered proofs], you should return to your already-prepared list of words. Use the numbered proofs to go through your book chapter by chapter and insert the page numbers against each entry on your list. (You can use the ‘Find’ function to locate words within the proof PDF.)

The gap between compiling your original list and adding page numbers will help you to evaluate your designated entries once more. Have you missed anything obvious? Are your cross references accurate and relevant? Revisit the questions under the heading ‘Choice of Entry’.

When you are satisfied that your index is complete, put it into alphabetical order.

You will come to love proper names in the early stages of this process. For example there is only one way you can represent Abraham Ortelius, or Tim Berners-Lee in your index; and no decisions involved in how to define the page limits for the references to them.

However, the process of selecting abstract terms for inclusion is more challenging. There were arguments both for and against including the word “Bias”, for example. All maps are biased of course, and in theory this could have applied to most of the examples I discuss. However, it forms an important topic of much recent literature on neogeography (for example), which address the ways in which neogeographic platforms perpetuate social bias due to their demographies (mostly white, male, Western etc). Therefore, inclusion made sense as it referenced explicit discussion of bias in secondary literature (mostly in the chapter on neogeography). It was possible to connect this to “collective bias” via the cross-referencing option of “see also”, of which Routledge advises:

See

If the entry is purely a cross reference, the entry is followed by a single space, the word ‘see’ in italics and the cross reference. For example:

sensitivity see tolerances

Note that under the entry for ‘tolerances’ there is no cross reference back to ‘sensitivity’. Page numbers should not be stated where ‘see’ is used.

See also

This should be used to direct the reader to additional related information.

This is a useful distinction, because it forces one to consider whether terms are synonymous versus relevant. “Bias” and “collective bias” is a good example as the original term is somewhat fluid and required some pre-hoc consideration but is clearly different from “collective bias”.

Highly specific and specialized terms presented less of a problem. Chorography, for example, features prominently in my index, but it could potentially have had any number of “see also…” references. However, given it is such a specialized term, I made a pragmatic decision (based partly on what I thought a reader using the index would need/want) to have it simply standalone, with no cross-references at all.

The most challenging terms were the big, important ones with multiple potential meanings. “GIS” is probably the most obvious example for A History of Place. Most of my arguments touch in some way on how spatial thinking in the humanities has emerged from, and been shaped by, GIS and related technologies, so the challenge was to divide the term up in to subsections which are a) useful for a potential reader, and b) reflective of disciplinary practices. My strategy was to treat branches of GIS which have been explicitly recognized and differentiated in the literature – such as Critical GIS; Qualitative GIS; Participatory GIS Historical GIS and Literary GIS – as separate index terms, linked as “see also” references. These are then tied only to specific occurrences of that term in each case. For discussions of GIS not explicitly relating to those terms, I used “and…” references which were tied to my chapter themes. This enabled me to divide the myriad references to GIS into sections which accord logically with the book’s structure – “- and archaeology” “- and and spatial analysis”, “-and text”, “-and crowdsourcing” and so on.

“Neogeography” created similar problems, but this type of term is compounded when the field moves so quickly. A recent paper by Linda See and others illustrates just how difficult this term is to pin down. I think all I can draw from this is that such index terms will need some considerable revisiting in the event of there being any future editions(!).

So, the agenda for that initial conversation with your indexer should, I would suggest, include:

Strategies for dealing with abstract terms, and deciding which are relevant and which are not
Highlight important, wide ranging terms, and what sub-categories you think they should have
How to identify specific terms which may or may not need “see also” references
Which sort of circumstances demand you to signpost between related terms using the “see” option.
Flag terms – for your won reference if nothing else – that may not be easily “future proofed”.

[1] Cole, R., & Hackett, C. (2010). Search vs. Research: Full-text repositories, granularity and the concept of “source” in the digital environment. In C. Avery & M. Holmlund (Eds.), Better off forgetting? Essays on archives, public policy and collective memory (pp. 112–123). Toronto.