Researching the Seventeenth Century Online: Tools of the Trade

[I posted this in 2014, but since so much of the EEBO TCP database came into the public domain in January 2015 I thought it worth updating.]

For those who come to this blog from academia, this is probably a post you can skip, but for people in other walks of life I thought it might be worth submitting a short piece on some of the basic tools of the trade.

When I first started researching the early modern period, in the 1970s, I spent nearly all my time in the Rare Books Room at Cambridge University Library with several ancient tomes in front of me, something like this:

Mumby Rare Books RoomThe room has been completely redesigned since my postgraduate days, but it’s the same basic process. These days, I still do a lot of my real research in the same room, but a lot of the time I’m working on one of the computers in a glass-partitioned area at the back of the room. This is because much of the corpus of early modern books in English (which is what I mostly work on) is available online at  Early English Books Online (EEBO). Yes, you need a password and log-in before you can actually access the database, and subscription is through institutions, not issued on an individual basis, which effectively locks the average person out, but there is some good news for Joe public, which I’ll come to later. Basically, what  subscribers get from EEBO is a PDF image of the original text. When it first started going online (in phases, during the 1990s) it was a radical improvement on microfilm, which was fiddly to use, gave you headache and came a poor second to having the actual book in your hand. EEBO PDFs can be viewed page by page online or downloaded as a single PDF file, making it an acceptable – even, sometimes, a preferred – alternative to reading the actual printed book.Now, though, even that’s been superseded. EEBO PDFs, with all their advantages, were not text-searchable, meaning you had to – gasp! – read stuff to know whether it was relevant to what you were researching or not. The Early English Books Online Text Creation Partnership changed all that. Until January 2015, it too was only accessible via subscription, but these days some 25,000 texts are in the public domain. Yes, you read that right – 25,000! Go to one of the search pages (I usually use the Boolean search) and enter some search terms. For example, if you search for Shakespeare or Shaksper and plays you get these results. Even without logging in you can find out that (at the last count) there are 75 publications prior to 1700 which contain these terms. If you can log in, you’d be able to read the exact pages on which the search terms occur, as well as being able to search for other terms in any of the 75 search results. Even if you can’t log in you will still  be able to see the full text of the public domain items and search for other lexical terms within those texts. When you realize that you can do this for any search terms you can envisage you see that this is a very powerful research tool. Something which would have taken a lifetime of research a couple of generations ago can now be clinched with a few mouse-clicks. For example, one can do a proximity search to find out something like this:

A search of EEBO TCP indicates that (including variant spellings) cruel/cruelty is closely collocated with unjust/injustice or iniquity – normally conveying the idea that an act is cruel if the intention behind it is unjust – in only about 80 texts during the whole of the sixteenth century. However, during the seventeenth century, there are over 1,500 such collocations, more than a third of which were published between 1680 and 1700. (Pain, Pleasure and Perversity, introduction, page 14; you can download the complete introduction here).

That makes it sound a bit easier than it actually is. Firstly, it would be a mistake to assume that all the results are necessarily relevant; you have to go through and check them to see whether the contexts in which the search terms are used really do support the point you are making. One would want to know, too, the genres in which these terms were used; a term or expression that was used in, let’s say, romance poetry in the sixteenth century might resurface in legal tracts in the seventeenth.And to know how significant it is that the words justice and cruelty were increasingly being used in the same breath one would also want to know how often they were used independently of each other. Suppose one of your search terms was a word or expression coined in the late sixteenth century that only caught on slowly; its overall use would have been low in the sixteenth century, so an increase in its collocation with another expression might only reflect the general pattern of increase as it came into wider usage.Then there’s the problem of multiple editions of the same work. EEBO TCP is patchy in this respect, with multiple editions of some works but not of others, so you’d need to think about how multiple editions of a work might affect the results. The example given is a fairly large sample, which would be less affected by, say, the inclusion or omission of a glut of editions of a single work during the space of a few years, but it could make a big difference to a smaller sample.There’s the slow increase of books published over the sixteenth and seventeenth centuries to take into account as well; one needs to consider not just the raw numbers, but what those numbers represent in terms of the percentage of the total number of books published at that time. And information about the size of editions is often unavailable. A particularly large – or small – print run could make a significant difference, especially to a small sample.Issues like these make it a far from straightforward matter to interpret the insights one can gain from the EEBO TCP database. Even so, the insights one can gain are startling. Twenty years ago, if anyone had speculated about a link developing between the concepts of injustice and cruelty during the early modern period, it would have mostly been just that – speculation, backed up perhaps by a few examples that would have had little more than anecdotal significance. Now we can identify patterns of usage with a much higher degree of accuracy, and there are spin-offs from the EEBO TCP database that are enabling highly specialized work of a kind unimaginable just a few years ago.
Did you find this article informative or useful? Post a comment or vote on it!
__________________________________________________
Further reading: Heather Froehlich, Richard J. Whitt and Jonathan Hope, ‘EEBO-TCP as a Tool for Integrating Teaching and Research’.