Can we use the EEBO TCP database?
This looks like a no-brainer – what would be the use of the Early English Books Online Text Creation Partnership if we can’t use it? – but it’s actually something of a minefield. How often, I wonder, has work citing the database been met with a response like the following?
I somewhat distrust the author’s generalizations because several of them appear to come from typing keywords into the Early English Books Online searchable database.
The starting point of any online database research will inevitably be typing keywords. If that is wrong in itself then all the money that has been spent on creating databases has clearly been misspent! And any research which does no more than type in keywords and simply report on results is hardly worthy of the name research.
Let’s start by taking a look at an example of a keyword search and the follow-up work it entails:
[This is the third in a series of three videos I posted a few weeks ago on the use of the EEBO and TCP databases. The complete series of videos is here.]
It should be clear from this that searches of this kind are pretty gruelling. Typing in keywords is the starting point, but after that a wide range of variables needs to be taken into account, from variant spellings to differences in the number of books published within a particular genre during a particular period. And, crucially, the process involves checking the results of the searches to ensure that the occurrences really are valid examples of the particular usage one is interested in.
For me, that’s just the starting point, the spadework before getting down to the job of analyzing usage in particular contexts, relating that to source texts (a lot of my work is with translations, so I want to know what the original text said), checking the background and views of authors, placing the usage in the context of other related texts and so on.
I’m a texty kind of guy, so I’m less interested in the statistical stuff than in seeing the results in context, but the raw figures can sometimes be of interest. EEBO TCP is still incomplete, but it nevertheless offers a much bigger – and more representative – sample than, say, a MORI poll, and it is unlikely that the general pattern of discourse usage picked out in the video above will alter very much once the gaps remaining in the database have been filled.
Last summer (2013) I attended a conference on early modern digital humanities. I could have done with that kind of input before embarking on Pain, Pleasure and Perversity; I might have escaped some of the more obvious pitfalls. I only cite the database eight times in 235 pages, and I don’t think the few claims I made based on it are wrong to any substantial degree, but even so I can see, in hindsight, ways I could have tightened up my approach/presentation.
What really interests me, though, is the discovery that, in acknowledging my use of the database, I appear very much to have stuck my neck out. A search for “EEBO TCP” on Google Books currently purports to turn up some 450 results, though in fact it dries up after page 7, giving fewer than 70 results (does anyone know why this happens on Google?). Astonishingly (to me), my book appears on the first page (at the bottom)! Most of the other books on that first page are specifically on the use of online databases in early modern studies. Can I really be so unusual as a researcher working in the field and giving credit to the database?
Apparently, yes. I searched again, specifying publications since 2010, and there is only one page of results!
So what is actually happening here? Are scholars just not using the database? I don’t think so. The impression I get from talking to people at conferences, etc., is that early modernists are logging in at about the same rate as other people have hot breakfasts. Is this such a recent development that it is not yet fully reflected in print? Probably, to some extent. About three years ago, after I had been working solidly on the database for about three weeks in the Rare Books Room at Cambridge University Library, one of the librarians came up and asked me what I was working on. I showed him the database and he was astounded; Cambridge was affiliated to it, but none of the library staff even knew it existed! A couple of months later they held a seminar on it, but prior to that it seems not to have been on anyone’s radar; I certainly didn’t see anyone else using it.
American scholars appear to have been quicker off the mark. I would frequently notice a marked slowdown in download times in the middle of the afternoon, which would be about the time people in the US would be logging on.
I could be wrong about this, but what it looks like to me is that lots of people are using the database, but not many are acknowledging it. Top marks on that score to Bruce R. Smith (in Christie Carson and Peter Kirwan, eds, Shakespeare and the Digital World: Redefining Scholarship and Practice, CUP, 2014), who writes:
I didn’t even have to rely on my recollections of just where the passages I wanted were located. I could simply enter a keyword as a search term, and there the desired text would be on my computer screen, ready for cutting and pasting directly into my draft … What effectively connected me to the texts I wanted was not just my possession of a computer but my university’s subscriptions to EEBO and EEBO-TCP. (Pp. 24-5)
Even then, though, Smith’s main point is how he was brought back to the reality of the printed book when one of the texts he wanted to access wasn’t on the database.
Many others, I suspect, are being less than candid about their use of the database. I could have done the same. How smart I would have looked, with all that intimate knowledge of such a wide range of texts!
I’m glad I was up-front about it, though. I would be the first to agree that there is nothing quite like the printed book, and uses of the database that took me away from reading and analyzing text just wouldn’t interest me but, like Smith, the database ‘connected me to the texts I wanted’ (or to many of them), and enabled me to find out things about the early modern printed corpus that simply would not have been discoverable by any other means.