At the time of writing I am sitting in the back row of a lecture room in Oxford at a conference of the Early English Books Online Text Creation Partnership (EEBO TCP). Follow on Twitter: #eebotcp.
The Text Creation Partnership is a text-searchable database developed from the Early English Books Online database, making it possible for the first time to get a fairly accurate idea of the frequency and distribution of lexical items and expressions. I make extensive use of this database in my own work. For example, there is an entire section on the satirical use of the expression “pleasant spectacle” to describe a scene of suffering or atrocity; before TCP putting together a set of collocations like this would have taken years, and I would not have been able to write the book in its present form without access to this database.
I came hoping to learn a bit more about the technical side of the interpretation of statistics. For example, if there is an increase in the occurrence of a particular usage during the last twenty years of the seventeenth century, how should we allow for the increase in the number of publications during this period? To make it more complicated, suppose the occurrences mainly occur in a particular genre (such as devotional literature). To evaluate the significance of these occurrences we would need to know whether publications within that particular genre have increased or not.
EEBO TCP makes possible the analysis of patterns across a range of text. The challenge is, how to draw valid inferences from the range of information the TCP makes available.