Bryn Mawr Classical Review 03.03.28

A proposal

Greg Crane of Perseus fame asked if we would forward the following note to BMCR readers. It's informative as to the state of the art in this kind of e-grammar work, but the request for feedback is genuine and all replies to will be welcome. -- JO'D

I am writing to seek input on a proposal that we tentatively plan to submit to the NEH at the end of the summer. The idea is fairly simple: we want to use the morphological parser that we have been developing for the past eight years or so to analyze every unique string in the TLG. The simplest use for this work would be to enhance word searches: you could ask specifically for fe/rw and retrieve oi)/sw (rather than searching for fer-, oi)s- etc. More generally, a morphological database of Greek would open up possibilities for the application of more sophisticated retrieval and analytic techniques. Ultimately, the same thing should be done to non-literary texts such as inscriptions and papyri, but we feel that the massive TLG would be the place to start. I would greatly appreciate any reactions that anyone might have to the basic idea of the project. I am particularly interested in any reservations that might immediately strike people. Does the project itself seem worth doing? Are there opportunities that we are not pursuing that would allow us better to serve the scholarly community and that would not detract from our basic goals? The working summary of the project follows. The proposal outline is fairly succinct (c. 7 pages) but it is full of Greek and does not lend itself readily to transliteration. If you would like to see a copy, please send me your US Mail address and we will send one to you. Casual reactions to just this summary are, however, also more than welcome. Thanks!

Gregory Crane
Department of Classics
Boylston 319
Harvard University
Cambridge, MA 02138

A Linguistic Database of Classical Greek

This project will extend an existing parser for classical Greek, expanding its database of stems to cover the majority of all words attested in the literary record, and will use this database to create a morphologically parsed database of more than 1,000,000 unique strings available in the TLG: in the end, we will publish the database of analyzed strings, the databases of stems and endings which drive the parser and the parser itself. The resulting databases are an essential piece of scholarly infrastructure that will (1) revolutionize current searching techniques for the TLG and other Greek databases, (2) make it possible to apply more sophisticated retrieval/text analysis to Greek texts, and (3) provide a basic but crucial lookup tool that will aid non-specialists in other fields (e.g., philosophy, political science, religion) who seek to work directly with the Greek database.