BMCR 2002.07.28

Computer-Aided Translation Technology: A Practical Introduction

, Computer-aided translation technology : a practical introduction. Master and use copy. Digital master created according to Benchmark for Faithful Digital Reproductions of Monographs and Serials, Version 1. Digital Library Federation, December 2002.. Ottawa: University of Ottawa Press, 2002. 1 online resource (xx, 185 pages).. ISBN 9780776615677. $27.50 (pb).

This book, in Ottawa’s “Didactics of Translation” series, aims to introduce professional translators and translation students to the basic tools and techniques of computer-aided translation. It is a beautifully organized textbook that nicely accomplishes this relatively modest goal. In what follows, I will explain what Bowker (B.) means by computer-aided translation (CAT) and why this book might be of interest to classicists.

The translators B. is concerned with are the professionals who must translate instruction manuals, legal documents, business memos, and other similar texts. She takes many of her examples from software documentation and on-screen messages. These translators frequently have a large amount of relatively repetitive material to be translated in a short time, before a product launch, the start of an advertising campaign, or the like. Computers can help ensure consistency and can facilitate looking up words and idioms.

B. distinguishes two ways that computers can be involved: machine translation and computer-aided translation. In machine translation (μτ the computer is responsible for most of the work, though a human may edit the results. Although linguists and computer scientists have done a great deal of work in this area, un-edited machine translations are generally fairly poor. Sometimes they are good enough, however, if all the reader needs is the main idea of a text, or a decision whether the text is worth spending more time (or money) on. For example, rough MT output may help an intelligence analyst decide what to do with an intercepted memo. Or it could help a student of early modern history get started working with records and documents in Latin.

CAT, on the other hand, puts the human translator in control and lets the computer help where it can. CAT tools can include anything translators use in their work — word processors, spelling checkers, scanners and optical character recognition, even email — but B. focuses discussion on four areas: data capture, corpus analysis, terminology management, and translation memories. Of these, the first two are useful to anyone who works with texts, while the last two are more particularly translation tools.

The first two chapters are introductory, asking why translators should learn about computer tools. The following two chapters are the most general part of the book and most likely to be useful for non-translators. Chapter 2 discusses data capture: how texts become machine readable. This is a significant issue for translators, who may get texts from clients in printed form, or in an inconvenient or incompatible electronic form. B. gives a clear explanation of how optical character recognition (OCR) works and why it is imperfect. She also considers voice recognition software, another way to enter a text without having to re-key it. Both OCR and voice recognition are general techniques, applicable to any text, though as B. points out it is necessary to have software that can recognize the language (character set, spelling, phoneme inventory) you need to work in. Software expecting English words will be confused by Latin; software expecting the Roman alphabet will be confused by Greek.

In chapter 3, B. gives a brief introduction to corpus analysis. With a large enough corpus, it’s possible to find patterns of word usage that can help in interpreting the source language or writing the target language. For translators, the most convenient type of corpus includes texts and their translations, with corresponding points marked so the system can identify which parts of text and translation go together (an “aligned” corpus). But the kinds of analysis B. describes are possible in a monolingual corpus as well, given suitable tools, and they are useful in the study of language generally. B. mentions frequency counts, concordancing and KWIC indices, and collocation analysis, all techniques that have been used in literary and linguistic studies. Her explanation of collocations is particularly good. This chapter is a good starting point for students as they begin to work with corpora like the Thesaurus Linguae Graecae.

Chapters 4 and 5 discuss two specific kinds of translation tools: terminology management systems and translation memories. While both of these are essentially databases of translation equivalents, they differ in their complexity. A terminology management system is basically an automated glossary or phrasebook, but a translation memory may store equivalents for words, phrases, or longer passages. Normally a translation memory is built up by a translator or team of translators in the course of their work. A terminology system, on the other hand, can be constructed independently of any particular body of texts, and contains the technical terms and fixed phrases of a given subject domain. A sophisticated terminology manager can be more than just a dictionary. Given enough similar texts, it is possible to identify repeated groups of words automatically and check them against the terminology database. This produces a pre-translation, with many of the content words turned into the target language; in the best case, the human translator then need only fill in the grammar. Terminology managers help ensure consistency, highly useful for the technical documents in B.’s examples but equally relevant for, say, translation of the formulaic phrases in a collection of inscriptions.

The longest chapter in the book is chapter 5, on translation memories. A translation memory is a collection of equivalents for words, phrases, or longer passages, along with software to facilitate its use. For example, a translation memory system might store “world without end” as the conventional equivalent for ” per omnia saecula saeculorum.” It is up to the human translator, of course, to recognize when this rendering is not appropriate, or to put the suggested version into the correct grammatical form for its context. A sophisticated translation memory system can suggest translations even when the phrase or sentence in the source text does not exactly match anything in the database, picking out more or less similar phrases from what has come before. B. points out that a memory system becomes more useful as more prior translations are stored in it, but observes that too many texts from different subject areas might cause confusion, especially in languages like English with many semantically different words that are spelled the same. “Bank” means one thing in a text about rivers, quite another in a text about finance.

Finally, chapter 6 looks to the future, discussing how technology is changing the way translators work. New tasks, such as web page localization, involve new tools, for example to find the text in a web page without attempting to translate the HTML tags. Shared corpora and translation memories are also becoming more common, facilitating collaboration.

The technical level of the book is quite elementary. B. assumes her readers are familiar with email and word processors, nothing more. She explains technical terms and concepts where required, and supplies a glossary. B.’s explanations are clear and not overly detailed, yet she almost never simplifies to the point of being technically wrong — a feat that is harder than it looks.1 Each chapter closes with a list of “key points” and suggestions for further reading keyed to the bibliography. Major ideas are also repeated throughout the text. This will be a good textbook in the type of translation course it is designed for, and could also be used as a supplement in a general humanities computing course.

Notes

1. The one place B. goes seriously astray is in her discussion of character encodings (p. 74-75): Unicode is not a double-byte character set. B.’s explanation of why ASCII is insufficient is correct, and she is right that Unicode is an emerging standard which may solve many character encoding problems. But Unicode characters can be more or fewer than two bytes long, depending on the character and on the particular implementation. Thus adding Unicode support to software can be more complicated than simply supporting double-byte characters, though the payoff is correspondingly greater.