The Universal Translator
A few years ago, while I was a graduate student in anthropology, I went to Guatemala to do some archeological fieldwork. One Sunday morning, taking a break from trowels and brushes and the Zen of reassembling broken pottery, I visited a small church in a very small town in the highlands near Antigua. The interior of the church was illuminated only by candles and by the rays of sun coming through the cracks in the roof and the walls. In one corner of the chapel an elderly man was lighting candles, offering incense and praying to his gods. I watched and listened as he prayed first, in Spanish, to the statues of the virgin Mary and Jesus, and then extracting a fist size stone from his woven bag, he prayed to it in Cakchiquel, a Mayan language. The stone represented a Maya god. The devout man prayed to his gods in the languages that he knew they would understand. He translated his hopes and concerns into words in their languages.
His prayers were part of a human quest that goes back in the Judeo-Christian tradition to the tower of Babel. It is a quest motivated by our need to communicate with each other in the chaos of over 4,000 human languages.
The Cakchiquel man was bilingual. There are people today who are trilingual, quadralingual, quintalingual . Some can speak 10 or more languages. But not all of us have the opportunity to learn 10 languages and even if we did become decalingual, that would still leave at least 3,990 more languages for us to learn.
Faced with this task, some have proposed that we all should learn a universal language along with our native tongue. Esperanto is the most well-known of these proposed universal languages, but other constructed languages like Eurolang have been proposed. Some natural languages like Latin, Arabic, and English have also been put forward as potential universal languages. But, the politics of choosing which language should be universal have inhibited universal adoption of any of these except in very limited domains like English for international air traffic control.
Recently, technology has suggested another option. This technology is not to be found in the old Cakchiquel church but on the bridge of the starship Enterprise.
The universe of the United Federation of Planets, the fictional realm of the Star Trek television programs is packed with tantalizing technological devices. The characters of Star Trek have replicators which can create anything out of atoms, and transporters which will disassemble you atom-by-atom and reassemble you, usually correctly, in another location, phasers, holo-suites, tractor beams, inertial dampers, warp cores, tricorders, and, perhaps most intriguing, a device called the universal translator.
The universal translator is a computer program designed to allow a human (or Ferengi ) to speak to and understand the speech of any other species. The program resides in the computer of the star ship and may be accessed by a variety of hand- held devices used by explorers and ambassadors.
If you had a Universal Translator you could beam yourself down to Paris and speak to any French shopkeeper or bartender with ease. You could stroll the streets of Katmandu and communicate with any Nepalese or Tibetan speaker you encountered.
Instant communication. Hello Mrs. Kenyan, could your direct me to the nearest bus stop? Good morning Mr. Mongolian, is there a Kinko’s in Ulaan bator? What is your opinion of my patient’s X-ray, Dr. Peruvian? The possibilities are galactic. Peace negotiations, business arrangements, research, education, travel, dating … wow! Anybody could easily talk to anyone, anywhere.
I want one of these! Where can I get one?
Well, you can’t get one. Not even at Best Buy.
Ok then, we’re clever people, maybe we can make one. How does it work?
Well, I consulted the official Technical Manual of the starship Enterprise and discovered that the Universal Translator "first analyzes the patterns of an unknown form of communication, then derives a translation matrix to permit real time verbal or data exchanges." Cool. Now all we need to do is write the program. Let’s see, how did that go?… analyze the patterns and then derive a translation matrix.
People have been analyzing the patterns of languages for thousands of years, but mathematicians, linguists and computer scientists have been analyzing the patterns of languages in a formal way for only about 50 years. (Analyzing patterns, of course, is at the very heart of what mathematicians do.)
The problem of deriving a "translation matrix" and implementing it on a computer has been an active research area for about 30 years. What results do we have?
We are seeing some good progress in voice-operated computer software, in inexpensive translation software, and in the computer generation of language. You can select the numeral on a telephone voice system menu by saying its name…. clearly. I have software on my computer that allows me to click next to any word in any document and discover its French equivalent. This software is also available in German and Italian, but not yet in Nepalese or Albanian. You can buy a hand-held electronic Italian phrase book for that trip to Tuscany. You can visit a web page and get online phrase books, with sound, for 70 languages, including Bengali and Basque. Many major corporations ( Siemens, for example) can translate for you on-line so long as you stay with the topic and, of course, they choose the topic. You can talk to your computer and it will type what you say, and it will talk back to you… in the same language ….politely.
Most recently, Patrick Suppes and his colleagues at Stanford University have produced a computer program that learns languages from physics word problems. The Stanford computer program assumes, along with some colleagues I have known, that any meaningful conversation must be about physics word problems. The program uses what it knows about physics and the structure of human languages to deduce vocabulary and syntax. It analyzes the patterns and constructs a translation matrix.
But all of this is a far cry from the Universal Translator carried by the characters of Star Trek. The Stanford program is a fine bit of work, but would be of limited use at multilingual business negotiations.
The problems involved in constructing a Universal Translator are fascinating and I’d like to explore some of them with you tonight. I do this not to give an introductory lecture on computational linguistics (despite the clear fact that such a lecture would be good for your souls), but rather to highlight some of the things that make automatic translation between languages difficult.
A Universal Translator must deal with sounds. Besides the engineering problem of distinguishing sounds which we have pretty well solved, the Universal Translator must be able to recognize what sounds are significant for each language. For example, in English, the voiced bilabial stop "b" imparts a meaning to a utterance that differs from its unvoiced counterpart . Thus, bat is different from pat. The Universal Translator must figure this out and also discover that the same distinction is not recognized as relevant in Korean.
And what about words? A Universal Translator must be able to figure out somehow that pots means "open" in Armenian, that pool is "bridge" in Gujarati, and that May is "we" in Finnish, "but" in French, and "mother" in Thai. Sound and meaning correspondences are generally arbitrary as Humpty Dumpty so lucidly pointed out to Alice, but a Universal Translator must somehow deduce these correspondences.
And then there’s grammar. A Universal Translator must be able to figure out that the past-tense markers are attached to the end of English verbs, but to the beginning of Ojibwa verbs; that plurals are formed in Ilocano with prefixes, but in English with suffixes; and that both the subject and the object of a sentence precede the verb in Burmese.
The Universal Translator would indeed be a remarkable program. From a sample of utterances it must determine what sounds are important, what specific sequences of sounds mean, and the grammatical role of each sound sequence.
In the quest for a Universal Translator that can take into account these formal complexities of human languages, mathematicians and linguists have expended much energy in creating formal descriptions of human languages. These formal descriptions in turn have been realized as computer programs that translate between languages.
We have had mixed success with our formalisms. But there always seems to be room for improvement. There are over a hundred distinct mathematical formalisms for human languages. They range from transformational grammars used in early automatic translation programs, to the two-level grammars used in European automatic translation, to the currently popular head-driven-phrase structure grammars. Still in the theoretical stage, but showing great promise are the DNA-grammars. We may be able encode a language in DNA and then recombine the DNA and decode it into another language.
But even as we have constructed our grammars and programmed our computers we know that translating words and sentences does not mean that the utterance has been translated. Our computer programs can do pretty well with grammar and sound, although some of the more complicated grammars of languages like Warlpiri continue to perplex us. Programs can also deduce word meanings within very restricted contexts. But even if we could get our Universal Translator to successfully analyze the patterns of sounds, words, and sentences, there is an additional complication.
Consider the task of translating a simple greeting. In English we may say "How are you?" (usually not expecting a medical assessment) or simply "Hello," or "Hi," or "Yo." The standard greeting in Mandarin Chinese is "Ni hao?" The literal translation is "You are good?" No problem here. But be careful, if you attach the question particle ma, and say "Ni hao ma?", the literal translation is still "you are good?", but there is an extra layer of supposition that makes this question an actual inquiry into your health. Indeed, there may be overtones of hospitalization. Our Universal Translator will need to recognize this distinction.
In Quiche, a Mayan Indian language, the standard greeting is "La utz awach?" This translates literally as "Is your face good?" Will the Universal Translator be able to translate this as a greeting rather than as a dermatological inquiry?
And then, of course, there’s the Klingon greeting "nook-Nekh" which translates into English as "what do you want?"
And if we look beyond simple greetings we encounter more surprises. Consider Quechua, the language of the great and ancient Inka still spoken in the high Andes of Peru and Bolivia. Quechua grammar is fairly well understood and it is a simple matter to translate the words of the English sentence "Sam has 5 sheep" into Quechua. But, the Quechua version of the sentence is fraught with dangerous undertones.
The Quechua classify objects into those that can be counted and those that cannot. Countinghas the effect of separating things. If what one proposes to count are members of a reproductive group then the act of enumerating the parts of the group one by one is a threat to, and undermines, the reproductive force of the group. If you say to another Quechua speaker that "Sam has 5 sheep," then you insult Sam, reduce the value of his herd, and label yourself as an enemy of Sam or a fool.
Doug Lenat’s multi-year (and multi-million dollar) Cyc project is attempting to encode common sense knowledge into a computer, but no current projects have attempted to give the computers a cultural sense. Sounds like a nice honors project.
Incorrect translations result from a failure to attend to the whole pattern of a language. Focusing solely on sound, or word, or grammar will invariably lead to error. Any universal translator will need to assemble a wide variety of linguistic artifacts and to observe language use in context before it can begin to "analyze the patterns" and "derive the translation matrix."
Language translation is hard. Automatic language translation by computers is very hard. Universal language translation by a computer program is extremely hard.
Yet, computer scientists persist in this difficult task because translation is important. It is important because it is necessary for communication and communication is necessary for humans living in social groups.
I think that translation is one of the most important activities that humans can be involved in. Language translation offers the promise, if not the reality of a joining of minds and spirits. It is a worthy enterprise.
But translation reaches beyond language translation. It is more than converting Hawai’ian to English or Vietnamese to Bulgarian. I like to view translation, not surprisingly, in its mathematical sense. Translation is movement without loss of essence. Translation involves movement between disciplines, between cultures, between philosophies, and between ways of knowing.
It is translation when mathematical theory becomes physical description. It is translation when E.O. Wilson helps us understand the importance of biological diversity. It is translation when a poem by Denise Levertov captures the pain of injustice and oppression.
I’d like to suggest that while my colleagues and I continue to attempt to build a universal language translator on our computers, we all need to work hard to develop our own personal universal translators. We need to be able to "analyze the patterns" and "derive the translation matrix."
Analyzing the patterns requires careful study. It requires a broad and critical perspective. The patterns may be cultural, mathematical, geological, chemical, ecological, spiritual, social or political. More often that naught the patterns will involve all of these simultaneously in a complex system of meaning and relationships.
We must know the patterns of the domain from which we hope to translate and the patterns of the domain to which we wish to translate.
The Pintupi recognize this. The Pintupi are a people that live in the Australian desert. They are mostly nomadic, staying for a time at this place or that, but mostly moving through the land. For the Pintupi, almost any topographical feature has layer upon layer of meaning.
Take a water hole for example. (These by the way are literally holes with water in them. An important feature for biological beings in the Great Australian Desert ) Depending upon your sex, your age, your experience, and the nature of your ancestors, you understand a water hole to be alternatively or simultaneously a water hole, a testicle, a ball of string, a sacred object made of string, a being, or any one of a variety of other objects physical or spiritual.
The layers of meaning are lifted up from the world and laid down in a Pintupi mind gradually. They are learned only when the learner is ready. The understandings form a pattern of significance that a Pintupi analyzes during her lifetime and for which she gradually derives a translation matrix.
The Pintupi view reality from all sides. Left, right, up, down, inside, outside, male, female, hunter, spirit, wallaby, and toad… all sides. Panoramic perspective is an essential component for analyzing patterns.
They contrast their technique with that of the non-aboriginal Australian quite succinctly. One Pintupi is quoted as saying, "You know, all you white fellows, you come to a big hill, you put a tunnel through–we always go round."
This is not an opinion on civil engineering. It is a call for broad perspective, for respect for other, for achieving goals without harming other beings.
Any adult Pintupi could probably write specifications for a universal translator .
She would tell us that the patterns are best seen on their own terms. We must attend to the native speakers before attempting an analysis. When we make policy, whether foreign, corporate or university, do we attend to the native speakers, the inhabitants of the foreign land, the workers of the corporation, or the students of the university?
She would also tell us that it is useful to allow the program to accumulate a larger linguistic sample by exchanging simple subjects before proceeding to the discussion of more complex or sensitive subjects. The patterns need to be analyzed slowly. We must work hard at basics before jumping to the complex. Have we really exchanged enough information on simple subjects before we move to the complex and the sensitive?
Many of our translation problems arise from failure to attend to these rather obvious guidelines.
Deriving the translation matrix is a lifelong process requiring creativity and skill. One must continually analyze new patterns and incorporate them into the Universal Translator’s matrix.
One of the greatest dangers is the tendency to freeze our translation matrix and stick with patterns determined by our own languages and domains. (My grandfather always assumed that bearded men were Cossacks and therefore dangerous. It was a pattern he learned as a child growing up under the Tzar.) When the matrix freezes the process stops being translation. In mathematics, we say that the mapping is no longer isomorphic…. essence is lost in translation. We stop analyzing the patterns and instead force our own patterns upon the domain from which we are translating.
Instead of communicating, we begin shouting. We are like the caricature of the tourist who believes that a Hungarian merchant will understand English better if it spoken louder.
"A roll of film, please. Film, please! FILM!."
I think we sometimes freeze our translation matrices and begin shouting at the merchant.
Suppose we partially analyze a pattern and see that a sick person pays a doctor for her services. Does our translation matrix then assume that the pattern of health-care translates directly into the pattern of selling sport utility vehicles? Has essence been lost in the translation?
Is the essence of volunteerism lost in translation when we make volunteering a requirement?
Is the pattern of the natural world lost in translation when we harvest a species to extinction?
Is the pattern of human social life lost in translation when we allow 1 in 5 of our American children to live in poverty?
Is the essence of the pattern of respect for life lost in the translation when we execute a murderer?
Is the essence of peace lost in the translation when we bomb people who refuse to stop fighting?
I think that too often our translation matrices become static because we have stopped the hard work of analyzing unfamiliar patterns.
We see scientists attacking arguments because they are not scientific. We see others attacking the scientists’ arguments because they are scientific. We often find ourselves shouting at the merchant.
It is an old story. But it is no less important because of its age and familiarity. We see only dimly the patterns of alien domains and alien languages. Just as in the interior of the old Cakchiquel church, candles need to be lit. The candles include mathematics and dance, anthropology and economics, biology and religion, history and poetry. They include foreign travel and fieldwork and labwork and concerts and art exhibits and monster truck competitions. Each candle illuminates only a small section of the pattern. We need their collective brilliance.
Some day we will have a universal translator to enable direct conversation between ordinary people. That’s in the Star Trek future. Today we can work on our personal universal translators. We can analyze the patterns of the unfamiliar and the familiar. We can derive our translation matrices.
We can stop shouting at the merchant, light our candles, and go around the big hill.