User Tools

Site Tools


cs356:first_draft

Computational Linguistics is a sub field of Artificial Intelligence that strives to model natural languages via computational models. The field of Computational Linguistics has become a very unique field of computer science as it has grown to engulf specialists in varying fields, including (but not limited to): linguists, cognitive psychologists, and computer scientists. Due to my double major in Computer Science and Russian, machine translation is of much interest to me. Specifically because, each language that is currently written is the result, but by no means the final product, of thousands of years of evolution and progression. Almost every language began as a spoken language and eventually evolved into a written language. As a result, each language has its own set of unique rules, words, quirks, and alphabets. In my research I plan to present the history, successes, failures, differences, and future of machine translation. this seems a bit ambitious for a 5+ page paper -spc

Since the creation of computers, computer scientists have struggled to find a way of interacting with them in an easy and logical way. Human beings are capable of learning to communicate effectively from a very young age. However, computers were not built to immediately comprehend English sentences whenever they were first turned on. In fact, computer scientists are still attempting to find an effective way of modeling a natural language computationally. Because we must communicate with a computer to “instruct” it to do any sort of computation, the understanding of computational linguistics is important on many levels of computer science. Fueled by the arms race between the Soviet Union and the United States, research into machine translation flourished during the 1950’s. The constant paranoia experienced by both world powers led to the need for thousands of documents to be translated quickly and correctly. However, there were no huge advances in the success of computers translating documents. This lack of results created a pessimistic culture that led to an official report denouncing the future of machine translation and computational linguistics.citation here - and expand the ALPAC acronym before using it next paragraph

The ALPAC report focused on the present technologies grammar shortcomings. The committee that produced the report believed that the money being put forth towards research in machine translation was unnecessary and could be better allotted elsewhere. Furthermore, they believed that a broader understanding of computational linguistics was needed before any significant progress could be made in the way of machine translation. As a result of the ALPAC report, funding for Machine Translation was essentially completely revoked. Furthermore, the general opinion became that Machine Translation was an unreachable. grammar Because of the lack of backing by any governmental funding, research came to a near standstill for the better part of two decades.

Horwood writes of the future of Machine Translating, fragment, bring next paragraph up here

“The goal of the MT ‘perfectionists’ of the 1960s was fully automatic high quality translation. While it may be true that some AI-inspired researchers are still aiming for FAHQT, this goal has long been abandoned by the great majority of those working in the MT field. It is recognised that revision is normal and expected for all translations, whether done by humans or by computers. Debates about what is meant by ‘high quality’ or ‘fully automatic’ are largely irrelevant. What matters is whether the MT output is satisfactory for its intended use (revised or not) and whether the operation is cost-effective.”

Many believe that one of the most demanding challenges facing Machine Translators will be that of creating translators that correctly handle abstract ideas and concepts that form the foundation for a writing. Research has produced algorithms that can come close to semantically translating a sentence. However, the world of computational linguistics has yet to see progress in understanding the abstract concept or idea that facilitates a piece of writing. Simply put, there does not yet exist an algorithm that can “read between the lines” and translate according to the understanding of that reading.

Current efforts in Machine translation have been focused on two different ways of developing machine translations, example-based machine translation and statistical machine translation (try to find information about the presentation that the guy from brown gave earlier this year!). The approaches of these I'd say “These approches” are both interesting and unique; both rely heavily on the use of bi-lingual text corpora (find a good explanation). Statistical machine translation develops a set of rules based on the statistical analysis of a document. Similarly, example-based translators rely on the use of previous translations to create new translations.

(Go into further detail about each with specific technical examples, perhaps?) yep

(Introduce Cunei)

Cunei is described as, “In this work we present Cunei, a hybrid, open-source platform for machine translation that models each example of a phrase-pair at run-time and combines them in dynamic collections. This results in a flexible framework that provides consistent modeling and the use of non-local features.”

Current Resources:

http://www.hutchinsweb.me.uk/PPF-TOC.htm - Machine Translation: past, present, future. Ellis Horwood

http://www.hutchinsweb.me.uk/ALPAC-1996.pdf - ALPAC Report

http://www.mt-archive.info/Nagao-1984.pdf - Example-based machine translation

“Natural Language Understanding” – James Allen

cs356/first_draft.txt · Last modified: 2010/04/25 17:52 by scarl