Dependency Treelet-based Phrasal SMT: Evaluation and Issues in English-Hindi Language Pair

the 6th International Conference on Natural Language Processing (ICON-2008), Pune, India. |

In this paper, we present a detailed evaluation of a Dependency Treelet-based Phrasal Statis-tical Machine Translation (SMT) system for English-Hindi language pair. The dependency treelet-based phrasal SMT system that adds the source language syntactic information to a standard phrasal SMT has been shown to perform significantly better than surface based approaches on several well-studied European language pairs. We seek to examine if this observation holds true for languages as diverse as English and Hindi, by developing and testing such a system, for the first time in this language pair. We make baseline comparisons with a standard phrasal SMT implementation, and further study the effect of two radically different types of corpora, namely, technical text and general web text, on the performance of the dependency-treelet based phrasal system. The evaluation includes human judgment, in addition to the two standard automated metrics, namely, BLEU and METEOR. Some language-specific issues are also highlighted that provide an insight into the challenges involved in applying standard phrasal SMT techniques for translation be-tween English and an Indic-language like Hindi.