Evaluation of Search Methodologies for DARPA GALE Distillation Engines

Abstract:

Effective search methodologies are needed to meet the human-like accuracy requirements of distillation engines used in the Defense Advanced Research Projects Agency (DARPA) Global Autonomous Language Exploitation (GALE) project. An evaluation of two search methods for the distillation engine is presented. The Gospels of the Holy Bible (in English and Arabic form) are used as a test corpus for determining which search method is more effective. Fifty English queries are issued against each corpus and the results tabulated. Statistical methods are then applied to compare the “hits” from each method with a baseline result set which is generated by issuing a human-translated version of the English queries against the Arabic corpus.Analysis shows that searching in the end user’s native language generates more hits than searching in the document’s language even though it requires translation of the entire document. This finding is contrary to the authors’ expectations. As such, an explanation of possible causes and a call for further research is given to determine whether full document translation is indeed more effective than query translation alone.

nsdlogo2016