| Characterizing Deletion Transformations across Dialects using a Sophisticated Tying Mechanism |
30 Mar 2011 |
5 pages |
| Authors:
Nancy F Chen; Wade Shen; Joseph P Campbell; MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB
|
 | We propose a sophisticated tying mechanism for modeling deletion transformations between dialects. We empirically show that the proposed tying mechanism reduces deletion errors by 33% when compared to a baseline system using a standard tying mechanism. Statistical tests show that the proposed and baseline models make statistically different errors, thus suggesting that they are complementary systems in dialect recognition tasks. Pronunciation rules learned by our proposed system quantify the occurrence ... |
|
| The MIT-LL/AFRL IWSLT-2010 MT System |
09 Nov 2010 |
|
| Authors:
Wade Shen; Tim Anderson; Ray Slyh; A R Aminzadeh; MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB
|
 | We discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2008 system, and experiments we ran during the IWSLT-2010 evaluation. Specifically, we focus on 1) cross-domain translation using MAP adaptation, 2) Turkish morphological processing and translation, 3) improved Arabic morphology for MT preprocessing, and 4) system combination methods for machine translation. |
|
| Assessing the Speaker Recognition Performance of Naive Listeners Using Mechanical Turk |
25 Oct 2010 |
|
| Authors:
Wade Shen; Joseph Campbell; Derek Straub; Reva Schwartz; MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB
|
 | In this paper we attempt to quantify the ability of naive listeners to perform speaker recognition in the context of the NIST evaluation task. We describe our protocol: a series of listening experiments using large numbers of naive listeners (432) on Amazon's Mechanical Turk that attempt to measure the ability of the average human listener to performance speaker recognition. Our goal was the compare the performance of the average human ... |
|
| A Comparison of Query-by-Example Methods for Spoken Term Detection |
Sep 2009 |
|
| Authors:
Wade Shen; Christopher M White; Timothy J Hazen; MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB
|
 | In this paper we examine an alternative interface for phonetic search, namely query-by-example, that avoids OOV issues associated with both standard word-based and phonetic search methods. We develop three methods that compare query lattices derived from example audio against a standard ngrambased phonetic index and we analyze factors affecting the performance of these systems. We show that the best systems under this paradigm are able to achieve 77% precision when ... |
|
| Low-Resource Speech Translation of Urdu to English Using Semi-Supervised Part-of-Speech Tagging and Transliteration |
Jan 2008 |
|
| Authors:
A R Aminzadeh; Wade Shen; MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB
|
 | This paper describes the construction of ASR and MT systems for translation of speech from Urdu into English. As both Urdu pronunciation lexicons and Urdu-English bitexts are sparse, we employ several techniques that make use of semi-supervised annotation to improve ASR and MT training. Specifically, we describe 1) the construction of a semi-supervised HMM-based part-of-speech tagger that is used to train factored translation models and 2) the use of an ... |
|
| ILR-Based MT Comprehension Test with Multi-Level Questions |
Apr 2007 |
|
| Authors:
DOUGLAS JONES; Martha Herzog; Hussny Ibrahim; Arvind Jairam; Wade Shen; Edward Gibson; Michael Emonts; MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB
|
 | We present results from a new Interagency Language Roundtable (ILR) based comprehension test. This new test design presents questions at multiple ILR difficulty levels within each document. We incorporated Arabic machine translation (MT) output from three independent research sites, arbitrarily merging these materials into one MT condition. We contrast the MT condition, for both text and audio data types, with high quality human reference Gold Standard (GS) translations. Overall, subjects ... |
|
| Construction of a Phonotactic Dialect Corpus using Semiautomatic Annotation |
Jan 2007 |
|
| Authors:
Reva Schwartz; Wade Shen; Joseph Campbell; Shelley Paget; Julie Vonwiller; Dominique Estival; Christopher Cieri; MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB
|
 | In this paper, we discuss rapid, semiautomatic annotation techniques of detailed phonological phenomena for large corpora. We describe the use of these techniques for the development of a corpus of American English dialects. The resulting annotations and corpora will support both large scale linguistic dialect analysis and automatic dialect identification. We delineate the semiautomatic annotation process that we are currently employing and, a set of experiments we ran to validate ... |
|
| Experimental Facility for Measuring the Impact of Environmental Noise and Speaker Variation on Speech-to-Speech Translation Devices |
Dec 2006 |
|
| Authors:
Douglas A Jones; Arvind Jairam; Wade Shen; Paul Gatewood; John Tardelli; Michael Emonts; MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB
|
 | We describe the construction and use of a laboratory facility for testing the performance of speech-to-speech translation devices. Approximately 1500 English phrases from various military domains were recorded as spoken by each of 30 male and 12 female English speakers with variation in speaker accent, for a total of approximately 60,000 phrases available for experimentation. We describe an initial experiment using the facility which shows the impact of environmental noise ... |
|
| The MIT-LL/AFRL IWSLT-2006 MT System |
Nov 2006 |
|
| Authors:
Wade Shen; Brian Delaney; Tim Anderson; MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB
|
 | The MIT-LL/AFRL MT system is a statistical phrase-based translation system that implements many modern SMT training and decoding techniques. Our system was designed with the long-term goal of dealing with corrupted ASR input and limited amounts of training data for speech-to-speech MT applications. This paper will discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2005 system, and experiments with manual and ASR transcription data that were run ... |
|
| Toward an Interagency Language Roundtable Based Assessment of Speech-to-Speech Translation Capabilities |
Aug-2006 |
9 pages |
| Authors:
DOUGLAS JONES; Jurgen Sottung; Sargon Jabri; Neil Granoien; James Dirgin; Sabine Atwell; Michael Emonts; Timothy Anderson; Martha Herzog; Brian Delaney; Wade Shen; Timothy Hunter; AIR FORCE RESEARCH LAB WRIGHT-PATTERSON AFB OH
|
 | We present observations from three exercises designed to map the effective listening and speaking skills of an operator of a speech-to-speech translation system (S2S) to the Interagency Language Roundtable (ILR) scale. Such a mapping is nontrivial, but will be useful for government and military decision makers in managing expectations of S2S technology. We observed domain-dependent S2S capabilities in the ILR range of Level 0+ to Level 1, and interactive text-based ... |
|
| Measuring Translation Quality by Testing English Speakers with a New Defense Language Proficiency Test for Arabic |
May-2005 |
|
| Authors:
DOUGLAS JONES; Wade Shen; Neil Granoien; Martha Herzog; Clifford Weinstein; MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB
|
 | We present results from an experiment in which educated English-native speakers answered questions from a machine translated version of a standardized Arabic language test. We compare the machine translation (MT) results with professional reference translations as a baseline for the purpose of determining the level of Arabic reading comprehension that current machine translation technology enables an English speaker to achieve. Furthermore, we explore the relationship between the current, broadly accepted ... |
|
| Measuring Human Readability of Machine Generated Text: Three Case Studies in Speech Recognition and Machine Translation |
Mar-2005 |
|
| Authors:
DOUGLAS JONES; Edward Gibson; Wade Shen; Neil Granoien; Martha Herzog; Douglas Reynolds; Clifford Weinstein; MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB
|
 | We present highlights from three experiments that test the readability of current state-of-the art system output from (1) an automated English speech-to-text system (2) a textbased Arabic-to-English machine translation system and (3) an audiobased Arabic-to-English MT process. We measure readability in terms of reaction time and passage comprehension in each case, applying standard psycholinguistic testing procedures and a modified version of the standard Defense Language Proficiency Test for Arabic called ... |
|
| The MIT-LL/AFRL MT System |
Jan 2005 |
|
| Authors:
Wade Shen; Brian Delaney; Tim Anderson; MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB
|
 | The MIT-LL/AFRL MT system is a statistical phrase-based translation system that implements many modern SMT training and decoding techniques. Our system was designed with the long term goal of dealing with corrupted ASR input for Speech-to-Speech MT applications. This paper will discuss the architecture of the MIT-LL/AFRL MT system, and experiments with manual and ASR transcription data that were run as part of the IWSLT-2005 Chinese-to-English evaluation campaign. |
|
| The Effect of Text Difficulty on Machine Translation Performance -- A Pilot Study with ILR-Rated Texts in Spanish, Farsi, Arabic, Russian and Korean |
May-2004 |
5 pages |
| Authors:
Ray Clifford; Neil Granoien; DOUGLAS JONES; Wade Shen; Clifford Weinstein; DEFENSE LANGUAGE INST MONTEREY CA
|
 | We report on initial experiments that examine the relationship between automated measures of machine translation performance and the Interagency Language Roundtable (ILR) scale of language proficiency/difficulty that has been in standard use for U.S. government language training and assessment for the past several decades. The main question we ask is how technology-oriented measures of MT performance relate to the ILR difficulty levels, where we understand that a linguist with ILR ... |
|