Storming Media: Pentagon Reports and DocumentsPentagon Reports: Fast. Definitive. Complete.     
New Account »
Forgot Password?
Advanced Search »

ComputersCybernetics

Portable Language-Independent Adaptive Translation from OCR. Phase 1

Authors: Prem Natarajan; BBN TECHNOLOGIES CAMBRIDGE MA
 
Abstract: The objective of MADCAT is to produce a robust, highly accurate transcription engine that ingests documents of multiple types and produces English transcriptions of their content. For addressing the technical challenges implicit in that goal, the BBN-led team proposed a system that embodies integration of five major operations: (1) pre-processing and image enhancement, (2) page segmentation, (3) text recognition, and (4) metadata extraction. In Phase 1 of the MADCAT effort, we made significant improvements in all the above areas. In addition, we developed an end-to-end system for processing the Phase 1 evaluation data. The evaluation system exceeded the Phase 1 program goal of 40% accuracy on 70% of the documents. Below, we summarize the work performed by the BBN-led team in Phase 1 of the MADCAT effort. We highlight our accomplishments by each technical area and also indicate the performers in that area.

Limitations: APPROVED FOR PUBLIC RELEASE
Description: Final rept. 21 Nov 2007-30 Apr 2009
Pages: 26
Report Date: Apr-2009
Contract Number: HR0011-08-C-0004 HR001108C0004
Report Number: A152005
Keywords relating to this report:
*IMAGE PROCESSING
*MACHINE TRANSLATION
*OPTICAL CHARACTER RECOGNITION
*TEXT PROCESSING
ARABIC LANGUAGE
DOCUMENTS
FEATURE EXTRACTION
HANDWRITING
MARKOV PROCESSES
METADATA
MODELS
WORD RECOGNITION
Adobe PDF - $18.95
Printed Format - $20.95
Please check the box for the format you wish to order.
Shipping Terms
About Electronic Delivery

Email This Abstract