Abstract: The objective of MADCAT is to produce a robust, highly accurate transcription engine that ingests documents of multiple types and produces English transcriptions of their content. For addressing the technical challenges implicit in that goal, the BBN-led team proposed a system that embodies integration of five major operations: (1) pre-processing and image enhancement, (2) page segmentation, (3) text recognition, and (4) metadata extraction. In Phase 1 of the MADCAT effort, we made significant improvements in all the above areas. In addition, we developed an end-to-end system for processing the Phase 1 evaluation data. The evaluation system exceeded the Phase 1 program goal of 40% accuracy on 70% of the documents. Below, we summarize the work performed by the BBN-led team in Phase 1 of the MADCAT effort. We highlight our accomplishments by each technical area and also indicate the performers in that area.
| Limitations: |
APPROVED FOR PUBLIC RELEASE |
| Description: |
Final rept. 21 Nov 2007-30 Apr 2009 |
| Pages: |
26 |
| Report Date: |
Apr-2009 |
| Contract Number: |
HR0011-08-C-0004 HR001108C0004 |
| Report Number: |
A152005 |
|
|
|
|