Storming Media: Pentagon Reports and DocumentsPentagon Reports: Fast. Definitive. Complete.     
New Account »
Forgot Password?
Advanced Search »
ComputersCybernetics

HowtogetaChineseName(Entity): Segmentation and Combination Issues

Authors: Hongyan Jing; Radu Florian; Xiaoqiang Luo; Tong Zhang; Abraham Ittycheriah; IBM THOMAS J WATSON RESEARCH CENTER YORKTOWN HEIGHTS NY
Abstract:
When building a Chinese named entity recognition system, one must deal with certain language-specific issues such as whether the model should be based on characters or words. While there is no unique answer to this question, we discuss in detail advantages and disadvantages of each model, identify problems in segmentation and suggest possible solutions, presenting our observations, analysis, and experimental results. The second topic of this paper is classifier combination. We present and describe four classifiers for Chinese named entity recognition and describe various methods for combining their outputs. The results demonstrate that classifier combination is an effective technique of improving system performance: experiments over a large annotated corpus of fine-grained entity types exhibit a 10% relative reduction in F-measure error.

Limitations: APPROVED FOR PUBLIC RELEASE
Description: Conference paper
Pages: 9
Report Date: JUL 2003
Contract Number: N660019928916
Report Number: A019754
Keywords relating to this report:
ALGORITHMS
CHINESE LANGUAGE
CLASSIFICATION
MARKOV PROCESSES
MODELS
NATURAL LANGUAGE
RECOGNITION
SYMPOSIA
WORDS(LANGUAGE)
Email This Abstract