Storming Media: Pentagon Reports and DocumentsPentagon Reports: Fast. Definitive. Complete.     
New Account »
Forgot Password?
Advanced Search »
ComputersCybernetics

Uncovering and Managing the Impact of Methodological Choices for the Computational Construction of Socio-Technical Networks from Texts

Authors: Jana Diesner; CARNEGIE-MELLON UNIV PITTSBURGH PA INST OF SOFTWARE RESEARCH INTERNAT
Abstract:
This thesis is motivated by the need for scalable and reliable methods and technologies that support the construction of network data based on information from text data. Ultimately, the resulting data can be used for answering substantive questions about socio-technical networks. One main limitation with this approach is that the validation of the resulting network data can be hard to infeasible, e.g. in the cases of covert, past and large-scale networks. This thesis addresses this problem by identifying the impact of coding choices that must be made when extracting network data from text data on the structure of networks and network analysis results. The findings suggest that conducting reference resolution on the text data can alter the identity and weight of 76% of the nodes and 23% of the links, and cause major changes in the value of commonly used network metrics. Also, completely different sets of key nodes are found when reference resolution is applied to the text data prior to conducting relation extraction. Based on the outcome of these experiments, I recommend strategies for avoiding or mitigating the outlined issues in practical applications.

Limitations: APPROVED FOR PUBLIC RELEASE
Description: Doctoral thesis
Pages: 317
Report Date: Jan 2012
Contract Number: W91WAW07C0063
Report Number: A079855
Keywords relating to this report:
INTERPERSONAL RELATIONS
LEARNING MACHINES
METADATA
NATURAL LANGUAGE
NETWORKS
SOCIAL SCIENCES
TEXT PROCESSING
THESES
Email This Abstract