Abstract
Research in generic unsupervised learning of language structure applied to the Search for Extra-Terrestrial Intelligence (SETI) and decipherment of unknown languages has sought to build up a generic picture of lexical and structural patterns, characteristic of natural language. As part of this toolkit, a generic system is required to facilitate the analysis of behavioural trends amongst selected pairs of terminals and non-terminals alike, regardless of which target natural language was selected. Such a tool may be useful in other areas, such as lexico-grammatical analysis or tagging of corpora. Data-oriented approaches to corpus annotation use statistical n-grams and/or constraint-based models; n-grams or constraints with wider windows can improve error rates by examining the topology of the annotation-combination space. We present a visualisation tool to help linguists find "useful" PoS-tag combinations, and cohesion between linguistic annotations at other levels, and suggest some possible applications.
Original language | English |
---|---|
Title of host publication | Proceedings 5th International conference on Information Visualisation |
Subtitle of host publication | 25-27 July 2001, London, England |
Editors | E. Banissi, F. Khosrowshahi, M. Sarfraz, A. Ursyn |
Place of Publication | Los Alamitos, CA |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 297-302 |
Number of pages | 6 |
DOIs | |
Publication status | Published - 7 Aug 2002 |
Event | Fifth International Conference on Information Visualisation - London, United Kingdom Duration: 25 Jul 2001 → 27 Jul 2001 |
Conference
Conference | Fifth International Conference on Information Visualisation |
---|---|
Country/Territory | United Kingdom |
City | London |
Period | 25/07/01 → 27/07/01 |