Using comparable corpora for discovering universals in surface structure

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many aspects of linguistic research, whatever their aims and objectives, are reliant on cross-language analysis for their results. In particular, any research into generic attributes, universals, or inter-language comparisons, requires samples of languages in a readily accessible format, which are clean and of adequate size for statistical analysis. Implicit in such understanding and detection of 'universal' attributes of language, is the need to study and analyse a representative set of the human language chorus. So, as an ongoing process during recent years, many raw text samples, in electronic format, have been collected to create a suitably diverse repository. Predominantly, the texts attained are freely available on a variety of sites over the Internet and cover all of the major language groups. These comprise Austro- Asiatic, Amerindian, Sino-Tibetan, Indo-European (Indo-Iranian, Hellenic, Celtic, Italic, Germanic and Slavic) Austroesian, Attaic, Uralic, Niger-Congo and independents and currently total over fifty language scripts.
Original languageEnglish
Title of host publicationProceedings of the workshop on the amazing utility of parallel and comparable corpora
Subtitle of host publicationFourth International Conference on Language Resources and Evaluation (LREC), 2004
EditorsNicoletta Calzolari
Place of PublicationParis
PublisherEuropean Language Ressources Association
Pages50-53
Number of pages4
Publication statusPublished - 25 May 2004
Event4th International Conference on Language Resources and Evaluation - Centro Cultural de Belem, Lisbon, Portugal
Duration: 24 May 200428 May 2024
http://www.lrec-conf.org/lrec2004/index.php

Conference

Conference4th International Conference on Language Resources and Evaluation
Country/TerritoryPortugal
CityLisbon
Period24/05/0428/05/24
Internet address

Fingerprint

Dive into the research topics of 'Using comparable corpora for discovering universals in surface structure'. Together they form a unique fingerprint.

Cite this