Automatic classification of human translation and machine translation: a study from the perspective of lexical diversity

Yingxue Fu, Mark Jan Nederhof

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Downloads (Pure)

Abstract

By using a trigram model and fine-tuning a pretrained BERT model for sequence classification, we show that machine translation and human translation can be classified with an accuracy above chance level, which suggests that machine translation and human translation are different in a systematic way. The classification accuracy of machine translation is much higher than of human translation. We show that this may be explained by the difference in lexical diversity between machine translation and human translation. If machine translation has independent patterns from human translation, automatic metrics which measure the deviation of machine translation from human translation may conflate difference with quality. Our experiment with two different types of automatic metrics shows correlation with the result of the classification task. Therefore, we suggest the difference in lexical diversity between machine translation and human translation be given more attention in machine translation evaluation.
Original languageEnglish
Title of host publicationProceedings for the First Workshop on Modelling Translation
Subtitle of host publicationTranslatology in the Digital Age
EditorsYuri Bizzoni, Elke Teich, Cristina España-Bonet, Josef van Genabith
PublisherLinkoping University Electronic Press
Pages91–99
Publication statusPublished - 31 May 2021
EventWorkshop on Modelling Translation: Translatology in the Digital Age - Online City, Iceland
Duration: 31 May 20212 Jun 2021
Conference number: 1
https://easychair.org/cfp/MoTra21

Publication series

NameNEALT Proceedings Series
PublisherLinköping University Electronic Press
ISSN (Print)1650-3686
ISSN (Electronic)1650-3740

Workshop

WorkshopWorkshop on Modelling Translation
Abbreviated titleMoTra21
Country/TerritoryIceland
CityOnline City
Period31/05/212/06/21
Internet address

Fingerprint

Dive into the research topics of 'Automatic classification of human translation and machine translation: a study from the perspective of lexical diversity'. Together they form a unique fingerprint.

Cite this