Abstract
This article explores multilingual sentiment analysis of short texts using three commercial decoder-only Large Language Models (“LLMs” ): OpenAI’s ChatGPT, Anthropic’s Claude, and Google’s Gemini. The training data for these models is approximately 90% English, and it remains an open question whether it is better to evaluate text data in its original language or translate it into English first. We build on previous research on sentiment analysis of multilingual short texts, such as those found on social media, using 1000 short text samples in seven languages (English, Spanish, French, Portuguese, Arabic, Japanese, and Korean) translated into English using Google Translate. We processed these samples with decoder-only LLMs and compared their results with those fromother methods (encoder-only LLMs, RNNs, lexicons). We found that decoder-only LLMs achieved the highest accuracy across all sentiment analysis methods when working with the original language data. The only exception was with the French data, where an RNN was the most accurate. Among the three decoder-only LLMs, ChatGPT had the highest accuracy in four of the seven languages, Claude in two, and Gemini, whichranked second in six of the seven languages.
| Original language | English |
|---|---|
| Pages (from-to) | 319-331 |
| Number of pages | 13 |
| Journal | Journal of Social Media Research |
| Volume | 2 |
| Issue number | 4 |
| DOIs | |
| Publication status | Published - 4 Dec 2025 |
Keywords
- Multilingual analysis
- Sentiment analysis
- Social media
- Artificial intelligence
- Large language models
Fingerprint
Dive into the research topics of 'Comparison of commercial decoder-only large language models for multilingual sentiment analysis of short text'. Together they form a unique fingerprint.Datasets
-
Comparison of commercial decoder-only large language models for multilingual sentiment analysis of short text (dataset)
Burns, J. (Creator) & Kelsey, T. (Creator), GitHub, 2025
https://github.com/jb370/Comparison-of-Commercial
Dataset