Abstract
Language Models (LMs) have shown significant advancements in various Natural Language Processing (NLP) tasks. However, recent studies indicate that LMs are particularly susceptible to malicious backdoor attacks, where attackers manipulate the models to exhibit specific behaviors when they encounter particular triggers. While existing research has focused on backdoor attacks against English LMs, Chinese LMs remain largely unexplored. Moreover, existing backdoor attacks against Chinese LMs exhibit limited stealthiness. In this paper, we investigate the high detectability of current backdoor attacks against Chinese LMs and propose a more stealthy backdoor attack method based on syntactic paraphrasing. Specifically, we leverage large language models (LLMs) to construct a syntactic paraphrasing mechanism that transforms benign inputs into poisoned samples with predefined syntactic structures. Subsequently, we exploit the syntactic structures of these poisoned samples as triggers to create more stealthy and robust backdoor attacks across various attack strategies. Extensive experiments conducted on three major NLP tasks with various Chinese PLMs and LLMs demonstrate that our method can achieve comparable attack performance (almost 100% success rate). Additionally, the poisoned samples generated by our method show lower perplexity and fewer grammatical errors compared to traditional character-level backdoor attacks. Furthermore, our method exhibits strong resistance against two state-of-the-art backdoor defense mechanisms.
| Original language | English |
|---|---|
| Article number | 103376 |
| Pages (from-to) | 1-13 |
| Number of pages | 13 |
| Journal | Information Fusion |
| Volume | 124 |
| Early online date | 12 Jun 2025 |
| DOIs | |
| Publication status | Published - Dec 2025 |
Keywords
- Backdoor attacks
- Generative artificial intelligence
- Large language models
- Model security
- Syntactic structure
- Synthetic data generation