Robustness of Pre-trained Language Models against Adversarial Attacks

Authors

  • Jānis Bērziņš Tilde, Riga, Latvia
  • Elīna Kalniņa Tilde, Riga, Latvia

Abstract

Pre-trained language models, such as BERT, GPT, and their derivatives, have revolutionized natural language processing (NLP) tasks. Despite their success, these models are vulnerable to adversarial attacks, which pose significant threats to their robustness and reliability. This paper explores the robustness of pre-trained language models against various types of adversarial attacks, examining both the nature of these attacks and the defenses that can be employed. We review existing literature, analyze the strengths and weaknesses of current approaches, and propose directions for future research to enhance the robustness of these models.

Downloads

Published

2024-08-07