Robustness of Pre-trained Language Models against Adversarial Attacks
Abstract
Pre-trained language models, such as BERT, GPT, and their derivatives, have revolutionized natural language processing (NLP) tasks. Despite their success, these models are vulnerable to adversarial attacks, which pose significant threats to their robustness and reliability. This paper explores the robustness of pre-trained language models against various types of adversarial attacks, examining both the nature of these attacks and the defenses that can be employed. We review existing literature, analyze the strengths and weaknesses of current approaches, and propose directions for future research to enhance the robustness of these models.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 MZ Computing Journal
This work is licensed under a Creative Commons Attribution 4.0 International License.