Improving Zero-Shot Transfer Learning in Vision-Language Models via Multimodal Contrastive Alignment

Authors

  • Luka Radoslav Department of Information Systems, University of Andorra, Andorra

Abstract

Zero-shot transfer learning aims to extend the capabilities of vision-language models to novel tasks without explicit task-specific training. This paper proposes a novel approach to improve zero-shot transfer learning by leveraging multimodal contrastive alignment. By enhancing the alignment between visual and textual modalities, the proposed method increases the model’s ability to generalize across unseen tasks, improving performance and robustness in various applications.

Downloads

Published

2024-08-13