Return to Article Details Unified Multimodal Transformers: Improving Vision-Language Models with Knowledge-Guided Attention Mechanisms Download Download PDF