Return to Article Details Interpretable Multimodal Transformers: Bridging the Gap Between Visual and Textual Representations Download Download PDF