Search Machine Learning Papers

Search Results for "computer vision"

Found 20 papers

cs.LG cs.AI cs.CL cs.CV I.5.4

2020-04-10

Multimodal Categorization of Crisis Events in Social Media

Mahdi Abavisani, Liwei Wu, Shengli Hu, Joel Tetreault, Alejandro Jaimes

Recent developments in image classification and natural language processing, coupled with the rapid growth in social media usage, have enabled fundamental advances in detecting breaking events around ...

View Details

PDF arXiv

cs.CV cs.AI cs.CL cs.LG

2022-03-22

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

Jing Gu, Eliana Stefani, Qi Wu, Jesse Thomason, Xin Eric Wang

A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigat...

View Details

PDF arXiv

cs.CV cs.AI cs.CL cs.LG

2022-04-20

K-LITE: Learning Transferable Visual Models with External Knowledge

Sheng Shen, Chunyuan Li, Xiaowei Hu, Jianwei Yang, Yujia Xie, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao

The new generation of state-of-the-art computer vision systems are trained from natural language supervision, ranging from simple object category names to descriptive captions. This form of supervisio...

View Details

PDF arXiv

cs.AI cs.CL cs.CV cs.LG

2024-08-20

Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification

Christos Constantinou, Georgios Ioannides, Aman Chadha, Aaron Elkins, Edwin Simpson

Detecting out-of-distribution (OOD) data is crucial in machine learning applications to mitigate the risk of model overconfidence, thereby enhancing the reliability and safety of deployed systems. The...

View Details

PDF arXiv

cs.CV cs.AI cs.CL cs.LG

2024-10-04

Mamba in Vision: A Comprehensive Survey of Techniques and Applications

Md Maklachur Rahman, Abdullah Aman Tutul, Ankur Nath, Lamyanba Laishram, Soon Ki Jung, Tracy Hammond

Mamba is emerging as a novel approach to overcome the challenges faced by Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in computer vision. While CNNs excel at extracting local f...

View Details

PDF arXiv

cs.CV cs.AI cs.CL cs.LG

2023-10-03

Trainable Noise Model as an XAI evaluation method: application on Sobol for remote sensing image segmentation

Hossein Shreim, Abdul Karim Gizzini, Ali J. Ghandour

eXplainable Artificial Intelligence (XAI) has emerged as an essential requirement when dealing with mission-critical applications, ensuring transparency and interpretability of the employed black box ...

View Details

PDF arXiv

cs.CL cs.AI cs.CV cs.LG

2022-06-18

CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks

Tejas Srinivasan, Ting-Yun Chang, Leticia Leonor Pinto Alva, Georgios Chochlakis, Mohammad Rostami, Jesse Thomason

Current state-of-the-art vision-and-language models are evaluated on tasks either individually or in a multi-task setting, overlooking the challenges of continually learning (CL) tasks as they arrive....

View Details

PDF arXiv

cs.CL cs.AI cs.CV cs.LG

2025-04-28

VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning

Run Luo, Renke Shan, Longze Chen, Ziqiang Liu, Lu Wang, Min Yang, Xiaobo Xia

Large Vision-Language Models (LVLMs) are pivotal for real-world AI tasks like embodied intelligence due to their strong vision-language reasoning abilities. However, current LVLMs process entire image...

View Details

PDF arXiv

cs.CV cs.AI cs.CL cs.LG

2025-06-05

Coordinated Robustness Evaluation Framework for Vision-Language Models

Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Antonio Guillen, Ricardo Luna Gutierrez, Soumyendu Sarkar

Vision-language models, which integrate computer vision and natural language processing capabilities, have demonstrated significant advancements in tasks such as image captioning and visual question a...

View Details

PDF arXiv

cs.CV cs.AI cs.CL cs.LG

2023-07-06

Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

Xuanlin Li, Yunhao Fang, Minghua Liu, Zhan Ling, Zhuowen Tu, Hao Su

Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impract...

View Details

PDF arXiv

cs.CL cs.AI cs.CV cs.LG

2025-04-27

VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?

Mohamed Gado, Towhid Taliee, Muhammad Memon, Dmitry Ignatov, Radu Timofte

Visual storytelling is an interdisciplinary field combining computer vision and natural language processing to generate cohesive narratives from sequences of images. This paper presents a novel approa...

View Details

PDF arXiv

cs.CV cs.AI cs.CL cs.LG

2024-11-07

Analyzing The Language of Visual Tokens

David M. Chan, Rodolfo Corona, Joonyong Park, Cheol Jun Cho, Yutong Bai, Trevor Darrell

With the introduction of transformer-based models for vision and language tasks, such as LLaVA and Chameleon, there has been renewed interest in the discrete tokenized representation of images. These ...

View Details

PDF arXiv

cs.LG cs.AI cs.CL cs.CV

2025-03-13

Transformers without Normalization

Jiachen Zhu, Xinlei Chen, Kaiming He, Yann LeCun, Zhuang Liu

Normalization layers are ubiquitous in modern neural networks and have long been considered essential. This work demonstrates that Transformers without normalization can achieve the same or better per...

View Details

PDF arXiv

cs.CV cs.AI cs.CL cs.LG

2022-01-26

Natural Language Descriptions of Deep Visual Features

Evan Hernandez, Sarah Schwettmann, David Bau, Teona Bagashvili, Antonio Torralba, Jacob Andreas

Some neurons in deep networks specialize in recognizing highly specific perceptual, structural, or semantic features of inputs. In computer vision, techniques exist for identifying neurons that respon...

View Details

PDF arXiv

cs.CV cs.AI cs.CL cs.LG

2024-10-08

Unsupervised Model Diagnosis

Yinong Oliver Wang, Eileen Li, Jinqi Luo, Zhaoning Wang, Fernando De la Torre

Ensuring model explainability and robustness is essential for reliable deployment of deep vision systems. Current methods for evaluating robustness rely on collecting and annotating extensive test set...

View Details

PDF arXiv

cs.LG cs.AI cs.CL cs.CV

2023-02-28

Linear Spaces of Meanings: Compositional Structures in Vision-Language Models

Matthew Trager, Pramuditha Perera, Luca Zancato, Alessandro Achille, Parminder Bhatia, Stefano Soatto

We investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs). Traditionally, compositionality has been associated with algebraic operations on embeddings o...

View Details

PDF arXiv

cs.LG cs.AI cs.CL cs.CV

2021-11-29

A General Framework for Defending Against Backdoor Attacks via Influence Graph

Xiaofei Sun, Jiwei Li, Xiaoya Li, Ziyao Wang, Tianwei Zhang, Han Qiu, Fei Wu, Chun Fan

In this work, we propose a new and general framework to defend against backdoor attacks, inspired by the fact that attack triggers usually follow a \textsc{specific} type of attacking pattern, and the...

View Details

PDF arXiv

cs.CV cs.AI cs.CL cs.LG

2024-11-20

Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-training

Ameera Bawazir, Kebin Wu, Wenbin Li

Recent advancements in vision-language pre-training via contrastive learning have significantly improved performance across computer vision tasks. However, in the medical domain, obtaining multimodal ...

View Details

PDF arXiv

cs.CV cs.AI cs.CL cs.LG

2023-07-06

Vision Language Transformers: A Survey

Clayton Fields, Casey Kennington

Vision language tasks, such as answering questions about or generating captions that describe an image, are difficult tasks for computers to perform. A relatively recent body of research has adapted t...

View Details

PDF arXiv

cs.CL cs.AI cs.CV cs.LG

2022-03-15

Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness

Tejas Gokhale, Swaroop Mishra, Man Luo, Bhavdeep Singh Sachdeva, Chitta Baral

Data modification, either via additional training datasets, data augmentation, debiasing, and dataset filtering, has been proposed as an effective solution for generalizing to out-of-domain (OOD) inpu...

View Details

PDF arXiv