Search Machine Learning Papers

Tips: Use specific ML terms like "transformer", "reinforcement learning", or "computer vision". Search for authors with quotes like "Hinton".

Search Results for "computer vision"

Found 20 papers

cs.LG cs.AI cs.CL cs.CV I.5.4
2020-04-10
Multimodal Categorization of Crisis Events in Social Media
Mahdi Abavisani, Liwei Wu, Shengli Hu, Joel Tetreault, Alejandro Jaimes

Recent developments in image classification and natural language processing, coupled with the rapid growth in social media usage, have enabled fundamental advances in detecting breaking events around ...

cs.CV cs.AI cs.CL cs.LG
2022-03-22
Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions
Jing Gu, Eliana Stefani, Qi Wu, Jesse Thomason, Xin Eric Wang

A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigat...

cs.CV cs.AI cs.CL cs.LG
2022-04-20
K-LITE: Learning Transferable Visual Models with External Knowledge
Sheng Shen, Chunyuan Li, Xiaowei Hu, Jianwei Yang, Yujia Xie, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao

The new generation of state-of-the-art computer vision systems are trained from natural language supervision, ranging from simple object category names to descriptive captions. This form of supervisio...

cs.AI cs.CL cs.CV cs.LG
2024-08-20
Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification
Christos Constantinou, Georgios Ioannides, Aman Chadha, Aaron Elkins, Edwin Simpson

Detecting out-of-distribution (OOD) data is crucial in machine learning applications to mitigate the risk of model overconfidence, thereby enhancing the reliability and safety of deployed systems. The...

cs.CV cs.AI cs.CL cs.LG
2024-10-04
Mamba in Vision: A Comprehensive Survey of Techniques and Applications
Md Maklachur Rahman, Abdullah Aman Tutul, Ankur Nath, Lamyanba Laishram, Soon Ki Jung, Tracy Hammond

Mamba is emerging as a novel approach to overcome the challenges faced by Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in computer vision. While CNNs excel at extracting local f...

cs.CV cs.AI cs.CL cs.LG
2023-10-03
Trainable Noise Model as an XAI evaluation method: application on Sobol for remote sensing image segmentation
Hossein Shreim, Abdul Karim Gizzini, Ali J. Ghandour

eXplainable Artificial Intelligence (XAI) has emerged as an essential requirement when dealing with mission-critical applications, ensuring transparency and interpretability of the employed black box ...

cs.CL cs.AI cs.CV cs.LG
2022-06-18
CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
Tejas Srinivasan, Ting-Yun Chang, Leticia Leonor Pinto Alva, Georgios Chochlakis, Mohammad Rostami, Jesse Thomason

Current state-of-the-art vision-and-language models are evaluated on tasks either individually or in a multi-task setting, overlooking the challenges of continually learning (CL) tasks as they arrive....

cs.CL cs.AI cs.CV cs.LG
2025-04-28
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
Run Luo, Renke Shan, Longze Chen, Ziqiang Liu, Lu Wang, Min Yang, Xiaobo Xia

Large Vision-Language Models (LVLMs) are pivotal for real-world AI tasks like embodied intelligence due to their strong vision-language reasoning abilities. However, current LVLMs process entire image...

cs.CV cs.AI cs.CL cs.LG
2025-06-05
Coordinated Robustness Evaluation Framework for Vision-Language Models
Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Antonio Guillen, Ricardo Luna Gutierrez, Soumyendu Sarkar

Vision-language models, which integrate computer vision and natural language processing capabilities, have demonstrated significant advancements in tasks such as image captioning and visual question a...

cs.CV cs.AI cs.CL cs.LG
2023-07-06
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Xuanlin Li, Yunhao Fang, Minghua Liu, Zhan Ling, Zhuowen Tu, Hao Su

Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impract...

cs.CL cs.AI cs.CV cs.LG
2025-04-27
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
Mohamed Gado, Towhid Taliee, Muhammad Memon, Dmitry Ignatov, Radu Timofte

Visual storytelling is an interdisciplinary field combining computer vision and natural language processing to generate cohesive narratives from sequences of images. This paper presents a novel approa...

cs.CV cs.AI cs.CL cs.LG
2024-11-07
Analyzing The Language of Visual Tokens
David M. Chan, Rodolfo Corona, Joonyong Park, Cheol Jun Cho, Yutong Bai, Trevor Darrell

With the introduction of transformer-based models for vision and language tasks, such as LLaVA and Chameleon, there has been renewed interest in the discrete tokenized representation of images. These ...

cs.LG cs.AI cs.CL cs.CV
2025-03-13
Transformers without Normalization
Jiachen Zhu, Xinlei Chen, Kaiming He, Yann LeCun, Zhuang Liu

Normalization layers are ubiquitous in modern neural networks and have long been considered essential. This work demonstrates that Transformers without normalization can achieve the same or better per...

cs.CV cs.AI cs.CL cs.LG
2022-01-26
Natural Language Descriptions of Deep Visual Features
Evan Hernandez, Sarah Schwettmann, David Bau, Teona Bagashvili, Antonio Torralba, Jacob Andreas

Some neurons in deep networks specialize in recognizing highly specific perceptual, structural, or semantic features of inputs. In computer vision, techniques exist for identifying neurons that respon...

cs.CV cs.AI cs.CL cs.LG
2024-10-08
Unsupervised Model Diagnosis
Yinong Oliver Wang, Eileen Li, Jinqi Luo, Zhaoning Wang, Fernando De la Torre

Ensuring model explainability and robustness is essential for reliable deployment of deep vision systems. Current methods for evaluating robustness rely on collecting and annotating extensive test set...

cs.LG cs.AI cs.CL cs.CV
2023-02-28
Linear Spaces of Meanings: Compositional Structures in Vision-Language Models
Matthew Trager, Pramuditha Perera, Luca Zancato, Alessandro Achille, Parminder Bhatia, Stefano Soatto

We investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs). Traditionally, compositionality has been associated with algebraic operations on embeddings o...

cs.LG cs.AI cs.CL cs.CV
2021-11-29
A General Framework for Defending Against Backdoor Attacks via Influence Graph
Xiaofei Sun, Jiwei Li, Xiaoya Li, Ziyao Wang, Tianwei Zhang, Han Qiu, Fei Wu, Chun Fan

In this work, we propose a new and general framework to defend against backdoor attacks, inspired by the fact that attack triggers usually follow a \textsc{specific} type of attacking pattern, and the...

cs.CV cs.AI cs.CL cs.LG
2024-11-20
Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-training
Ameera Bawazir, Kebin Wu, Wenbin Li

Recent advancements in vision-language pre-training via contrastive learning have significantly improved performance across computer vision tasks. However, in the medical domain, obtaining multimodal ...

cs.CV cs.AI cs.CL cs.LG
2023-07-06
Vision Language Transformers: A Survey
Clayton Fields, Casey Kennington

Vision language tasks, such as answering questions about or generating captions that describe an image, are difficult tasks for computers to perform. A relatively recent body of research has adapted t...

cs.CL cs.AI cs.CV cs.LG
2022-03-15
Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness
Tejas Gokhale, Swaroop Mishra, Man Luo, Bhavdeep Singh Sachdeva, Chitta Baral

Data modification, either via additional training datasets, data augmentation, debiasing, and dataset filtering, has been proposed as an effective solution for generalizing to out-of-domain (OOD) inpu...