AF Cooper, MA Lemley, C De Sa, L Duesterwald… - arXiv preprint arXiv …, 2026
Recent work shows that standard greedy-decoding extraction methods for
quantifying memorization in LLMs miss how extraction risk varies across sequences.
Probabilistic extraction--computing the probability of generating a target suffix given …
JM Martins, J Jumelet, V Priesemann, L Beinborn - arXiv preprint arXiv:2603.19427, 2026
Why do some languages like Czech permit free word order, while others like English
do not? We address this question by pretraining transformer language models on a
spectrum of synthetic word-order variants of natural languages. We observe that …
S Ouyang, H Wang, G Fang, X Ma, L Lin, X Wang - … of the AAAI Conference on Artificial …, 2026
Hallucination in Large Vision-Language Models (LVLMs) remains a critical
challenge, undermining their reliability in real-world applications. Existing studies
have investigated the causes of hallucination at the modality level and proposed …
H Wang, W Xie, H Jiang, Y Wei, K Jiang, M Cao, C Hao… - Proceedings of the AAAI …, 2026
In recent years, Large Vision-Language Models (LVLMs) have significantly
advanced multimodal tasks. However, their inference requires intensive processing
of numerous visual tokens and incurs substantial computational overhead. Existing …
Y Yu, B Chen, Y Zhang, T Xie, M Jing, L Zuo - … of the AAAI Conference on Artificial …, 2026
Large vision-language models (LVLMs) have demonstrated remarkable capabilities
in understanding multimodal data such as images and text. However, the number of
visual tokens in these models often far exceeds that of textual tokens, resulting in …
Y Zhou, Y Zhang, J Chang, X Gu, Y Wang, K Ding… - Proceedings of the AAAI …, 2026
Despite the rapid progress of Vision Language Models (VLMs), existing benchmarks
still concentrate on coarse-grained object recognition or simple relational reasoning,
leaving the fine-grained and higher-order reasoning abilities of these systems largely …
Z Wang, M Li, H Yin, W Liu, Z Wang - Proceedings of the AAAI Conference on …, 2026
Large Vision-Language Models (LVLMs) enhance performance on vision-language
tasks by integrating visual features from pre-trained vision encoders into large
language models (LLMs). However, the large number of visual tokens introduces …
H Liang, Y Shen, Y Deng, S Xu, Z Feng, T Zhang… - arXiv preprint arXiv …, 2026
Achieving human-like spatial intelligence for vision-language models (VLMs)
requires inferring 3D structures from 2D observations, recognizing object properties
and relations in 3D space, and performing high-level spatial reasoning. In this paper …
J Liu, D Fan, C Ji, D Zha, Q Tan - arXiv preprint arXiv:2603.13370, 2026
Vision-Language Models (VLMs) have demonstrated remarkable capabilities in
aligning and understanding multimodal signals, yet their potential to reason over
structured data, where multimodal entities are connected through explicit relational …
I Puri, M Damani, I Shenfeld, M Ghassemi, J Andreas… - arXiv preprint arXiv …, 2026
Given a question, a language model (LM) implicitly encodes a distribution over
possible answers. In practice, post-training procedures for LMs often collapse this
distribution onto a single dominant mode. While this is generally not a problem for …
This message was sent by Google Scholar because you're following new articles related to research by Anthony (Tony) G Cohn.