Jiaang Li's Avatar

Jiaang Li

@jiaangli.bsky.social

PhD student at University of Copenhagen @belongielab.org | #nlp #computervision | ELLIS student @ellis.eu 🌐 https://jiaangli.github.io/

1,111 Followers  |  85 Following  |  17 Posts  |  Joined: 17.11.2024  |  2.1653

Latest posts by jiaangli.bsky.social on Bluesky

Feel free to reach out and chat with Xinyi on July 18th in Vancouver at the #ICML

14.07.2025 08:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
NeurIPS participation in Europe We seek to understand if there is interest in being able to attend NeurIPS in Europe, i.e. without travelling to San Diego, US. In the following, assume that it is possible to present accepted papers ...

Would you present your next NeurIPS paper in Europe instead of traveling to San Diego (US) if this was an option? SΓΈren Hauberg (DTU) and I would love to hear the answer through this poll: (1/6)

30.03.2025 18:04 β€” πŸ‘ 280    πŸ” 160    πŸ’¬ 6    πŸ“Œ 13
Post image

Check out our new preprint π“πžπ§π¬π¨π«π†π‘πšπƒ.
We use a robust decomposition of the gradient tensors into low-rank + sparse parts to reduce optimizer memory for Neural Operators by up to πŸ•πŸ“%, while matching the performance of Adam, even on turbulent Navier–Stokes (Re 10e5).

03.06.2025 03:16 β€” πŸ‘ 29    πŸ” 7    πŸ’¬ 2    πŸ“Œ 2

PhD student, Jiaang Li and his collaborators, with insights into cultural understanding of vision-language models πŸ‘‡

02.06.2025 18:12 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Paper title "Cultural Evaluations of Vision-Language Models
Have a Lot to Learn from Cultural Theory"

Paper title "Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory"

I am excited to announce our latest work πŸŽ‰ "Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory". We review recent works on culture in VLMs and argue for deeper grounding in cultural theory to enable more inclusive evaluations.

Paper πŸ”—: arxiv.org/pdf/2505.22793

02.06.2025 10:36 β€” πŸ‘ 57    πŸ” 18    πŸ’¬ 3    πŸ“Œ 5

Great collaboration with @yfyuan01.bsky.social @wenyan62.bsky.social @aliannejadi.bsky.social @danielhers.bsky.social , Anders Søgaard, Ivan Vulić, Wenxuan Zhang, Paul Liang, Yang Deng, @serge.belongie.com

23.05.2025 17:04 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
jaagli/ravenea Β· Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

πŸ”—More here:
Project Page: jiaangli.github.io/RAVENEA/
Code: github.com/yfyuan01/RAV...
Dataset: huggingface.co/datasets/jaa...

23.05.2025 17:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ“ŠOur experiments demonstrate that even lightweight VLMs, when augmented with culturally relevant retrievals, outperform their non-augmented counterparts and even surpass the next larger model tier, achieving at least a 3.2% improvement in cVQA and 6.2% in cIC.

23.05.2025 17:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ› Culture-Aware Contrastive Learning

We propose Culture-aware Contrastive (CAC) Learning, a supervised learning framework compatible with both CLIP and SigLIP architectures. Fine-tuning with CAC can help models better capture culturally significant content.

23.05.2025 17:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ“š Dataset Construction
RAVENEA integrates 1,800+ images, 2,000+ culture-related questions, 500+ human captions, and 10,000+ human-ranked Wikipedia documents to support two key tasks:

🎯Culture-focused Visual Question Answering (cVQA)
πŸ“Culture-informed Image Captioning (cIC)

23.05.2025 17:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸš€New PreprintπŸš€
Can Multimodal Retrieval Enhance Cultural Awareness in Vision-Language Models?

Excited to introduce RAVENEA, a new benchmark aimed at evaluating cultural understanding in VLMs through RAG.
arxiv.org/abs/2505.14462

More details:πŸ‘‡

23.05.2025 17:04 β€” πŸ‘ 17    πŸ” 7    πŸ’¬ 1    πŸ“Œ 2

Super cool! Incidentally, in our previous project, we also found that linear alignment between embedding spaces from two modalities is viable β€” and the alignment improves as LLMs scale.
bsky.app/profile/jiaa...

23.05.2025 13:59 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Revisiting the Othello World Model Hypothesis Li et al. (2023) used the Othello board game as a test case for the ability of GPT-2 to induce world models, and were followed up by Nanda et al. (2023b). We briefly discuss the original experiments, ...

I won’t be attending #ICLR in person this year😒. But feel free to check our paper β€˜Revisiting the Othello World Model Hypothesis’ with Anders SΓΈgaard, accepted at ICLR world models workshop!
Paper link arxiv.org/abs/2503.04421

21.04.2025 21:09 β€” πŸ‘ 1    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

Thrilled to announce "Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation" is accepted as a Spotlight (5%) at #ICLR2025!

Our model MM-FSS leverages 3D, 2D, & text modalities for robust few-shot 3D segmentationβ€”all without extra labeling cost. 🀩

arxiv.org/pdf/2410.22489

More detailsπŸ‘‡

11.02.2025 17:49 β€” πŸ‘ 25    πŸ” 7    πŸ’¬ 1    πŸ“Œ 0
Post image

Forget just thinking in words.

πŸ””Our New Preprint:
πŸš€ New Era of Multimodal Reasoning🚨
πŸ” Imagine While Reasoning in Space with MVoT

Multimodal Visualization-of-Thought (MVoT) revolutionizes reasoning by generating visual "thoughts" that transform how AI thinks, reasons, and explains itself.

14.01.2025 14:50 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

FGVC12 Workshop is coming to #CVPR 2025 in Nashville!

Are you working on fine-grained visual problems?
This year we have two peer-reviewed paper tracks:
i) 8-page CVPR Workshop proceedings
ii) 4-page non-archival extended abstracts
CALL FOR PAPERS: sites.google.com/view/fgvc12/...

09.01.2025 17:36 β€” πŸ‘ 10    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
VidenSkaber | Min AI forstΓ₯r mig ikke - professor Serge Belongie
YouTube video by Videnskabernes Selskab VidenSkaber | Min AI forstΓ₯r mig ikke - professor Serge Belongie

Here’s a short film produced by the Danish Royal Academy of Sciences, showcasing the WineSensed 🍷 project of Þóranna Bender et al. thoranna.github.io/learning_to_...

30.12.2024 11:05 β€” πŸ‘ 17    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Post image

From San Diego to New York to Copenhagen, wishing you Happy Holidays!πŸŽ„

21.12.2024 11:20 β€” πŸ‘ 39    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

With @neuripsconf.bsky.social right around the corner, we’re excited to be presenting our work soon! Here’s an overview

(1/5)

03.12.2024 11:43 β€” πŸ‘ 16    πŸ” 6    πŸ’¬ 1    πŸ“Œ 2
Preview
Belongie Lab Join the conversation

Here’s a starter pack with members of our lab that have joined Bluesky

25.11.2024 10:42 β€” πŸ‘ 13    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0
Preview
a panda bear is rolling around in the grass in a zoo enclosure . Alt: a panda bear is rolling around in the grass in a zoo enclosure .

No one can explain stochastic gradient descent better than this panda.

24.11.2024 15:04 β€” πŸ‘ 216    πŸ” 32    πŸ’¬ 10    πŸ“Œ 6

πŸ™‹β€β™‚οΈ

24.11.2024 11:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - jiaangli/VLCA: Do Vision and Language Models Share Concepts? A Vector Space Alignment Study Do Vision and Language Models Share Concepts? A Vector Space Alignment Study - jiaangli/VLCA

Great collaboration with @constanzafierro.bsky.social , @YovaKem_v2, and Anders SΓΈgaard!

πŸ‘¨β€πŸ’» github.com/jiaangli/VLCA
πŸ“ƒ direct.mit.edu/tacl/article...

19.11.2024 13:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

πŸš€Take away:

1. Representation spaces of LMs and VMs grow more partially similar with model size.
2. Lower frequency, polysemy, dispersion can be easier to align.
3. Shared concepts between LMs and VMs might extend beyond nouns.

🧡(7/8)
#NLP #NLProc

19.11.2024 13:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

🌱We then discuss the implications of our finding:
- the LM understanding debate
- the study of emergent properties
- philosophy

🧡(6/8)

19.11.2024 13:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image

πŸ”We also measure the generalization of the mapping to other POS, and explore the impact of different size of the training data. πŸ‘€To investigate the effects of incorporating text signals during vision pretraining, we compare pure vision models against selected CLIP vision encoders.

🧡(5/8)

19.11.2024 13:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

What factors influence the convergence?
πŸ”Our experiments show the alignability of LMs
and vision models is sensitive to image and language dispersion, polysemy, and frequency.

🧡(4/8)

19.11.2024 12:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The results show a clear trend:
✨LMs converge toward the geometry of visual models as they grow bigger and better.

🧡(3/8)

19.11.2024 12:48 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Mapping vector spaces:
🎯We measure the alignment between vision models and LMs by mapping their vector spaces and evaluating retrieval precision on held-out data.

🧡(2/8)

19.11.2024 12:48 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ€”Do Vision and Language Models Share Concepts? πŸš€
We present an empirical evaluation and find that language models partially converge towards representations isomorphic to those of vision models. #EMNLP

πŸ“ƒ direct.mit.edu/tacl/article...

19.11.2024 12:48 β€” πŸ‘ 26    πŸ” 7    πŸ’¬ 2    πŸ“Œ 2

@jiaangli is following 20 prominent accounts