The hidden flaws in your favorite foundation model: We've uncovered how subtle image metadata (JPEG params, camera type, etc.) systematically biases visual representations and consequently affect the object recognition ability. To be presented at #ICCV2025 as a highlight paper.
19.08.2025 07:53 β π 10 π 2 π¬ 2 π 0
No we shouldn't... but also every metric can be optimised through shortcuts and cheating. That doesn't make an evaluation metric itself useless.
The h-index is only meaningful to compare one's career over time, not across people. When calculated (without cheating), it shows a useful trajectory.
19.08.2025 08:21 β π 2 π 0 π¬ 1 π 0
not even for tasks such as stirring your food or washing dishes? :-)
18.08.2025 18:14 β π 0 π 0 π¬ 1 π 0
This is one of the coolest ideas using EPIC-KITCHENS in a long while...
We've all been waiting to be replaced by robots! At least this is now done in the generative space...
Great work by Marion Lepert, Jiaying Fang and @leto--jean.bsky.social from Stanford IPRL.. congrats!
arxiv.org/abs/2508.09976
18.08.2025 17:29 β π 9 π 1 π¬ 1 π 0
Vision is gen-NI !
('The shape of things unseen', Adam Zeman)
17.08.2025 17:45 β π 8 π 2 π¬ 0 π 0
Happy Birthday Kosta⦠thanks for sharing the lovely pictures!
Wishing you and your adorable family the next 50 years of Happiness, Heath and Success π₯³π₯³
15.08.2025 08:28 β π 2 π 0 π¬ 1 π 0
We have released the winners list and winning reports for the 2025 EPIC-KITCHENS and HD-EPIC VQA Challenges, awarded at the 2nd #EgoVis workshop @cvprconference.bsky.social #CVPR2025
Check these reports:
epic-kitchens.github.io/2025#results
hd-epic.github.io/index#vqa-be...
14.08.2025 14:48 β π 3 π 0 π¬ 0 π 0
BBC Inside Science BBC Radio 4 has been exploring the most powerful computer the UK has ever seen π₯οΈπ€―
Hear how our Isambard-AI #supercomputer is being used to carry out groundbreaking new research: www.bbc.co.uk/sounds/play/...
08.08.2025 13:40 β π 4 π 1 π¬ 0 π 0
Finally the next @iclr-conf.bsky.social location is revealed...
iclr.cc
#ICLR2026 will be in Rio de Janeiro from 23 to 27 April!
05.08.2025 23:45 β π 13 π 1 π¬ 0 π 0
If you have not seen this yet, you are missing a lot!
Genie 3 by Google DeepMind was unveiled today &delivers in abundance.
Of course my fav example is ego x world model.
It is video gen x modeling "out of the frame".
Many congrats @jparkerholder.bsky.social & team
deepmind.google/discover/blo...
05.08.2025 19:38 β π 9 π 1 π¬ 0 π 0
You should read the Pascal POC challenge definition
link.springer.com/article/10.1...
02.08.2025 23:44 β π 2 π 0 π¬ 1 π 0
Dark blue background to the left of image with Cardiff University logo in top left corner, red Babylab dragon and ident to the centre. Cream box overlays the blue background. Centred dark blue text reads "are you a medical or allied health professional based in the UK? Do you work with children with Down syndrome under 5? We want to hear your thoughts on using wearable head-mounted cameras in your practice". A QR code to the bottom left with dark blue text that reads "Find out more about our new remote study using the QR code or link in the caption"
Right side of image shows a young child with Down syndrome in a light blue t-shirt with the Babylab Dragon and text reading βLittle Scientistβ. She is wearing a soft, light blue foam helmet with a small camera attached to the front, just above eye level.
Cardiff Babylab is excited to launch a new, remote study for medical and allied health professionals in the UK!
We are inviting professionals to give their thoughts on integrating wearable head cameras into their practice.
Find out more: www.cardiff-babylab.com/tinyexplorer...
28.07.2025 11:42 β π 2 π 2 π¬ 0 π 0
Extended EPIC-SOUND paper was accepted at TPAMI
arxiv.org/abs/2302.006...
This follows ICASSP 2023 oral, extended for detection and further analysis
epic-kitchens.github.io/epic-sounds/
work by @jaesunghuh.bsky.social Jacob Chalk @ekazakos.bsky.social
@oxford-vgg.bsky.social @bristoluni.bsky.social
22.07.2025 12:00 β π 6 π 3 π¬ 0 π 0
wouldn't you be distilling some intermediate features rather than just the label?
No clue, I've never worked on distillation myself
16.07.2025 12:59 β π 2 π 0 π¬ 1 π 0
Perception Test [3rd edition]
perception-test-challenge.github.io
github.com/google-deepm...
led by Viorica Patraucean, Joe Heyward, @nikparth1.bsky.social @tylerzhu.bsky.social
JoΓ£o Carreira, AZ and myself
Up to 50K in prizes sponsored by Google DeepMind
More on PT:
youtu.be/8BiajMOBWdk?...
4/4
16.07.2025 12:46 β π 0 π 0 π¬ 0 π 0
- two guest tracks:
- KiVA image understanding challenge
kiva-challenge.github.io
βͺ@euniceyiu.bsky.social @shiryginosar.bsky.social
- Physics-IQ video generation challenge
physics-iq.github.io/workshop/phy...
3/4
16.07.2025 12:43 β π 1 π 0 π¬ 1 π 0
Our new novel tracks unify diverse tasks under common interfaces to move beyond single-task models:
- joint object/point tracking
- joint action/sound localisation
- unified multiple-choice videoQA
Also:
- novel VLM interpretability track -- can you show where models fail?
2/4
16.07.2025 12:40 β π 1 π 0 π¬ 1 π 0
Join us for 3rd Perception Test Workshop &Challenge
@iccv.bsky.social #iccv2025
*NEW* this year:
- 3 unified tracks
- novel interpretability track
- guest tracks: KiVA and Physics-IQ
- 4 world-class speakers (see pic)
Up to 50K in prizes sponsored by Google DeepMind
π§΅ for details [1/4]
16.07.2025 12:40 β π 8 π 3 π¬ 1 π 0
Scaling 4D Representations
Scaling 4D Representations
Self-supervised learning from video does scale! In our latest work, we scaled masked auto-encoding models to 22B params, boosting performance on pose estimation, tracking & more.
Paper: arxiv.org/abs/2412.15212
Code & models: github.com/google-deepmind/representations4d
10.07.2025 11:52 β π 20 π 8 π¬ 0 π 0
Also proud of the HD-EPIC team (Omar, Fahd, Kaiting) who are attending @ICVSS on behalf of @bristoluni.bsky.social and @unileiden.bsky.social. They had a busy poster session #ICVSS2025 yesterday detailing our @cvprconference.bsky.social work and public dataset
hd-epic.github.io
2/3
08.07.2025 12:33 β π 2 π 0 π¬ 1 π 0
No no... really my 40th birthday :-D [which was 4 years ago]. I invited my friends to "Go Ape" which is what these are called in the UK :-)
and we really had fun!
06.07.2025 20:30 β π 1 π 0 π¬ 1 π 0
Love them too, including the zip lines at the end. I went there for my 40th π€
06.07.2025 06:27 β π 4 π 0 π¬ 1 π 0
In a new paper led by Gianluca Monaci, with @weinzaepfelp.bsky.social and myself, we explore the relationship between rel pose estimation and image goal navigation and study different architectures: late fusion, channel cat (w/ or w/o space2depth) and cross-attention.
arxiv.org/abs/2507.01667
π§΅1/5
04.07.2025 17:00 β π 24 π 5 π¬ 1 π 1
Assistant Professor at the University of Cambridge @eng.cam.ac.uk, working on 3D computer vision and inverse graphics, previously postdoc at Stanford and PhD at Oxford @oxford-vgg.bsky.social
https://elliottwu.com/
CS Ph.D. student at University of Bristol. Prev: Researcher at Hitachi Research Lab, Tokyo
RS at Google DeepMind and Honorary Lecturer at UCL. Building general world models to solve AGI :)
Visual Inference Lab of @stefanroth.bsky.social at @tuda.bsky.social - Research in Computer Vision and Machine Learning.
See https://www.visinf.tu-darmstadt.de/visual_inference
Assistant Prof at Stanford, Director of Stanford Translational AI (STAI) Lab
Computer Vision, Computational Neuroscience
Incoming Assistant Professor at the University of Cambridge
https://ayushtewari.com/
Researcher in machine learning, optimization, computer vision, image and signal processing
Principal research scientist at Naver Labs Europe, I am interested in most aspects of computer vision, including 3D scene reconstruction and understanding, visual localization, image-text joint representation, embodied AI, ...
https://jytime.github.io/
We are a CUCHDS, Cardiff based research group working to understand the early development of children with and without developmental conditions. πΆπ»π©βπ¬
Website: https://www.cardiff-babylab.com/
The German Conference on Pattern Recognition (GCPR) is the annual symposium of the German Association for Pattern Recognition (DAGM). It is the national venue for recent advances in image processing, pattern recognition, and computer vision.
Assistant Professor Stanford CS. Perception, learning and control for autonomous robotic manipulation. https://web.stanford.edu/~bohg/
Junior research group leader at TUM | University of TΓΌbingen. Previously at VGG (Oxford), BAIR (Berkeley). Interested in multi-modal learning.
π https://akoepke.github.io/
Assistant Prof. at Georgia Tech | NVIDIA AI | Making robots smarter
Blog: https://sander.ai/
π¦: https://x.com/sedielem
Research Scientist at Google DeepMind (WaveNet, Imagen 3, Veo, ...). I tweet about deep learning (research + software), music, generative models (personal account).