Winston Smith's Avatar

Winston Smith

@smithwinst0n.bsky.social

Graduate Student, ML, CV, Robotics

51 Followers  |  462 Following  |  2 Posts  |  Joined: 18.11.2024  |  2.0115

Latest posts by smithwinst0n.bsky.social on Bluesky

Post image

challenge!

21.12.2024 16:44 β€” πŸ‘ 22    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

Yesterday the hyped Genesis simulator released. But it's up to 10x slower than existing GPU sims, not 10-80x faster or 430,000x faster than realtime since they benchmark mostly static environments

blog post with corrected open source benchmarks & details: stoneztao.substack.com/p/the-new-hy...

20.12.2024 23:49 β€” πŸ‘ 88    πŸ” 21    πŸ’¬ 4    πŸ“Œ 7
Preview
o3: The grand finale of AI in 2024 A step change as influential as the release of GPT-4. Reasoning language models are the current big thing.

Excellent post about the recent OpenAI o3 results on ARC (& other benchmarks). I don't know how @natolambert.bsky.social manages to write these so quickly! I highly recommend his newsletter.

www.interconnects.ai/p/openais-o3...

I am (more slowly) writing my own take on all this, coming soon.

21.12.2024 19:52 β€” πŸ‘ 153    πŸ” 31    πŸ’¬ 2    πŸ“Œ 6

Waymo's "superhuman" crash rate is an indicator that the frequent argument that we need human-level intelligence to solve hard robotics tasks is seemingly wrong, we just need time and elbow grease

20.12.2024 01:44 β€” πŸ‘ 52    πŸ” 4    πŸ’¬ 8    πŸ“Œ 0
Preview
[NeurIPS HackerCup 2024] Grounding LLMs in Code Execution Grounding LLMs in Code Execution Gabriel Synnaeve, Meta, FAIR

Just gave a talk on "Grounding LLMs in Code Execution" at the NeurIPS Hacker-Cup AI Competition, here are the slides docs.google.com/presentation...

14.12.2024 19:10 β€” πŸ‘ 22    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

Interpreting CLIP: Insights on the Robustness to ImageNet Distribution Shifts

Jonathan CrabbΓ©, Pau Rodriguez, Vaishaal Shankar, Luca Zappella, Arno Blaas

Action editor: Pavel Izmailov

https://openreview.net/forum?id=1SCptTFtmV

#imagenet #robust #robustness

15.12.2024 04:07 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image Post image

Align3R: Aligned Monocular Depth Estimation for Dynamic Videos

Jiahao Lu et 10 al.

tl;dr: DepthPro for all frames -> inject depth ControlNet-style into Dust3r decoder, finetune on dynamic scenes. Long videos process in coarse-to-fine

arxiv.org/abs/2412.03079

13.12.2024 12:42 β€” πŸ‘ 12    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

πŸš€ Introducing the Byte Latent Transformer (BLT) – A LLM architecture that scales better than Llama 3 using patches instead of tokens 🀯
Paper πŸ“„ dl.fbaipublicfiles.com/blt/BLT__Pat...
Code πŸ› οΈ github.com/facebookrese...

13.12.2024 16:53 β€” πŸ‘ 60    πŸ” 15    πŸ’¬ 5    πŸ“Œ 3

One of the physics of llm papers studied that and found you need a certain amour of repetitions of a factoid before it’s memorized. Repetition can be either multi epochs or just the same fact in another document. Number of needed repeats is also related to model size.

13.12.2024 16:27 β€” πŸ‘ 11    πŸ” 2    πŸ’¬ 2    πŸ“Œ 0

Our paper PRISM alignment won a best paper award at #neurips2024!

All credits to @hannahrosekirk.bsky.social A.Whitefield, P.RΓΆttger, A.M.Bean, K.Margatina, R.Mosquera-Gomez, J.Ciro, @maxbartolo.bsky.social H.He, B.Vidgen, S.Hale

Catch Hannah tomorrow at neurips.cc/virtual/2024/poster/97804

11.12.2024 16:20 β€” πŸ‘ 67    πŸ” 9    πŸ’¬ 2    πŸ“Œ 0
Preview
The next chapter of the Gemini era for developers Explore the latest with the release of Gemini 2.0 Flash and new coding agents, now available for testing in Google AI Studio.

Welcome to Gemini 2.0 era!

I am thrilled about ✨ Gemini 2.0 Flash as it allowed us to build the next generation of Code Agents experience: developers.googleblog.com/en/the-next-...

11.12.2024 16:16 β€” πŸ‘ 9    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

Very interesting approach, added to my read list!

11.12.2024 08:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

🌍 Guessing where an image was taken is a hard, and often ambiguous problem. Introducing diffusion-based geolocationβ€”we predict global locations by refining random guesses into trajectories across the Earth's surface!

πŸ—ΊοΈ Paper, code, and demo: nicolas-dufour.github.io/plonk

10.12.2024 15:56 β€” πŸ‘ 96    πŸ” 32    πŸ’¬ 8    πŸ“Œ 5
Preview
Release 0.7 Β· simonw/llm-gemini New Gemini 2.0 Flash model: llm -m gemini-2.0-flash-exp 'prompt goes here'. #28

Gemini 2.0 is out, and there's a ton of interesting stuff about it. From my testing it looks like Gemini 2.0 Flash may be the best currently available multi-modal model - I upgraded my LLM plugin to support that here: github.com/simonw/llm-g...

Gemini 2.0 announcement: blog.google/technology/g...

11.12.2024 17:55 β€” πŸ‘ 129    πŸ” 17    πŸ’¬ 3    πŸ“Œ 1
Post image

Can we enhance the performance of T2I models without any fine-tuning?

We show that with our ReNO, Reward-based Noise Optimization, one-step models consistently surpass the performance of all current open-source Text-to-Image models within the computational budget of 20-50 sec!
#NeurIPS2024

11.12.2024 23:05 β€” πŸ‘ 27    πŸ” 7    πŸ’¬ 1    πŸ“Œ 1
Ethical Challenges Related to the NeurIPS 2024 Best Paper Award

The best paper awardee from NeuRIPS 2024 has been apparently accused of misconduct by his ByteDance peers. This raises many questions certainly:

var-integrity-report.github.io

12.12.2024 01:35 β€” πŸ‘ 23    πŸ” 8    πŸ’¬ 0    πŸ“Œ 6
Post image

1/ πŸŽ‰ Excited to share our work, "Composed Image Retrieval for Training-Free Domain Conversion", accepted at WACV 2025! πŸš€

05.12.2024 12:58 β€” πŸ‘ 16    πŸ” 5    πŸ’¬ 3    πŸ“Œ 1
Video thumbnail

Now on ArXiv
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
arxiv.org/abs/2412.01987
soczech.github.io/showhowto/
Given one real image &variable sequence of text instructions, ShowHowTo generates a multi-step sequence of images *conditioned on the scene in the REAL image*
🧡

05.12.2024 15:01 β€” πŸ‘ 18    πŸ” 3    πŸ’¬ 1    πŸ“Œ 1

So, now that our move to OpenAI became public, @kolesnikov.ch @xzhai.bsky.social and I are drowning in notifications. I read everything, but may not reply.

Excited about this new journey! πŸš€

Quick FAQ thread...

04.12.2024 21:23 β€” πŸ‘ 111    πŸ” 4    πŸ’¬ 15    πŸ“Œ 2

Ok, it is yesterdays news already, but good night sleep is important.

After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.

04.12.2024 09:14 β€” πŸ‘ 117    πŸ” 11    πŸ’¬ 9    πŸ“Œ 6
Post image

Optimal transport, convolution, and averaging define interpolations between probability distributions. One can find vector fields advecting particles that match these interpolations. They are the Benamou-Brenier, flow-matching, and Dacorogna-Moser fields.

04.12.2024 13:55 β€” πŸ‘ 77    πŸ” 11    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ€” Why do we extract diffusion features from noisy images? Isn’t that destroying information?

Yes, it is - but we found a way to do better. πŸš€

Here’s how we unlock better features, no noise, no hassle.

πŸ“ Project Page: compvis.github.io/cleandift
πŸ’» Code: github.com/CompVis/clea...

πŸ§΅πŸ‘‡

04.12.2024 23:31 β€” πŸ‘ 41    πŸ” 10    πŸ’¬ 2    πŸ“Œ 5

In arxiv.org/abs/2303.00848, @dpkingma.bsky.social and @ruiqigao.bsky.social had suggested that noise augmentation could be used to make other likelihood-based models optimise perceptually weighted losses, like diffusion models do. So cool to see this working well in practice!

02.12.2024 18:36 β€” πŸ‘ 52    πŸ” 11    πŸ’¬ 0    πŸ“Œ 0
Post image

A common question nowadays: Which is better, diffusion or flow matching? πŸ€”

Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.

02.12.2024 18:45 β€” πŸ‘ 254    πŸ” 58    πŸ’¬ 6    πŸ“Œ 7
Post image

Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)?

We have been pondering this during summer and developed a new model: JetFormer πŸŒŠπŸ€–

arxiv.org/abs/2411.19722

A thread πŸ‘‡

1/

02.12.2024 16:41 β€” πŸ‘ 155    πŸ” 36    πŸ’¬ 4    πŸ“Œ 7

Meanwhile @deep-mind.bsky.social is a fake account, was reported a week ago.. and is still up. πŸ™„

30.11.2024 20:52 β€” πŸ‘ 17    πŸ” 2    πŸ’¬ 2    πŸ“Œ 0

Can someone explain to me BlueSky’s business model. This will prolly make it clear what all the data policies amount to.

29.11.2024 18:57 β€” πŸ‘ 23    πŸ” 1    πŸ’¬ 5    πŸ“Œ 0

I noticed a lot of starter packs skewed towards faculty/industry, so I made one of just NLP & ML students: go.bsky.app/vju2ux

Students do different research, go on the job market, and recruit other students. Ping me and I'll add you!

23.11.2024 19:54 β€” πŸ‘ 176    πŸ” 54    πŸ’¬ 101    πŸ“Œ 4
Post image

Second Stage xkcd.com/3018

29.11.2024 13:39 β€” πŸ‘ 9720    πŸ” 718    πŸ’¬ 94    πŸ“Œ 46

@smithwinst0n is following 20 prominent accounts