Josh Susskind's Avatar

Josh Susskind

@kindsuss.bsky.social

Ramen whisperer, bad throat singer

70 Followers  |  239 Following  |  35 Posts  |  Joined: 18.11.2024
Posts Following

Posts by Josh Susskind (@kindsuss.bsky.social)

Cogsci peeps! This is a great opportunity! @sineadwilliamson.bsky.social is a great mentor and scientist, along with a wonderful team :)

07.11.2025 22:15 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

We have been working with Michal Klein on pushing a module to train *flow matching* models using JAX. This is shipped as part of our new release of the OTT-JAX toolbox (github.com/ott-jax/ott)

The tutorial to do so is here: ott-jax.readthedocs.io/tutorials/ne...

05.11.2025 14:04 β€” πŸ‘ 13    πŸ” 7    πŸ’¬ 1    πŸ“Œ 0
Sharded Sinkhorn β€” ott 0.5.1.dev34+g3462f28 documentation

scaling up the computation of optimal transport couplings to hundreds of thousands of 3k dimensional vectors made easy using sharding and OTT-JAX! check this notebook, it only takes a few lines of code thanks to JAX's native sharding abilities ott-jax.readthedocs.io/en/latest/tu...

01.08.2025 00:13 β€” πŸ‘ 14    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Wow. Thank you for your bravery whoever you are.

08.06.2025 06:07 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

How about "Machine Learning and Computer Science"? MLCS. 😁

05.05.2025 19:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
A Call for Constructive Engagement | AAC&U A Call for Constructive Engagement

What say you @stanford.edu?
www.aacu.org/newsroom/a-c...

23.04.2025 04:40 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Check out our Apple research work on scaling laws for native multimodal models! Combined with mixtures of experts, native models develop both specialized and multimodal representations! Lots of rich findings and opportunists for follow up research!

11.04.2025 22:37 β€” πŸ‘ 6    πŸ” 5    πŸ’¬ 0    πŸ“Œ 1
https://arxiv.org/pdf/2502.18435

Paper link: t.co/Z2FZ6YSpbA
Code/Model checkpoints: t.co/bXYHZOONOm

24.03.2025 17:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

My colleagues in #Apple ML Research posted a fun paper investigating how autoregressive design choices affect reasoning (in this case, multi-choice question answering), showing a benefit to R2L ordering. Reminds me of similar findings for reverse order addition in arxiv.org/abs/2310.16028!

24.03.2025 17:47 β€” πŸ‘ 10    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

Permanent Hellcountry is a badass name for a band! Too bad it's also us. stranger than fiction.

02.03.2025 18:25 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

My colleague Shuangfei Zhai is looking for a summer research intern to work on improving TarFlow at Apple. If interested, send your CV to szhai at apple.com by this week.

25.02.2025 01:36 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1
A Sad Moment in American History
YouTube video by Senator Bernie Sanders A Sad Moment in American History

youtu.be/rKBM2kS6B8o

Thank you @sanders.senate.gov for speaking up

20.02.2025 05:12 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Is there an article associated with this thread?

14.02.2025 21:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Here's a great paper on scaling laws for teacher-student neural network distillation led by @dbusbridge.bsky.social and Apple colleagues. I've often seen people struggle to get distillation working well enough in practical settings, and I expect the insights in this paper can really help!

14.02.2025 03:30 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Here's a fun Apple research paper seeking to understand when/why diffusion models can be composed to generate images containing multiple independent concepts. For example, composing images from a model trained on Preetum's dog and a model trained on hats. Because why wouldn't you want to do that?!!

12.02.2025 04:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Yeah! that's what I said when I saw it too :) Better than any dog-horse I could make!

11.02.2025 17:58 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If you are interested in doing an internship in ML research at Apple, I highly recommend talking with Etai Littwin (and Vimal Thilak is pretty awesome too!)

11.02.2025 03:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I think it's really important for more of this kind of work to be published openly rather than be walled off due to corporate greed -- scientific inquiry benefits us all. Hopefully we will continue to see lots and lots more of this!

29.01.2025 05:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This work was born from an Apple internship with Harshay Shah. Samira provided excellent direction and technical contributions along with Vimal, and the entire team was incredibly helpful! I'm intrigued reading comprehension tasks do not follow pre-training scaling curves -- gotta follow this up!

29.01.2025 05:17 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Missing the deep learning part? go check out the follow up work @neuripsconf.bsky.social (tinyurl.com/yvf72kzf) and @iclr-conf.bsky.social (tinyurl.com/4vh8vuzk)

23.01.2025 08:45 β€” πŸ‘ 11    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Too disgusted by the Twitter/X vomit and could not justify keeping my account there. Hoping this platform steers clear of disinformation and hate -- and remains a positive place to share science and other good things.

23.01.2025 22:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Here's a really cool cross-institution study leveraging optimal transport techniques developed by my Apple ML Research colleagues! It's great to see basic research in machine learning translate into scientific tools like this. Cuts into the AI hype a bit ;)

23.01.2025 21:54 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Excited about vision-language models? πŸš€ Check out our latest work on FastVLM, a new family of efficient vision-language models that balances the tradeoff between high-resolution image understanding and latency without compromising accuracy!

arxiv.org/abs/2412.13303

19.12.2024 18:18 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 6    πŸ“Œ 0

If you're looking for research scientist roles in Europe, check out Marco's post! The Paris team is fantastic, and does diverse idea-driven and impactful research. In addition, MLR is highly collaborative across timezones, so you'd have a chance to work with many others too.

18.12.2024 17:14 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Apple Machine Learning Research at NeurIPS 2024 Apple researchers are advancing the field of ML through fundamental research that improves the world’s understanding of this technology and…

Last but not least, please check out the flurry of papers being presented at #NeurIPS2024 , highlighted here in this post machinelearning.apple.com/research/neu... that showcases work from many teams at Apple and their academic collaborators.

Thanks for making it to the end ;-)

10.12.2024 23:53 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

EC-IJEPA makes the JEPA approach less brittle, and also further unlocks its use in diverse planning and reasoning tasks that leverage pre-tained visual representations as a world model. We're excited to see others build on this work with us!
12/n

10.12.2024 23:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
x.com

Returning to the theme of powerful visual representation learning, please check out Vimal Thilak, Etai Littwin, and Anand Gopalakrishnan's EC-IJEPA paper, on improving JEPA models with spatial conditioning:
x.com/AggieInCA/st...
11/n

10.12.2024 23:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

WVD is impressive because it enables a range of downstream 3D tasks including 3D view synthesis and depth estimation at inference time by training a good generative model of RGB + XYZ values. Many directions to follow up on here including modeling dynamics!
10/n

10.12.2024 23:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
x.com

Moving from images to video and 3D generation, I'm also excited to highlight Jiatao Gu and collaborators' work on WVD (world video diffusion), which jointly models multi-view images and 3D geometry: x.com/thoma_gu/sta...
9/n

10.12.2024 23:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

ASFT is exciting because it shows a single architecture can generate high resolution high quality data and scale well across data domains (we applied it to images and 3D point clouds in this work), removing the need for domain-specific architectural priors.
8/n

10.12.2024 23:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0