Marco Cuturi's Avatar

Marco Cuturi

@marcocuturi.bsky.social

machine learning researcher @ Apple machine learning research

760 Followers  |  58 Following  |  25 Posts  |  Joined: 09.12.2023  |  2.2848

Latest posts by marcocuturi.bsky.social on Bluesky

๐Ÿ“ข Weโ€™re looking for a researcher in in cogsci, neuroscience, linguistics, or related disciplines to work with us at Apple Machine Learning Research! We're hiring for a one-year interdisciplinary AIML Resident to work on understanding reasoning and decision making in LLMs. ๐Ÿงต

07.11.2025 21:19 โ€” ๐Ÿ‘ 8    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
On Fitting Flow Models with Large Sinkhorn Couplings Flow models transform data gradually from one modality (e.g. noise) onto another (e.g. images). Such models are parameterized by a time-dependent velocity field, trained to fit segments connecting pai...

We also introduce two coupling approaches advocated this summer to improve FM training: using either very large sharp Sinkhorn couplings (arxiv.org/abs/2506.05526) or, even better, semidiscrete couplings (arxiv.org/abs/2509.25519), as proposed with Alireza Mousavi-Hosseini and
@syz.bsky.social

05.11.2025 14:04 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

We have been working with Michal Klein on pushing a module to train *flow matching* models using JAX. This is shipped as part of our new release of the OTT-JAX toolbox (github.com/ott-jax/ott)

The tutorial to do so is here: ott-jax.readthedocs.io/tutorials/ne...

05.11.2025 14:04 โ€” ๐Ÿ‘ 11    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image Post image

Afternoon talks by:
@marcocuturi.bsky.social
Elena Agliari
Jan Gerken

Thanks all for the great talks, conversations, and engagement! Fingers crossed we get to host this event a 4th time next year and see many of you back in Gothenburg ๐Ÿคž๐Ÿ‡ธ๐Ÿ‡ช

29.10.2025 20:58 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

๐Ÿš€ Excited to share LinEAS, our new activation steering method accepted at NeurIPS 2025! It approximates optimal transport maps e2e to precisely guide ๐Ÿงญ activations achieving finer control ๐ŸŽš๏ธ with โœจ less than 32 โœจ prompts!

๐Ÿ’ปhttps://github.com/apple/ml-lineas
๐Ÿ“„https://arxiv.org/abs/2503.10679

21.10.2025 10:00 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

It's that time of the year! ๐ŸŽ

The Apple Machine Learning Research (MLR) team in Paris is hiring a few interns, to do cool research for ยฑ6 months ๐Ÿš€๐Ÿš€ & work towards publications/OSS.

Check requirements and apply: โžก๏ธ jobs.apple.com/en-us/detail...

Moreโ“โ†’ โœ‰๏ธ mlr_paris_internships@group.apple.com

17.10.2025 13:07 โ€” ๐Ÿ‘ 7    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

While working on semidiscrete flow matching this summer (โžก๏ธ arxiv.org/abs/2509.25519), I kept looking for a video illustrating that the velocity field solving the Benamou-Brenier OT problem is NOT constant w.r.t. time โณ... so I did it myself, take a look! ott-jax.readthedocs.io/tutorials/th...

09.10.2025 20:09 โ€” ๐Ÿ‘ 10    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

LLMs are currently this one big parameter block that stores all sort of facts. In our new preprint, we add context-specific memory parameters to the model, and pretrain the model along with a big bank of memories.

๐Ÿ“‘ arxiv.org/abs/2510.02375

[1/10]๐Ÿงต

06.10.2025 16:06 โ€” ๐Ÿ‘ 12    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Wow! Finally OT done on the entire training set to train a diffusion model!

04.10.2025 07:03 โ€” ๐Ÿ‘ 12    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Then there's always ๐œ€ regularization. When ๐œ€=โˆž, we recover vanilla FM. At this point we're not completely sure whether ๐œ€=0 is better than ๐œ€>0, they both work! ๐œ€=0 has a minor edge in larger scales (sparse gradients, faster assignment, slightly better metrics), but ๐œ€>0 is also useful (faster SGD)

04.10.2025 11:21 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thanks for the nice comments! my interpretation is that we're using OT to produce pairs (x_i,y_i) to guide FM. With that, it's up to you to provide an inductive bias (a model) that gets f(x)~=y while generalizing. The hard OT assignment could be that model, but it would fail to generalize.

04.10.2025 11:21 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

for people that like OT, IMHO the very encouraging insight is that we have evidence that the "better" you solve your OT problem, the more flow matching metrics improve, this is Figure 3

04.10.2025 08:45 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Thanks @rflamary.bsky.social! yes, exactly. We try to summarize this tradeoff in Table 1, in which we show that for a one-off preprocessing cost, we now get all (noise,data) pairings you might need during flow matching training for "free" (up to the MIPS lookup for each noise).

04.10.2025 08:44 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Flow Matching with Semidiscrete Couplings Flow models parameterized as time-dependent velocity fields can generate data from noise by integrating an ODE. These models are often trained using flow matching, i.e. by sampling random pairs of noi...

the paper is out: arxiv.org/abs/2509.25519

Michal also did a fantastic push to open source the semidiscrete solver prepared by Stephen and Alireza in the OTT-JAX library. We plan to open source the flow pipeline in JAX soon. Please reach out if interested!

03.10.2025 21:02 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

This much faster than using Sinkhorn, and generates with higher quality.

As a bonus, you can forget about entropy regularization (set ฮต=0), apply things like correctors to guidance, and use it on consistency-type models, or even with conditional generation.

03.10.2025 21:00 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

the great thing with SD-OT is that this only needs to be computed once. You only need to store a real number per data sample. You can precompute these numbers once & for all using stochastic convex optimization.

When training a flow model, you assign noise to data using these numbers.

03.10.2025 20:56 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

In practice, however, this idea only begins to work when using massive batch sizes (see arxiv.org/abs/2506.05526). The problem is that the costs of running Sinkhorn on millions of points can quickly balloon...

Our solution? rely on semidiscrete OT at scales that were never considered before.

03.10.2025 20:56 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Our two phenomenal interns, Alireza Mousavi-Hosseini and Stephen Zhang @syz.bsky.social have been cooking some really cool work with Michal Klein and me over the summer.

Relying on optimal transport couplings (to pick noise and data pairs) should, in principle, be helpful to guide flow matching

๐Ÿงต

03.10.2025 20:50 โ€” ๐Ÿ‘ 30    ๐Ÿ” 7    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Preview
The A recent paper from Apple researchers,

New Apple #ML Research Highlight: The "Super Weight:" How Even a Single Parameter can Determine an #LLM's Behavior machinelearning.apple.com/research/the...

21.08.2025 18:13 โ€” ๐Ÿ‘ 5    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

you're right that the PCs' message uses space as a justification to accept less papers, but it does not explicitly mention that the acceptance rate should be lower than the historical standard of 25%. In my SAC batch, the average acceptance before their email was closer to 30%, but that's just me..

29.08.2025 11:32 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

I see it a bit differently. The new system pushed reviewers aggressively to react to rebuttals. I think this is a great change, but this has clearly skewed results, creating many spurious grade upgrades. Now the system must be rebalanced in the other direction by SAC/AC for results to be fair..

29.08.2025 07:05 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Sharded Sinkhorn โ€” ott 0.5.1.dev34+g3462f28 documentation

scaling up the computation of optimal transport couplings to hundreds of thousands of 3k dimensional vectors made easy using sharding and OTT-JAX! check this notebook, it only takes a few lines of code thanks to JAX's native sharding abilities ott-jax.readthedocs.io/en/latest/tu...

01.08.2025 00:13 โ€” ๐Ÿ‘ 15    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
FastVLM: Efficient Vision Encoding for Vision Language Models Vision Language Models (VLMs) enable visual understanding alongside textual inputs. They are typically built by passing visual tokens from aโ€ฆ

New Apple #ML Research Highlight: "FastVLM: Efficient Vision Encoding for Vision Language Models" machinelearning.apple.com/research/fas...

23.07.2025 18:35 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

So pleased and proud to share with you what our team has been up to, on an ambitious journey to build a video foundation model for scientific domains ! โœจ ๐Ÿš€ ๐ŸŽž๏ธ ๐Ÿงช #ICCV2025 #AI4Science

08.07.2025 11:28 โ€” ๐Ÿ‘ 10    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Standard error of what now? The NeurIPS checklist corroborates the bureaucratic theory of statistics.

The NeurIPS paper checklist corroborates the bureaucratic theory of statistics.

03.07.2025 14:40 โ€” ๐Ÿ‘ 64    ๐Ÿ” 13    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 6
Video thumbnail

Can LLMs access and describe their own internal distributions? With my colleagues at Apple, I invite you to take a leap forward and make LLM uncertainty quantification what it can be.
๐Ÿ“„ arxiv.org/abs/2505.20295
๐Ÿ’ป github.com/apple/ml-sel...
๐Ÿงต1/9

03.07.2025 09:08 โ€” ๐Ÿ‘ 21    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Shun-ichi Amari | Kyoto Prize Shun-ichi Amari

www.kyotoprize.org/en/laureates...

29.06.2025 19:56 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
็”˜ๅˆฉ ไฟŠไธ€ ๆ „่ช‰็ ”็ฉถๅ“กใŒใ€Œไบฌ้ƒฝ่ณžใ€ใ‚’ๅ—่ณž ็”˜ๅˆฉ ไฟŠไธ€ๆ „่ช‰็ ”็ฉถๅ“ก๏ผˆๆœฌๅ‹™๏ผšๅธไบฌๅคงๅญฆ ๅ…ˆ็ซฏ็ทๅˆ็ ”็ฉถๆฉŸๆง‹ ็‰นไปปๆ•™ๆŽˆ๏ผ‰ใฏใ€ไบบๅทฅใƒ‹ใƒฅใƒผใƒฉใƒซใƒใƒƒใƒˆใƒฏใƒผใ‚ฏใ€ๆฉŸๆขฐๅญฆ็ฟ’ใ€ๆƒ…ๅ ฑๅนพไฝ•ๅญฆๅˆ†้‡Žใงใฎๅ…ˆ้ง†็š„ใช็ ”็ฉถใŒ่ฉ•ไพกใ•ใ‚Œใ€็ฌฌ40ๅ›ž๏ผˆ2025๏ผ‰ไบฌ้ƒฝ่ณž๏ผˆๅ…ˆ็ซฏๆŠ€่ก“้ƒจ้–€ใ€€ๅ—่ณžๅฏพ่ฑกๅˆ†้‡Ž๏ผšๆƒ…ๅ ฑ็ง‘ๅญฆ๏ผ‰ใ‚’ๅ—่ณžใ—ใพใ—ใŸใ€‚

Shunichi Amari has been awarded the 40th (2025) Kyoto Prize in recognition of his pioneering research in the fields of artificial neural networks, machine learning, and information geometry

www.riken.jp/pr/news/2025...

20.06.2025 13:26 โ€” ๐Ÿ‘ 35    ๐Ÿ” 12    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

NEW PAPER ALERT: Recent studies have shown that LLMs often lack robustness to distribution shifts in their reasoning. Our paper proposes a new method, AbstRaL, to augment LLMsโ€™ reasoning robustness, by promoting their abstract thinking with granular reinforcement learning.

23.06.2025 14:32 โ€” ๐Ÿ‘ 7    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
https://www.interspeech2025.org/tutorials Your cookies are disabled, please enable them.

Now that @interspeech.bsky.social registration is open, time for some shameless promo!

Sign-up and join our Interspeech tutorial: Speech Technology Meets Early Language Acquisition: How Interdisciplinary Efforts Benefit Both Fields. ๐Ÿ—ฃ๏ธ๐Ÿ‘ถ

www.interspeech2025.org/tutorials

โฌ‡๏ธ (1/2)

27.05.2025 16:14 โ€” ๐Ÿ‘ 9    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

@marcocuturi is following 20 prominent accounts