Teemu Sarapisto @tsarpf - Bluesky Profile

Maybe I should learn to tag everything like #timeseries #highdimensional #clustering #subsystems 😁

23.11.2025 14:46 — 👍 1 🔁 0 💬 0 📌 0

This one looks way too AI slop'y, but somehow I like the full story.

23.11.2025 14:42 — 👍 0 🔁 0 💬 0 📌 0

This one explains the masking process quite well, unfortunately the powerplant segment is a bit unclear.

23.11.2025 14:40 — 👍 0 🔁 0 💬 1 📌 0

Infograph of my previous paper.

Generated via Nano Banana Pro. These models are getting quite useful! I think it has genuinely good explanations. Maybe I would've opted for a bit less "AI does this and that", and there are some small mistakes, but not too much to distract from the main message.

23.11.2025 14:39 — 👍 1 🔁 0 💬 2 📌 0

If we train LLMs to use browsers, I doubt it makes them 10x more useful, because they are dumb af 😄

My guess is that most progress in "agents" will be driven more by human developed LLM-friendly APIs rather than improvement of generalization capabilities in LLMs. No exponential speed-ups there.

08.04.2025 09:50 — 👍 1 🔁 0 💬 1 📌 0

Good point, a sloppy reply from me.

I just meant that language is not the end-all tool for everything. Sure, LLMs can be trained to use tools like calculators and browsers like us. But so far we need to develop those tools, and train the LLMs to use them.

08.04.2025 09:45 — 👍 1 🔁 0 💬 1 📌 0

What I expect is that scenarios that are particularly economically valuable will get neat automated solutions. |

Either via 1000 people annotating data for a year, or a bunch of scientists coming up with neat self-supervised losses for it 😆

04.04.2025 12:52 — 👍 1 🔁 0 💬 0 📌 0

It assumes algo dev to maintain the exponential progress.

IMO the (multimodal) LLM paradigm to handle everything in a single model will not scale. Language is a bad abstraction for 1) math (LLMs can't multiply) 2) physical things (where is my cleaning robot?)

End of the sigmoid for data/compute.

04.04.2025 12:50 — 👍 1 🔁 0 💬 2 📌 0

Nice to find you here then!

That'll be a difficult read, having limited background in dynamics/control/RL, but it's on the TODO.

Coming from ML, Neural ODEs got me hooked on dynamics and state spaces. Also variational math x optimal control is 🔥

Now learning basics from the book by Brunton/Kutz.

30.03.2025 16:55 — 👍 0 🔁 0 💬 1 📌 0

www.foundationalpapersincomplexityscience.org/tables-of-co...

has a nice overview of the papers.

For example
- The 1943 perceptron paper (neural nets)
- Landauer's principle (reversible computing)
- Info theory (Shannon's og paper)
- State space models (Kalman)
...Turing's AI, Nash equilibrium...

28.03.2025 22:22 — 👍 4 🔁 0 💬 1 📌 0

Just received the first volume, and damn, clearly a ton of effort was put into it!

There's a ~90 page intro to "foundations of complexity science" (which is also sold separately).

(The super interesting) papers each have a 5-10 page intro with historical context, and are full of annotations ❤️

28.03.2025 22:15 — 👍 12 🔁 2 💬 1 📌 1

I guess one could call this moving the goalpost so far that nothing will ever suffice 😁

27.03.2025 10:14 — 👍 1 🔁 0 💬 1 📌 0

"If intelligence lies in the process of acquiring new skills, there is no task X that skill at X demonstrates intelligence"

27.03.2025 10:12 — 👍 0 🔁 0 💬 1 📌 0

GitHub - computerhistory/AlexNet-Source-Code: This package contains the original 2012 AlexNet code. This package contains the original 2012 AlexNet code. - computerhistory/AlexNet-Source-Code

github.com/computerhist...

20.03.2025 23:02 — 👍 3 🔁 0 💬 0 📌 0

Ok, heh, well, partially the reason for the \infty in there is because the simulator got stuck in a non-chaotic loop due to integration errors. Grabbing the samples before the looping just gives a more uniform distribution

20.03.2025 18:46 — 👍 0 🔁 0 💬 0 📌 0

I was out of curiosity scatter plotting a double pendulum's x1/y1/x2/y2 positions in euclidean coordinates against each other.... Turns out they are sooo aesthetic 😍

20.03.2025 18:34 — 👍 3 🔁 0 💬 1 📌 0

1) In something like the 37th layer the model is (weighted) summing vectors which already have been combined with every other vector in the input sequence 36 times, + the effect of residual connections and multiple heads.
2) The tokens are (usually) not even full words to begin with. 2/2

10.03.2025 22:24 — 👍 2 🔁 0 💬 0 📌 0

Great visualizations, and excellent explanation of KV cache. But their intuitive reasoning about attention adding the meaning(s) of words to others is quite misleading. 1/2

10.03.2025 22:23 — 👍 1 🔁 0 💬 1 📌 0

Self-supervised contrastive learning performs non-linear system... Self-supervised learning (SSL) approaches have brought tremendous success across many tasks and domains. It has been argued that these successes can be attributed to a link between SSL and...

Two very interesting papers on identifiably (provably) recovering "true"/unique latent states + dynamics / time-evolution from high-dimensional data in ICLR2025:

1. VAE + nonlinear ICA-style aux variables
openreview.net/forum?id=d16...
2. Contrastive learning
openreview.net/forum?id=ONf...

05.03.2025 13:00 — 👍 1 🔁 0 💬 0 📌 0

Yeah, it's a bit too silent here, and the recommendation algorithm on bsky is not working great. The amount of clicking "show less like this" I'm doing is stupid.

Meanwhile, every time I check X I find a ton of interesting stuff, unfortunately also mixed with a lot of toxic bullshit as well.

05.02.2025 13:08 — 👍 3 🔁 0 💬 0 📌 0

What are you referring to? I've missed this.

05.02.2025 13:04 — 👍 0 🔁 0 💬 1 📌 0

Request and personal opinion: I would prefer if you focused less on the latest hype the AI swindlers are pushing out.

You have had unique angles for the physics stuff. While anyone with a brain can see, that even though OpenAI does very cool research, they are over-hyping every single release.

29.01.2025 09:38 — 👍 0 🔁 0 💬 0 📌 0

Basic idea of #DeepSeek R1, the new RL LLM model:

Start from a pretrained LLM, sample K responses for tasks (e.g., LeetCode problems with N tests). Update weights to favor better-than-average responses, as the true correct code is unknown. Also, reward correct output format.

#LLM #genAI #RL

28.01.2025 14:33 — 👍 3 🔁 0 💬 0 📌 0

And oh yeah, nice visualization! I really liked being able to compare the ELBO and log(z).

05.12.2024 13:56 — 👍 1 🔁 0 💬 1 📌 0

I've taken one course in Bayesian ML, so I barely know the basics 😄

But somehow the fact that there is no consistency/identifiability guarantees even with infinite data makes me afraid of VI 😅

2/2

05.12.2024 13:55 — 👍 1 🔁 0 💬 1 📌 0

IRL we don't know the shape of the true posterior (or log Z). In practice, when can you believe in the approximation enough to "dare" estimate uncertainty?

In practice, would you, e.g., try adding GMM components to boost ELBO? You’d need to keep everything else fixed for comparability, right?

1/2

05.12.2024 13:54 — 👍 1 🔁 0 💬 1 📌 0

For the past 2 years, every time I've tried to use jax-metal it has either refused to work at all due to features being unimplemented, or provided wrong results in a very simple test scenario. So I just use the CPU version on my M2...

05.12.2024 13:28 — 👍 1 🔁 0 💬 1 📌 0

Interested to see if a proper academic bsky will happen! The amount of great papers and memes I found on Twitter made me stick to it this long, but definitely something new would be nice.

22.11.2024 10:58 — 👍 1 🔁 0 💬 0 📌 0

Hi, lets try bsky!

New paper: Subsystem Discovery in High-Dimensional Time-Series Using Masked Autoencoders

Code/data/paper: github.com/helsinki-sda...

Presented at European Conference on Artificial Intelligence 2024

Map graph learned from weather #timeseries, adjacency from 7 engines!

22.11.2024 10:56 — 👍 2 🔁 0 💬 0 📌 0

Teemu Sarapisto

Latest posts by tsarpf.bsky.social on Bluesky

@tsarpf is following 20 prominent accounts