Nasim Rahaman @nasimrahaman

I do wonder how much of that fat tail is lost in simulation, and what the downstream effects are. Model autophagy disorder is not something you’d want for models making life or death decisions.

07.11.2025 06:46 — 👍 1 🔁 0 💬 0 📌 0

The purpose of a document is to help an LLM answer questions about said document.

06.07.2025 15:33 — 👍 0 🔁 0 💬 0 📌 0

Imagine where we would be today if an overpowered Nazi Germany did not have US and USSR to counterbalance. Or just watch Man in the High Castle.

29.01.2025 22:40 — 👍 1 🔁 0 💬 0 📌 0

Dario Amodei — On DeepSeek and Export Controls On DeepSeek and Export Controls

Re: Dario’s post (darioamodei.com/on-deepseek-...) — the current state of the US has left me scared shitless about a unipolar world. A single, powerful pole with compromised institutions is the worst possible outcome for human civilization.

29.01.2025 22:36 — 👍 2 🔁 0 💬 1 📌 0

I’m low key disappointed r1 doesn’t swear in its CoT.

Tired: “Wait…”

Wired: “hol the fuck up”

25.01.2025 23:10 — 👍 1 🔁 0 💬 0 📌 0

And in that world, the quality of the weapon is how cheap it runs for how good it is.

I won’t be surprised if nation states eventually train and host their own models. Heck, some LLM shops seem to be betting on that, e.g. Mistral and AlephAlpha with “European LLMs”.

17.01.2025 22:56 — 👍 0 🔁 0 💬 0 📌 0

We’re approaching an era of memetic warfare where LLMs are the weapons. We’re not there yet — the values espoused by Chinese LLMs aren’t all that different from American ones — but that’s for now.

But once LLMs become our primary interface with the outside world, it’s bound to happen.

17.01.2025 22:56 — 👍 1 🔁 0 💬 1 📌 0

Towards Benchmarking LLM Diversity & Creativity Discussion of possible tasks to measure LLM capabilities in soft ‘creative’ tasks like brainstorming or editing, to quantify failures in creative writing domains.

An underappreciated takeaway from Gwern’s recent post (linked) is that LLMs are like vessels of human culture. They’re distribution channels for values.

gwern.net/creative-ben...

17.01.2025 22:56 — 👍 1 🔁 0 💬 1 📌 0

Looks nice! Some FastAPI endpoints + a docker image should help adoption. :)

17.01.2025 22:02 — 👍 0 🔁 0 💬 0 📌 0

The @jina.ai team is low key cracked. No yapping just shipping. 🫡

12.01.2025 22:20 — 👍 0 🔁 0 💬 0 📌 0

Finished the first season of Severance. Brb implementing memory for my LLM agents.

08.01.2025 22:50 — 👍 0 🔁 0 💬 0 📌 0

This is fascinating work, congratulations!

Question: the point that architectural constraints (locality + equivariance) are sufficient is well demonstrated.

But do you think it is necessary? I.e. would you expect a diffusion transformer to learn these constraints?

01.01.2025 10:41 — 👍 1 🔁 0 💬 0 📌 0

In other words, code 1 is more “multi-agent” than others.

What do I mean when I say “agent”? A system that we’d like to abstract away like a black box (Rovelli’s definition). Of that, I count three in code 1, and 1 in both codes 2 and 3.

21.12.2024 22:39 — 👍 2 🔁 0 💬 0 📌 0

Because code 1 is most explicit about the structure of the computational graph. :)

bsky.app/profile/nasi...

21.12.2024 22:29 — 👍 2 🔁 0 💬 1 📌 0

If I were to force an answer, I’d say code 1 (prompt chaining) has more agent energy than the others.

20.12.2024 23:35 — 👍 0 🔁 0 💬 1 📌 0

Claude is something special.

16.12.2024 22:39 — 👍 0 🔁 0 💬 0 📌 0

Large Concept Models: Language Modeling in a Sentence Representation Space | Research - AI at Meta LLMs have revolutionized the field of artificial intelligence and have emerged as the de-facto tool for many tasks. The current established technology of...

Paper here:

14.12.2024 05:43 — 👍 0 🔁 0 💬 0 📌 0

Have the token-level LLM predict “concept tokens”. The hidden states for these tokens go in to an adapter, and out come concept representations. Concept tokens attend to previous concept tokens, and perhaps also the span between itself and the previous concept token.

14.12.2024 05:43 — 👍 0 🔁 0 💬 1 📌 0

Very cool work from Meta AI: Large Concept Models. Idea: autoregress in the space of sentence level representations.

I think an interesting next step would be to layer this on conventional LLMs / token prediction models.

Here’s how that could work: ⤵️

14.12.2024 05:43 — 👍 1 🔁 0 💬 1 📌 0

GitHub - microsoft/markitdown: Python tool for converting files and office documents to Markdown. Python tool for converting files and office documents to Markdown. - microsoft/markitdown

Microsoft just released a tool that lets you convert Office files to Markdown. Never thought I'd see the day.

Google also added Markdown export to Google Docs a few months ago.

github.com/microsoft/markitdown

13.12.2024 20:25 — 👍 529 🔁 127 💬 24 📌 24

Alien Recombination: Exploring Concept Blends Beyond Human Cognitive Availability in Visual Art While AI models have demonstrated remarkable capabilities in constrained domains like game strategy, their potential for genuine creativity in open-ended domains like art remains debated. We explore t...

See paper for more.

+ Alejandro is at NeurIPS and figuring out where to do his PhD. Wink wink nudge nudge.

www.linkedin.com/in/alejandro...

13.12.2024 09:41 — 👍 0 🔁 0 💬 0 📌 0

Results? Good stuff.

🧵⤵️

13.12.2024 09:41 — 👍 0 🔁 0 💬 1 📌 0

The idea is to model two things:
(a) if concepts fit together to make good art, and
(b) if people have already thought about that combination of concepts (“cognitive availability”).

Seek out the combos for which (a) is true but (b) isn’t, and ask a text-to-image model to render that.

🧵⤵️

13.12.2024 09:41 — 👍 0 🔁 0 💬 1 📌 0

My MSc advisee (& gang) cooked.

tl:dr — a cute technique to get machines to be more creative.

🧵⤵️

13.12.2024 09:41 — 👍 1 🔁 0 💬 1 📌 0

If you aren’t already doing this: share that pdf with Claude.

07.12.2024 22:50 — 👍 0 🔁 0 💬 0 📌 0

On the Spectral Bias of Neural Networks Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy. In this work, we present properties of neural networks that ...

But it’s less true for diffusion models that are trained on a good chunk of their domain.

This means there’s less space for models to disagree. This is related to the point @preetumnakkiran.bsky.social was making around data augmentation in response to OP. See also: Figure 7 in attached paper.

01.12.2024 11:05 — 👍 1 🔁 0 💬 1 📌 0

Perhaps this has to do with the “size” of the training distribution’s support relative to the “size” of its domain.

Intuition: you can have two very different models agree with each other on a small subset of their domain. This is true for models trained on images (assuming manifold hypothesis).

01.12.2024 11:05 — 👍 1 🔁 0 💬 1 📌 0

I pivoted away from geospatial after being in a call where words like “battlespace” were thrown around, so I know what you mean. I wanted to prevent forest fires.

That said, I know good folks working on cell tracking to understand plant growth and maybe better food production.

Hard stuff.

30.11.2024 08:21 — 👍 1 🔁 0 💬 0 📌 0

🫡

27.11.2024 10:22 — 👍 0 🔁 0 💬 0 📌 0

Yep, I know exactly what you mean — also why I say they were light on this part.

They seem to have made the assumption that server can call back to the client and the client is expected to respond synchronously. This does leave a clue as to how the client is set up (= not in a fancy way).

27.11.2024 08:09 — 👍 1 🔁 0 💬 0 📌 0

Nasim Rahaman

Latest posts by nasimrahaman.bsky.social on Bluesky

@nasimrahaman is following 20 prominent accounts