I do wonder how much of that fat tail is lost in simulation, and what the downstream effects are. Model autophagy disorder is not something youβd want for models making life or death decisions.
07.11.2025 06:46 β π 1 π 0 π¬ 0 π 0@nasimrahaman.bsky.social
π¨βπ³ @ Tiptree Systems. Previously, deep learning & espresso slurping @ Mila and Max-Planck Institute for Intelligent Systems. Before that, physics and more deep learning @ Uni Heidelberg. πBerlin
I do wonder how much of that fat tail is lost in simulation, and what the downstream effects are. Model autophagy disorder is not something youβd want for models making life or death decisions.
07.11.2025 06:46 β π 1 π 0 π¬ 0 π 0The purpose of a document is to help an LLM answer questions about said document.
06.07.2025 15:33 β π 0 π 0 π¬ 0 π 0Imagine where we would be today if an overpowered Nazi Germany did not have US and USSR to counterbalance. Or just watch Man in the High Castle.
29.01.2025 22:40 β π 1 π 0 π¬ 0 π 0Re: Darioβs post (darioamodei.com/on-deepseek-...) β the current state of the US has left me scared shitless about a unipolar world. A single, powerful pole with compromised institutions is the worst possible outcome for human civilization.
29.01.2025 22:36 β π 2 π 0 π¬ 1 π 0Iβm low key disappointed r1 doesnβt swear in its CoT.
Tired: βWaitβ¦β
Wired: βhol the fuck upβ
And in that world, the quality of the weapon is how cheap it runs for how good it is.
I wonβt be surprised if nation states eventually train and host their own models. Heck, some LLM shops seem to be betting on that, e.g. Mistral and AlephAlpha with βEuropean LLMsβ.
Weβre approaching an era of memetic warfare where LLMs are the weapons. Weβre not there yet β the values espoused by Chinese LLMs arenβt all that different from American ones β but thatβs for now.
But once LLMs become our primary interface with the outside world, itβs bound to happen.
An underappreciated takeaway from Gwernβs recent post (linked) is that LLMs are like vessels of human culture. Theyβre distribution channels for values.
gwern.net/creative-ben...
Looks nice! Some FastAPI endpoints + a docker image should help adoption. :)
17.01.2025 22:02 β π 0 π 0 π¬ 0 π 0The @jina.ai team is low key cracked. No yapping just shipping. π«‘
12.01.2025 22:20 β π 0 π 0 π¬ 0 π 0Finished the first season of Severance. Brb implementing memory for my LLM agents.
08.01.2025 22:50 β π 0 π 0 π¬ 0 π 0This is fascinating work, congratulations!
Question: the point that architectural constraints (locality + equivariance) are sufficient is well demonstrated.
But do you think it is necessary? I.e. would you expect a diffusion transformer to learn these constraints?
In other words, code 1 is more βmulti-agentβ than others.
What do I mean when I say βagentβ? A system that weβd like to abstract away like a black box (Rovelliβs definition). Of that, I count three in code 1, and 1 in both codes 2 and 3.
Because code 1 is most explicit about the structure of the computational graph. :)
bsky.app/profile/nasi...
If I were to force an answer, Iβd say code 1 (prompt chaining) has more agent energy than the others.
20.12.2024 23:35 β π 0 π 0 π¬ 1 π 0Claude is something special.
16.12.2024 22:39 β π 0 π 0 π¬ 0 π 0Have the token-level LLM predict βconcept tokensβ. The hidden states for these tokens go in to an adapter, and out come concept representations. Concept tokens attend to previous concept tokens, and perhaps also the span between itself and the previous concept token.
14.12.2024 05:43 β π 0 π 0 π¬ 1 π 0Very cool work from Meta AI: Large Concept Models. Idea: autoregress in the space of sentence level representations.
I think an interesting next step would be to layer this on conventional LLMs / token prediction models.
Hereβs how that could work: ‡οΈ
Microsoft just released a tool that lets you convert Office files to Markdown. Never thought I'd see the day.
Google also added Markdown export to Google Docs a few months ago.
github.com/microsoft/markitdown
See paper for more.
+ Alejandro is at NeurIPS and figuring out where to do his PhD. Wink wink nudge nudge.
www.linkedin.com/in/alejandro...
Results? Good stuff.
π§΅β€΅οΈ
The idea is to model two things:
(a) if concepts fit together to make good art, and
(b) if people have already thought about that combination of concepts (βcognitive availabilityβ).
Seek out the combos for which (a) is true but (b) isnβt, and ask a text-to-image model to render that.
π§΅β€΅οΈ
My MSc advisee (& gang) cooked.
tl:dr β a cute technique to get machines to be more creative.
π§΅β€΅οΈ
If you arenβt already doing this: share that pdf with Claude.
07.12.2024 22:50 β π 0 π 0 π¬ 0 π 0But itβs less true for diffusion models that are trained on a good chunk of their domain.
This means thereβs less space for models to disagree. This is related to the point @preetumnakkiran.bsky.social was making around data augmentation in response to OP. See also: Figure 7 in attached paper.
Perhaps this has to do with the βsizeβ of the training distributionβs support relative to the βsizeβ of its domain.
Intuition: you can have two very different models agree with each other on a small subset of their domain. This is true for models trained on images (assuming manifold hypothesis).
I pivoted away from geospatial after being in a call where words like βbattlespaceβ were thrown around, so I know what you mean. I wanted to prevent forest fires.
That said, I know good folks working on cell tracking to understand plant growth and maybe better food production.
Hard stuff.
π«‘
27.11.2024 10:22 β π 0 π 0 π¬ 0 π 0Yep, I know exactly what you mean β also why I say they were light on this part.
They seem to have made the assumption that server can call back to the client and the client is expected to respond synchronously. This does leave a clue as to how the client is set up (= not in a fancy way).