But if depth creates inductive bias towards low rank solutions, and this bias is helpful for learning task competence, it feels like that should somehow affect the local weight landscape. Any thoughts?
Really cool work! I'm curious whether you see any connection to your earlier results on low-rank bias scaling with depth. I see that Fig 8 shows that depth + random init alone isn't enough to give task competence with RandOpt.
Thank you!
@nsaphra.bsky.social I want to learn esoteric details about training run variance, where do you recommend I start?
PAPER OUT ✨ How can we make smart microscopy more interoperable? What are the technical and cultural challenges? 30+ people from academia and industry propose a roadmap: doi.org/10.1515/mim-... Also a review of applications and repo of implementations. Join the discussion! smartmicroscopy.github.io
Bringing old school info theory into modern day imaging with @henrypinkard.bsky.social @lakabuli.bsky.social : eecs.berkeley.edu/2026/01/lens...
A Comment discusses how commands, data, and metadata currently discarded by scientific instruments could be used to train AI systems to learn to conduct experiments. @henrypinkard.bsky.social @nilsnorlin.bsky.social
www.nature.com/articles/s41...
Thanks! If it ever seems useful/relevant to your work, I'd be happy to chat about it.
Would be a fascinating result if the structure of human-created language contains most of the information for its semantic meaning, and yet humans aren't able to understand it in isolation!
If you're able to generate a large enough dataset of BLANKed out passages, just using the cross entropy bound from training another transformer could probably get pretty close to joint entropy (if I'm thinking about this right).
Me too?
And I wonder how this would compare to the complement: ablating structure but preserving content words. Thoughts?
Really cool paper. My intuition upon reading: Jabberwockified text must maintain low Kolmogorov complexity conditional on the retained structure, which implies the grammar itself IS a standalone message.
I also recently developed a technique that could be applied for similar experiments for the blurry image case:
bsky.app/profile/henr...
In theory one could quantitatively test this by comparing the -log( p(the passage)) with conditional entropy of BLANK words given the known structure. It might be that this decoding is in fact quite easy from an information theoretic perspective, but perhaps just uniquely unintuitive to humans
Hey, really cool paper. I especially like the blurry, upside down image of text experiment.
I wonder about disambiguating whether this phenomenon arises from LLMs being able to decode with very little information vs. that particular decoding just being ill-suited to human priors.
Excited to share our new paper on the future of autonomous scientific laboratory work (together with
@henrypinkard.bsky.social
). Perhaps the path to intelligent scientific instruments starts with rethinking what data we save ?
www.nature.com/articles/s41...
rdcu.be/eW7SU
Most scientific instruments throw away exactly the data AI would need to learn how to operate them.
In @natmethods.nature.com this month, @nilsnorlin.bsky.social and I describe in how capturing this data could let us train AI to run experiments like expert scientists.
doi.org/10.1038/s415...
And a big thankyou to @annalenakofler.bsky.social for design inspiration!
Come see the poster in person this Wednesday at #NeurIPS2025!
Joint work with @lakabuli.bsky.social, Eric Markley, Tiffany Chien, Jiantao Jiao @optrickster.bsky.social
Information estimation predicts the performance of algorithms performing downstream tasks using measurements across all tested domains, meaning optimal designs can be found without the complexity, compute, and considerations of downstream processing.
On the other hand, noise processes in imaging systems are well understood or can be easily measured, reducing the second entropy to a straightforward calculation.
For the measurements, we fit a probabilistic model to upper bound their true entropy.
The key is to separately estimate and diversity of the measurements and the diversity of the noise process.
And it is differentiable, enabling the automated discovery of new designs in simulation.
It is field-deployable since it does not require knowledge of the objects being imaged.
It can be broadly applied across diverse imaging systems in photography, microscopy, and astronomy.