 
                        
                Why Busy Beaver Hunters Fear the Antihydra
                In which I explore the biggest barrier in the busy beaver game. What is Antihydra, what is the Collatz conjecture, how are they connected, and what makes them so daunting?
            
        
    
    
            I published a new post on my rarely updated personal blog! It's a sequel of sorts to my Quanta coverage of the Busy Beaver game, focusing on a particularly fearsome Turing machine known by the awesome name Antihydra.
               
            
            
                27.10.2025 16:04 β π 33    π 7    π¬ 2    π 3                      
            
         
            
        
            
            
            
            
            
    
    
            
                             
                        
                Alexander horned sphere - Wikipedia
                
            
        
    
    
            The video is a fun watch but if you don't check it out, the counterexample I'm referring to is the Alexander horned sphere
en.wikipedia.org/wiki/Alexand...
               
            
            
                26.10.2025 13:43 β π 0    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
            
            
            
                YouTube video by Metamorphic
                The Most Obvious Theorem in All of Mathematics
            
         
    
    
            I'm a big fan of when counterexamples to intuitive sounding theorems are so convoluted that outside of mathematics, the kind of rules-lawyering needed to construct them would be considered pedantic or even rude 
youtu.be/pLgcZLysOFk?...
               
            
            
                26.10.2025 13:39 β π 1    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            It's interesting that its blend of coherence and unexpected nonsense capture a dreamy quality that most things described as "dreamlike" don't quite match for me
               
            
            
                24.10.2025 17:30 β π 1    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
            
                             
                        
                Blocky Planet β Making Minecraft Spherical
                Discover the unique design challenges of creating a spherical planet out of Minecraft-like blocks.
            
        
    
    
            How do you make Minecraft spherical? A really fun read about all the problem solving that goes into transferring Minecraft gameplay onto a spherical world
www.bowerbyte.com/posts/blocky...
               
            
            
                21.10.2025 19:35 β π 1    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
            
            
            
                YouTube video by Welch Labs
                What the Books Get Wrong about AI [Double Descent]
            
         
    
    
            Really nice primer on double-descent and the bias-variance trade-off
I'm impressed by the depth that Welch Labs consistently manages to pack into their videos without sacrificing the storytelling for a popular science audience
www.youtube.com/watch?v=z64a...
               
            
            
                20.10.2025 09:16 β π 2    π 1    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            How best to define "open problem" depends largely on whether you start off with a problem metric or just a problem topology
               
            
            
                19.10.2025 10:56 β π 2    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
            
                        
                Crinkled Arcs And Brownian Motion
                A crinkled arc is a continuous curve that appears as if it is making right-angle turns at every point along its trajectory. Additionally, if you draw a straight line between two recent points and comp...
            
        
    
    
            Since @spmontecarlo.bsky.social shared a post about Crinkled Arcs a couple of weeks ago, I've spent a fair bit of time digging into them
This new blog post attempts to collect what I found into an interactive introduction to crinkled arcs and their relationship to Brownian Motion
               
            
            
                14.10.2025 11:25 β π 1    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            For me, it's roughly:
Monday: Odd
Tuesday: Odd
Wednesday: Even if you were just thinking about Tue/Thu, Odd if you were just thinking about Monday
Thursday: Odd
Friday: Same as Wednesday
Saturday: Even
Sunday: Weakly even
               
            
            
                14.10.2025 10:37 β π 3    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
            
            
            
                YouTube video by Naviary
                Mate-in-Omega, The Great Phenomenon of Infinite Chess
            
         
    
    
            A surprisingly intuitive and practical use for limit ordinals (assuming you have an infinite chess board handy)
youtu.be/CQ4Ap5itTX4?...
               
            
            
                11.10.2025 16:07 β π 1    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                            https://en.wikipedia.org/wiki/Infinite-dimensional_vector_function#Crinkled_arcs
                                                
    
    
    
    
            Answering my own question, the example from the  "Infinite-dimensional vector function" Wikipedia page makes it feel clear that a curve with these properties will exist
Even so, without more thought, this feels more like it side-steps the issue in my intuition than directly tackles it
               
            
            
                28.09.2025 10:30 β π 1    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Wait... what?! 
Even though continuity means that the arcs have to exist in a countable-basis subspace, it feels so unintuitive that this gives room for a curve with uncountably many orthogonal chord pairs set up like this
Do you have any intuition on how you reconcile these two things?
               
            
            
                27.09.2025 14:19 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            βEveryone knowsβ what an autoencoder isβ¦ but there's an important complementary picture missing from most introductory material.
In short: we emphasize how autoencoders are implementedβbut not always what they represent (and some of the implications of that representation).π§΅
               
            
            
                06.09.2025 21:20 β π 69    π 10    π¬ 2    π 1                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            It seems chronological for me too with the caveat that reposts are shown according to when they were reposted but display the time elapsed since original posting
               
            
            
                19.08.2025 21:31 β π 2    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Does anyone have a good reference for paradoxes in set theory? I'm looking for something self-contained.
               
            
            
                14.08.2025 14:01 β π 51    π 9    π¬ 2    π 2                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            This is what I've pieced together from a couple of hours of reading... If there's anything I've missed or got wrong, let me know!
               
            
            
                07.08.2025 16:29 β π 0    π 0    π¬ 0    π 0                      
            
         
            
        
            
        
            
            
            
            
            
    
    
    
    
            This turns out to be a special case of the attention sink as implemented in the new GPT models, as explained in this paper:
arxiv.org/pdf/2309.174...
               
            
            
                07.08.2025 16:29 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
            
                             
                        
                Attention Is Off By One
                Letβs fix these pesky Transformer outliers using Softmax One and QuietAttention.
            
        
    
    
            This idea seems to have originated in a blog post by Evan Miller, which is a really nice read and suggests that just adding +1 to the denominator of the softmax could solve the extreme value issue:
 www.evanmiller.org/attention-is...
               
            
            
                07.08.2025 16:29 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            But when you have mostly very small values and one or two extremely large ones, this make quantisation much lossier
The solution is essentially to add an extra element to the input sequence that the attention heads can "sink" their attention into but is removed from subsequent calculations
               
            
            
                07.08.2025 16:29 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
        
            
            
            
            
                                                 
                                            The code for the attention mechanism in the `transformers` implementation of gpt_oss
                                                
    
    
    
    
            Looking at the code in Hugging Face's library for the new GPT models, I was a bit disappointed by how similar they are to Llama & Mistral models, but there is one cool trick I hadn't seen before: attention sinks. These are a mechanism by which attention heads can say "I don't have anything to add"
               
            
            
                07.08.2025 16:29 β π 2    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            There seems to be a lot of nuance around exactly how to translate "lemma": it can be anything between a premise and or more literally as something like "I take", as in something taken for granted. But "proposition" is nice in that it fits in both contexts
               
            
            
                22.07.2025 13:01 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            A little etymological fact that I like is that "lemma" and "dilemma" are from the same ancient Greek origin
If you translate "lemma" as meaning a proposition, a dilemma is literally having two propositions to consider
               
            
            
                22.07.2025 12:59 β π 2    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            There are of course some details and caveats discussed in the blog:
echostatements.github.io/posts/2025/0...
You can also find code for the experiments on Github:
github.com/EchoStatemen...
9/9 π§΅
               
            
            
                18.07.2025 12:19 β π 0    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                            Graph from start of thread repeated, showing effects of faster model and faster training
                                                
    
    
    
    
            And even better, these tricks are not mutually exclusive, by doing both simultaneously, you get a 2.5x speed up (dependent on batch size)
8/9 π§΅
               
            
            
                18.07.2025 12:17 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                            Graph showing training speed of classic vs faster training (~250 seconds vs ~150 seconds)
                                                
    
    
    
    
            Again, this gets some pretty significant speed-up
7/9 π§΅
               
            
            
                18.07.2025 12:17 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                            Image showing how paired samples can be offset by one in a batch to make unpaired samples
                                                
    
    
    
    
            Optimisation 2:
When training Siamese networks, people tend to generate matching/non-matching pairs in equal ratio. However, you can train more efficiently if you generate only matching pairs, then creating the non-matching ones with some shifting of subnetwork outputs.
6/9 π§΅
               
            
            
                18.07.2025 12:17 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                            A diagram of a faster implementation of a Siamese network vs a standard one
                                                         
                                            Comparison of inference  and throughput for classic vs faster network (1.35 seconds for classic, 0.91 seconds for faster)
                                                
    
    
    
    
            Optimisation 1: 
In practice, when implementing these networks there is only one subnetwork, called twice, once for each input 
But by stacking the inputs, we actually only need to make one call to the network:
Depending on batch size, the effect can be significant 
5/9 π§΅
               
            
            
                18.07.2025 12:17 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            This is useful if you're doing, say, facial recognition. You  train the network to learn whether pairs of images are the same person or different people
Then you can recognise someone not in your training set by providing a reference image for that person along with your test image
4/9 π§΅
               
            
            
                18.07.2025 12:17 β π 0    π 0    π¬ 1    π 0                      
            
         
    
         
        
            
        
                            
                    
                    
                                            philosophy PhD. historical materialist. free Palestine. trans rights. Chicago based. all takes my own. podcast @leftofphilosophy.bsky.social https://leftofphilosophy.com/ https://gilmorejon.wordpress.com
                                     
                            
                    
                    
                                            EM @ Netflix Security. Encinitas CA. Dogs, sports, gaming, home improvement, food. he/him
                                     
                            
                    
                    
                                            Postdoc @univie.ac.at
Researches the Milky Way & star clusters with machine learning
Founded the Astronomy feeds (@astronomy.blue)
π³οΈβπ π³οΈββ§οΈ (she/her), β
Website: https://emily.space
GitHub: https://github.com/emilyhunt
                                     
                            
                    
                    
                                            Senior Research Scientist at Google DeepMind. Views my own.
                                     
                            
                    
                    
                                            Markup Language for Recipes and Tools http://cooklang.org/
                                     
                            
                    
                    
                                            Computer science staff writer @quantamagazine.bsky.social, ex-physicist. More about me at benbrubaker.com. Banner art by Nico Roper β find more of their work at nicoroper.com. [Obligatory disclaimer about views being my own.]
                                     
                            
                    
                    
                                            Research Associate in Digital Health at the VIVO Hub for Enhanced Independent Living (https://www.thevivohub.com/) at the University of Bristol
Co-organiser of the Data Ethics Club (https://dataethicsclub.com/)
Recovering mathematician 
he/him
                                     
                            
                    
                    
                                            Interests on bsky: ML research, applied math, and general mathematical and engineering miscellany. Also: Uncertainty, symmetry in ML, reliable deployment; applications in LLMs, computational chemistry/physics, and healthcare. 
https://shubhendu-trivedi.org
                                     
                            
                    
                    
                                            Associate professor of Physics and Astronomy at UNC Chapel Hill studying young exoplanets and stars. Dad to one human and one cat. π³οΈβπππ. Carrboro Citizen. http://andrewwmann.com
                                     
                            
                    
                    
                                            AI Architect | North Carolina | AI/ML, IoT, science
WARNING: I talk about kids sometimes
                                     
                            
                    
                    
                                            the future is a nation we will become citizens of together
founder & frontend dev @ Dreamtime
                                     
                            
                    
                    
                                            Mathematics and Philosophy of the Infinite
Professor of Logic, University of Notre Dame
University of Oxford
#InfinitelyMore #BookOfInfinity #PanoramaOfLogic #PhilMaths
https://buymeacoffee.com/joeldavidhamkins
                                     
                            
                    
                    
                                            Senior Lecturer #USydCompSci at the University of Sydney. Postdocs IBM Research and Stanford; PhD at Columbia. Converts β into puns: sometimes theorems. He/him.
                                     
                            
                    
                    
                                             Economist | Complex Systems | Agent-based modeling | PhD | hojimat.com
                                     
                            
                    
                    
                                            Mathematical Oncologist at the Integrated Mathematical Oncology department (@mathonco.bsky.social) at the Moffitt Cancer Center (@moffittnews.bsky.social). 
Married to the lovely @parmvir.com. 
On mastodon with @david@fediscience.org
                                     
                            
                    
                    
                                            Streaming at twitch.tv/caseyexplosion
Youtube: youtube.com/@CaseyExplosion
                                     
                            
                    
                    
                                            e-tsundoku, supplementary info: nlab fan account, arxiv surveyor, pubmed enjoyer, two culture bridger, vacuous high gossiper, dearth of any domain expertise, reluctant g theorist, gpu poor  
                                     
                            
                    
                    
                                            Machine Learning Professor
https://cims.nyu.edu/~andrewgw
                                     
                            
                    
                    
                                            I contain multitudes, today I am a glowering yet rocking realistic poser with a warm love of chaos
                                     
                            
                    
                    
                                            Research scientist at Anthropic. 
PhD in machine learning from the University of Toronto and Vector Institute. 
Prev: NVIDIA, Google