Ah, I ran into something very similar yesterday with an async "find or insert" cache.  The first caller canceled the request while the insert future was in progress (dropped the future) and that cache key was forever blocked.
               
            
            
                31.10.2025 19:27 β π 1    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            Conclusion of a little halloween tradition.  If I'm going to traumatize the kids it might as well be interesting.
               
            
            
                31.10.2025 15:39 β π 2    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
            
                             
                        
                Announcing Columnar
                Back to the future of data connectivity
            
        
    
    
            The future of data connectivity is columnar. Today we launched 
@columnar.tech to accelerate the shift from slow, row-oriented APIs like ODBC and JDBC to >10x faster alternatives powered by @arrow.apache.org. Learn more π
               
            
            
                29.10.2025 22:51 β π 29    π 7    π¬ 0    π 4                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Nice definition!  This matches my use.  I also usually have a touch of "please don't hate me I'm doing my best"
               
            
            
                29.10.2025 20:05 β π 2    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            A bittersweet story but glad to see a principled stance!
               
            
            
                27.10.2025 19:11 β π 2    π 2    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            However - a word that exists because someone decided we aren't allowed to start a sentence with "but"
               
            
            
                27.10.2025 13:18 β π 1    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                            The awkward monkey puppet meme with the text "Well..." from a maintainer of Lance, a lake house format that might just happen to be what the author is describing...
                                                
    
    
    
    
               
            
            
                17.10.2025 18:01 β π 0    π 0    π¬ 0    π 0                      
            
         
            
        
            
        
            
            
            
            
            
    
    
    
    
            Douglas squirrels are 1/3 the size of gray squirrels but six times more ferocious.
               
            
            
                10.10.2025 15:52 β π 0    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            I suspect this will change as caching layers become more mature.  The selectivity threshold for cloud storage is something like "one in a million" but more like "one in a thousand" for NVMe.
Also, a self-promotional shout out that you might want to look at lance (lancedb.github.io/lance/format...)
               
            
            
                08.10.2025 21:46 β π 2    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            They do a bit of both.  The base model is unsupervised and is generally described as "learning the language".  The model is then fine tuned with supervision for a specific task.
The "suck up as much data as you can" is for the first part.
               
            
            
                07.10.2025 23:37 β π 1    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Yesterday, OP responded to my 11 year old comment on their 13 year old post with a pedantic correction.
               
            
            
                07.10.2025 11:26 β π 1    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Though I think the "we can't change Parquet" problem is a bit of a false problem.  90% of Parquet users are probably fine to just keep using Parquet.  I'm not sure I agree that "the long time archival format" and the "database storage format" need to be the same thing.
               
            
            
                03.10.2025 21:38 β π 2    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            That might be next week's blog post ;).  Short answer is I see it as a table format problem and not a file format problem.  Change "decoder" to "file reader".  Change "stored in the page" to "stored in a folder on the table" and change "wasm" to "pluggable" (native or wasm).
               
            
            
                03.10.2025 21:38 β π 2    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Hope this helps, it's fun to see so much exciting innovation in a space that's been relatively quiet for many years!
               
            
            
                03.10.2025 17:18 β π 0    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            F3 is from a joint project between CMU and Tsinghua University.  They have tackled the "forwards compatibility" problem by storing WASM decoders with the data so that old readers can read data written by futuristic writers.
               
            
            
                03.10.2025 17:18 β π 0    π 0    π¬ 2    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            FastLanes comes from CWI.  They're the group that's designed some of the new lightweight compression algorithms (e.g. FSST).  They definitely focus on compression and they likely have the best layout for processing data already in memory.
               
            
            
                03.10.2025 17:18 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Vortex comes from SpiralDB.  They've done a good job explaining what they do and writing about it.  They've made a big focus on compression but, especially, on pushing down compute to run against compressed data.
               
            
            
                03.10.2025 17:18 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Nimble comes from Meta, and there has sadly not been much written about it publicly.  The best I can say at the moment is that Nimble has made perhaps the biggest emphasis of all the formats on extremely wide schemas (again, all formats have done some here).
               
            
            
                03.10.2025 17:18 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            I work on Lance!  So I'm most biased here. We focus on balancing random access and full scans.  All formats have focused on better random access / large data, but none to the extent that we have, especially for tensors / embeddings.
               
            
            
                03.10.2025 17:18 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Lot's of work being done on columnar file formats lately.  I count 5 new formats so far (Lance, Nimble, Vortex, FastLanes, F3).
It's definitely something we follow at LanceDB and it can be confusing to track.  So here is my very biased head-canon (trying to stay positive)
               
            
            
                03.10.2025 17:18 β π 7    π 1    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Newest house mate is an industrious spider that spends every day building a beautiful web right at eye level so I can blearily walk face first into it every morning.
               
            
            
                03.10.2025 15:38 β π 0    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Son got mad and told me he wouldn't take me to the creamery when I died.  I have some questions.
               
            
            
                14.08.2025 01:42 β π 3    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Discussion points so far...
Should we slap `urn:` in the front so that users get a free parser?
Should the coordinates be repeated in the file itself?
               
            
            
                13.08.2025 17:49 β π 1    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            We're trying to figure out "Substrait coordinates" (e.g. organization, name, version tuples) for Substrait functions.  Is anyone out there actually passionate about the topic or have any lessons or advice?
At the moment, leaning towards `organization:name:version` (and forbid colon in each field)
               
            
            
                13.08.2025 17:48 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Hmm, it shouldn't be _that_ slow.  DuckDb is going to do one query to get column values (O(N), pretty fast) and another with a "case when" for each possible value (O(C*N)).  I wonder if there's some optimization opportunity for hundreds of "case when" statements collapsing into a dict lookup.
               
            
            
                12.08.2025 00:53 β π 1    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            768 is very common, probably the most common I see from users.  128 is still around but rare.  One user even has 1536.
               
            
            
                09.08.2025 15:21 β π 3    π 0    π¬ 1    π 0                      
            
         
            
        
            
        
            
            
            
            
            
    
    
    
    
            Tike to double check my reservation
               
            
            
                05.08.2025 23:51 β π 1    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Sounds like you got yourself a new DIY project π
               
            
            
                04.08.2025 19:04 β π 0    π 0    π¬ 1    π 0                      
            
         
    
         
        
            
        
                            
                    
                    
                                            I do webcomics for a living & I love it
 (they/she/he)
π gay&sweet: gogetaroomie.com (finished)
πΏ quests&demons: headlessbliss.com (hiatus)
π queer&bugs: littletinythings.com (ongoing)
π₯ harlivy smut: clovercomics.itch.io
email: clover@tuta.com
                                     
                            
                    
                    
                                            Superciliously super silly.
                                     
                            
                    
                    
                                            Researcher, advisor, writer, formal verification eng @ Confluent.
Everything data (dist sys, databases, messaging, data eng/analytics). 
https://jack-vanlightly.com, https://www.hotds.dev
Credit: ESO/B. Tafresh
                                     
                            
                    
                    
                                            π¦ Social Media Manager @ π πΌπΌπ―β’
π¬ DM me for Opportunitie$$$
π join The Party! π moob.app
                                     
                            
                    
                    
                                            doer of data things at Yardhura Walani
vice president of statsoc.org.au
I do #rstats, surveys, biostatistics, #IDSov
on Ngunnawal and Ngambri lands
not Indigenous, non-binary (they/them) 
benharrap.com
                                     
                            
                    
                    
                                            Epidemiologist. Research Fellow. Doctor of Spreadsheets. Writer (Slate, TIME, Guardian, etc).  PhD, MPH. Host of senscipod Email gidmk.healthnerd@gmail.com he/him. Find my writing on Substack and Medium.
                                     
                            
                    
                    
                                            Aspiring wastrel, applied econometrician. At http://rachaelmeager.com for bayes, dev econ and meta science. Also at http://rottenandgood.substack.com for writing, art, death and emotions. Gay academic nonbinary weirdo, cursed to be serious in life.
                                     
                            
                    
                    
                                            Prof at UniMelb. I'm a computational cognitive scientist studying human inference, learning, information systems, culture, and (mis/dis)info. Nerd & opinionated loudmouth in Oz, originally from America, citizen of both. Parent of two. π³οΈββ§οΈ perfors.net
                                     
                            
                    
                    
                                            economics-talking guy, R enthusiast
                                     
                            
                    
                    
                                            numbers gremlin. 
i invented ravioli
β’ do NOT contact me with unsolicited services or offers
                                     
                            
                    
                    
                                            Professor of Statistics, Monash University, Australia. FAA, FASSA. Interested in #forecasting, #rstats, #statistics. he/him http://robjhyndman.com
                                     
                            
                    
                    
                                            Data scientist and R engineer. Also Python and Julia. Accidental econometrician. I mostly post about R, sometimes Georgism/YIMBYism and Warhammer. Melbourne, Australia.
                                     
                            
                    
                    
                                            Statistical Data Artist & Scientist at Australian National University π³πΏ Interested in mixed models, experimental design, plant breeding π±, bioinfo π§¬, data vis π, statistical practice, ML/AI π©π»βπ» Dabbles in UI, UX, front-end & #rstats π¦ dev 
emitanaka.org
                                     
                            
                    
                    
                                            stats consultant and PhD student in Epidemiology & Biostatistics (multiple imputation, causal inference, clinical trials) @ University of Melbourne. always graph your data. also runs, bikes, hikes, etc. he/him #BiInSci π³οΈβπ
https://cameronpatrick.com/
                                     
                            
                    
                    
                                            Professor. Statistical graphics, EDA, data science, open source and R. Gender equity. she/her
                                     
                            
                    
                    
                                            Statistician at Sydney University.
                                     
                            
                    
                    
                                            Statistician (she/her) | #rstats | Posts primarily about statistical ecology and fisheries (and pet chickens)
                                     
                            
                    
                    
                                            Australia's most serious professor. CPO Emeritus. 
Personal views only. 
                                     
                            
                    
                    
                                            ML & Privacy Prof at the University of Melbourne, Australia. Deputy Dean Research. Prev Microsoft Research, Berkeley EECS PhD. @bipr on the X bird site. He/him.