The aesthetics here are so spot on that I almost didn't read it --- not wanting to deal with any doomerish crap so early in the morning. But this is hilarious and it keeps on going.
11.09.2025 14:15 β π 117 π 38 π¬ 1 π 0@eyesrobson.bsky.social
PhD Candidate at UC Berkeley // y = mx + biology // bioethics, algorithmic fairness, and genomic AI // they/she π³οΈββ§οΈ
The aesthetics here are so spot on that I almost didn't read it --- not wanting to deal with any doomerish crap so early in the morning. But this is hilarious and it keeps on going.
11.09.2025 14:15 β π 117 π 38 π¬ 1 π 0one quick thing - I saw GUANinE listed under "...benchmarks that do not fine-tune gLMs" and was a bit confused ?
...a big chunk of that paper was about fine-tuning our hgT5 gLMs (it was actually the whole motivation for GUANinE -- tl;dr we saw strong gains in functional & conservation tasks)
just getting to this, but it looks awesome! π―
10.09.2025 02:37 β π 0 π 0 π¬ 1 π 0#MLCB2025 is tomorrow & Thursday with a fantastic lineup of keynotes & contributed talks www.mlcb.org/schedule. We'll be livestreaming through our YouTube channel www.youtube.com/@mlcbconf. Thanks to www.corteva.com, instadeep.com, the Simons Center at CSHL & NYGC for generous support!
10.09.2025 00:16 β π 9 π 2 π¬ 0 π 0Screenshot from Andrew Ng's The Batch Newsletter, which reads "Gemini's Environmental Impact Measured" before stating Google's claim that its "environmental footprint [is smaller] than previous estimates had led it to expect. Not show is that the "research" ignored most of the actual necessary training & infrastructure to produce such models, which is known to increase the carbon emissions by two orders of magnitude (i.e. The Verge's 2023 analysis, which is mentioned briefly in The Batch's newsletter)
breaking: ExxonMobil announces "it's actually carbon-neutral" after adjusting for factors in the petroleum refinement process "outside of its control"
03.09.2025 20:03 β π 1 π 0 π¬ 0 π 0Public Access > Open Access
20.05.2025 01:41 β π 50 π 11 π¬ 0 π 0For the next Mystery AI Hype Theater 3000, @alexhanna.bsky.social and I are looking forward to discussing border and immigration technologies with @petramolnar.com
Join us as we navigate these heavy issues with our ridicule as praxis:
Monday, May 5, noon Pacific
www.twitch.tv/dair_institute
Deeply honored to receive the 2025 NAACP-Archewell Digital Civil Rights Award. The fight for rights, opportunity, and equity in AI and tech is critical. AI's promise isn't inevitable - it must be actively stewarded to build a digital future that is just and sustainable.
naacp.org/articles/naa...
nope lol π
using all the params in an LM is hard. In genonics I would expect it to conform to extracting features for augmentation (i.e. an LM feature in CADD), just like in protein LMs
www.nature.com/articles/s41...
the GPRA task was mostly for thoroughness & lack of alternatives at the time -- I designed the GUANinE benchmark with Nilah back in 2021 before lots of human large N, high-throughput methods emerged
however, our follow-up preprint correlates it with model "quality" as Basenji2 < Enformer < Borzoi
a proper use for these models in genomics would more likely be preliminary exploration, annotation, and variant calling correction
(but a huge part of the funding & dev pipeline is forbiopharma and variant interpretation, not basic science)
can't disagree!
the original use case for ELMO and other NLP LMs was pretraining ultra-high parameter models in the absence of large-scale supervised data. genomics only has this absence on novel organisms in genbank, not humans
www.ncbi.nlm.nih.gov/genbank/stat...
I'm optimistic they'll find they're niches... eventually.. although I expect the field to take a while to figure out how to structure tasks in a scalable way that genomic LMs would succeed at
(e.g. Borzoi's 32 bp RNA-seq vs Xpresso's historical approach of one-gene-is-one-example)
love seeing this critique of genomic LMs!
although I've seen pretty strong evidence to suggest they work well on certain tasks like conservation or cCRE recognition, e.g. ~ proceedings.mlr.press/v240/robson2...
(obviously depends on the model, the task... and how predictions are made :) )
I feel moved to write about one of my technological heroes, possibly the most influential on my youth as an 8-bit hacker: a trans woman named Sophie Wilson. en.m.wikipedia.org/wiki/Sophie_...
25.01.2025 00:59 β π 133 π 37 π¬ 7 π 5A figure of four stacked bar plots. Each bar plot sums to the R2 variance explained by a (model x task) combination, for the models Enformer and Borzoi. The two GUANinE tasks are dnase-propensity (accessibility, left two) and cons30 (conservation, right two). The blue portion of each bar is uniquely captured variance from the deep models, while the orange portion of each bar is shared variance (i.e. these models have absorbed useful signal from these factors, which include chromosomal variables like chromosome size). Nearly invisible, save for the rightmost (Borzoi, cons30) task, is the green, interpretable variable (IV) only portion of each bar -- this means Enformer and Borzoi have extracted all of the useful signal, subsuming these features entirely, despite never having seen them during sequence-only training.
we also present an undervalued interpretability approach, which decomposes model variance explained into 'interpretable variables' like GC-content etc.
Borzoi and Enformer capture deeper features than the ones we test out, even surprisingly cryptic chromosomal features from sequence alone
A 3x2 table of six plots, each showing the L2-regularized few-shot performance of Borzoi on a different GUANinE task, with the x-axes showing different bin averaging sizes (corresponding to 32 to 416bp). Most plots show performance slightly up and to the right, except for the top-left plot for the dnase-propensity task (an accessibility task). Enformer's 3-bin performance is also present in each plot, usually below Borzoi's at the same x-axis resolution (384 bp), except for the two conservation tasks (center), where Enformer significantly outperforms Borzoi.
we also examined the accuracy of Borzoi at different bin averaging scales -- for the region-based tasks of GUANinE, more bins = better perf., aside from the accessibility task
31.01.2025 21:26 β π 1 π 0 π¬ 1 π 0updated preprint of Enformation Theory out this week!
we propose a technique benchmarking deep sequence-to-function models in genomics using Enformer and Borzoi on the large-scale GUANinE benchmark
tl;dr -- Borzoi is good, yes, but far from the best at everything
www.biorxiv.org/content/10.1...
uc berkeley represent
(ai-generated misinformation)
this just further emphasizes that the biggest opportunities for advancing public health is *increasing access to existing medicines*
be it through single-payer systems (e.g. medicare for all) or publicly developed and distributed medicines
reminded today of the ever-increasing difficulty of creating new medicines, as this 2019 article states --
"the drugs that are easiest to find... have all been found; what is left is hunting for drugs that address problems with complex and elusive solutions..."
www.nature.com/articles/d41...
R code of install.packages('car') -- a common statistical package... named 'car' ... yikes
walkable city?? this code ain't walkable!!
15.02.2024 22:51 β π 5 π 0 π¬ 2 π 0