eyes robson's Avatar

eyes robson

@eyesrobson.bsky.social

PhD Candidate at UC Berkeley // y = mx + biology // bioethics, algorithmic fairness, and genomic AI // they/she πŸ³οΈβ€βš§οΈ

89 Followers  |  263 Following  |  16 Posts  |  Joined: 03.11.2023  |  2.0078

Latest posts by eyesrobson.bsky.social on Bluesky

The aesthetics here are so spot on that I almost didn't read it --- not wanting to deal with any doomerish crap so early in the morning. But this is hilarious and it keeps on going.

11.09.2025 14:15 β€” πŸ‘ 117    πŸ” 38    πŸ’¬ 1    πŸ“Œ 0

one quick thing - I saw GUANinE listed under "...benchmarks that do not fine-tune gLMs" and was a bit confused ?

...a big chunk of that paper was about fine-tuning our hgT5 gLMs (it was actually the whole motivation for GUANinE -- tl;dr we saw strong gains in functional & conservation tasks)

10.09.2025 02:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

just getting to this, but it looks awesome! πŸ’―

10.09.2025 02:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
MLCB - Schedule The in-person component will be held at the New York Genome Center, 101 6th Ave, New York, NY 10013.

#MLCB2025 is tomorrow & Thursday with a fantastic lineup of keynotes & contributed talks www.mlcb.org/schedule. We'll be livestreaming through our YouTube channel www.youtube.com/@mlcbconf. Thanks to www.corteva.com, instadeep.com, the Simons Center at CSHL & NYGC for generous support!

10.09.2025 00:16 β€” πŸ‘ 9    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Screenshot from Andrew Ng's The Batch Newsletter, which reads "Gemini's Environmental Impact Measured" before stating Google's claim that its "environmental footprint [is smaller] than previous estimates had led it to expect. 

Not show is that the "research" ignored most of the actual necessary training & infrastructure to produce such models, which is known to increase the carbon emissions by two orders of magnitude (i.e. The Verge's 2023 analysis, which is mentioned briefly in The Batch's newsletter)

Screenshot from Andrew Ng's The Batch Newsletter, which reads "Gemini's Environmental Impact Measured" before stating Google's claim that its "environmental footprint [is smaller] than previous estimates had led it to expect. Not show is that the "research" ignored most of the actual necessary training & infrastructure to produce such models, which is known to increase the carbon emissions by two orders of magnitude (i.e. The Verge's 2023 analysis, which is mentioned briefly in The Batch's newsletter)

breaking: ExxonMobil announces "it's actually carbon-neutral" after adjusting for factors in the petroleum refinement process "outside of its control"

03.09.2025 20:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Public Access > Open Access

20.05.2025 01:41 β€” πŸ‘ 50    πŸ” 11    πŸ’¬ 0    πŸ“Œ 0
Preview
dair_institute - Twitch Twitch account for The Distributed AI Research Institute (DAIR).

For the next Mystery AI Hype Theater 3000, @alexhanna.bsky.social and I are looking forward to discussing border and immigration technologies with @petramolnar.com

Join us as we navigate these heavy issues with our ridicule as praxis:

Monday, May 5, noon Pacific
www.twitch.tv/dair_institute

02.05.2025 18:00 β€” πŸ‘ 31    πŸ” 12    πŸ’¬ 3    πŸ“Œ 2
Preview
NAACP Announces Dr. Alondra Nelson as the Recipient of the 2025 NAACP – Archewell Digital Civil Rights Award

Deeply honored to receive the 2025 NAACP-Archewell Digital Civil Rights Award. The fight for rights, opportunity, and equity in AI and tech is critical. AI's promise isn't inevitable - it must be actively stewarded to build a digital future that is just and sustainable.
naacp.org/articles/naa...

21.02.2025 06:42 β€” πŸ‘ 338    πŸ” 46    πŸ’¬ 27    πŸ“Œ 8
Preview
Learning protein fitness models from evolutionary and assay-labeled data - Nature Biotechnology A simple machine learning algorithm combines evolutionary and experimental data for improved protein fitness prediction.

nope lol πŸ˜†

using all the params in an LM is hard. In genonics I would expect it to conform to extracting features for augmentation (i.e. an LM feature in CADD), just like in protein LMs

www.nature.com/articles/s41...

04.02.2025 18:47 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

the GPRA task was mostly for thoroughness & lack of alternatives at the time -- I designed the GUANinE benchmark with Nilah back in 2021 before lots of human large N, high-throughput methods emerged

however, our follow-up preprint correlates it with model "quality" as Basenji2 < Enformer < Borzoi

04.02.2025 18:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

a proper use for these models in genomics would more likely be preliminary exploration, annotation, and variant calling correction

(but a huge part of the funding & dev pipeline is forbiopharma and variant interpretation, not basic science)

04.02.2025 18:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
GenBank and WGS StatisticsTwitterFacebookLinkedInGitHubNCBI Insights BlogTwitterFacebookYoutube

can't disagree!

the original use case for ELMO and other NLP LMs was pretraining ultra-high parameter models in the absence of large-scale supervised data. genomics only has this absence on novel organisms in genbank, not humans

www.ncbi.nlm.nih.gov/genbank/stat...

04.02.2025 18:11 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

I'm optimistic they'll find they're niches... eventually.. although I expect the field to take a while to figure out how to structure tasks in a scalable way that genomic LMs would succeed at

(e.g. Borzoi's 32 bp RNA-seq vs Xpresso's historical approach of one-gene-is-one-example)

04.02.2025 04:32 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
GUANinE v1.0: Benchmark Datasets for Genomic AI Sequence-to-Function Models Computational genomics increasingly relies on machine learning methods for genome interpretation, and the recent adoption of neural sequence-to-function models highlights the need for rigorous mode...

love seeing this critique of genomic LMs!

although I've seen pretty strong evidence to suggest they work well on certain tasks like conservation or cCRE recognition, e.g. ~ proceedings.mlr.press/v240/robson2...

(obviously depends on the model, the task... and how predictions are made :) )

04.02.2025 04:27 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Preview
Sophie Wilson - Wikipedia

I feel moved to write about one of my technological heroes, possibly the most influential on my youth as an 8-bit hacker: a trans woman named Sophie Wilson. en.m.wikipedia.org/wiki/Sophie_...

25.01.2025 00:59 β€” πŸ‘ 133    πŸ” 37    πŸ’¬ 7    πŸ“Œ 5
A figure of four stacked bar plots. Each bar plot sums to the R2 variance explained by a (model x task) combination, for the models Enformer and Borzoi. The two GUANinE tasks are dnase-propensity (accessibility, left two) and cons30 (conservation, right two). The blue portion of each bar is uniquely captured variance from the deep models,  while the orange portion of each bar is shared variance (i.e. these models have absorbed useful signal from these factors, which include chromosomal variables like chromosome size). Nearly invisible, save for the rightmost (Borzoi, cons30) task, is the green, interpretable variable (IV) only portion of each bar -- this means Enformer and Borzoi have extracted all of the useful signal, subsuming these features entirely, despite never having seen them during sequence-only training.

A figure of four stacked bar plots. Each bar plot sums to the R2 variance explained by a (model x task) combination, for the models Enformer and Borzoi. The two GUANinE tasks are dnase-propensity (accessibility, left two) and cons30 (conservation, right two). The blue portion of each bar is uniquely captured variance from the deep models, while the orange portion of each bar is shared variance (i.e. these models have absorbed useful signal from these factors, which include chromosomal variables like chromosome size). Nearly invisible, save for the rightmost (Borzoi, cons30) task, is the green, interpretable variable (IV) only portion of each bar -- this means Enformer and Borzoi have extracted all of the useful signal, subsuming these features entirely, despite never having seen them during sequence-only training.

we also present an undervalued interpretability approach, which decomposes model variance explained into 'interpretable variables' like GC-content etc.

Borzoi and Enformer capture deeper features than the ones we test out, even surprisingly cryptic chromosomal features from sequence alone

31.01.2025 21:35 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
A 3x2 table of six plots, each showing the L2-regularized few-shot performance of Borzoi on a different GUANinE task, with the x-axes showing different bin averaging sizes (corresponding to 32 to 416bp). Most plots show performance slightly up and to the right, except for the top-left plot for the dnase-propensity task (an accessibility task). Enformer's 3-bin performance is also present in each plot, usually below Borzoi's at the same x-axis resolution (384 bp), except for the two conservation tasks (center), where Enformer significantly outperforms Borzoi.

A 3x2 table of six plots, each showing the L2-regularized few-shot performance of Borzoi on a different GUANinE task, with the x-axes showing different bin averaging sizes (corresponding to 32 to 416bp). Most plots show performance slightly up and to the right, except for the top-left plot for the dnase-propensity task (an accessibility task). Enformer's 3-bin performance is also present in each plot, usually below Borzoi's at the same x-axis resolution (384 bp), except for the two conservation tasks (center), where Enformer significantly outperforms Borzoi.

we also examined the accuracy of Borzoi at different bin averaging scales -- for the region-based tasks of GUANinE, more bins = better perf., aside from the accessibility task

31.01.2025 21:26 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Enformation Theory: A Blueprint for Evaluating Deep Learning Models in Genomics The nascent field of genomic AI is rapidly expanding with new models, benchmarks, and findings. As the field diversifies, there is an increased need for a common set of measurement tools and perspecti...

updated preprint of Enformation Theory out this week!

we propose a technique benchmarking deep sequence-to-function models in genomics using Enformer and Borzoi on the large-scale GUANinE benchmark

tl;dr -- Borzoi is good, yes, but far from the best at everything

www.biorxiv.org/content/10.1...

31.01.2025 21:20 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

uc berkeley represent

(ai-generated misinformation)

24.05.2024 01:12 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

this just further emphasizes that the biggest opportunities for advancing public health is *increasing access to existing medicines*

be it through single-payer systems (e.g. medicare for all) or publicly developed and distributed medicines

23.05.2024 18:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Hunting for New Drugs with AI The pharmaceutical industry is in a drug-discovery slump. How much can AI help? The pharmaceutical industry is in a drug-discovery slump. How much can AI help?

reminded today of the ever-increasing difficulty of creating new medicines, as this 2019 article states --

"the drugs that are easiest to find... have all been found; what is left is hunting for drugs that address problems with complex and elusive solutions..."

www.nature.com/articles/d41...

23.05.2024 18:25 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
R code of install.packages('car') -- a common statistical package... named 'car' ... yikes

R code of install.packages('car') -- a common statistical package... named 'car' ... yikes

walkable city?? this code ain't walkable!!

15.02.2024 22:51 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

@eyesrobson is following 20 prominent accounts