Eric Kernfeld's Avatar

Eric Kernfeld

@ekernf01.bsky.social

Statistician and computational biologist; uw alum; jhu student. He/him. http://ekernf01.github.io

1,308 Followers  |  422 Following  |  315 Posts  |  Joined: 19.09.2023
Posts Following

Posts by Eric Kernfeld (@ekernf01.bsky.social)

Preview
Against Malaria: My nets I've just donated to The Against Malaria Foundation, would you do your part in saving lives too?

Happy New Year. I have just donated $9,000 to malaria prevention.

www.againstmalaria.com/MyNets.aspx?...

30.12.2025 22:59 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

This is so friggin cool. You don't have to rewrite all that FORTRAN to do autodiff on it. "Enzyme"

12.10.2025 21:00 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I love using the UKB RAP. As soon as I see that ยฃ, my brain goes "I shall keep-safe this file upon the clouds and it shall cost you nigh thruppence fortnightly." innit

19.09.2025 22:49 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Failed to weed out LLM-generated applications for bioinformatics role | Olga Sazonova, Ph.D. posted on the topic | LinkedIn Well, that's it. Hiring is f*cked. I'm just opened a contract role for a bioinformatics project. I know it's a wild market at the moment - each open role gets flooded with resumes, making it hard to separate qualified candidates from the noise. I'm convinced LLMs are partially to blame. So I devised a process to select qualified, committed candidates and avoid generic applications. Instead of resumes and cover letters, I created an intake questionnaire requiring bespoke effort to reduce cookie-cutter applications. The questions focused on specific scenarios from past experience, and therefore shouldn't be directly outsourced to chatbots. I also accidentally made the application link hard to access, but decided not to fix it. After all, computational biology often requires hacking and clever work-arounds. Finally, I included an honor code asking people not to use LLMs. Initially, I felt good about my approach. I had 20 applications after two days, and the candidates were differentiating themselves: A few didn't answer all questions, a few didn't have the right background, and several candidates provided well-written, topical, plausible answers. It was only after I read a few of these "high signal" applications in a row that the alarm bells started ringing: - Multiple candidates highlighted the same methodological paper - Many cited required skills in the exact same order - A high percentage reported identical troubleshooting scenarios involving differential gene expression studies with lab-derived batch effects The nail in the coffin was repeated phrases like "my initial hypothesis was a bug in [popular tool]" and "the spurious result disappeared." Clearly, candidates were using LLMs. Instead of learning about fitness for the role, I was learning how LLMs answer my "clever" questions. So...I failed. My approach was naive, perhaps even hypocritical. After all, I use LLMs for technical documentation myself. Still, it's really disappointing. I'm no closer to a hiring process that identifies qualified, motivated, and trustworthy candidates for remote work. Now what, y'all? | 109 comments on LinkedIn

For people that have tried to hire someone recently, did you encounter this problem?
www.linkedin.com/posts/olga-v...

30.08.2025 13:54 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

All parrots are stochastic.

16.08.2025 12:21 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Found it!!

13.08.2025 15:45 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Sankey diagram of Eric's job search efforts and opportunities. Source data: 

Months unemployed [1] Biking
Months unemployed [1] Stressing
Months unemployed [1] Applying 
Months unemployed [2] Networking
Months unemployed [0] Waiting to joke about "came in a fluffer"

Jobs applied to (no referral) [64] No follow up
Jobs applied to (with referral) [4] No follow up
Jobs applied to (with referral) [3] Interview
Jobs applied to (no referral) [1] Interview
Info interviews conducted [30] No follow up
Info interviews conducted [2] Interview
Credible unsolicited inquiries [2] Interview
Credible unsolicited inquiries [3] Location mismatch

Interview [2] Job offer

Sankey diagram of Eric's job search efforts and opportunities. Source data: Months unemployed [1] Biking Months unemployed [1] Stressing Months unemployed [1] Applying Months unemployed [2] Networking Months unemployed [0] Waiting to joke about "came in a fluffer" Jobs applied to (no referral) [64] No follow up Jobs applied to (with referral) [4] No follow up Jobs applied to (with referral) [3] Interview Jobs applied to (no referral) [1] Interview Info interviews conducted [30] No follow up Info interviews conducted [2] Interview Credible unsolicited inquiries [2] Interview Credible unsolicited inquiries [3] Location mismatch Interview [2] Job offer

Job Search Recap
ekernf01.github.io/job_search_2...

13.08.2025 15:26 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Thanks to everyone who has supported me in large and small ways during this uncertain interval.

13.08.2025 15:24 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

4) I (constructively) peer-reviewed two papers; I wrote ten blog posts; I conducted about 30 informational interviews; and I applied to about 65 jobs. It's tough out there.

13.08.2025 15:24 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Picture of Eric and his bike in front of tons of unstable rock at the mouth of the Paw-Paw Tunnel, part of the C&O Canal Towpath national park

Picture of Eric and his bike in front of tons of unstable rock at the mouth of the Paw-Paw Tunnel, part of the C&O Canal Towpath national park

3) I rode my bike from Washington, DC to Pittsburgh and back, through rain, mud, stiff headwinds, dozens of downed trees, and a couple of sub-freezing nights.

13.08.2025 15:23 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

2) I defended my PhD. Ask me if you want the slides and the recording! I somehow lost track of the cute photo I was going to show. :[ Will post in replies if I can find it.

13.08.2025 15:21 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

1) I have accepted a job at Alden Scientific working to improve the world's best methods for predicting long-term health.

13.08.2025 15:20 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I am proud to announce a lot of personal progress from this spring and summer. ๐Ÿงต

13.08.2025 15:20 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

i hope you can keep your cool through this.

28.07.2025 19:03 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I definitely think everyone working on virtual cells would be happy with more training data. What do you mean when you say the cell should be smaller?

28.07.2025 19:03 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
A recap of virtual cell releases circa June 2025 In October 2024, I twote that โ€œsomething is deeply wrongโ€ with what we now call virtual cell models. A lot has happened since then: modelers are advancing new architectures and mining new sources of i...

In October 2024, I twote that "something is deeply wrong" with what we now call virtual cell models. A lot has happened since then. How am I updating? New blog post: ekernf01.github.io/virtual-cell...

27.07.2025 23:48 โ€” ๐Ÿ‘ 12    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Hardware/wetware codesigned data loop VISTA makes use of generative model sampling and synthesis "on chip" on-board by leveraging oligosynthesis setup shown here.

Hardware/wetware codesigned data loop VISTA makes use of generative model sampling and synthesis "on chip" on-board by leveraging oligosynthesis setup shown here.

The biggest challenge for AI in biology isn't just models, it's the data used to train them. Standard biological data isn't built for AI. To unlock generative AI for drug discovery, we must rethink how we generate and capture data. 1/

22.07.2025 12:29 โ€” ๐Ÿ‘ 30    ๐Ÿ” 9    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 6
Post image

1/๐Ÿงต
๐Ÿšจ New paper published in RNA!
Scientists often say anecdotally that RNA modifications are disrupted in immortalized cells โ€” but no oneโ€™s really tested it.
So we did: ฯˆ-mapping in primary T cells vs. Jurkat cells using direct RNA-seq.
๐Ÿ“„ tinyurl.com/TCellPsiRNA
#nanopore #RNA #pseudouridine

17.07.2025 17:21 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Fuck, I'm so sorry.

29.06.2025 21:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

๐Ÿ˜‹

26.06.2025 15:24 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Classic "midwit meme" with dummy expanding (a+b)^2 as a^2 + b^2, midwit (superimposed ekernf01 profile pic) crying and expanding (a+b)^2 as a^2 + 2ab + b^2, and wizard (superimposed Sasha Gusev profile pic) expanding E[(a+b)^2] as E[a^2] + E[b^2] because E[ab]=0.

Classic "midwit meme" with dummy expanding (a+b)^2 as a^2 + b^2, midwit (superimposed ekernf01 profile pic) crying and expanding (a+b)^2 as a^2 + 2ab + b^2, and wizard (superimposed Sasha Gusev profile pic) expanding E[(a+b)^2] as E[a^2] + E[b^2] because E[ab]=0.

What it felt like to write this series:

26.06.2025 13:08 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Vanilla (Linkage Disequilibrium Score Regression) By B.navez - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=436896
Strawberry (stratified LD score regression) By Ivar Leidus - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=114685374
Cranberry (cross-trait LD score regression) By โฑฎ - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=42145142
Lime (Signed LD Score Regression) By Ivar Leidus - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=97307282
Melon (Mediated Expression Score Regression) By Filo gรจn' - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=68127982

Vanilla (Linkage Disequilibrium Score Regression) By B.navez - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=436896 Strawberry (stratified LD score regression) By Ivar Leidus - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=114685374 Cranberry (cross-trait LD score regression) By โฑฎ - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=42145142 Lime (Signed LD Score Regression) By Ivar Leidus - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=97307282 Melon (Mediated Expression Score Regression) By Filo gรจn' - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=68127982

Blogpost: ekernf01.github.io/vanilla

Image credits:

26.06.2025 13:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
(flowery font "Corsiva" with bright colors)

The exotic FLAVORS of LD Score Regression

Vanilla (Linkage Disequilibrium Score Regression) distinguishes whether bias is from polygenicity or population structure using GWAS summary stats
Strawberry (stratified LD score regression) quantifies which genomic regions are enriched for disease risk using GWAS summary stats + functional annotations
Cranberry (cross-trait LD score regression) estimates genetic correlations between traits using summary stats from multiple GWAS
Lime (Signed LD Score Regression) tests whether predicted allele effects directionally align with per-allele disease risk using GWAS + deep learning
Melon (Mediated Expression Score Regression) quantifies how much disease risk is mediated through gene expression using GWAS + eQTL effects

(flowery font "Corsiva" with bright colors) The exotic FLAVORS of LD Score Regression Vanilla (Linkage Disequilibrium Score Regression) distinguishes whether bias is from polygenicity or population structure using GWAS summary stats Strawberry (stratified LD score regression) quantifies which genomic regions are enriched for disease risk using GWAS summary stats + functional annotations Cranberry (cross-trait LD score regression) estimates genetic correlations between traits using summary stats from multiple GWAS Lime (Signed LD Score Regression) tests whether predicted allele effects directionally align with per-allele disease risk using GWAS + deep learning Melon (Mediated Expression Score Regression) quantifies how much disease risk is mediated through gene expression using GWAS + eQTL effects

BLOG ALERT! ๐Ÿงฌ If you do RNA-seq, ATAC-seq, ChIP-seq, or modeling thereof, you may be overlooking LD score regression methods. These should be standard tools to study genes, variants, and regions en masse, but they are hard to understand. I wrote intros:

26.06.2025 13:03 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
On Modality Commoditization And what might come next

Fast-followers are more lucrative than first-in-class drugs. centuryofbio.com/p/commoditiz...

How, then, can companies protect investments in target discovery?

Code obfuscators protect source code from reverse engineering.

Could mechanism obfuscators protect drug target identities?

25.06.2025 13:38 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thank you. It seems this doesn't quite work to define the genetic association parameters it's supposed to define, so I will substitute another thing that makes sense to me.

23.06.2025 18:29 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Help me understand the definition of genetic covariance from the xt-ldsc paper. If beta is an argmax, then isn't cbeta also an argmax for any positive scalar c?

23.06.2025 18:04 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Part 1. The DNA Sequencing Arms Race: The Fight to Dethrone Illumina Multi-part series exploring the tech and politics around short/long-read sequencing advances

Is anyone making giant post-perturbation cDNA libraries and sticking them in the freezer while waiting for sequencing costs to drop below a certain point? editlife.substack.com/p/part-1-the...

22.06.2025 12:26 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Clipper: p-value-free FDR control on high-throughput data from two conditions - Genome Biology High-throughput biological data analysis commonly involves identifying features such as genes, genomic regions, and proteins, whose values differ between two conditions, from numerous features measure...

Found it.

link.springer.com/article/10.1...

19.06.2025 21:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Interesting. I feel like this problem ought to have a general solution by now! Like a multivariate lmer or glmer package. But I am not up to date on the software ecosystem.

Jessica Jingyi Li's group has an interesting and flexible FDR control method based on simulated data. Will link . . .

19.06.2025 21:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Terrific opportunity. If I were not committed to the east coast for personal reasons, I would apply to this immediately.

19.06.2025 21:10 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0