Jacob Schreiber's Avatar

Jacob Schreiber

@jmschreiber91.bsky.social

Studying genomics, machine learning, and fruit. My code is like our genomes -- most of it is junk. Assistant Professor UMass Chan, Board of Directors NumFOCUS Previously IMP Vienna, Stanford Genetics, UW CSE.

6,583 Followers  |  1,402 Following  |  805 Posts  |  Joined: 17.11.2023  |  2.1829

Latest posts by jmschreiber91.bsky.social on Bluesky

It was suggested that the audience may not appreciate/understand :(

03.10.2025 06:22 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

is it a good idea to wear a "join, or die!" hat to a big talk in europe? please say yes

30.09.2025 18:15 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

the greatest productivity hack is having a grant deadline. there's so much other stuff you can do when you're supposed to be working on a grant.

22.09.2025 14:05 β€” πŸ‘ 15    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

I was delighted to have the unexpected opportunity to give a keynote at MLCB 2025 in NYC last week. I used it to explain how I view deep learning models in genomics not as "uninterpretable black boxes" but as indispensable tools for understanding genomics + designing the next gen of synthetic DNA.

19.09.2025 13:59 β€” πŸ‘ 11    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

for some reason i thought being a professor would involve more mentoring and research and less filling out disclosures concerning whether plants and seeds were used in my computational study

09.09.2025 19:24 β€” πŸ‘ 10    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image

stocking up the new apartment with essentials

03.09.2025 14:24 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

In the genomics community, we have focused pretty heavily on achieving state-of-the-art predictive performance.

While undoubtedly important, how we *use* these models after training is potentially even more important.

tangermeme v1.0.0 is out now. Hope you find it useful!

27.08.2025 16:20 β€” πŸ‘ 44    πŸ” 14    πŸ’¬ 1    πŸ“Œ 0

For some reason, hitting "comment" on GitHub is significantly more responsive than a month ago and it freaks me out. Surely there are some important calculations that need to be done before letting my thoughts into the wild?

27.08.2025 21:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks! Let me know if you want me to stop in virtually, we can try to figure out a time.

27.08.2025 17:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Hope you find tangermeme helpful in your work! Please reach out if you have any comments + questions.

27.08.2025 16:44 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Because everything is automatic, we can probe models.

What motifs are driving model predictions? Calculate attributions, call + annotate seqlets, and count the annotations!

BPNet is relying on MYC, whereas Beluga is relying on many more TFs. Easy comparison now.

27.08.2025 16:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Frequently, people manually annotate seqlets and draw bars or boxes around these high-attribution characters themselves. This is not really a problem, but it's just slow and does not scale genome-wide.

In the above picture, everything is automatically done.

27.08.2025 16:41 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

People *talk* about seqlets a lot but tangermeme is the first package for complete functionality.

Here is a complete example of using tangermeme for attributions, seqlet calling + annotation, and plotting, to visualize what five models think of the same locus

27.08.2025 16:40 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Expanding past these implementations, tangermeme has a large focus on automatic seqlet calling and usage. Seqlets are short contiguous spans of high-attribution characters that usually correspond to the binding of a TF.

27.08.2025 16:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

By considering attributions you can see how variants disrupt or change usage of motifs. Maybe you'll even find that a variant causes alternative binding by inducing a new motif or slightly changing competition! That would be challenging to see from the predictions alone.

27.08.2025 16:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Past simply re-implementing algorithms people use (in a convenient repo), tangermeme offers flexibility not usually offers in other implementations.

As an example, instead of calculating variant effect as predictions before/after a substitution, why not look at attributions?

27.08.2025 16:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

This care extends to each of our operations. For example, one-hot encoding the entirety of chr1 takes <2s on a single thread. This is significantly faster than other one-hot encoding methods out there, and is fast enough to enable real-time batch generation from FASTAs.

27.08.2025 16:33 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Jacob Schreiber on X: "Using Captum to interpret your @PyTorch models using DeepLift/DeepLiftShap? If you specify your activations incorrectly, you will silently get incorrect attributions. In this genomics example, the TTTGCAT.ACAAT motif is the important thing and is entirely missed. https://t.co/MuVsAO5isz" / X Using Captum to interpret your @PyTorch models using DeepLift/DeepLiftShap? If you specify your activations incorrectly, you will silently get incorrect attributions. In this genomics example, the TTTGCAT.ACAAT motif is the important thing and is entirely missed. https://t.co/MuVsAO5isz

Here is a (twitter) thread on the issue:

x.com/jmschreiber9...

27.08.2025 16:31 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

By focusing in this manner, we can "delve" deeply into these downstream algorithms. For instance, we found a bug in many DeepLIFT/SHAP implementations that will cause them to silently fail when you don't register your operations. Didn't know you needed to do that? Same!

27.08.2025 16:29 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This design choice is intentional. Because model components and training strategies are hugely variable and evolving quickly, I did not even want to touch those aspects.

You define and train your model however you want, and then use tangermeme to do genomic discovery with it.

27.08.2025 16:26 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Post image

tangermeme is a toolkit that implements "everything-but-the-model" for genomic machine learning.

This includes sequence manipulations, batched predictions, attributions, ablations, marginalizations, variant effect prediction, design, etc...

27.08.2025 16:24 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
tangermeme: A toolkit for understanding cis-regulatory logic using deep learning models Deep learning models have achieved state-of-the-art performance at predicting diverse genomic modalities, yet their promise for biological discovery lies in how they are used after demonstrating their...

Preprint: biorxiv.org/content/10.1...

Installation: `pip install tangermeme`

27.08.2025 16:22 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

In the genomics community, we have focused pretty heavily on achieving state-of-the-art predictive performance.

While undoubtedly important, how we *use* these models after training is potentially even more important.

tangermeme v1.0.0 is out now. Hope you find it useful!

27.08.2025 16:20 β€” πŸ‘ 44    πŸ” 14    πŸ’¬ 1    πŸ“Œ 0

It's about both. Conceptually, they are intertwined.

26.08.2025 18:38 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Why Stacking Sliding Windows Can't See Very Far Modern LLMs use sliding window attention for efficiency, but why can't stacking sliding windows see as far as theory suggests? A mathematical exploration of information dilution and the exponential ba...

An excellent post about the receptive range of convolution models.

"You might reasonably ask: "If I have 100 layers with W=1000W=1000, that's a theoretical receptive field of 100,000 tokens. Doesn't that matter?"

The answer is no, and here's why:"

guangxuanx.com/blog/stackin...

26.08.2025 18:32 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Multiple, in fact

25.08.2025 19:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

First week as a faculty was successful: one grant submitted, one paper submitted, and revision requests back on one paper.

If we extrapolate, by the time I'm up for tenure I'll have 260 grants submitted (none awarded) and 260 papers submitted/reviewed (none accepted). `

lgtm

25.08.2025 14:42 β€” πŸ‘ 15    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Almost a year ago I submitted a NIH grant and federal funding collapsed. Continuing on that success, I'm proud to announce that I've just submitted a local grant...

20.08.2025 19:52 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Very wide-ranging. I am impressed with it!

20.08.2025 12:46 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

now i get to be happy, right?

20.08.2025 12:25 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@jmschreiber91 is following 20 prominent accounts