Sameed Siddiqui's Avatar

Sameed Siddiqui

@sameedms.bsky.social

Californian lost in the Northeast ☀️. PhD @ MIT Computational and Systems Biology | MBA Fellow at MIT Sloan. @SabetiLab member

41 Followers  |  35 Following  |  17 Posts  |  Joined: 06.01.2025  |  2.1241

Latest posts by sameedms.bsky.social on Bluesky

Finally, thanks to the team! A huge shoutout to my friend and mentee @krithik-bs.bsky.social. Also infinitely grateful for #AlbertGu for his advice, and #MichaelMitzenmacher @pardissabeti.bsky.social for their mentorship and leadership. So much laughter while making this paper, can't wait for more.

21.03.2025 21:16 — 👍 1    🔁 0    💬 0    📌 0
Preview
Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences Deep learning architectures such as convolutional neural networks and Transformers have revolutionized biological sequence modeling, with recent advances driven by scaling up foundation and task-speci...

Check out our paper "Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences," detailing how mathematical insights overcome computational limitations in biology.
arxiv.org/abs/2503.16351

21.03.2025 21:16 — 👍 1    🔁 1    💬 2    📌 0

This work shows that principled mathematical insights, like approximation of epistatic interactions, can provide an accessible and performant alternative to large foundation models—suggesting broader applicability beyond biological sequences.

21.03.2025 21:16 — 👍 2    🔁 0    💬 1    📌 0

We are excited about Lyra's potential to accelerate discoveries in molecular biology, therapeutic development, and protein engineering.

21.03.2025 21:16 — 👍 1    🔁 0    💬 1    📌 0

Lyra makes cutting-edge biological modeling accessible to labs without extensive compute resources. Instead of relying on massive GPU clusters, Lyra empowers researchers to train state-of-the-art models directly on their own laptops.

21.03.2025 21:16 — 👍 1    🔁 0    💬 1    📌 0
Post image

Lyra’s subquadratic O(N log N) complexity dramatically reduces memory (125x–2600x less than Evo and ESM-1b) and accelerates inference—up to 239x faster than ESM-1b, processing sequences up to 1M length.

21.03.2025 21:16 — 👍 1    🔁 0    💬 1    📌 0

RNA-dependent RNA polymerases (RDRPs) are essential markers for RNA virus detection. Lyra achieves a near-perfect 0.998 true positive rate, matching LucaProt-ESM with over 60,000x fewer parameters, accelerating pathogen discovery without needing large-scale GPU infrastructure.

21.03.2025 21:16 — 👍 1    🔁 0    💬 1    📌 0
Post image

Lyra achieves SOTA results in 6 out of 7 intrinsically disordered protein region tasks, with an average AUC of 0.89, outperforming a ProtT5-based model (avg AUC 0.86). Lyra accomplishes this using only 55K parameters, compared to ProtT5’s 3 billion parameters—a >50,000-fold reduction in model size.

21.03.2025 21:16 — 👍 1    🔁 0    💬 1    📌 0
Post image

Lyra’s consistently strong performance across different tasks using orders of magnitude fewer parameters allows researchers to spend less time optimizing models and more time generating biological insights.

21.03.2025 21:16 — 👍 1    🔁 0    💬 1    📌 0

Lyra sets records in 5/9 RNA BEACON benchmarking tasks tested, including nearly solving the splice-site prediction dataset (98.89% accuracy vs previous best 50.55%) and almost doubling performance on structural score imputation (0.73 vs 0.42).

21.03.2025 21:16 — 👍 2    🔁 0    💬 1    📌 0

We tested Lyra on 101 diverse biological tasks spanning:

1. Proteomics

2. Genomics

3. CRISPR guide efficacy

Lyra set new performance records in 79 out of 101 tasks, w/ substantially smaller models than competing architectures.

21.03.2025 21:16 — 👍 2    🔁 0    💬 1    📌 0

We designed Lyra with two simple components: Projected Gated Convolutions (PGC), which enhance local feature extraction, and diagonalized State Space Models (S4D), which capture global epistatic interactions. In doing so, Lyra efficiently captures both global and local epistatic relationships.

21.03.2025 21:16 — 👍 1    🔁 0    💬 1    📌 0

We drew a mathematical connection between State Space Models (SSMs) and polynomial approximation, showing how their hidden states can naturally approximate the polynomial terms that govern epistatic relationships. This makes SSMs ideal for modeling biological functions as multilinear polynomials.

21.03.2025 21:16 — 👍 1    🔁 0    💬 1    📌 0

This perspective provides a principled mathematical framework for modeling sequence-function relationships.

21.03.2025 21:16 — 👍 1    🔁 0    💬 1    📌 0
Post image

To unify biological sequence modeling across DNA, RNA, and proteins into a single computational framework, we revisited epistasis—the phenomenon where mutations influence each other—which can be characterized by multilinear polynomials.

21.03.2025 21:16 — 👍 1    🔁 0    💬 1    📌 0

Breaking down how biological sequences encode molecular functions remains a central challenge in computational biology. For example, given a GFP sequence, can we predict its fluorescence brightness?

21.03.2025 21:16 — 👍 1    🔁 0    💬 1    📌 0
Post image

🧬 Meet Lyra, a new paradigm for accessible, powerful modeling of biological sequences. Lyra is a lightweight SSM achieving SOTA performance across DNA, RNA, and protein tasks—yet up to 120,000x smaller than foundation models (ESM, Evo). Bonus: you can train it on your Mac.
arxiv.org/abs/2503.16351

21.03.2025 21:16 — 👍 20    🔁 12    💬 1    📌 2

@sameedms is following 20 prominent accounts