Noam Teyssier (@noamteyssier) — Bluesky Profile

2 days ago

Sassy2: Batch Searching of Short DNA Patterns https://www.biorxiv.org/content/10.64898/2026.03.10.710811v1

10 5 0 1

1 month ago

CRISPR screens in iPSC-derived neurons reveal principles of tau proteostasis CRISPR screens in iPSC-derived neurons reveal that the E3 ubiquitin ligase CRL5SOCS4 ubiquitinates tau, that CUL5 expression is correlated with resilience in human Alzheimer’s disease, and that electr...

After a long review process, I'm excited that our paper is finally in print: www.cell.com/cell/fulltex...

TL;DR: We use CRISPR screens in iPSC-derived neurons to find a new tau E3 ligase and a relationship between oxidative stress, the proteasome, and tau proteolytic fragments.

More below 👇

35 11 2 1

1 month ago

Arc bioinformatics scientists @noamteyssier.bsky.social
and Alex Dobin have just released cyto, an ultra-high throughput processor specifically optimized for
@10xgenomics.bsky.social Flex single-cell data.

We are excited to make this resource open source: www.biorxiv.org/content/10.6...

7 3 1 0

1 month ago

GitHub - ArcInstitute/cyto: A mapper for single cell sequencing reads with abstract geometries A mapper for single cell sequencing reads with abstract geometries - ArcInstitute/cyto

cyto is free, open-source, and production-ready. Built in Rust for reliability at scale.

Currently supports 10x Flex GEX and CRISPR, with more modalities coming.

Try it out and let us know how it works for you!

github.com/ArcInstitute...

1 0 0 0

1 month ago

GitHub - ArcInstitute/binseq: A high efficiency binary format for sequencing data A high efficiency binary format for sequencing data - ArcInstitute/binseq

cyto is the first large-scale bioinformatics project to build with BINSEQ. Switching to BINSEQ can achieve mapping rates of 50M reads per second reduce your storage requirements by about 40%.

github.com/ArcInstitute...

1 0 2 0

1 month ago

We also show that we can reproduce the results of CellRanger at a fraction of the resource cost. Our concordance is above 99.85% as measured via Spearman on matched cell UMIs and our lower-dimensional representations show perfect overlap with no method specific clustering.

0 0 1 0

1 month ago

cyto was built from the ground up to be modular and to expose the individual modules to the user. Each step is highly optimized and can be run independently, perfect for production scale workflows as it allows for better parallelization and resource allocation on smaller nodes

0 0 1 0

1 month ago

Currently the only tool that supports this data type is CellRanger and we show that cyto provides runtimes an order of magnitude faster (16x), uses less than half the memory, dramatically reduces CPU-hours (30x) and reduces total IO by more than 5x.

2 0 1 0

1 month ago

Today I’m happy to release cyto, a tool I’ve developed at @arcinstitute.org to dramatically increase our computational throughput with 10x-flex single-cell processing by more than 16X!

11 4 1 0

1 month ago

I've tried this at least 3 times haha I think honestly the best way to do it is not really to port it but drastically rework the way that its written.

1 0 1 0

2 months ago

GitHub - ArcInstitute/binseq: A high efficiency binary format for sequencing data A high efficiency binary format for sequencing data - ArcInstitute/binseq

Maybe this will be the year we start to really question the foundational infrastructure of the field.

Obligatory BINSEQ mention here - keep a lookout the next couple weeks for an update!

github.com/arcInstitute...

3 0 0 0

2 months ago

It is the year 2026 - bioinformaticians are still trying to figure out the best way to handle fastq

4 0 1 0

2 months ago

GitHub - ArcInstitute/binseq: A high efficiency binary format for sequencing data A high efficiency binary format for sequencing data - ArcInstitute/binseq

My attempt at the 15th competing standard: github.com/arcInstitute...

0 0 1 0

2 months ago

To be one with the borrow checker one must first be willing to let go

1 0 0 0

3 months ago

GitHub - ArcInstitute/bqtools: A command line utilty for working with BINSEQ files A command line utilty for working with BINSEQ files - ArcInstitute/bqtools

Check out the github and give it a shot!

github.com/arcinstitute...

0 0 0 0

3 months ago

We can make as many pipes as we have threads, each with a fixed record range and with a specified segment (R1 / R2). Then we can connect to each pipe on a reader and treat it as a normal FASTX file.

What's great about this is we can process *either* sequentially or in parallel *without* deadlocks!

0 0 1 0

3 months ago

This was a fun engineering problem but ultimately was not very difficult because of the way BINSEQ is designed in the first place!

Named pipes can be a headache because it requires coordination between readers and writers but because BINSEQ is random access the implementation is straightforward.

1 0 1 0

3 months ago

New feature to bqtools v0.4.14 that I'm stoked on!

One of the limiting factors to adopting BINSEQ is that it's new and not widely supported by existing tools.

`bqtools pipe` addresses this by transparently creating FASTX named-pipes which can be processed normally by existing tools.

6 1 1 0

3 months ago

If you ever need to fuzzy search some DNA, sassy is your tool.

Please spread the word; I think many people just outside my own circle could benefit from this :)

cc @rickbitloo.bsky.social

github.com/RagnarGrootK...

40 24 4 0

3 months ago

Some optimization on VBQ with the latest binseq update, especially in lossless mode. Some ways to trim the fat:

1. Reuse zstd decoders for each thread. I was creating a decoder for each vbq block which incurred redundant allocations

2. Zero-copy parsing of blocks, referencing similar to paraseq

3 0 0 0

3 months ago

ARM64 linux I think is pretty common on cloud computing environments. Might be worth to build for it also

1 0 1 0

3 months ago

Efficient sequence analysis with bqtools Interactive bqtools tutorial: learn to analyse sequence data efficiently with BINSEQ files using a command-line interface in your browser.

Excited to announce a new bqtools tutorial on sandbox.bio by @noamteyssier.bsky.social! Learn about the BINSEQ file format, and how it can replace FASTQ files for better data compression and faster parallel processing: sandbox.bio/tutorials/bq...

8 4 0 0

3 months ago

Built with uv so you don't have to worry about the dependencies or environments. Simple as:

```
uv tool install anntools-bio
anntools --help
```

1 0 0 0

3 months ago

GitHub - noamteyssier/anntools: a cli-driven anndata toolkit a cli-driven anndata toolkit. Contribute to noamteyssier/anntools development by creating an account on GitHub.

I work with large collections of AnnDatas for single-cell work and got tired of opening notebooks for simple operations. Built a CLI tool to handle some common stuff directly from the terminal.

Quick ops: downsample, concat, pseudobulk, QC, metadata export, etc.

github.com/noamteyssier...

12 4 1 0

4 months ago

sandbox.bio - Interactive bioinformatics tutorials

Side note: sandbox.bio is so cool.

Setting up an environment where you can learn and play around with these tools in the browser is no simple feat and I think it's an excellent educational resource for the bioinformatics community.

I'm very happy and proud to contribute to it!

2 0 0 0

4 months ago

Efficient sequence analysis with bqtools Interactive bqtools tutorial: learn to analyse sequence data efficiently with BINSEQ files using a command-line interface in your browser.

BINSEQ is a high-performance format for sequencing data and bqtools is a CLI tool that lets you create and manipulate these files in the style of samtools.

Excited to release a tutorial with @robert.bio showcasing how to use it to encode, decode, and grep sequences in the browser on sandbox.bio!

5 2 1 0

4 months ago

The pattern counting is something I'm especially stoked about. I was actually very surprised to see that this feature isn't more common on grep-like tools (outside of bioinformatics as well).

I've had this problem for years and I end up writing bespoke tools that do some variation of it.

1 0 0 0

4 months ago

Release bqtools-0.4.8 · ArcInstitute/bqtools What's Changed 116 support fuzzy grep with sassy by @noamteyssier in #118 119 gate fuzzy matching behind feature flag by @noamteyssier in #120 58 implement a pattern count feature by @noamteyssier...

New bqtools release with some nice new features!

1. Support for fuzzy matching using sassy
2. Multi-Pattern counting (like `grep -c` but the count is for each individual pattern provided)
3. Pattern files (providing large lists of patterns as either regex or literals)

github.com/ArcInstitute...

2 0 1 0

4 months ago

And stay on the look out the next couple weeks (hopefully) for the release of an even bigger project built with binseq!

0 0 0 0

4 months ago

GitHub - ArcInstitute/binseq: A high efficiency binary format for sequencing data A high efficiency binary format for sequencing data - ArcInstitute/binseq

And if you're interested in building with binseq here is the place to start!

github.com/arcinstitute...

0 0 1 0