Guillaume Holley's Avatar

Guillaume Holley

@guillaumeolesan.bsky.social

Research Scientist working on pangenomes and long reads at deCODE Genetics. Opinions shared here do not reflect the views of deCODE. "There is no peace amongst the stars, for in the grim of darkness of the far future, there is only war" - W40K

353 Followers  |  169 Following  |  13 Posts  |  Joined: 18.08.2023  |  1.8446

Latest posts by guillaumeolesan.bsky.social on Bluesky

Really exciting that the preprint on Barbell, a new demultiplexer, is finally out!
It's the first tool that builds on Sassy, the approximate-DNA-searching tool that @rickbitloo.bsky.social and myself developed earlier this year, specifically with this application in mind.

23.10.2025 21:28 β€” πŸ‘ 20    πŸ” 15    πŸ’¬ 2    πŸ“Œ 0
Preview
GitHub - mohsenzakeri/Movi: Fast, Cache-Efficient, and Scalable Queries on Pangenomes Fast, Cache-Efficient, and Scalable Queries on Pangenomes - mohsenzakeri/Movi

1/6 Movi 2 is here: faster and more space-efficient for pangenome queries. Its fastest mode uses half the memory of Movi 1 while running ~30% faster. github.com/mohsenzakeri...

21.10.2025 20:00 β€” πŸ‘ 44    πŸ” 24    πŸ’¬ 1    πŸ“Œ 2
Preview
GitHub - EichlerLab/pav: Phased assembly variant caller Phased assembly variant caller. Contribute to EichlerLab/pav development by creating an account on GitHub.

There is also PAV from the Eichler lab and maintained by the Beck lab (github.com/EichlerLab/pav). Both dipcall and PAV are being used to call small variants and SVs from the T2T-Q100 HG002 asm in the latest GIAB HG002 benchmark set.

21.10.2025 17:51 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Efficient and accurate search in petabase-scale sequence repositories - Nature MetaGraph enables scalable indexing of large sets of DNA, RNA or protein sequences using annotated de Bruijn graphs.

www.nature.com/articles/s41...

10.10.2025 11:12 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

I'm excited to share our pre-print about a new variant benchmarking tool we've been working on for the past few months!

Aardvark: Sifting through differences in a mound of variants
GitHub: github.com/PacificBiosc...

Some highlights in this thread:
1/N

06.10.2025 20:07 β€” πŸ‘ 33    πŸ” 17    πŸ’¬ 1    πŸ“Œ 1

πŸ¦’Long read giraffe is out!πŸ¦’
Mapping long reads to pangenome graphs is ~10x faster than with GraphAligner, with veeery slightly better mapping accuracy, short variant calling, and SV genotyping than GraphAligner or Minimap2

02.10.2025 06:28 β€” πŸ‘ 42    πŸ” 22    πŸ’¬ 1    πŸ“Œ 0
Preview
A complete diploid human genome benchmark for personalized genomics Human genome resequencing typically involves mapping reads to a reference genome to call variants; however, this approach suffers from both technical and reference biases, leaving many duplicated and ...

Delighted to finally announce a preprint describing the Q100 project! β€œA complete diploid human genome benchmark for personalized genomics” For which we finished HG002 to near-perfect accuracy: www.biorxiv.org/content/10.1... 🧡[1/14]

22.09.2025 17:01 β€” πŸ‘ 96    πŸ” 57    πŸ’¬ 4    πŸ“Œ 4
Post image

colorSV: Long-range Somatic Structural Variation Calling from Matched Tumor-normal Co-assembly Graphs. #SomaticStructuralVariants #SV #CoassemblyGraphs #Bioinformatics #Genomics #GenomicsProteomicsBioinformatics
academic.oup.com/gpb/advance-...

23.09.2025 09:15 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

In silico discovery of pathogenic PD-L1 nsSNVs with altered glycosylation and immunotherapy binding https://www.biorxiv.org/content/10.1101/2025.06.17.660108v1

19.06.2025 16:47 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

REINDEER2: practical abundance index at scale https://www.biorxiv.org/content/10.1101/2025.06.16.659990v1

17.06.2025 13:46 β€” πŸ‘ 8    πŸ” 7    πŸ’¬ 0    πŸ“Œ 2
Post image

Congrats to @dantipov.bsky.social et al. on the publication of Verkko2! The team put a ton of work into this making it the first assembler that deals with the complexity of human acrocentric chromosomes. Lots of interesting discoveries to come! genome.cshlp.org/content/earl...

17.06.2025 13:39 β€” πŸ‘ 32    πŸ” 19    πŸ’¬ 1    πŸ“Œ 1

Pangenome-aware DeepVariant https://www.biorxiv.org/content/10.1101/2025.06.05.657102v1

06.06.2025 22:48 β€” πŸ‘ 6    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ“œ Excited to share insights from our recent paper: "Kaminari: a resource-frugal index for approximate colored k-mer queries". The study aims to efficiently identify documents containing a query string, focusing on DNA strings. www.biorxiv.org/content/10.1... 🧬 πŸ–₯️ 1/8

27.05.2025 12:06 β€” πŸ‘ 24    πŸ” 16    πŸ’¬ 1    πŸ“Œ 1
Preview
Sequence alignment with k-bounded matching statistics Finding high-quality local alignments between a query sequence and sequences contained in a large genomic database is a fundamental problem in computational genomics, at the core of thousands of biolo...

New preprint: we used k-mer matching with suffix match length information to create an assembly-to-assembly alignment algorithm + software, kbo.

We wanted to create a reference-based aligner and variant caller that scales to at least 10-100k bacterial queries.

www.biorxiv.org/content/10.1...

26.05.2025 08:00 β€” πŸ‘ 27    πŸ” 11    πŸ’¬ 2    πŸ“Œ 0

Delighted to see this paper from danderson123.bsky.social 's PhD out. We have been building tools for AMR gene detection for over a decade now, but multicopy genes remain challenging. Dan shows that with a gene-space de Bruijn graph and long reads, you can do well
www.biorxiv.org/content/10.1...

19.05.2025 09:28 β€” πŸ‘ 89    πŸ” 50    πŸ’¬ 4    πŸ“Œ 4
Post image

πŸ“’ HPRC Release 2 is here!

Now with phased genomes from 200+ individuals, a 5x increase from Release 1.

Explore sequencing data, assemblies, annotations & alignments in our interactive data explorer ⬇️:

humanpangenome.org/hprc-data-re...

12.05.2025 13:14 β€” πŸ‘ 37    πŸ” 28    πŸ’¬ 0    πŸ“Œ 3
Preview
Efficient near telomere-to-telomere assembly of Nanopore Simplex reads Telomere-to-telomere (T2T) assembly is the ultimate goal for de novo genome assembly. Existing algorithms capable of near T2T assembly all require Oxford Nanopore Technologies (ONT) ultra-long reads w...

Preprint on hifiasm Nanopore-only assembly. Led by Haoyu Cheng: www.biorxiv.org/content/10.1...

18.04.2025 21:54 β€” πŸ‘ 139    πŸ” 77    πŸ’¬ 5    πŸ“Œ 6

Not only is this seriously elegant science from @gregfindlay.bsky.social, @nickywhiffin.bsky.social and friends - using saturation editing to define variant impact in RNU4-2 - it also defines *another* new syndrome associated with this fascinating non-coding RNA gene.

11.04.2025 11:13 β€” πŸ‘ 27    πŸ” 7    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - TimD1/vcfdist: vcfdist: Accurately benchmarking phased variant calls vcfdist: Accurately benchmarking phased variant calls - TimD1/vcfdist

Finally got around to fixing the main limitation of the current vcfdist release: exploding memory usage and runtime in regions with high-density variants. A new `--max-supercluster-size` parameter limits this. Release v2.6.0 is out on [Github](github.com/timd1/vcfdist), DockerHub, and bioconda!

06.04.2025 15:21 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

A milestone for our lab! Here's a full access link: rdcu.be/egmYb

05.04.2025 16:40 β€” πŸ‘ 22    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Preview
Beyond the Gene in Genetics: How Isoform-Resolved Analysis Empowers the Study of Both Common and Rare Genetic Variation Genetics is rapidly deepening our understanding of human health and disease by investigating common and rare genetic variants and their influence on gene expression1,2. Alternative splicing is a molec...

The Genetics research community has a problem. Most recent articles do not consider #splicing/isoforms.

Here, we analyze how important this opportunity gap is - and spoiler warning - we find it is essential for both analysis of common and rare variants

More infoπŸ‘‡

www.medrxiv.org/content/10.1...

02.04.2025 07:45 β€” πŸ‘ 29    πŸ” 10    πŸ’¬ 3    πŸ“Œ 0
Preview
SimdMinimizers: Computing random minimizers, fast Motivation Because of the rapidly-growing amount of sequencing data, computing sketches of large textual datasets has become an essential preprocessing task. These sketches are typically much smaller ...

Congratulations to @imartayan.bsky.social and @curiouscoding.nl whose paper on fast minimizer computation with simd has been accepted to SEA 2025 πŸ™ŒπŸ» www.biorxiv.org/content/10.1...

01.04.2025 08:23 β€” πŸ‘ 17    πŸ” 10    πŸ’¬ 0    πŸ“Œ 1

"Our results reveal substantial differences between pipelines, with many inversions either misrepresented or lost. Most notably, recovery rates remain strikingly low, even with the most simple simulated genome sets, highlighting major challenges in analyzing inversions in pangenomic approaches."

18.03.2025 16:53 β€” πŸ‘ 8    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

Comparative population pangenomes reveal unexpected complexity and fitness effects of structural variants https://www.biorxiv.org/content/10.1101/2025.02.11.637762v1 🧬πŸ–₯️πŸ§ͺ https://github.com/harvardinformatics/scrub-jay-genomics

14.02.2025 16:30 β€” πŸ‘ 12    πŸ” 6    πŸ’¬ 0    πŸ“Œ 1

Pangenome graph augmentation from unassembled long reads https://www.biorxiv.org/content/10.1101/2025.02.07.637057v1

09.02.2025 02:50 β€” πŸ‘ 1    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
A strong internal promoter drives massive expression of YEATS‐domain devoid MLLT3 transcripts in HSC and most lethal AML Click on the article title to read more.

Our study reveals that the MLLT3 gene, crucial for maintaining the self-renewal of some bloom stem cells, also produces a truncated version of its protein via an alternative process. REINDEER was the indexing technique behind the discovery, more here: onlinelibrary.wiley.com/doi/10.1002/... 2/2

10.02.2025 15:22 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Benchmarking, detection, and genotyping of structural variants in a population of whole-genome assemblies using the SVGAP pipeline Comparisons of complete genome assemblies offer a direct procedure for characterizing all genetic differences among them. However, existing tools are often limited to specifi c aligners or optimized f...

Benchmarking, detection, and genotyping of structural variants in a population of whole-genome assemblies using the SVGAP pipeline. #StructuralVariants #GenomeAssembly #Bioinformatics #Genomics @biorxivpreprint.bsky.social
www.biorxiv.org/content/10.1...

08.02.2025 18:07 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Haplotype-based Parallel PBWT for Biobank Scale Data https://www.biorxiv.org/content/10.1101/2025.02.04.636317v1

08.02.2025 18:48 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Fast and Scalable Parallel External-Memory Construction of Colored Compacted de Bruijn Graphs with Cuttlefish 3 https://www.biorxiv.org/content/10.1101/2025.02.02.636161v1

06.02.2025 18:46 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 1

Evaluation of sequencing reads at scale using rdeval https://www.biorxiv.org/content/10.1101/2025.02.01.636073v1

02.02.2025 08:47 β€” πŸ‘ 5    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

@guillaumeolesan is following 20 prominent accounts