Luke Zappia's Avatar

Luke Zappia

@lazappi.bsky.social

Bioinformatician, data scientist, software developer Also @_lazappi_ and @lazappi@mastodon.au

1,006 Followers  |  320 Following  |  45 Posts  |  Joined: 04.12.2023  |  2.3408

Latest posts by lazappi.bsky.social on Bluesky

Heads up: ignore samtools dot org, similarly minimap2 dot com and likely others. It's owned by a known phishing site and while the binaries they offer look valid currently (but note they may be serving us different binaries to others), that could change.

Ie: it's not us (Samtools team)! Be warned

15.09.2025 08:40 β€” πŸ‘ 144    πŸ” 126    πŸ’¬ 2    πŸ“Œ 5
Post image

anndataR enables seamless R-Python interoperability for single-cell RNA-seq by reading, writing, and converting H5AD files, supporting Seurat and SingleCellExperiment formats www.biorxiv.org/content/10.1... 🧬πŸ–₯️πŸ§ͺ #Rstats github.com/scverse/annd...

29.08.2025 10:15 β€” πŸ‘ 19    πŸ” 7    πŸ’¬ 0    πŸ“Œ 0

A very big thank you to all co-authors and collaborators! @lazappi.bsky.social @rcannood.bsky.social Martin Morgan @scverse.bsky.social @ivirshup.bsky.social Chananchida Sang-aram Danila Bredikhin Brian Schilder Ruth Seurinck @yvansaeys.bsky.social @saeyslab.bsky.social

25.08.2025 15:24 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Hopefully coming soon to @bioconductor.bsky.social!

26.08.2025 05:37 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

It's taken more than 2 years but we can officially announce anndataR!

There's still a lot of features to add but we hope a robust, "official", R-native H5AD reader/writer will unlock the defacto single-cell storage format for R users by avoiding issues with current solutions like zellkonverter.

26.08.2025 05:36 β€” πŸ‘ 21    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Post image

We're excited to share that our preprint on anndataR, a new package bringing Python's AnnData to R, is now available on bioRxiv πŸŽ‰

πŸ”— Read the paper: www.biorxiv.org/content/10.1...
πŸ’» Check the package in action: anndatar.data-intuitive.com

25.08.2025 15:24 β€” πŸ‘ 20    πŸ” 6    πŸ’¬ 1    πŸ“Œ 1

This week in my graduate-level Science Communication course, we are discussing communicating science to the public through public speaking (i.e., museum "evening with a scientist" nights, Nerd Nite lectures, speaking at community events, etc).

What advice do you have to share with my students? πŸ§ͺ

17.06.2025 15:10 β€” πŸ‘ 56    πŸ” 3    πŸ’¬ 20    πŸ“Œ 0
[Bioc-devel] Bioconductor is moving from Slack to Zulip – 2 June

Bioconductor moving to Zulip for community chat

stat.ethz.ch/pipermail/bi...

30.05.2025 22:19 β€” πŸ‘ 16    πŸ” 9    πŸ’¬ 2    πŸ“Œ 0
Make your future Melbourne | WEHI Lead the next wave of biomedical discovery

www.wehi.edu.au/careers/make...
WEHI is looking for new lab heads! New and experienced lab heads welcome. Check out the ad, consider applying and share. We’re a collaborative bunch, in Australia, with fabulous technology, great colleagues at a values-driven organisation. @wehi-research.bsky.social

13.05.2025 23:44 β€” πŸ‘ 25    πŸ” 28    πŸ’¬ 0    πŸ“Œ 2
Post image

Analyzing your single-cell data by mapping to a reference atlas? Then how do you know the mapping actually worked, and you’re not analyzing mapping-induced artifacts? We developed mapQC, a mapping evaluation tool www.biorxiv.org/content/10.1... from the β€ͺ@fabiantheis lab. Let’s dive in🧡

03.06.2025 08:24 β€” πŸ‘ 24    πŸ” 10    πŸ’¬ 2    πŸ“Œ 0
Post image Post image Post image Post image

We digitized the AfD report of the Federal Office for the Protection of the Constitution.

This (secret) document was created to deliver proofs for the partyβ€˜s extreme right nature.

Now you can explore it interactively.

➑️ 🎁 πŸ‡©πŸ‡ͺ www.spiegel.de/politik/deut...

27.05.2025 04:59 β€” πŸ‘ 64    πŸ” 22    πŸ’¬ 2    πŸ“Œ 1

πŸ”₯ Just published! Thrilled to share that our work on funkyheatmap has just been published in the Journal of Open Source Software πŸŽ‰

01.05.2025 14:20 β€” πŸ‘ 7    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Preview
From R to Python with minimal baggage Getting the best of both worlds.

There is no reason to stay bound to one programming language. I discussed ways to ease R-Python interoperability with Luke Zappia, Philipp Angerer, Tomasz Kalinowski.
Their tips and tricks are collected in this blog: hrovatin.github.io/posts/r_pyth...
@lazappi.bsky.social @t-kalinowski.bsky.social

28.04.2025 05:13 β€” πŸ‘ 19    πŸ” 3    πŸ’¬ 5    πŸ“Œ 1

If they don't agree then it depends which you find more convincing I guess. I think there are lots of reasons two benchmarks might not after without either if them being "wrong" though.

16.04.2025 11:58 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Very on brand 😹. I guess I missed a space 🀦🏻. At least the mystery is solved.

20.03.2025 06:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Yeah, it surprised me a bit as well. This wasn't the main aim of the paper though so it's a fairly limited comparison. There is a lot more you could do if you wanted to properly compare them.

19.03.2025 16:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks!

19.03.2025 07:51 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Our paper benchmarking feature selection for scRNA-seq integration and reference usage is out now www.nature.com/articles/s41...!

Keep reading for more about how we did the study and what we found out 🧡 πŸ‘‡

1/16

18.03.2025 15:40 β€” πŸ‘ 41    πŸ” 17    πŸ’¬ 6    πŸ“Œ 1

Thanks!

19.03.2025 07:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks!

19.03.2025 07:38 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Feature selection methods affect the performance of scRNA-seq data integration and querying - Nature Methods This Registered Report presents a benchmarking study evaluating the impact of feature selection on scRNA-seq integration.

Hmmm...It's broken for me too 😿. I'm not sure what happened there but this is working www.nature.com/articles/s41.... Thanks for letting me know!

19.03.2025 07:37 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Prospective staff : Jobs at UWA : IN DEVELOPMENT

We're hiring for a bioinformatics lead at the OceanOmics Centre! L7, come work with all of the cool genomics data!
HiFi, HiC, Illumina, a PromethION, it's all there. Sequence *all* of the marine vertebrates!

#bioinformatics

external.jobs.uwa.edu.au/cw/en/job/51...

17.03.2025 07:02 β€” πŸ‘ 23    πŸ” 19    πŸ’¬ 2    πŸ“Œ 0

Big thank you to everyone who contributed to the study πŸŽ‰!

And thank you for reading πŸ“–!

16/16

18.03.2025 15:40 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The project took 2.5 years from initial commit to publication. Some things could have been quicker but that’s not a complaint. I think that’s how long science takes and we should set realistic expectations, especially for junior researchers. It also emphasises the need for continuous benchmarking.

18.03.2025 15:40 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - theislab/atlas-feature-selection-benchmark: Code for "Feature selection methods affect the performance of scRNA-seq data integration and querying" Code for "Feature selection methods affect the performance of scRNA-seq data integration and querying" - theislab/atlas-feature-selection-benchmark

You can also find the code on GitHub github.com/theislab/atl... and the data on figshare figshare.com/projects/Ben....

14/16

18.03.2025 15:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Benchmarking feature selection for integration – lazappi A summary of our recently published paper β€œFeature selection methods affect the performance of scRNA-seq data integration and querying”

I have expanded on these points in this blog post lazappi.id.au/posts/2025-0... and of course all the detail is in the paper doi.org/10.1038/s415....

13/16

18.03.2025 15:40 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Figure showing the comparison between scVI, scANVI and Harmony/Symphony integration methods. a) metric category scores for each feature selection and integration method. b) difference in metric scores for scANVI and Symphony compared to scVI. c) metric category ranks for each feature selection and integration method. d) difference in ranks for scANVI and Symphony compared to scVI.

Figure showing the comparison between scVI, scANVI and Harmony/Symphony integration methods. a) metric category scores for each feature selection and integration method. b) difference in metric scores for scANVI and Symphony compared to scVI. c) metric category ranks for each feature selection and integration method. d) difference in ranks for scANVI and Symphony compared to scVI.

We focused on feature selection methods, but we compared scANVI and Harmony/Symphony to our baseline of scVI. Feature selection methods performed similarly but scANVI scored higher overall and Symphony worse, particularly at unseen population detection. More work is needed to understand why.

12/16

18.03.2025 15:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Figure showing performance on subsets of the Human Lung Cell Atlas. a) shows scores for metric categories on the full HLCA, the immune lineage and the epithelial lineage. b) is a heatmap of the Jaccard index of the features selected on different subsets. c) shows the proportion of marker genes selected by each method on each subset. d) shows Milo scores for identifying unseen populations on the full HLCA and lineage subsets, as well as the difference between lineages compared to the full dataset.

Figure showing performance on subsets of the Human Lung Cell Atlas. a) shows scores for metric categories on the full HLCA, the immune lineage and the epithelial lineage. b) is a heatmap of the Jaccard index of the features selected on different subsets. c) shows the proportion of marker genes selected by each method on each subset. d) shows Milo scores for identifying unseen populations on the full HLCA and lineage subsets, as well as the difference between lineages compared to the full dataset.

What about lineage-specific integration? Using subsets of the Human Lung Cell Atlas we saw poorer performance overall on lineages compared to the full dataset, particularly for unseen population detection, but a full study is needed to properly answer this.

11/16

18.03.2025 15:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Do you need to select features in a batch-aware way? We didn’t see a clear effect of doing that so it depends on your dataset and computational resources.

10/16

18.03.2025 15:40 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Figure showing the comparison of feature selection methods. a) shows the overall scores and ranks for each metric category. b) is a heatmap of Jaccard index between feature sets selected by each method. c) shows the number of common features selected by different numbers of methods on each dataset. d) shows the number of features selected by methods that automatically choose the number of features. e) is a heatmap of the difference in metric category shows for batch-aware variants of methods compared to the standard version.

Figure showing the comparison of feature selection methods. a) shows the overall scores and ranks for each metric category. b) is a heatmap of Jaccard index between feature sets selected by each method. c) shows the number of common features selected by different numbers of methods on each dataset. d) shows the number of features selected by methods that automatically choose the number of features. e) is a heatmap of the difference in metric category shows for batch-aware variants of methods compared to the standard version.

Highly variable features performed consistently well, especially the Seurat VST method. Supervised marker genes also score highly but are more variable and require cell labels. Check out triku for an alternative approach that performs similarly.

9/16

18.03.2025 15:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@lazappi is following 20 prominent accounts