This paper wouldn't be possible without my wonderful co-authors: Shuang Li, Malte Lรผcken, and the support from the Marioni Lab and Teichmann Lab. ๐ค
Huge thanks to the reviewers and community for the feedback! #SingleCell #Bioinformatics #scRNAseq #DataScience
17.02.2026 02:12 โ
๐ 1
๐ 0
๐ฌ 0
๐ 0
8๏ธโฃ The Future: Leverage high-quality references and interpretable models to convert "unknown" Class II effects into "known" Class I artifacts. ๐
Long-term goal ๐: Robust reference-based analyses resilient to batch effectsโultimately reducing the need for post hoc correction altogether.
17.02.2026 02:12 โ
๐ 0
๐ 0
๐ฌ 1
๐ 0
7๏ธโฃ Despite progress, gaps remain: โ "Black box": models often obscure which genes are corrected. โ Inefficiency: We often train models "from scratch" for every dataset. โ Lack of guidance on overlapping/redundant covariates.
17.02.2026 02:12 โ
๐ 0
๐ 0
๐ฌ 1
๐ 0
6๏ธโฃ Notably, while the exact link between genes and technical variance remains complex, emerging feature selection insights may pave the way for interpretable frameworks that explicitly model these sources rather than blindly correcting them.
17.02.2026 02:12 โ
๐ 0
๐ 0
๐ฌ 1
๐ 0
5๏ธโฃ We further divide each category into three subgroups.
โ๏ธ Data Cleaning methods relate to physical models, offering mechanistic insights and precise analysis. โฌ Data Integration methods are highly effective but implicit, often functioning as "black boxes" where adjustments are harder to trace.
17.02.2026 02:12 โ
๐ 0
๐ 0
๐ฌ 1
๐ 0
4๏ธโฃ However, there is a long history of method development handling Class I variations (known artifacts like sequencing depth). ๐งน
We classify these as "Data Cleaning" methods. They explicitly model technical noise sourcesโa critical step often overshadowed by integration.
17.02.2026 02:12 โ
๐ 0
๐ 0
๐ฌ 1
๐ 0
3๏ธโฃ Itโs important to clarify that what is usually called "data integration" or "batch correction" (like MNN, Harmony, or scVI) is actually just one subset of methods.
These tools are typically designed for Class II effectsโvariations that are complex or batch-specific.
17.02.2026 02:12 โ
๐ 0
๐ 0
๐ฌ 1
๐ 0
2๏ธโฃ We classify batch effects into two categories (Fig 1a):
๐น Class I: Better characterized, universally unwanted artifacts (e.g., ambient RNA, sequencing depth). ๐น Class II: Poorly characterized, batch-specific variation (e.g., donor effects, protocol differences).
17.02.2026 02:12 โ
๐ 0
๐ 0
๐ฌ 1
๐ 0
1๏ธโฃ Batch effects remain the "elephant in the room" for single-cell genomics. ๐
The core challenge? The trade-off between undercorrection (residual noise) and overcorrection (erasing fine-grained biological signals). We argue that not all batch effects are the same.
17.02.2026 02:12 โ
๐ 0
๐ 0
๐ฌ 1
๐ 0
Toward informed batch correction for single-cell transcriptome integration
Nature Computational Science - Batch effects pose substantial challenges for obtaining meaningful biological insights from large-scale yet heterogeneous single-cell RNA-sequencing datasets. Here...
Our Perspective paper "Toward informed batch correction for single-cell transcriptome integration" is out now in Nature Computational Science! ๐โจ
We review a decade of batch-correction methods and propose a move from "blind" integration to "informed" modeling. ๐งต๐ ๐ rdcu.be/e4cSp
17.02.2026 02:12 โ
๐ 6
๐ 1
๐ฌ 1
๐ 0
give it a try and you may get addicted : )
16.06.2025 01:34 โ
๐ 2
๐ 1
๐ฌ 0
๐ 0
YouTube video by Potato Pseudo Gamer
Royal Match game ads '105' Make way for king
Being a new PI ๐ฝ youtube.com/shorts/llV5_...
15.01.2025 20:12 โ
๐ 0
๐ 1
๐ฌ 0
๐ 0
Open Postdoctoral Position in Computational Genomics and Systems Biology
๐ฌ Hiring: Computational Biology Postdoc @UCSF!
to develop:
1๏ธโฃ Novel deep learning models for spatial/single-cell multiomics
2๏ธโฃ Single-cell analyses across development & disease
3๏ธโฃ Open-source tools for the broader community
Apply here: opportunities.ucsf.edu/content/open...
#CompBio #PostdocJobs
11.12.2024 04:36 โ
๐ 1
๐ 4
๐ฌ 0
๐ 0
He Lab - UCSF
๐Our new lab website is live! ๐
Explore our research on single-cell๐, multi-omics๐ฎ, spatial-omics๐บ๏ธ, gene regulation๐งฌ, bioinformatics๐ค, and development๐ฒ. Meet our motivated team and stay tuned for our latest works ๐๐New members at all levels welcome!
peng-he-lab.github.io
30.11.2024 23:08 โ
๐ 4
๐ 0
๐ฌ 0
๐ 0
He Lab is looking for a computational research assistant to help us build the best cell atlases in the world and dissect gene regulatory networks. This position can be useful for fresh grads to take a break before applying to grad school. More details: aprecruit.ucsf.edu/JPF05334
26.11.2024 05:05 โ
๐ 2
๐ 0
๐ฌ 0
๐ 0
Smart design. I do have codes to pool the replicates, demultiplex together, and then separate the replicates, in case you would need that
11.01.2024 15:12 โ
๐ 0
๐ 0
๐ฌ 0
๐ 0
Ive done 6 with uneven mixing. Guess 10+ is possibow with even mixing+30k cells per library
11.01.2024 13:17 โ
๐ 1
๐ 0
๐ฌ 1
๐ 0
I'm maintaining an active list of biology-related jobs at various levels (PIs, staff scientists, postdocs, PhDs, etc.). Please feel free to subscribe or contribute! github.com/brianpenghe/...
04.01.2024 15:13 โ
๐ 4
๐ 0
๐ฌ 0
๐ 0