Erin Young erinyoung - Bluesky Statics

A flier with instructions on judging the DAVIS SCHOOL DISTRICT STEM FAIR.

I had a fantastic morning connecting with some incredibly inspiring young scientists at our local STEM fair!

Congratulations to all the students who showcased their curiosity, creativity, and hard work! (And their corresponding adults.)

The future of science is bright!

#sciencefair #STEM

23.02.2026 19:50 — 👍 5 🔁 0 💬 0 📌 0

Hello world! I am excited to announce my lab is open at the University of Utah in the Department of Biochemistry. We are looking for scientists at all levels interested in studying host-virus interactions in both bacteria and animals. Come join us in beautiful Utah! (photo is 10 steps from lab)

22.01.2026 22:06 — 👍 73 🔁 31 💬 6 📌 0

It's great being able to connect with my fellow public health bioinformaticians at the AMD Leads Meeting in Atlanta!

13.01.2026 16:31 — 👍 0 🔁 0 💬 0 📌 0

Working from home with a sick-kid-that-feels-fine is such an adventure.

Features:
fever that reduces with acetaminophen
random infrequent cough
lots of energy and stir craziness

10.12.2025 16:17 — 👍 1 🔁 0 💬 2 📌 0

DaisyBlast helps me visualize possible horizontal gene transfer (HGT) events involving plasmids.

21.11.2025 23:41 — 👍 3 🔁 1 💬 0 📌 0

And bioconda

```
conda install -c bioconda daisyblast
```

21.11.2025 23:41 — 👍 1 🔁 0 💬 1 📌 0

DaisyBlast A Python tool to find, plot, and export synteny blocks from all-vs-all BLAST.

And pypi (blast must be installed separately)

pypi.org/project/Dais...

```
pip install daisyblast
```

21.11.2025 23:41 — 👍 0 🔁 0 💬 1 📌 0

GitHub - erinyoung/daisyblast: Python pipeline for automated synteny block detection and visualization from multiple genomes. Python pipeline for automated synteny block detection and visualization from multiple genomes. - erinyoung/daisyblast

daisyblast sourcecode is up on github github.com/erinyoung/da...

21.11.2025 23:41 — 👍 1 🔁 0 💬 1 📌 0

This circular genomic map visualizes the distribution of conserved BLAST hits across the ~155,000 bp length of test_3.fasta___3, where distinct colored blocks represent homology groups shared by at least two sequences. The sequence architecture is defined by extensive contiguous conservation at the termini, dominated by massive blocks such as Group_23 (orange) spanning the first 18kb and Group_6 (light blue) covering the final 20kb. In contrast to these stable ends, specific internal loci—most notably the 20,000–30,000 bp and 110,000–125,000 bp regions—exhibit a dense, mosaic arrangement of fragmented hits (e.g., Groups 11, 22, 16, and 21), indicating complex zones of variation or repetitive elements that align with the "hotspots" seen in the previous dot plot analysis.

Or little circular figures, also without connecting lines. I've found these impress people, but it is a little harder to identify sections of synteny when the genome is circular.

21.11.2025 23:41 — 👍 1 🔁 0 💬 1 📌 0

This linear genomic map unfolds the test_3.fasta___3 sequence along a horizontal axis of approximately 155,000 base pairs, providing a flattened, chronological perspective of the BLAST hit distribution. The sequence architecture is anchored by substantial, contiguous homology blocks at the termini: the massive Group_23 (orange) dominates the first 18kb, while the extensive Group_6 (light blue) occupies the final segment from ~127kb onwards. Between these conserved flanks lies a segmented interior characterized by alternating patterns of stability—such as the large contiguous spans of Group_7 (purple) and Group_24 (mauve)—and highly fragmented "hotspots" (notably around 20kb–40kb and 100kb–125kb) where rapid transitions between numerous small groups (e.g., Groups 16, 12, 11, and 32) indicate regions of high complexity or genomic shuffling.

Or little synteny blocks figures without neighbors (no connecting lines). This is helpful in exploratory analysis where I am initially unsure which sequences are going to be significant.

21.11.2025 23:41 — 👍 0 🔁 0 💬 1 📌 0

This combined alignment dot plot illustrates the syntenic relationships between the query sequence test_3.fasta___3 (x-axis) and four distinct subject sequences (y-axis), revealing a split between global collinearity and localized fragmentation. The subjects test_5.fasta___2 (dark blue) and test_4.fasta___3 (yellow) exhibit robust, nearly continuous diagonal alignments starting from the origin, indicating high homology and structural conservation with the query, though they diverge in coordinate space past the 100kb mark due to potential insertion/deletion events. Conversely, test_2.fasta___2 (light blue) and test_1.fasta___2 (orange) display discontinuous, "patchy" alignments clustered specifically around the 20,000 bp and 120,000 bp positions on the query; these high-coordinate matches (especially for the light blue sequence) are suggestive of translocated regions or shared repetitive motifs rather than continuous whole-sequence similarity.

I finally fleshed out one of my python scripts into a package: daisyblast

Instead of pairwise blast results, this is intended to find blast hits that are shared by _n_ queries (like a daisy chain!).

I can create dotplots of multiple samples

21.11.2025 23:41 — 👍 8 🔁 1 💬 1 📌 0

Tumblr: posts a dramatic black-and-white photo of a fin slicing through water with caption “Just circling… thinking about life… and seals…”

18.11.2025 18:01 — 👍 20 🔁 2 💬 0 📌 0

Phangorn's NeighborNet comparing 17 different phylogenetic trees. Tree tips are now web tips.

This is my first time working with NeighborNet (part of phangorn), and I think I'm going to need to do some reading so that 1) this makes sense to me, and 2) I can explain it to my collaborators

28.10.2025 22:34 — 👍 2 🔁 0 💬 0 📌 0

I have finally done it! I finished a rough draft for a manuscript and sent to to co-authors for editing. I may have it submitted by the end of this year!

24.10.2025 16:09 — 👍 3 🔁 0 💬 0 📌 0

Isolates SNP Tree Viewer - Pathogen Detection - NCBI Isolates SNP Tree Viewer

And the cluster link

17.10.2025 17:35 — 👍 1 🔁 0 💬 0 📌 0

A phylogenetic tree of pathogen detection cluster PDS000161821.27, which contains both isolates that initiated the investigation in red with additional isolates.

I had an EPI contact me to see if any isolates were clustering with 2024CK-00004 and 2024CK-00250 for EPI reasons. Pathogen Detection has these two isolates in the same cluster, making this question relatively easy to answer.

17.10.2025 17:35 — 👍 2 🔁 0 💬 1 📌 0

Good idea, but there's more documentation about conda, and a lot of people I work with use the command line less than 5 hours per month.

17.10.2025 15:19 — 👍 0 🔁 0 💬 1 📌 0

I personally think it's because each amino acid change gets its own number

16.10.2025 18:27 — 👍 0 🔁 0 💬 0 📌 0

Stacked bargraph where the x-axis the year-month and the values are the number of bla*{NDM,OXA,VIM,GES,IMP,KPC}* genes identified in sequencing efforts that month. The legend is to the right in order for general matching of AMR gene to value.

I probably need some different colors, but I've made a graph detailing the number of AMR genes we've seen in organisms that we've sequenced for the last few months.

16.10.2025 18:10 — 👍 6 🔁 2 💬 1 📌 0

Estimating the potential economic and health impact of integrated genomic surveillance in a hospital setting Integrated genomic surveillance, combining whole genome sequencing (WGS) of bacterial isolates with patient movement data, promises improved detection…

"WGS-informed prevention could hypothetically generate net savings of €1.35 million annually if transmission was stopped once a clonal isolate was detected in a second patient."

I think that translates to ~1.5 million US dollars

14.10.2025 21:04 — 👍 6 🔁 2 💬 1 📌 0

It'd really shift my way of thinking if it didn't

29.09.2025 17:05 — 👍 0 🔁 0 💬 0 📌 0

A scatter plot comparing the number of reads and the overall mean depth observed in a bam file, which appears to have a linear relationship.

And today's (useless) figure is that the mean depth observed in a bam file is linearly associated with the number of reads in the corresponding fastq files.

29.09.2025 17:04 — 👍 4 🔁 0 💬 1 📌 0

samtools coverage histogram of a SARS-CoV-2 sample with too many reads. The large grey box in the center should have more variation, but in general indicates that each and every base in the reference has a lot of read coverage.

I think you misunderstand. There is a lot of coverage for these samples. So much so that it is hard to see bubbles or other aberrations.

23.09.2025 20:46 — 👍 0 🔁 1 💬 0 📌 0

How is this method simpler than what I attempted?

23.09.2025 19:48 — 👍 1 🔁 0 💬 0 📌 0

All 33 million+ reads were mapped with bbmap (all non-mapped reads were excluded prior to trimming the primers)

23.09.2025 19:38 — 👍 0 🔁 0 💬 0 📌 0

So, in summary, if high coverage samples aren't getting assigned lineages, I recommend subsampling them or adjusting the `samtools mpileup` command.

23.09.2025 18:01 — 👍 2 🔁 0 💬 0 📌 0

I really solved my dilemma (after too many hours trouble shooting it) by adding the `-d 0` flag to `samtools mpileup`, which uses a lot of memory but produced adequate consensus fasta files.

/

23.09.2025 18:01 — 👍 0 🔁 0 💬 1 📌 0

Instead it turns out that `bbmap` was allowing for very large insert sizes, some of which spanned the majority of the genome. These, for whatever reason, were given priority for `samtools mpileup` (which gets piped into `ivar consensus`) /

23.09.2025 18:01 — 👍 1 🔁 0 💬 1 📌 0

The most irritating part is that each of these 18 would generate a consensus that could be used for determining lineage if I subsampled them. I was worried about contamination, but these had all had human reads removed. /

23.09.2025 18:01 — 👍 0 🔁 0 💬 1 📌 0

I had 18 samples (33 million+ reads) of SARS-CoV-2 amplicon-based sequencing that would not create a decent consensus fasta after aligning with `bbmap` and trimming/consensus generation with `ivar`, but would if I tweaked the pipeline a bit.
/

23.09.2025 18:01 — 👍 3 🔁 1 💬 1 📌 0

Posts by Erin Young (@erinyoung.bsky.social)