PSA: dbGaP authorized access/run selectors seem to be not working. I assume it's the shutdown - either they turned the lights off, or something's busted and no one is there to fix it
07.10.2025 20:13 β π 0 π 0 π¬ 0 π 0@chrismiller.science.bsky.social
I study cancer at Washington University in St Louis. Cancer Genomics, Bioinformatics, Data Viz, Tumor Evolution, AML, Immunotherapy, Irreverent humor 𧬠π₯οΈ mostly @chrisamiller on other platforms
PSA: dbGaP authorized access/run selectors seem to be not working. I assume it's the shutdown - either they turned the lights off, or something's busted and no one is there to fix it
07.10.2025 20:13 β π 0 π 0 π¬ 0 π 0These 3-L bottles contain one million tiny colored spheres each.
One sphere is black (1 ppm).
Finding the black sphere is comparable to detecting a protein present at ~ 6,000 copies in the proteome of a human cell.
Quantifying the protein requires analyzing multiple jars.
I can't ever see the MTHFR gene symbol without doing a double take
07.10.2025 19:10 β π 3 π 0 π¬ 0 π 0I have been trying to get this published as an op-ed, but I am going to post it here since I think it is timely in light of the "consent" extortion events.
Deafening Quiet from the Scientific Establishment
jeremymberg.github.io/jeremyberg.g...
1/14
The lessons here:
1) Many gene names are stupid.
2) Edge cases may be rare, but they often matter. (TP53 is a key cancer gene that wouldn't be accessible without some special accommodations here).
3) As always, check your assumptions!
(fin)
For our little internal app, this probably won't matter much, and I will either set the number of records to 200 (because we generate almost no traffic) or might code up something that dynamically decides how many queries to return, based on which genes are in the input data. (8/n)
06.10.2025 18:54 β π 0 π 0 π¬ 1 π 0Plot showing how many records need to be returned to ensure that each completely typed gene will be in the list.
For those who are interested, the plot showing cumulative percentage of human HUGO gene names (from ensembl protein-coding genes) covered by a set number of records looks like this. So 8 results covers 99% of genes, 34 results covers 99.9% of genes, and it takes 199 to cover everything. (7/n)
06.10.2025 18:54 β π 0 π 0 π¬ 1 π 0So in order to guarantee that we'll get "AR" in the list, the value should be 200 records, which seems excessive. My instinctual guess of 30 wasn't bad, and covers 99.89% of gene names, but that's not all of them! (6/n)
06.10.2025 18:54 β π 0 π 0 π¬ 1 π 0199 AR 120 PC 100 KL 78 ZNF7 67 ZNF2 67 CS 58 CP 58 ADA 57 SI 55 ZNF3 52 TH 51 C2 43 MAG 42 ZNF8 42 TNF 41 GPR1 37 DEFB1 36 USP1 36 GAL 34 PLEK 31 MET
It introduces a new question, though - this failed on TP53 with 10 results, so how many results need to be returned to handle all genes correctly? A few seconds of bash/grep later, I get the following list of 21 genes that will still fail. (5/n)
06.10.2025 18:54 β π 0 π 0 π¬ 1 π 0After some digging, it turns out that mygene.info has a default max of 10 records returned for each query, and the first 10 hits include genes like "TP53TGS", "TP53TG3F", "TP53RK-DT", but not "TP53" itself. Adding "&size=30" to the query allows it to return 30 hits, which solved this problem (4/n)
06.10.2025 18:54 β π 0 π 0 π¬ 1 π 0But when I manually tried the query string - something like mygene.info/v3/query?spe... - TP53 didn't appear in the returned json - I know that's not right! (3/n)
06.10.2025 18:54 β π 0 π 0 π¬ 1 π 0Digging through the backend code, I found that the tool was storing the ENSG id as the key (sensibly!), and then using an API call to mygene.info to match them up to gene names as they are typed. Seems fine... (2/n)
06.10.2025 18:54 β π 0 π 0 π¬ 1 π 0Today's little mystery involving started simply enough: I was hacking on a web tool that autocompletes gene names, and was surprised when searching for "TP53" didn't return that gene. I checked, and it was definitely in the input data, so I was left scratching my head (1/n)
06.10.2025 18:54 β π 2 π 0 π¬ 1 π 0How do we get @ensembl.org l the infrastructure they need to not be unresponsive like three times a week? Like can we pass the hat? I'm in for twenty bucks.
06.10.2025 16:23 β π 0 π 0 π¬ 0 π 0Immigrants, particularly on H1Bs, are the lifeblood of American innovation. If you wanted to hurt US competitiveness in the next century, I can think of few more effective ways than a move like this
Even when found illegal, the mere intent will have irreparably harmed our future
Critical part of the President's new $100,000 charge for H1-B visas: The Administration can also offer a $100,000 discount to any person, company, or industry that it wants. Replacing rules with arbitrary discretion.
Want visas? You know who to call and who to flatter.
When you want to do reproducible analysis in R, some packages require you to set a RNG seed. I'm not sure I trust anyone who doesn't immediately run `set.seed(42)`
18.09.2025 19:47 β π 2 π 2 π¬ 1 π 0Zstandard's --long range mode works wonders for assemblies, but needs uninterrupted single line sequences.
*AllTheBacteria 661k, multiline fasta*
gzip (pigz): 751GB
zstandard --long: 641GB (30% original size)
*Single line fasta*
gzip (pigz): 700GB
zstandard --long: 232GB (10% original size)
For sure. Hybrid meetings are the worst of both worlds
03.09.2025 00:51 β π 2 π 0 π¬ 0 π 0Everyone has a second full time job being mad at the government now
03.09.2025 00:45 β π 6582 π 957 π¬ 59 π 63Good news. The House of Representatives stands behind the NIH budget with no cuts.
02.09.2025 00:53 β π 1067 π 145 π¬ 22 π 6These are the words of a lunatic who does not belong in government, much less as our nation's top health official.
It is dangerous to allow him to oversee ALL federal health research and public health infrastructure. It is never too late to do the right thing. Fire RFK Jr.
π’π¨π’π¨ Genome Informatics deadline extended to September 8! meetings.cshl.edu/meetings.asp....
Please spread the word. If you are like me and had at least one abstract that wasn't quite ready by last week's deadline, you get another swing. See you there!
The Genome Informatics conference (@ Cold Spring Harbor Lab, Nov 5 - 8) abstract deadline is **today**. We welcome your submissions! Topics include:
- PanGenomes
- Genome Assembly & Seq. Algos.
- Algorithmic Evo. Bio
- Single Cell & Spatial Omics
- Microbial Genomics
- AI/ML & Integrative Omics
πππ
The Journal of Open Source Software (@joss-openjournals.bsky.socialβ¬) is looking for editors. Come join our team!
I've only been an editor for a few months, but I love working with JOSS. Our peer review process is actually collaborative, and we're Diamond Open Access.
#AcademicSky π§ͺ βοΈ #CompChem
In my small corner of biology, Iβm constantly reassured by how frequently we confirm main findings. And then when differences do show up, they very rarely end up being fraud, but instead tell us something important about differences in experimental design.
08.08.2025 03:37 β π 160 π 45 π¬ 2 π 3Only 35% of the dictionary? Yes, it might require some tortile posts from us nonathletes to get it done, but we're no tambourinists, we can do this!
www.avibagla.com/blueskydicti...
"It gives one that too-familiar mixture of rage, despair, and embarrassment that is peculiar to the Trump era, and for which the English language does not have an appropriate word."
06.08.2025 18:21 β π 2 π 0 π¬ 0 π 0As stressful as it is to run a laboratory in the US right now, my thoughts are often with the NIH staff living in the awful chaos trying to preserve the whole enterprise. Iβm grateful for every moment they choose not to give up. These people are the only thing keeping it from entirely collapsing.
31.07.2025 00:15 β π 164 π 44 π¬ 3 π 3Photo of president Harry S. Truman laying the cornerstone of the NIH clinical center.
President Harry S. Truman laid the cornerstone in 1951, saying of the Clinical Center's future work: "Medical care is for the people and not just for the doctors and the rich." He mentioned that 75 million Americans then without health insurance would soon become a "medically indigent class" and he challenged the scientific community to "translate the new knowledge gained by research into better care for more people." "Research to prevent disease" was a better investment for federal dollars than "providing unlimited hospitalization to treat it."
In 1951, Harry S. Truman laid the cornerstone of the NIH clinical center, stating, "Research to prevent disease" was a better investment for federal dollars than "providing unlimited hospitalization to treat it."
30.07.2025 01:33 β π 28 π 11 π¬ 1 π 0