โบ๏ธ
03.08.2025 09:04 โ ๐ 2 ๐ 1 ๐ฌ 0 ๐ 0@hasindu2008.bsky.social
Lecturer at UNSW Sydney; Visiting Scientist at Garvan Institute of Medical Research - Designing embedded systems for bioinformatics applications.
โบ๏ธ
03.08.2025 09:04 โ ๐ 2 ๐ 1 ๐ฌ 0 ๐ 0Minimod preprint by @sunethsa.bsky.social is out
biorxiv.org/content/10.1...
-similar accuracy to modkit & pb-CpG-tools.
-standard open-source licenses (NOT vendor-specific)
-Simple but faster, on a laptop ~4X for DNA and ~55X for RNA.
Code: github.com/warp9seq/min...
@bonson-wong.bsky.social
21.07.2025 09:05 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0If you are at #ISMB2025:
@bosc.bsky.social track around 2:30pm ish after
@sunethsa.bsky.social's talk, Bonson Wong will present on
nanopore basecalling on AMD GPUs using slorado
github.com/BonsonW/slor...
If you are at #ISMB2025: Go to the
@bosc.bsky.social around 2:30pm ish where
@sunethsa.bsky.social will present real-time @nanoporetech.com frequency calculation using realfreq & standalone frequency calculation using minimod.
academic.oup.com/bioinformati...
For reference based frequency finding, thought taking the bases that only match the ref could be a better choice. But yes, such a warning is indeed something that would be valuable.
Thank you very much for the suggestion
I get these then,
6754
6751
6769
6760
6756
Which seem to match the expected, assuming you are using 1-based coordinates
pass the option --insertions, so that the CIGAR is parsed for inserted bases
18.07.2025 05:50 โ ๐ 2 ๐ 1 ๐ฌ 2 ๐ 0human chrM?
echo -e "@SQ\tSN:chrM\tLN:16569" > a.sam
echo -e "read1\t16\tchrM\t6749\t60\t4S12M1I3M2D7M1S\t*\t0\t0\tGGCTCATTAATCTCAATAACAGCCGTAA\t*\tMM:Z:A+a.,0,0,0,0,0;\tML:B:C,255,255,255,255,255" >> a.sam
samtools sort a.sam -o a.bam && samtools index a.bam
./minimod view -c a[A] hg38.fa a.bam
We've been developing a small standalone tool for viewing & calculating frequency from modification tags in BAM files. This call is for brave users to test.
github.com/warp9seq/min...
written by
@sunethsa.bsky.social
in C, based on mod tag parsing we did for realfreq doi.org/10.1093/bioi...
The truly open solution is the technicallu better one here (SLOW5). Even if it was not, there would be strong reasons to prefer it. I hope the community rejects closed or strangely licensed basic tools, not just POD5, but also pseudo-open offerings like CellRanger. Good alternatives exist!
04.07.2025 21:08 โ ๐ 9 ๐ 6 ๐ฌ 2 ๐ 0blue-crab v0.4.0 has been released
- yet another end_reason added to support pod5 updates.
To convert POD5<=>S/BLOW5 it's as simple as
pip install blue-crab
pod5->blow5
blue-crab p2s example.pod5 -o example.blow5
blow5->pod5
blue-crab s2p example.blow5 -o example.pod5
github.com/Psy-Fer/blue...
Sorry haha
Tagged the wrong person ๐
Meant to be @psy-fer.bsky.social
Will give a try again this time. Our previous attempts to get some examples or more information were not too successful :/
06.07.2025 09:09 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Will have a look and see, thanks
06.07.2025 09:03 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0But thanks for the suggestion. Worth checking this out nevertheless. I am not familiar wth Python, so when @psyche.social is back, something to try.
06.07.2025 08:57 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0When I last checked with ONT, they said that pod5 writing is done in high performance C++. So we thought adding a pod5 python benchmark in python could be seen as a deliberate attempt to look it slow - so never pursued.
06.07.2025 08:55 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0The specification is here for S/BLOW5 is that is what's after
hasindu2008.github.io/slow5specs/s...
The point is that S/BLOW5 uses primitive ASCII or binary without an intermediate format, where POD5 uses the intermediate Apache IPC (it is a format with a spec, not a standard I think).
Conversion can be done live, so this time is hidden.
06.07.2025 08:38 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0How many flowcells is this GargantION going to have?
06.07.2025 08:35 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Bulk sequencing file is a bulk-fast5?
That's Python library unfortunately, something close as C is needed for file a proper format benchmark - otherwise end up comparing 2 languages.
I tried to get some info on pod5 writing from ONT devs, but the responses were not so helpful.
Possibly to keep things under the company's control? I am not sure either why.
06.07.2025 04:23 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Now sure I am understanding you correctly, Ascii at least is a iso standard isn't it?
06.07.2025 04:17 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Pod5 reading code from ONT is available from Dorado in c/c++, if you are aware of where to find the minknows pod5 writing code in c/c++ let us know.
06.07.2025 04:06 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0The sad thing is the community also seem to go with the path of least immediate resistance. Unfortunately, i also don't know how to navigate these non technical issues.
06.07.2025 03:09 โ ๐ 2 ๐ 1 ๐ฌ 1 ๐ 0Also I know quite a few projects that wanted to use slow5, but ONT has influenced them not to use it.
They even influence some projects from sharing data in both formats, like pod5 only it should be.
I will have to write at least 100 replies to tell just the ones that I still remember, but latest is that ONT want us not to provide data to others in non-pod5 formats, stating baseless reasons.
06.07.2025 03:01 โ ๐ 3 ๐ 1 ๐ฌ 1 ๐ 0This shows the technicality (previously known) that using mmap for reading large files with predictable access pattern which the programmer can well know in advance than what the operating system can guess -> can lead to unpredicable performance outcomes.
05.07.2025 04:26 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0Key observations
- sequential access: BLOW5 ~7X faster on academic HPC we tested. Similar performance on desktops with single SSD drives.
- random access: BLOW5 is always significantly fast (sometimes 100X)
- size: similar if same compression
- Dependencies: BLOW5 ~3, POD5 >50
For many of those who were asking on BLOW5 vs POD5 for nanopore signal data, here is a finally detailed benchmark we did:
biorxiv.org/content/10.1...
Summary: performance of BLOW5 is >= POD5 (from ~= to 100X, see below), with benefit of having ~3 dependencies instead of >50.