Getting pretty close now!
Still moving through a big bug squashing phase, but I can see the light at the end.
@psy-fer.bsky.social
Bioinformatician/Genomics Software Engineer @garvaninstitute.bsky.social Views my own. Mastodon @Psy_Fer_@genomic.social, https://genomic.social
Getting pretty close now!
Still moving through a big bug squashing phase, but I can see the light at the end.
Can also cheat and do like zip and stick the index in the file π
10.02.2026 11:35 β π 1 π 0 π¬ 0 π 0Trying really hard not to make many performance changes while just doing a faithful port. it's like 17s for 10k reads vs STAR doing it in like 3s. But i have not really focussed on this at all
10.02.2026 09:34 β π 0 π 0 π¬ 0 π 0Finally, ruSTAR and STAR are actually agreeing on things. haha. Claude is pretty psyched about this too. Onward and upwards
10.02.2026 06:42 β π 1 π 0 π¬ 2 π 0You're not gonna believe this...but ruSTAR has single threaded fastq.gz reading.....
10.02.2026 03:02 β π 2 π 0 π¬ 1 π 0that is cool!
Could this potentially solve the single threaded read bottleneck in tools that read fastq.gz files?
I think most of it is in the seed generation steps. Claude has had a really rough time trying to get this right. It has the method from STAR, reads the code into context, and still really struggles to get it right. I am going to write it myself and then see if it can build on that.
09.02.2026 10:41 β π 0 π 0 π¬ 1 π 0so in the seed finding/expansion phase, it's creating too many soft clippings. weird. this is causing a bunch of other issues. So need to fix this, then go from there.
for 10k reads, STAR runs in 3.5s and ruSTAR runs in ~17s. So there are also quite a lot of things to change for performance.
Claude thinks that if a read maps vs not maps is the be all and end of all testing π
I have told it that it needs to consider WHERE a read maps too. It is overjoyed at this insight, and thinks i'm brilliant (i'm not).
Well, at least we are on the same page that reads should align correctly π
now fixing another bug in seed generation and scoring, where mismatches wouldn't impact the final score after seed stitching. This obviously impacts the alignments. So hopefully this improve things quite a lot. Claude actually found this one this time (after I told it to look at seed scoring)
09.02.2026 08:21 β π 1 π 0 π¬ 1 π 0zooming in...yep, that's about right :)
09.02.2026 08:14 β π 0 π 0 π¬ 0 π 0I am waaaaay off to the side here but near my "people". Cool visualisation
09.02.2026 08:13 β π 2 π 0 π¬ 1 π 0While we've been somewhat successful in spurring adoption of alevin-fry/simpleaf for scRNAseq processing, an impediment Dongze brought to our attention (now working directly w/ many experimentalists) was the need for a nice QC report for it's output; hence QCatch www.biorxiv.org/content/10.1... 1/x
03.01.2026 18:18 β π 26 π 6 π¬ 1 π 0And i'm back at it fixing some critical bugs in the cigar string construction (it's still trying to do like 2S10M12M4M lol). It also started getting integer overflow when casting i32 into u32 on negative scores from gaps. Like...just follow the logic in star ya dummy!
Got a new testing framework now
And tests in your code aren't enough. These things will go round and round on circles chasing their tails over relatively obvious (to us) issues. Like the cigar string in those posts. Obvious to us that it was wrong, yet it didn't pick up on it at all until I pointed it out.
08.02.2026 13:45 β π 1 π 0 π¬ 0 π 0Where are you in the tech sphere? i'm a scientist, in bioinformatics more on the computer science and tool building things.
08.02.2026 10:26 β π 0 π 0 π¬ 1 π 0I'm a bioinformatician that builds these kinds of tools for work, so i know what i'm doing, and ooooh boy, someone who didn't know this stuff would have no chance. LLMs really couldn't do this withough me
08.02.2026 10:25 β π 1 π 0 π¬ 0 π 0Took a break today. Played some music on my viola, played with my cats and played some video games with my mates. I'll get back into it tomorrow
08.02.2026 04:00 β π 1 π 0 π¬ 1 π 0People that buy those places probably aren't living in them. They rent them out to make someone else pay off their asset, and then gaslight them into thinking they are bad with money. A place near me went for like 3.5M and it's tiny and not even that nice. Housing market makes zero sense.
08.02.2026 00:37 β π 1 π 0 π¬ 0 π 0let results = vec![] // TODO: build the results vector
it's always the "hard" thing too, that actually gives you results. Bloody lazy clankers
holy malloc batman
yea gonna have to fix that π
So after burning a an insane amount of tokens, we finally have something close to STAR in output
Still a lot of work to go in making sure the logic is actually sound. No matter how many times I tell it to match the logic from STAR, it goes off on some tangent and just doesn't do that.
basically this
07.02.2026 09:54 β π 1 π 0 π¬ 0 π 0Yea LLMs absolutely speed up development. I still don't think the quality is anywhere near as good as if you or I did it manually. In my other tools I like to ask the LLMs of I wrote something or if it wrote it (trick question, I wrote all of it) and it thinks it wrote most of it π
07.02.2026 09:44 β π 1 π 0 π¬ 1 π 0It can read docs. But it is kinda dumb about it too. It took a loooong time to get the syntax of the noodles library right.
In one of my tools I have cigar parsers. They aren't that hard to write, there are only so many op codes.
here you go, done with loc
07.02.2026 09:36 β π 1 π 0 π¬ 1 π 0It even has a bunch of tests, but the issue is getting the logic right. So if it tried to do TDD, then it would still get the logic for the tests wrong, and then also write the code wrong π
Let me check where i'm at with the LOC
This is even more reason why LLMs are dumb as rocks. It will see that kind of CIGAR string, write 500 lines of code using it, testing it, and have no idea why it's parsing logic isn't working and creating an edge case format bug.
07.02.2026 09:22 β π 2 π 0 π¬ 2 π 0me: yea pretty sure the CIGAR string construction is broken. Cigar strings don't go 5S10M12M10M...
claude: i'm a double dumbass
me: yea...
claude: The problem is that for reverse-strand alignments, we're comparing the FORWARD read sequence to the genome, when we should be comparing the REVERSE-COMPLEMENTED read sequence!
me: I literally said that like 2 context windows ago wtf
claude: oh yea, it was last on my "to check" list