First grant proposal submitted! π€
Life resumes with a backlog of papers to review and making my thesis pretty for printing ... and then another grant to write.
(this is never going to stop anymore, is it...)
@curiouscoding.nl.bsky.social
PhD on high troughput bioinformatics @ ETH Zurich; IMO, ICPC, Xoogler, Rust, road-cycling, hiking, wild camping, photography
First grant proposal submitted! π€
Life resumes with a backlog of papers to review and making my thesis pretty for printing ... and then another grant to write.
(this is never going to stop anymore, is it...)
I see your house and offer you a sax factory
youtu.be/TDlvmvpMaZ0?...
Anyway, if I were to ban openai from scraping my blog, then I would lose half my impact.
(Note that stats exclude 35k requests without user agent nor referring site. A bulk of that is other bots, but hard to say what else.)
Also: perplexity is 10x less, and claude next to nothing.
But tbh I don't understand why chatgpt is making these requests in the first place; they should just cache the internet locally. They probably already do since they must have a search backend containing my blog already anyway.
Number of requests to blog pages (exluding statis assets and such) by bot user agents, LLM user user agents, and referring sites. Google scraped 5800 times and gave 5900 clicks. ChatGPT-User made 8700 requests. but only resulted in 38 clicks.
I have such mixed feelings on this. On the one hand I blog to spread knowledge, but on the other hand attribution is nice ...
Either way, here's some statistics on server logs of the past 3 months. Basically, Google is by far the biggest referrer, but ChatGPT-User requests are more 1/
A final thought for now:
In the u32 hash case and with reduced blocks, the number of hash collisions I get for all 3 versions of 1/2/3 rotated blocks is very similar to gxhash, whether k<32 or k>32.
Generally, it feels like 'random' hash collisions are more likely than specific other examples.
ohh man; a simply wrong result is bad (but expected in this context), but this seems like it may be much darker than that - blog.computationalcomplexity.org/2025/08/some...
04.08.2025 21:08 β π 6 π 1 π¬ 0 π 0So far I'm not able to reproduce this with random data and u16 or u32 hashes.
How many kmers are in the read sets that you test on?
Otherwise I should adjust to testing on real data as well, with more interesting collections of kmers.
Ok, I'm gonna run some experiments then. Gotta figure this out π
04.08.2025 16:59 β π 0 π 0 π¬ 1 π 0Mostly I wonder why 64 is still the threshold in the split setting, since neither of the rotations has length 64. It all feels very weird. I'd normally blame a bug in the code, but that can't be since rolling hash bugs would break everything very quickly.
04.08.2025 15:43 β π 0 π 0 π¬ 1 π 0Specificaly, I'm assuming you never actually ran with k>1023, so it's curious that going from period 1023 to period 855855 makes a difference at all.
I have a post on hash collisions in plain nthash v1 (no splitting), and there you already get collisions at k=23.
2/
curiouscoding.nl/posts/nthash/
Hmmm, so just to confirm: you were already using 2 splits before, and that gives 10k-100k collisions starting at k>=65.
And then going to the 3-way split variant that goes down to 10-100 collisions, and with the 6-way split it's 4.
Are all collisions always for k>64 in all three cases?
1/
Hype! More blogs on bioinformatics!
04.08.2025 15:22 β π 4 π 0 π¬ 0 π 0Reading papers on the balcony, with a backdrop of meteor shower
03.08.2025 23:06 β π 0 π 0 π¬ 0 π 0This way, if you use it more, it will slowly get faster, while you don't waste time optimizing things you end up not using.
03.08.2025 20:47 β π 0 π 0 π¬ 0 π 0Mathematically speaking, I think you get a (multiplicative) constant approximation to optimal efficiency by spending a constant fraction of the time you wait on optimizing the thing you're waiting for.
Ie: whenever something takes a day of running, spend at least 1/4 of the day making it faster.
~Theoretical~ Practical Computer Science
03.08.2025 20:28 β π 1 π 0 π¬ 0 π 0This is actually one of the evaluation criteria of the EU MSCA grants!
> "Code of good practice"
But in the end, for most labs dev time is probably way more expensive than compute?
marie-sklodowska-curie-actions.ec.europa.eu/about-msca/m...
I talk a lot about Rust for building high-perf (& even non-perf critical) software, & scientific software in particular. I often discuss what's interesting to me, but wanted to offer the chance to those interested for me to answer their questions about Rust in science. Fire away with questions!π§¬π₯οΈ
03.08.2025 18:12 β π 19 π 11 π¬ 4 π 1Aerial shot of Gaza City, burned and bombed into burned and shattered, unliveable wreckage as far as can be seen.
Whole area that used to be houses, shops, public transport routes, made into rubble and blackened, unliveable ruins. Aerial shot shows vast area entirely obliterated as habitable, attacked again and again by Israeli forces.
Several wrecked Gaza schools, shown from the air, pulverised by Israel's forces: windows all gaping, bombed out and covered in ash and dust. Some people are trying to live in makeshift camps amid the destruction, in former quadrangles and bomb sites.
Aerial shots of neighbourhoods with destroyed apartment buildings, houses, offices and shops, all bombed by Israel's government. Starving people are trying to live in tents and makeshift cubbies on the former roundabouts on a main road.
This what Israel has done to Gaza. Forbidden pics by a Washington Post photographer whose Jordanian air crew didn't pass on Israel's order to only photograph aid being dropped. City, suburbs, schools, people: bombed to rubble. People being starved in makeshift camps on road roundabouts. #Genocide
03.08.2025 04:31 β π 539 π 394 π¬ 15 π 16first BWT paper: no DOI
first wavelet tree paper: broken DOI
sad; these should be backfilled somehow
dx.doi.org/10.1145/6441...
Like, yes I'm a methods guy and I want to know how your method works. But please first tell me what your algorithm is computing, rather than a spelled out pseudocode.
02.08.2025 17:31 β π 0 π 0 π¬ 0 π 0Still in favour of a mandatory 'problem statement' paragraph at the start of every paper and readme, that clearly states expected input and output.
02.08.2025 17:27 β π 1 π 0 π¬ 1 π 0If you read this: go to the readme of your most popular software, and check that the first line is more specific than 'X is a tool to search large sequence sets'.
Otherwise, you've just narrowed yourself down to 99% of bioinf software and I still have no idea what problem you're solving...
BY THE NUMBERS The Gaza Humanitarian Foundation (GHF) continued its operations today to provide vital food aid for the Palestinian people in Gaza. Below is an update on todayβs operations: Distributed 21,600 boxes of aid today across three distribution sites: Location Truckloads Boxes Meals SDS2 (Saudi Neighborhood) 10 8,640 554,400 SDS3 (Khan Younis) 11 9,504 609,840 SDS4 (Wadi Gaza) 4 3,456 221,760 TOTAL 25 21,600 1,386,000 Approximately 99,949,482 meals distributed to date via roughly 1,662,040 boxes.
GHF brags about delivering 1.3M meals per day. Mathematicians have a moral responsibility to speak out about the fact that 1.3M <<< 6.6M
02.08.2025 09:18 β π 6 π 3 π¬ 1 π 0Really excited to share the first paper from my PhD - itβs all about assumptions in modelling and the history of early population geneticsβ¦ π§΅
doi.org/10.1111/ahg....
Pinging your local cloudfare/google edge node has lower latency than reading from a spinning disk HDD π€―
01.08.2025 20:11 β π 1 π 0 π¬ 0 π 0Conferences: where you become friends with all the cool people, so that they then ask you to review for their journal π€
01.08.2025 10:19 β π 2 π 0 π¬ 0 π 0Interesting attempt to infer a phylogenetic tree of llms www.dbreunig.com/2025/05/30/u...
01.08.2025 01:00 β π 2 π 1 π¬ 0 π 0