Ragnar looking-for-a-postdoc {GK}'s Avatar

Ragnar looking-for-a-postdoc {GK}

@curiouscoding.nl.bsky.social

PhD on high troughput bioinformatics @ ETH Zurich; IMO, ICPC, Xoogler, Rust, road-cycling, hiking, wild camping, photography

872 Followers  |  96 Following  |  784 Posts  |  Joined: 15.09.2023  |  1.949

Latest posts by curiouscoding.nl on Bluesky

First grant proposal submitted! 🀞

Life resumes with a backlog of papers to review and making my thesis pretty for printing ... and then another grant to write.

(this is never going to stop anymore, is it...)

05.08.2025 13:31 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Classic Sesame Street - Saxophone Factory
YouTube video by tpirman1982 Classic Sesame Street - Saxophone Factory

I see your house and offer you a sax factory

youtu.be/TDlvmvpMaZ0?...

05.08.2025 03:29 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 2    πŸ“Œ 0

Anyway, if I were to ban openai from scraping my blog, then I would lose half my impact.

(Note that stats exclude 35k requests without user agent nor referring site. A bulk of that is other bots, but hard to say what else.)

04.08.2025 23:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Also: perplexity is 10x less, and claude next to nothing.

But tbh I don't understand why chatgpt is making these requests in the first place; they should just cache the internet locally. They probably already do since they must have a search backend containing my blog already anyway.

04.08.2025 23:46 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Number of requests to blog pages (exluding statis assets and such) by bot user agents, LLM user user agents, and referring sites.

Google scraped 5800 times and gave 5900 clicks. ChatGPT-User made 8700 requests. but only resulted in 38 clicks.

Number of requests to blog pages (exluding statis assets and such) by bot user agents, LLM user user agents, and referring sites. Google scraped 5800 times and gave 5900 clicks. ChatGPT-User made 8700 requests. but only resulted in 38 clicks.

I have such mixed feelings on this. On the one hand I blog to spread knowledge, but on the other hand attribution is nice ...

Either way, here's some statistics on server logs of the past 3 months. Basically, Google is by far the biggest referrer, but ChatGPT-User requests are more 1/

04.08.2025 23:44 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

A final thought for now:
In the u32 hash case and with reduced blocks, the number of hash collisions I get for all 3 versions of 1/2/3 rotated blocks is very similar to gxhash, whether k<32 or k>32.

Generally, it feels like 'random' hash collisions are more likely than specific other examples.

04.08.2025 23:25 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

ohh man; a simply wrong result is bad (but expected in this context), but this seems like it may be much darker than that - blog.computationalcomplexity.org/2025/08/some...

04.08.2025 21:08 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

So far I'm not able to reproduce this with random data and u16 or u32 hashes.

How many kmers are in the read sets that you test on?
Otherwise I should adjust to testing on real data as well, with more interesting collections of kmers.

04.08.2025 19:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Ok, I'm gonna run some experiments then. Gotta figure this out πŸ˜†

04.08.2025 16:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Mostly I wonder why 64 is still the threshold in the split setting, since neither of the rotations has length 64. It all feels very weird. I'd normally blame a bug in the code, but that can't be since rolling hash bugs would break everything very quickly.

04.08.2025 15:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Specificaly, I'm assuming you never actually ran with k>1023, so it's curious that going from period 1023 to period 855855 makes a difference at all.

I have a post on hash collisions in plain nthash v1 (no splitting), and there you already get collisions at k=23.
2/
curiouscoding.nl/posts/nthash/

04.08.2025 15:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Hmmm, so just to confirm: you were already using 2 splits before, and that gives 10k-100k collisions starting at k>=65.
And then going to the 3-way split variant that goes down to 10-100 collisions, and with the 6-way split it's 4.

Are all collisions always for k>64 in all three cases?
1/

04.08.2025 15:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Hype! More blogs on bioinformatics!

04.08.2025 15:22 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Reading papers on the balcony, with a backdrop of meteor shower

03.08.2025 23:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This way, if you use it more, it will slowly get faster, while you don't waste time optimizing things you end up not using.

03.08.2025 20:47 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Mathematically speaking, I think you get a (multiplicative) constant approximation to optimal efficiency by spending a constant fraction of the time you wait on optimizing the thing you're waiting for.

Ie: whenever something takes a day of running, spend at least 1/4 of the day making it faster.

03.08.2025 20:46 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

~Theoretical~ Practical Computer Science

03.08.2025 20:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
MSCA Green Charter The MSCA is committed to tackling climate and environmental-related challenges. MSCA funding promotes the sustainable implementation of research activities.

This is actually one of the evaluation criteria of the EU MSCA grants!

> "Code of good practice"

But in the end, for most labs dev time is probably way more expensive than compute?

marie-sklodowska-curie-actions.ec.europa.eu/about-msca/m...

03.08.2025 19:46 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I talk a lot about Rust for building high-perf (& even non-perf critical) software, & scientific software in particular. I often discuss what's interesting to me, but wanted to offer the chance to those interested for me to answer their questions about Rust in science. Fire away with questions!🧬πŸ–₯️

03.08.2025 18:12 β€” πŸ‘ 19    πŸ” 11    πŸ’¬ 4    πŸ“Œ 1
Aerial shot of Gaza City, burned and bombed into burned and shattered, unliveable wreckage as far as can be seen.

Aerial shot of Gaza City, burned and bombed into burned and shattered, unliveable wreckage as far as can be seen.

Whole area that used to be houses, shops, public transport routes, made into rubble and blackened, unliveable ruins. Aerial shot shows vast area entirely obliterated as habitable, attacked again and again by Israeli forces.

Whole area that used to be houses, shops, public transport routes, made into rubble and blackened, unliveable ruins. Aerial shot shows vast area entirely obliterated as habitable, attacked again and again by Israeli forces.

Several wrecked Gaza schools, shown from the air, pulverised by Israel's forces: windows all gaping, bombed out and covered in ash and dust. Some people are trying to live in makeshift camps amid the destruction, in former quadrangles and bomb sites.

Several wrecked Gaza schools, shown from the air, pulverised by Israel's forces: windows all gaping, bombed out and covered in ash and dust. Some people are trying to live in makeshift camps amid the destruction, in former quadrangles and bomb sites.

Aerial shots of neighbourhoods with destroyed apartment buildings, houses, offices and shops, all bombed by Israel's government. Starving people are trying to live in tents and makeshift cubbies on the former roundabouts on a main road.

Aerial shots of neighbourhoods with destroyed apartment buildings, houses, offices and shops, all bombed by Israel's government. Starving people are trying to live in tents and makeshift cubbies on the former roundabouts on a main road.

This what Israel has done to Gaza. Forbidden pics by a Washington Post photographer whose Jordanian air crew didn't pass on Israel's order to only photograph aid being dropped. City, suburbs, schools, people: bombed to rubble. People being starved in makeshift camps on road roundabouts. #Genocide

03.08.2025 04:31 β€” πŸ‘ 539    πŸ” 394    πŸ’¬ 15    πŸ“Œ 16

first BWT paper: no DOI
first wavelet tree paper: broken DOI

sad; these should be backfilled somehow

dx.doi.org/10.1145/6441...

02.08.2025 23:25 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Like, yes I'm a methods guy and I want to know how your method works. But please first tell me what your algorithm is computing, rather than a spelled out pseudocode.

02.08.2025 17:31 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Still in favour of a mandatory 'problem statement' paragraph at the start of every paper and readme, that clearly states expected input and output.

02.08.2025 17:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

If you read this: go to the readme of your most popular software, and check that the first line is more specific than 'X is a tool to search large sequence sets'.

Otherwise, you've just narrowed yourself down to 99% of bioinf software and I still have no idea what problem you're solving...

02.08.2025 17:26 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1
BY THE NUMBERS 
The Gaza Humanitarian Foundation (GHF) continued its operations today to provide vital food aid for the Palestinian people in Gaza. Below is an update on today’s operations:
Distributed 21,600 boxes of aid today across three distribution sites: 
Location	Truckloads	Boxes	Meals
SDS2 (Saudi Neighborhood)	10	8,640	554,400
SDS3 (Khan Younis)	11	9,504	609,840
SDS4 (Wadi Gaza)	4	3,456	221,760
TOTAL	25	21,600	1,386,000
Approximately 99,949,482 meals distributed to date via roughly 1,662,040 boxes.

BY THE NUMBERS The Gaza Humanitarian Foundation (GHF) continued its operations today to provide vital food aid for the Palestinian people in Gaza. Below is an update on today’s operations: Distributed 21,600 boxes of aid today across three distribution sites: Location Truckloads Boxes Meals SDS2 (Saudi Neighborhood) 10 8,640 554,400 SDS3 (Khan Younis) 11 9,504 609,840 SDS4 (Wadi Gaza) 4 3,456 221,760 TOTAL 25 21,600 1,386,000 Approximately 99,949,482 meals distributed to date via roughly 1,662,040 boxes.

GHF brags about delivering 1.3M meals per day. Mathematicians have a moral responsibility to speak out about the fact that 1.3M <<< 6.6M

02.08.2025 09:18 β€” πŸ‘ 6    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
The History of the Panmictic Population Concept and Its Legacy in Contemporary Population Genetics ABSTRACT The panmictic population concept is at the heart of population, evolutionary and conservation genetics. However, in nature, true panmictic populations are vanishingly rare. As an idea conce...

Really excited to share the first paper from my PhD - it’s all about assumptions in modelling and the history of early population genetics… 🧡

doi.org/10.1111/ahg....

01.08.2025 22:47 β€” πŸ‘ 35    πŸ” 12    πŸ’¬ 1    πŸ“Œ 0

Pinging your local cloudfare/google edge node has lower latency than reading from a spinning disk HDD 🀯

01.08.2025 20:11 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Conferences: where you become friends with all the cool people, so that they then ask you to review for their journal πŸ€”

01.08.2025 10:19 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
The mathematics of starvation: how Israel caused a famine in Gaza Israel controls the flow of food into Gaza. It has calculated how many calories Palestinians need to stay alive. Its data shows only a fraction has been allowed in

Mathematics is everywhere

01.08.2025 07:39 β€” πŸ‘ 6    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
Using β€˜Slop Forensics’ to Determine Model Ancestry Yesterday, after playing with some smaller models, I started to experiment with the idea of a flowchart for determining a model’s ancestry with a few prompts. For example, could you ask it about state...

Interesting attempt to infer a phylogenetic tree of llms www.dbreunig.com/2025/05/30/u...

01.08.2025 01:00 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

@curiouscoding.nl is following 20 prominent accounts