#statstab #317 Tests for Pairwise Mean Differences in R
by @timmastny.bsky.social
Thoughts: A nice illustration of what {emmeans} is doing when computing contrasts using various functions.
#emmeans #r #pairwise #ttest #tutorial #dataviz
timmastny.com/blog/tests-p...
Word is that it's just a wrapper around Claude 3.7! www.reddit.com/r/LocalLLaMA...
My friend @timmastny.bsky.social made a game for the NES and asked me to write music. We made both the game and music under the constraints of the NES.
Check out the game on tmastny.itch.io/dead-rise
And the soundtrack on all streaming platforms!
Spotify link: open.spotify.com/album/2NDjba...
This is called false sharing - when CPUs coordinate over variables they don't actually share.
In this case the penalty was small, but in other cases it can cause 5%, 25%, or even 8000% performance degradation. If you want to learn more, check out my post!
timmastny.com/blog/false-s...
The cache line is the smallest unit of data unique to each CPU cache, commonly 64 or 128 bytes. If a CPU wants to modify a variable on a cache line, it must coordinate with all other CPUs to invalidate their copy, even if they some of the variables on that line are not shared.
Variables `a` and `b` are not shared between the two threads: thread 0 uses a and thread 1 uses b. But since they are initialized next to each other in the program on the left, they share the same cache line.
The program on the right is 0.7% faster measured over millions of iterations. Why?
Which program is faster? 1/
Check out problem 2: sirupsen.com/napkin/probl...
And my napkin math: timmastny.com/projects/nap...
It’s also random for main memory, but we only need to load one new cache line from main memory per page: 50 * 50 ns = 2.5 us. The prefetcher should ensure the sequential read of the rest of the page is fast.
Even if we have to fetch more cache lines, it’s much faster than 1ms!
Solution: I didn’t consider that each page access is random! A random SSD of 8KB is 100 us, so 101 us ~ 100 us for 16KB, 1 ms for 10 pages.
When doing random reads, SSDs are 100x slower, not 10x!
Total:
50 pages * .25 us / page = 12.5 us
10 pages * 2 us / page = 20 us
12.5 us + 20 us = 32.5 us
Seems reasonable: SSD is 10x slower than main memory.
50 pages / query * 80% of pages in cache = 40 pages in cache, 10 on disk.
Sequential SSD read: 1 us / 8KB * 16KB / page = 2 us / page.
Page scan time: 1 ns / 64 bytes * 16KB / page = 250 ns / page = .25 us / page
Problem 2 from the @sirupsen.bsky.social's napkin math newsletter:
Your SSD-backed database has a usage-pattern that rewards you with a 80% page-cache hit-rate. The median is 50 distinct disk pages for a query to gather its query results. What is the expected average query time from your database?
The blue line in the first plot is our theoretical model and the black points are average search times on my implementation of B+ tree search. Our theoretical model agrees very closely with our benchmark!
Check out my blog post for more details: timmastny.com/blog/tuning-...
Pointer chasing is slow, but the number is limited to the height of the tree. Scanning is fast, but quickly adds up. Our model combines these costs: for any number of total keys, we can calculate the optimal number of keys per node 'b' that minimizes total search time.
On the other hand, a B+ tree with infinite keys per node is an array. A sequential scan of 1 trillion keys would take about 16 minutes at 1ns per key.
Let’s start with the extremes. With one key per node, our B+ tree is a binary search tree. Even with 1,000,000,000 keys, the maximum height is <30. We can find any key in less 3 us: 30 pointer jumps * 100 ns per jump = 3 us
Search in B+ trees is a balance of two operations: scanning within nodes (1ns each) and jumping between nodes (100ns each). The size of each node determines how many of each operation we do - and that's why finding the optimal node size is crucial for performance.
Here's problem one from the newsletter:
sirupsen.com/napkin/probl...
And my napkin math:
timmastny.com/projects/nap...
The solution estimated the storage cost to be $0.01 GB/month, but didn't mention write costs. Simon mentions disk storage rather than blob, which I did not consider 🤔
Tips from solution: calculate and estimate with powers of 10 to simplify the math, e.g. 1 day = 9 * 10^4 seconds. I also like to use powers of 10 that are multiples of three to match SI prefixes: 5 * 10^5 bytes = 0.5 * 10^6 = 0.5 MB
At $5 / 1 million writes, we have to batch requests together before writing to the blob, or we would 250x our storage cost. Batching 1000 requests costs $15,000 / year. Batching 1,000,000 is only $15.
Data size: 1 character per byte, 5 characters per word, 200 words per request = 1KB per request. That works out to be 9TB / day. Blob storage at $0.02 GB / month is $65,000 per year. But we also have to pay for writes to blob storage!
Starting to work through the Napkin Math newsletter by x.com/Sirupsen
Problem 1: How much will the storage of logs cost for a standard, monolithic 100,000 RPS web application?
The demo includes several interactive examples, showing how the data structure solves some typical synchronization problems.
Check out the demo here: timmastny.com/rga/
Latest demo: collaborative text editing synchronized with a conflict-free data structure.
The data structure ensures that insertions, edits, and deletions are properly merged during synchronization. Even if the changes were made offline. 1/
Loops, hot paths, and repetitive data means programs are often predictable: even a simple "branch taken" prediction can achieve 70% accuracy.
Unlike sorting algos, we don't care how they perform on random data: we need to compare against real data
timmastny.com/blog/branch-...
Think about hashing like putting balls randomly into bins. gshare could put all the balls into one bin.
But with concatenation, each bin has a maximum of 4 balls, leading to fewer collisions on average.
The problem is that branch history is not random! 2/
When analyzing branch prediction algorithms, the "better" hashing approach (gshare) is theoretically worse than the simple one (concatenation).
What's going on?