imagine falling for the most obvious spy of all time on bumble ???
(a friend sent me this screenshot, I'm married π
)
@michaelrbock.com.bsky.social
co-founder of Column Tax // michaelrbock.com
imagine falling for the most obvious spy of all time on bumble ???
(a friend sent me this screenshot, I'm married π
)
The full plan:
www.columntax.com/blog/our-se...
Weβre so confident that weβre publishing an internal roadmap document: our βsecretβ master plan to automate tax filing (just between you & me).
29.10.2025 13:48 β π 1 π 0 π¬ 1 π 0And now the combination of the latest AI progress and our expert team & large proprietary eval datasets means weβre the group that can finally fully automate tax filing and save people time & money.
29.10.2025 13:48 β π 1 π 0 π¬ 1 π 0How will we know that AI has really βmade itβ?
The task that most exemplifies our ability to automate knowledge work is βdoing your taxesβ.
At Column Tax weβre now within line of sight to fully automating taxes. We started the company at the perfect moment, with LLMs just on the horizon.
No matter how many times we do it...
I always get nervous before a big announcement (coming tomorrow!)
The blog post in question: michaelrbock.com/hypothesis
23.10.2025 15:35 β π 1 π 0 π¬ 0 π 0Positive review of my most popular blog post: "Hypothesis Sheets - how to navigate and exit the idea maze with a (good) startup idea".
Glad to hear the founder whisper networks are still sharing this knowledge around.
4/ next up?
adding tool use (code execution & web search) to see how that helps models calculate tax returns
also testing Claude Opus 4.1 and GPT-5 mini & nano
follow here: github.com/column-tax/...
3/ GPT-5 is impressive in many ways
especially because it's knowledge cutoff is still September 2024
but it's not the leader in tax calculation today
(even with maximal test time compute)
2/ back in July, we published the first-ever eval for US personal income tax calculations
x.com/michaelrboc...
1/ GPT-5 is worse than Gemini 2.5 Pro at filing your taxes (but it's really close and they both can't do it yet)
we proved it via our tax calculation benchmark:
I got married last month.π€΅ββοΈπ°ββοΈ
Here's what it taught me about B2B2C tax software:
Just kidding :) but I do really recommend getting married to the love of your life with all your friends & family around!
no one had even heard of git worktress before claude code
13.08.2025 20:49 β π 1 π 0 π¬ 0 π 0amazing ChatGPT Agent Mode use case: find & validate coupon codes without having to test them yourself
03.08.2025 19:08 β π 2 π 0 π¬ 0 π 010/ Read more about the work, research, and results here:
www.columntax.com/blog/taxcal...
9/ This work wouldnβt have been possible without the hard work of our Tax Analyst team over the past 4 years & the success of our commercial product: you canβt buy this dataset on Scale or Surge.
View the dataset and testing harness here:
github.com/column-tax/...
8/ Models are also inconsistent:
using pass^k (a measure of reliability of a model across multiple runs on the same task), performance degrades with additional runs meaning models mess up in new & surprising ways when calculating tax returns.
7/ For some models, performance improves with increased inference-time compute (thinking budget tokens)
but not for the best model (Gemini 2.5 Pro), suggesting alternative techniques/scaffolding/orchestration is required to get AI to do this tax calculation task.
6/ Models consistently:
1. Misuse tax tables
2. Make calculation errors
For example, models will hallucinate line numbers on Forms or use incorrect eligibility limits.
5/ Takeaway: models canβt calculate tax returns reliably today.
Even on this simplified data set and allowing the models to output to a simplified format, the best model only calculates 32.35% of returns correctly.
4/ TaxCalcBench is a dataset of 51 pairs of user inputs and the expected tax return output + a testing harness.
We made the task easy for the models. We provide:
- all of the data (e.g. W-2s) needed to file a return
- the expected output in IRS XML format
3/ Tax calculation means taking a userβs "inputs" (W-2s, 1099s) and outputting the Form 1040 in the IRS XML format.
75k pages of English text define the transformations required to do this.
Companies like @ColumnTax use deterministic tax engines to do these calculations.
2/ Today, weβre releasing TaxCalcBench: a first-ever benchmark dataset & eval framework for testing AIβs ability to calculate US personal income tax returns.
Tax is a secretive industry, so weβre proud to release a research paper sharing our findings:
arxiv.org/abs/2507.16126
1/ Can AI file your taxes? Not yet.
We tested the latest frontier models and the results were full of catastrophic errors.
Letting AI do your taxes would mean IRS rejections, audits, and penalties:
this is the wildest cold twitter dm opener i've ever received
21.07.2025 23:17 β π 1 π 0 π¬ 0 π 0this is what founder <> founder private text messages look like (and what makes the job so fun)
21.07.2025 14:05 β π 2 π 0 π¬ 0 π 0why is everyone complaining about a GPU shortage if it turns out you can just buy them on amazon ;)
21.06.2025 15:48 β π 2 π 0 π¬ 0 π 011/ Thanks to the folks who worked on Direct File. We have a lot of gratitude.
05.06.2025 16:02 β π 0 π 1 π¬ 0 π 0