Screenshot of sample of Islington's Council Tax address data, visualised in Google Earth
More progress on #openaddresses:
Islington Council in London has released its Council Tax address list for re-use as #opendata under the Open Government Licence www.owenboswarva.com/blog/post-ad...
I've made a geocoded version by adding coordinates from ONS
#FOI #localgov #UKhousing #proptech
28.10.2025 08:46 β π 2 π 2 π¬ 0 π 0
No worries - thanks for the report on the repo, we'll take a look
02.10.2025 13:08 β π 2 π 0 π¬ 0 π 0
GitHub - moj-analytical-services/uk_address_matcher
Contribute to moj-analytical-services/uk_address_matcher development by creating an account on GitHub.
Did you try github.com/moj-analytic...?
The trie is WIP, but the idea is that it will be used as an initial step to skim off the easy ones. The remainder will go through to the main matching phase which already exists in uk_address_matcher, but is more computationally intensive
02.10.2025 06:21 β π 2 π 0 π¬ 2 π 0
New β¨interactiveβ¨ explainer: Address matching using a fault tolerant trie:
robinlinacre.com/fault_tolera...
Which illustrates a powerful technique for address matching that we're currently working on building into uk_address_matcher (github.com/moj-analytic...)
24.09.2025 07:51 β π 1 π 0 π¬ 1 π 1
You select the columns you want, and it handles the joins for you.
It's just a rough sketch for now. I feel like it must have done before, but couldn't find anything. Feedback welcome!
18.08.2025 06:40 β π 0 π 0 π¬ 1 π 0
When working a complex postgres schema, I find it time consuming to figure out the joins.
I had an idea: a 'join generator' that traverses the relationship graph for you, and writes the joins.
You give it a dump of the postgres schema, and it gives you a UI.
www.robinlinacre.com/vite_live_pg...
18.08.2025 06:40 β π 1 π 0 π¬ 1 π 0
We're working on a DuckDB community extension called `splink_udfs` to add some record linkage related functions to DuckDB. It's currently very much WIP, but you can already use it wherever you're using DuckDB.
github.com/moj-analytic...
22.07.2025 16:50 β π 6 π 0 π¬ 0 π 0
Then give output to VS Code copilot in agent mode to implement
11.07.2025 08:33 β π 0 π 0 π¬ 0 π 0
My most commonly used pattern for AI coding: Dump entire source code into Gemini 2.5 pro, write prompt specifying what I want, and then: Give precise instructions for an LLM to follow to implement this feature. Break the solution down into steps where each step is verifiable.
11.07.2025 08:33 β π 1 π 0 π¬ 1 π 0
I think more blocking stage. UK blocking is relatively easy because postcode gets you down to about 50 or fewer addresses. So if your postcodes are accurate, blocking isn't too hard. For addresses outside UK, you might need to lean more heavily on the signature based approaches
05.07.2025 21:01 β π 1 π 0 π¬ 0 π 0
Building Accurate Address Matching Systems
A bag of tricks to improve the accuracy of geocoding
I have been working on a free, high performance address matcher. I've written up some key tricks, techniques, and ideas into a blog post here: www.robinlinacre.com/address_matc...
05.07.2025 10:00 β π 3 π 0 π¬ 1 π 0
Visual Fraction Addition
Rough working app here:
rupertlinacre.com/fraction_add... and code github.com/RupertLinacr...
22.05.2025 22:31 β π 0 π 0 π¬ 0 π 0
The 'build' button in google AI studio is unbelievably good. I had an idea to visualise fractions, three prompts total and it's pretty close to something useful (it does this for any arbitrary fractions). Even the one-shot attempt was pretty good
22.05.2025 20:15 β π 1 π 0 π¬ 1 π 0
YouTube video by PyData
Robin Linacre - Rapid deduplication and fuzzy matching of large datasets using Splink
My PyData Global talk "Rapid deduplication and fuzzy matching of large datasets using Splink" is now on Youtube: www.youtube.com/watch?v=eQtF...
17.04.2025 15:03 β π 4 π 1 π¬ 0 π 0
I see too much focus on trying to find applications of LLMs to help other people 'at scale' with their jobs. At the moment, the output of LLMs is rarely useful for business rules or passive consumption. The lower hanging fruit is helping people use AI directly & however they see fit in their job.
04.04.2025 08:38 β π 3 π 0 π¬ 0 π 0
02.04.2025 13:29 β π 0 π 0 π¬ 0 π 0
If you're using duckdb in a python script or jupyter notebook, you can run con.execute('CALL start_ui()') at any point, and the ui will pop right up in your web browser with the current database automatically available.
(I knew about the UI, but I had missed this trick!)
01.04.2025 06:28 β π 6 π 2 π¬ 0 π 0
GitHub - simonw/files-to-prompt: Concatenate a directory full of files into a single prompt for use with LLMs
Concatenate a directory full of files into a single prompt for use with LLMs - simonw/files-to-prompt
Gemini 2.5 pro is really good. Grok 3 felt like a big step forwards and was my 'go-to' for hard problems, and this feels like another significant step forward.
So nice with small codebases to be able put everything into context (I use github.com/simonw/files... )
29.03.2025 11:39 β π 1 π 0 π¬ 0 π 0
Ended up writing a follow up post with the final approach and learnings from getting this running on GitHub Actions!
All original datasets weight more than 500GB combined. The final ones published on π€, only 1 GB. Took some tinkering to get there but was fun!
davidgasquez.com/exporting-in...
20.03.2025 12:56 β π 6 π 1 π¬ 1 π 0
It's pretty easy to set up a markdown-based blog using github pages for free. Custom styling is much easier now we're in the world of ChatGPT!
16.03.2025 20:19 β π 2 π 0 π¬ 0 π 0
Breakout Maths Game
I vibe coded a primary school maths breakout game - aimed to be fun and educational.
rupertlinacre.com/breakout_mat...
In the process I created and open sourced a maths problem generator aligned to the national curriculum, so you can vibe code your own maths games!
www.npmjs.com/package/math...
15.03.2025 19:40 β π 1 π 0 π¬ 0 π 0
Linking businesses - Splink
Just added an example/tutorial to the Splink docs of matching business data.
It uses some feature engineering tricks that help improve accuracy vs. just fuzzy matching on names.
moj-analytical-services.github.io/splink/demos...
14.02.2025 14:01 β π 1 π 0 π¬ 0 π 0
Undoubtably this will change as models improve, but at the moment there's usually not quite enough 9s of reliability to use in fully automated use cases
13.02.2025 13:16 β π 0 π 0 π¬ 0 π 0
I think the single most productivity-enhancing use of LLMs in gov would be give all devs and data scientists access to Cursor (or equivalent). I am not yet convinced of the widespread value of 'behind the scenes' uses of LLMs, but v. bullish on skilled human-in-the-loop uses, especially coding
13.02.2025 13:16 β π 0 π 0 π¬ 1 π 0
With DuckDB WASM it's possible to run a full Splink model in your browser in a single, standalone .html page.
Here's an example:
www.robinlinacre.com/live_splink/
And the git repo:
github.com/RobinL/vite_...
03.02.2025 16:47 β π 2 π 0 π¬ 0 π 0
Playing around with a spatial duckdb wasm database in a static webpage. Absolutely amazing how far you can get with geospatial in the browser using entirely open source tools
26.01.2025 15:57 β π 4 π 0 π¬ 0 π 0
Software engineer β’ formerly at Microsoft and Amazon β’ πΊπΈEN / π«π·FR / π©πͺDE / πͺπΈES / π―π΅ζ₯ζ¬θͺ β’ https://tts.travisvn.com β’ https://gptree.dev β’ https://chatterboxtts.com
https://github.com/travisvn
https://travis.engineer
Data science and AI to benefit people and society. Data scientist in local government in the UK.
Strategy, tech & products to improve society. Principal Software Engineer @madetech.bsky.social (he/him/they/them)
ποΈπποΈ design & eng @ MotherDuck. UI, statistics, databases. Ex Rill Data, Mozilla
Wannabe Rstats-fu in the Far East.
Product manager at heart, Solution Architect by job title. At TPXimpact helping government publish open stats and data - or whatever else I can help with.
Formerly Swirrl and ONS
Likes a cognac. Likes a second cognac.
Bristolian | internet of public service person | comic book collector | street art seeker | Adidas obsessive | 50s | UK
https://digitalbydefault.com/
Vega, Vega-Lite, and Altair are declarative formats for creating, saving, and sharing interactive visualizations. ππ
Dad, husband, President, citizen. barackobama.com
Shitposting & Memes.
Data & Stuff.
#dataBS #trailrunning
βοΈ https://rmoff.net π https://rmoff.info
Data science and infrastructure for public good in London and civil service.
Cornishman, Data Nerd, always hungry.
Policy & Strategy Analytics academic at Loughborough University; Member, Independent SAGE; Fellow, St Catherine's College, Oxford.
Fuzzy data matching, entity resolution, data science, data analytics
Professor of Political Science & Public Affairs & Director of European Studies, University of Wisconsin - Madison. International political economy, international relations, & international finance. πΊπΈ & πͺπΊ politics & economic policy. Tradeoffs in everything.
Senior Data Scientist at BuzzFeed in San Francisco // AI content generation ethics and R&D // plotter of pretty charts
https://minimaxir.com