We're still thinking about how to best publicly document what data validations we're currently running, but if you're familiar with dbt you can look at the dbt project in the PUDL repo:
github.com/catalyst-coo...
@catalyst.coop.bsky.social
We help #climate advocates, policymakers, & researchers working on the #EnergyTransition by liberating #OpenData about the US energy system using #Python based #DataEngineering We're also a worker cooperative. https://github.com/catalyst-cooperative
We're still thinking about how to best publicly document what data validations we're currently running, but if you're familiar with dbt you can look at the dbt project in the PUDL repo:
github.com/catalyst-coo...
We're going to organize an internal hackathon to add a bunch of new data validations later this summer. If you're a PUDL user and there are things you think we should be checking, please let us know in this discussion. #EnergySky
github.com/orgs/catalys...
We've finally finished migrating all our PUDL data validations out of our old DIY pytest+pandas setup and into @getdbt.com running @duckdb.org on Parquet files. It runs a lot more tests now, and takes 45 seconds instead of ~3 hours. Plus we got to learn SQL.
29.07.2025 14:48 โ ๐ 4 ๐ 1 ๐ฌ 1 ๐ 0If you're interested in finding a tech worker co-op near you, or in your field, check out this directory:
tech-coops.xyz
Tech worker co-ops aren't (yet!) that common in the US, and we get a lot of requests from researchers to do interviews or surveys, so we finally decided to just interview ourselves and share. Cc @usworker.coop
www.youtube.com/watch?v=mBjj...
The Splink (@robinlinacre.bsky.social) folks are working on some record-linkage extensions for @duckdb.org. Mostly focused on string standardization and homophones for now. Looks cool! What other record linkage extensions might make sense? github.com/moj-analytic...
24.07.2025 04:05 โ ๐ 4 ๐ 0 ๐ฌ 0 ๐ 0Instructions for accessing our various outputs here: catalystcoop-pudl.readthedocs.io/en/v2025.7.0...
08.07.2025 18:08 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Full release notes in our documentation here: catalystcoop-pudl.readthedocs.io/en/v2025.7.0...
08.07.2025 18:08 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Hey #EnergySky we have a new PUDL data release -- it includes refreshed EIA-860M and early release data for EIA-860/923. We're experimenting with a monthly release cadence. Open discussion here if you want to say hi or highlight any issues you find in the release:
github.com/orgs/catalys...
The biggest immediate action we're taking as a result of the survey feedback is trying to improve our documentation -- to make it clearer how the data is being processed, and also which version of it you should be using, depending on your use case.
26.06.2025 16:46 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0More respondents than we expected were comfortable working with Apache Parquet files, though CSV is still the crowd favorite. Most respondents also had some experience filing issues or making pull requests on GitHub, which was encouraging.
26.06.2025 16:46 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0The NREL ATB (electricity) data was also more commonly used than we imagined -- we need to better document that it's all available in PUDL and how to work with it alongside other datasets:
26.06.2025 16:46 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Our sample was only N=47, and it's certainly biased, but we found that we have more for-profit and fewer NGO users than we imagined.
26.06.2025 16:46 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Hey #EnergySky we wrote up some notes on the results of our first annual open energy data user survey. At the top level:
- It's hard to find energy data
- We already have many of the most used datasets.
- More folks have python / open source experience than we thought.
catalyst.coop/2025/06/24/k...
I think my favorite kind of book is when I find someone else that has thought longer and harder about an idea I've also had, but in a fuzzy way. It's like getting to fill in all the gaps for free! Precursor paper up here. #EnergySky
www.cambridge.org/core/journal...
VERY excited for @greenprofgreen.bsky.social's new book. This is basically the theory behind most of the work @catalyst.coop has supported with open utility & financial data. We stumbled into it on our own fighting Xcel's coal plants in Colorado around 2013. #EnergySky
heatmap.news/podcast/shif...
I've spent some time recently with NERC's reliability assessments - it's interesting to look back over the last several years of reliability assessments and compare. There's a long-run trend of declining summer reliability risk nation-wide as regions have ramped up deployment. ๐งต #energysky
07.06.2025 00:12 โ ๐ 11 ๐ 7 ๐ฌ 4 ๐ 0We're interested in trying to create a modern, public, interface for searching and using LLMs to operate on regulatory filings related to the energy system, like state PUC dockets, utility IRPs, the FERC eLibrary, etc. but so far haven't found anything analogous. #EnergySky @simonwillison.net
06.06.2025 17:20 โ ๐ 3 ๐ 2 ๐ฌ 1 ๐ 0Does anybody have examples of #opendata projects using #OpenSearch / #ElasticSearch to provide meaningful public access to large collections of documents? Or to give #opensource developers API access to documents, metadata, and text embeddings? Maybe @propublica.org? #EnergySky
06.06.2025 17:20 โ ๐ 6 ๐ 5 ๐ฌ 1 ๐ 0A post looking at doing this kind of vector search using @pola.rs + Parquet, and the very fast dot-product code buried deep in the guts of NumPy:
minimaxir.com/2025/02/embe...
Oh cool, @duckdb.org has some built-in features that make it good for working with text embeddings. The @ferc.gov EQR data includes unstructured text describing contracts for electricity. What if we added a column with the vector representation in Parquet? #EnergySky
motherduck.com/blog/search-...
We added the NEMS repo to our archiving runs in January. Would folks find it useful if we integrated O&M cost tables based on this data into PUDL in a standardized way? #EnergySky
27.05.2025 23:00 โ ๐ 3 ๐ 2 ๐ฌ 0 ๐ 0More cool Parquet based data warehouse (lakehouse) tooling from @duckdb.org -- separating the storage (Parquet object store) from the catalog metadata (DuckDB) from the compute (whatever). It would be nice to have all PUDL releases available in the same structure.
duckdb.org/2025/05/27/d...
Wouldn't it be sweet if you could convert your PLEXOS modeling data into another format that worked with open source energy system modeling tools? #EnergySky
forum.openmod.org/t/is-it-lega...
Some (good) personal news - I am so happy to share that today is my first day at Climate Central (as a Climate Scientist).
Ready to move the science and communication of it forward in this crucial moment. Follow our work on climate services at @climatecentral.org (Bluesky) & www.climatecentral.org
We added the NEMS repo to our archiving runs in January. Would folks find it useful if we integrated O&M cost tables based on this data into PUDL in a standardized way? #EnergySky
27.05.2025 23:00 โ ๐ 3 ๐ 2 ๐ฌ 0 ๐ 0TLDR -- you can get unit-level O&M data out of NEMS for generators in the US! The data are easy to misunderstand so I've provided some helpful details.
@catalyst.coop
The lesser known NEPA.
27.05.2025 06:27 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Hey #EnergySky welcome @priyald17.bsky.social to the skyline!
26.05.2025 17:58 โ ๐ 4 ๐ 1 ๐ฌ 1 ๐ 0If you value our work, and the open data we publish to support public interest energy system modeling, analysis, and journalism, please consider becoming a PUDL Sustainer. The PUDL budget is still quite a bit short for 2025 and the federal research chaos isn't helping.
opencollective.com/pudl