Catalyst Cooperative's Avatar

Catalyst Cooperative

@catalyst.coop.bsky.social

We help #climate advocates, policymakers, & researchers working on the #EnergyTransition by liberating #OpenData about the US energy system using #Python based #DataEngineering We're also a worker cooperative. https://github.com/catalyst-cooperative

2,144 Followers  |  1,308 Following  |  448 Posts  |  Joined: 28.11.2023  |  2.1246

Latest posts by catalyst.coop on Bluesky

Preview
pudl/dbt at main ยท catalyst-cooperative/pudl The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists. - catalyst-cooperative/pudl

We're still thinking about how to best publicly document what data validations we're currently running, but if you're familiar with dbt you can look at the dbt project in the PUDL repo:

github.com/catalyst-coo...

29.07.2025 14:48 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Suggestions for new PUDL data validations ยท catalyst-cooperative ยท Discussion #4495 We've finally finished migrating all of our existing data validation tests into dbt. They're already more extensive than the old pytest + pandas setup, and also run in 45 seconds instead of 3 hours...

We're going to organize an internal hackathon to add a bunch of new data validations later this summer. If you're a PUDL user and there are things you think we should be checking, please let us know in this discussion. #EnergySky

github.com/orgs/catalys...

29.07.2025 14:48 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We've finally finished migrating all our PUDL data validations out of our old DIY pytest+pandas setup and into @getdbt.com running @duckdb.org on Parquet files. It runs a lot more tests now, and takes 45 seconds instead of ~3 hours. Plus we got to learn SQL.

29.07.2025 14:48 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Tech Coops list

If you're interested in finding a tech worker co-op near you, or in your field, check out this directory:

tech-coops.xyz

24.07.2025 18:28 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Tech Coop Roundtable
YouTube video by Catalyst Cooperative Tech Coop Roundtable

Tech worker co-ops aren't (yet!) that common in the US, and we get a lot of requests from researchers to do interviews or surveys, so we finally decided to just interview ourselves and share. Cc @usworker.coop

www.youtube.com/watch?v=mBjj...

24.07.2025 18:28 โ€” ๐Ÿ‘ 11    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
GitHub - moj-analytical-services/splink_udfs Contribute to moj-analytical-services/splink_udfs development by creating an account on GitHub.

The Splink (@robinlinacre.bsky.social) folks are working on some record-linkage extensions for @duckdb.org. Mostly focused on string standardization and homophones for now. Looks cool! What other record linkage extensions might make sense? github.com/moj-analytic...

24.07.2025 04:05 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Data Access - PUDL 2025.7.1.dev0+gba234e0.d20250703 documentation

Instructions for accessing our various outputs here: catalystcoop-pudl.readthedocs.io/en/v2025.7.0...

08.07.2025 18:08 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
PUDL Release Notes - PUDL 2025.7.1.dev0+gba234e0.d20250703 documentation

Full release notes in our documentation here: catalystcoop-pudl.readthedocs.io/en/v2025.7.0...

08.07.2025 18:08 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
PUDL v2025.7.0 is available! ยท catalyst-cooperative ยท Discussion #4390 We're experimenting with doing monthly releases to make sure that the EIA-860M data in PUDL doesn't lag official sources by more than a month, and this is our first such update. It's also got annua...

Hey #EnergySky we have a new PUDL data release -- it includes refreshed EIA-860M and early release data for EIA-860/923. We're experimenting with a monthly release cadence. Open discussion here if you want to say hi or highlight any issues you find in the release:

github.com/orgs/catalys...

08.07.2025 18:08 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The biggest immediate action we're taking as a result of the survey feedback is trying to improve our documentation -- to make it clearer how the data is being processed, and also which version of it you should be using, depending on your use case.

26.06.2025 16:46 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

More respondents than we expected were comfortable working with Apache Parquet files, though CSV is still the crowd favorite. Most respondents also had some experience filing issues or making pull requests on GitHub, which was encouraging.

26.06.2025 16:46 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

The NREL ATB (electricity) data was also more commonly used than we imagined -- we need to better document that it's all available in PUDL and how to work with it alongside other datasets:

26.06.2025 16:46 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Our sample was only N=47, and it's certainly biased, but we found that we have more for-profit and fewer NGO users than we imagined.

26.06.2025 16:46 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Keeping PUDL on the Pulse: Insights from our Annual Ecosystem Survey - Catalyst Cooperative Last July we conducted more than 60 interviews with energy data users to kick-off our work for the ย NSF POSE grant! While that process was very informative, it was also a huge amount of work, and we a...

Hey #EnergySky we wrote up some notes on the results of our first annual open energy data user survey. At the top level:
- It's hard to find energy data
- We already have many of the most used datasets.
- More folks have python / open source experience than we thought.

catalyst.coop/2025/06/24/k...

26.06.2025 16:46 โ€” ๐Ÿ‘ 6    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Asset Revaluation and the Existential Politics of Climate Change | International Organization | Cambridge Core Asset Revaluation and the Existential Politics of Climate Change - Volume 75 Issue 2

I think my favorite kind of book is when I find someone else that has thought longer and harder about an idea I've also had, but in a fuzzy way. It's like getting to fill in all the gaps for free! Precursor paper up here. #EnergySky

www.cambridge.org/core/journal...

12.06.2025 21:51 โ€” ๐Ÿ‘ 8    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
A New Grand Theory of Why Decarbonization Is So Hard Rob and Jesse talk with Jessica Green, author of the forthcoming book, Existential Politics.

VERY excited for @greenprofgreen.bsky.social's new book. This is basically the theory behind most of the work @catalyst.coop has supported with open utility & financial data. We stumbled into it on our own fighting Xcel's coal plants in Colorado around 2013. #EnergySky

heatmap.news/podcast/shif...

12.06.2025 21:51 โ€” ๐Ÿ‘ 24    ๐Ÿ” 9    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 1

I've spent some time recently with NERC's reliability assessments - it's interesting to look back over the last several years of reliability assessments and compare. There's a long-run trend of declining summer reliability risk nation-wide as regions have ramped up deployment. ๐Ÿงต #energysky

07.06.2025 00:12 โ€” ๐Ÿ‘ 11    ๐Ÿ” 7    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0

We're interested in trying to create a modern, public, interface for searching and using LLMs to operate on regulatory filings related to the energy system, like state PUC dockets, utility IRPs, the FERC eLibrary, etc. but so far haven't found anything analogous. #EnergySky @simonwillison.net

06.06.2025 17:20 โ€” ๐Ÿ‘ 3    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Does anybody have examples of #opendata projects using #OpenSearch / #ElasticSearch to provide meaningful public access to large collections of documents? Or to give #opensource developers API access to documents, metadata, and text embeddings? Maybe @propublica.org? #EnergySky

06.06.2025 17:20 โ€” ๐Ÿ‘ 6    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
The Best Way to Use Text Embeddings Portably is With Parquet and Polars Never store embeddings in a CSV!

A post looking at doing this kind of vector search using @pola.rs + Parquet, and the very fast dot-product code buried deep in the guts of NumPy:

minimaxir.com/2025/02/embe...

02.06.2025 21:56 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Building Vector Search in DuckDB - MotherDuck Blog Discover the power of AI search by using vector embeddings in natural language processing in the first blog in our informative three-part series! We'll cover the basics of vector embeddings and cosine...

Oh cool, @duckdb.org has some built-in features that make it good for working with text embeddings. The @ferc.gov EQR data includes unstructured text describing contracts for electricity. What if we added a column with the vector representation in Parquet? #EnergySky

motherduck.com/blog/search-...

02.06.2025 21:56 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We added the NEMS repo to our archiving runs in January. Would folks find it useful if we integrated O&M cost tables based on this data into PUDL in a standardized way? #EnergySky

27.05.2025 23:00 โ€” ๐Ÿ‘ 3    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
DuckLake: SQL as a Lakehouse Format DuckLake simplifies lakehouses by using a standard SQL database for all metadata, instead of complex file-based systems, while still storing data in open formats like Parquet. This makes it more relia...

More cool Parquet based data warehouse (lakehouse) tooling from @duckdb.org -- separating the storage (Parquet object store) from the catalog metadata (DuckDB) from the compute (whatever). It would be nice to have all PUDL releases available in the same structure.

duckdb.org/2025/05/27/d...

28.05.2025 02:35 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Is it legal to use or contribute to a PLEXOS-to-PyPSA data converter? Many modellers and planners hold data in PLEXOS XML formats, but want to work with open-source tools like PyPSA. This raises a common question: Can I legally use or help build a data converter between...

Wouldn't it be sweet if you could convert your PLEXOS modeling data into another format that worked with open source energy system modeling tools? #EnergySky

forum.openmod.org/t/is-it-lega...

28.05.2025 01:15 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Some (good) personal news - I am so happy to share that today is my first day at Climate Central (as a Climate Scientist).

Ready to move the science and communication of it forward in this crucial moment. Follow our work on climate services at @climatecentral.org (Bluesky) & www.climatecentral.org

27.05.2025 23:21 โ€” ๐Ÿ‘ 1183    ๐Ÿ” 83    ๐Ÿ’ฌ 117    ๐Ÿ“Œ 6

We added the NEMS repo to our archiving runs in January. Would folks find it useful if we integrated O&M cost tables based on this data into PUDL in a standardized way? #EnergySky

27.05.2025 23:00 โ€” ๐Ÿ‘ 3    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

TLDR -- you can get unit-level O&M data out of NEMS for generators in the US! The data are easy to misunderstand so I've provided some helpful details.

@catalyst.coop

27.05.2025 19:33 โ€” ๐Ÿ‘ 4    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The lesser known NEPA.

27.05.2025 06:27 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Hey #EnergySky welcome @priyald17.bsky.social to the skyline!

26.05.2025 17:58 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
The Public Utility Data Liberation Project - Open Collective Liberating US energy system data for easy use by people fighting climate change.

If you value our work, and the open data we publish to support public interest energy system modeling, analysis, and journalism, please consider becoming a PUDL Sustainer. The PUDL budget is still quite a bit short for 2025 and the federal research chaos isn't helping.

opencollective.com/pudl

21.05.2025 14:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@catalyst.coop is following 19 prominent accounts