Pete Bachant's Avatar

Pete Bachant

@petebachant.me.bsky.social

Bicycles, fluid dynamics, Python, open source, open science, reproducibility. https://petebachant.me | https://calkit.org

170 Followers  |  716 Following  |  105 Posts  |  Joined: 13.11.2024  |  1.8499

Latest posts by petebachant.me on Bluesky

Interactive workflows are great for doing things once, not so great for doing things twice.

#reproducibility

12.08.2025 17:52 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Naming files like 01_do-something.sh, 02_do-something-else.sh is a bit of a code smell. The order of execution in a project should be automated, not left to the user to follow a sequence of file names.

10.08.2025 15:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Notebooks - Calkit

Docs: docs.calkit.org/notebooks/#p...

07.08.2025 15:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Release v0.27.0 Β· calkit/calkit What's Changed Allow parameterizing notebooks and iterating over those parameters by @petebachant in #465 Full Changelog: v0.26.12...v0.27.0

Calkit 0.27 introduces parameterization for notebooks run in pipeline stages, and these can even be iterated over: github.com/calkit/calki...

07.08.2025 15:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Getting started with Julia. First impression is that there is a lot done interactively from the REPL, which feels weird and antithetical to reproducibility. It's also strange that Julia will load packages from both project-level and system-level environments simultaneously.

06.08.2025 16:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

To prevent being scooped: Instead of working in a closed, non-transparent way, why not make it easier to collaborate?

#openscience

03.08.2025 14:54 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Too few researchers work reproducibly because it costs too much for too little benefit. We need to make it easier in such a way that helps researchers get papers out the door faster.

30.07.2025 20:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Put succinctly, Calkit's goal is to make "repro pack" READMEs unnecessary

#openscience #reproducibility

24.07.2025 15:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Classes should be named after the data they encapsulate, not the actions they perform

#softwarenengineering

22.07.2025 17:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
A cartoon showing how to draw an owl in two steps, which clearly doesn't provide enough information.

A cartoon showing how to draw an owl in two steps, which clearly doesn't provide enough information.

When you describe the computational methods in your paper without sharing the code and data:

#openscience #reproducibility

21.07.2025 13:58 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

One major cause of the reproducibility crisis is that the way we describe computational methods in research articles has not kept pace with the increase in complexity of the methods themselves. Human language and mathematical formulas are not adequate in many cases.

20.07.2025 12:40 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
A quotation from "Electronic Documents Give Reproducible Research a New Meaning" by Claerbout and Karrenbach, whereby they state the goal of allowing researchers to reproduce their work with a single button.

A quotation from "Electronic Documents Give Reproducible Research a New Meaning" by Claerbout and Karrenbach, whereby they state the goal of allowing researchers to reproduce their work with a single button.

"One button" reproducibility should be the standard

#openscience #reproducibility

17.07.2025 02:28 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Snakemake users, tell me what you like and don't like about it. Is it too complex for non-software-oriented researchers? What's missing?

10.07.2025 13:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Notebooks - Calkit

Tips for using Jupyter notebooks as part of a reproducible workflow (one that goes from raw data to research article with a single command): docs.calkit.org/notebooks/

#openscience #jupyter #reproducibility

04.07.2025 13:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If you're asked to review a journal article and don't have access to the code and data, or do but can't run it, do you reject the article?

#openscience

25.06.2025 13:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Why don't scientists work reproducibly? All they need to do is learn an ever-evolving suite of complex tools built for software engineers, learn software engineering and architecture best practices, automated testing, etc. Super easy stuff and fun for everyone!

#openscience

20.06.2025 15:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The "interest rate" on tech debt is proportional to how much will depend on it. If it's a function only used internally, not sweat. If it's an interface to a library you plan to have used by millions, you should really try to get it right the first time.

#programming #software

19.06.2025 20:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

You don't need to run every single thing you write through ChatGPT

17.06.2025 22:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Closed science is rent seeking behavior

15.06.2025 21:09 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Falling Into The Pit of Success Eric Lippert notesΒ the perils of programming in C++: I often think of C++ as my own personal Pit of Despair Programming Language.Β Unmanaged C++ makes it so easy to fall into traps. Think buffer overr...

How can we help researchers fall into the "pit of success" when it comes to computational reproducibility? I don't think I've come across a single code/data repo that's actually reproducible with a single command.

blog.codinghorror.com/falling-into...

12.06.2025 14:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
YAML pipeline syntax.

YAML pipeline syntax.

Calkit now has its own pipeline syntax that forces you to define an environment for every stage, but manages those environments for you automatically. No more pip installs, Docker builds, etc. Your project will just be reproducible.

Docs: docs.calkit.org/pipeline/

#reproducibility #openscience

06.06.2025 14:40 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

reproducible + (lots of effort) != reproducible

03.06.2025 18:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Engineering is simply predicting what will work and what won't.

28.05.2025 15:41 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Also, why do these all look like they were typed by someone who doesn't know how to use a computer?

25.05.2025 13:25 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Restoring Gold Standard Science By the authority vested in me as President by the Constitution and the laws of the United States of America, including section 7301 of title 5, United

"Restoring gold standard science"

Lots to say here. It's amazing this administration would accuse others of undermining trust in science, but nevertheless, requiring all agencies to do open science is a good thing. Let's see if it actually happens.

www.whitehouse.gov/presidential...

25.05.2025 13:25 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

If you share all of your code and data but users can't rerun with a single command (including environment setup), that's commendable, but your project is not reproducible.

In other words, don't write a non-automated "pipeline" as a list of manual steps in your README!

#reproducibility #openscience

23.05.2025 22:27 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

IMO it is well worth the trouble to do this when you have proper caching like DVC provides. No more memorization of what notebook needs to be run next, etc.

But are reproducible habits still more vitamins than painkillers?

16.05.2025 18:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Or provide some other evidence that your outputs were truly created by the provided input data and process definitions. For this, we can force users to provide comprehensive environment descriptions, and hash inputs and outputs to verify consistency (Calkit uses DVC for this).

16.05.2025 18:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I was happy to see Claerbout and Karrenbach (1992) use a similar phrase: "pressing a single button".

So, providing all of your code and data along with some instructions is great, but it's not reproducible to the highest standard. For that, you'll need to automate everything, or...

16.05.2025 18:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Electronic documents give reproducible research a new meaning | SEG Technical Program Expanded Abstracts 1992

@russpoldrack.org's Substack led me down a reproducibility rabbit hole this morning, ending at doi.org/10.1190/1.18...

One qualification I've been using for a project to be considered reproducible is that it only takes a single command (no lists of instructions for humans in the README!)

16.05.2025 18:06 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

@petebachant.me is following 19 prominent accounts