Interactive workflows are great for doing things once, not so great for doing things twice.
#reproducibility
@petebachant.me.bsky.social
Bicycles, fluid dynamics, Python, open source, open science, reproducibility. https://petebachant.me | https://calkit.org
Interactive workflows are great for doing things once, not so great for doing things twice.
#reproducibility
Naming files like 01_do-something.sh, 02_do-something-else.sh is a bit of a code smell. The order of execution in a project should be automated, not left to the user to follow a sequence of file names.
10.08.2025 15:18 β π 1 π 0 π¬ 0 π 0Calkit 0.27 introduces parameterization for notebooks run in pipeline stages, and these can even be iterated over: github.com/calkit/calki...
07.08.2025 15:19 β π 0 π 0 π¬ 1 π 0Getting started with Julia. First impression is that there is a lot done interactively from the REPL, which feels weird and antithetical to reproducibility. It's also strange that Julia will load packages from both project-level and system-level environments simultaneously.
06.08.2025 16:59 β π 0 π 0 π¬ 0 π 0To prevent being scooped: Instead of working in a closed, non-transparent way, why not make it easier to collaborate?
#openscience
Too few researchers work reproducibly because it costs too much for too little benefit. We need to make it easier in such a way that helps researchers get papers out the door faster.
30.07.2025 20:09 β π 0 π 0 π¬ 0 π 0Put succinctly, Calkit's goal is to make "repro pack" READMEs unnecessary
#openscience #reproducibility
Classes should be named after the data they encapsulate, not the actions they perform
#softwarenengineering
A cartoon showing how to draw an owl in two steps, which clearly doesn't provide enough information.
When you describe the computational methods in your paper without sharing the code and data:
#openscience #reproducibility
One major cause of the reproducibility crisis is that the way we describe computational methods in research articles has not kept pace with the increase in complexity of the methods themselves. Human language and mathematical formulas are not adequate in many cases.
20.07.2025 12:40 β π 1 π 1 π¬ 0 π 0A quotation from "Electronic Documents Give Reproducible Research a New Meaning" by Claerbout and Karrenbach, whereby they state the goal of allowing researchers to reproduce their work with a single button.
"One button" reproducibility should be the standard
#openscience #reproducibility
Snakemake users, tell me what you like and don't like about it. Is it too complex for non-software-oriented researchers? What's missing?
10.07.2025 13:16 β π 0 π 0 π¬ 0 π 0Tips for using Jupyter notebooks as part of a reproducible workflow (one that goes from raw data to research article with a single command): docs.calkit.org/notebooks/
#openscience #jupyter #reproducibility
If you're asked to review a journal article and don't have access to the code and data, or do but can't run it, do you reject the article?
#openscience
Why don't scientists work reproducibly? All they need to do is learn an ever-evolving suite of complex tools built for software engineers, learn software engineering and architecture best practices, automated testing, etc. Super easy stuff and fun for everyone!
#openscience
The "interest rate" on tech debt is proportional to how much will depend on it. If it's a function only used internally, not sweat. If it's an interface to a library you plan to have used by millions, you should really try to get it right the first time.
#programming #software
You don't need to run every single thing you write through ChatGPT
17.06.2025 22:02 β π 0 π 0 π¬ 0 π 0Closed science is rent seeking behavior
15.06.2025 21:09 β π 1 π 0 π¬ 0 π 0How can we help researchers fall into the "pit of success" when it comes to computational reproducibility? I don't think I've come across a single code/data repo that's actually reproducible with a single command.
blog.codinghorror.com/falling-into...
YAML pipeline syntax.
Calkit now has its own pipeline syntax that forces you to define an environment for every stage, but manages those environments for you automatically. No more pip installs, Docker builds, etc. Your project will just be reproducible.
Docs: docs.calkit.org/pipeline/
#reproducibility #openscience
reproducible + (lots of effort) != reproducible
03.06.2025 18:03 β π 0 π 0 π¬ 0 π 0Engineering is simply predicting what will work and what won't.
28.05.2025 15:41 β π 0 π 0 π¬ 0 π 0Also, why do these all look like they were typed by someone who doesn't know how to use a computer?
25.05.2025 13:25 β π 0 π 0 π¬ 0 π 0"Restoring gold standard science"
Lots to say here. It's amazing this administration would accuse others of undermining trust in science, but nevertheless, requiring all agencies to do open science is a good thing. Let's see if it actually happens.
www.whitehouse.gov/presidential...
If you share all of your code and data but users can't rerun with a single command (including environment setup), that's commendable, but your project is not reproducible.
In other words, don't write a non-automated "pipeline" as a list of manual steps in your README!
#reproducibility #openscience
IMO it is well worth the trouble to do this when you have proper caching like DVC provides. No more memorization of what notebook needs to be run next, etc.
But are reproducible habits still more vitamins than painkillers?
Or provide some other evidence that your outputs were truly created by the provided input data and process definitions. For this, we can force users to provide comprehensive environment descriptions, and hash inputs and outputs to verify consistency (Calkit uses DVC for this).
16.05.2025 18:06 β π 0 π 0 π¬ 1 π 0I was happy to see Claerbout and Karrenbach (1992) use a similar phrase: "pressing a single button".
So, providing all of your code and data along with some instructions is great, but it's not reproducible to the highest standard. For that, you'll need to automate everything, or...
@russpoldrack.org's Substack led me down a reproducibility rabbit hole this morning, ending at doi.org/10.1190/1.18...
One qualification I've been using for a project to be considered reproducible is that it only takes a single command (no lists of instructions for humans in the README!)