Michael Boo-tancourt's Avatar

Michael Boo-tancourt

@betanalpha.bsky.social

Zealous modeler. Annoying statistician. Reluctant geometer. Support my writing at http://patreon.com/betanalpha. He/him.

2,726 Followers  |  159 Following  |  1,616 Posts  |  Joined: 06.07.2023  |  2.5901

Latest posts by betanalpha.bsky.social on Bluesky

Post image

take me seriously

23.10.2025 16:00 β€” πŸ‘ 294    πŸ” 36    πŸ’¬ 1    πŸ“Œ 3

An entire ensemble of experiments implemented in the same way would behave even more weirdly, and unless someone recognized the poor use of pseudo-random numbers people might end up chasing down red herrings.

23.10.2025 02:23 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

In the other words when using parallel pseudo-random number generator sequences from different seeds there would be no way to diagnose the failure of the randomized assignments, at least not without running the experiments over and over again.

23.10.2025 02:23 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The issue is that for most pseudo-random number generators the sequences generated from two different seeds can have arbitrary correlations. In the randomization design example the assignments would look "random" superficially but not actually ensure the expected randomization outcomes.

23.10.2025 02:23 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

No disagreement from me that applied science is a mess, but I maintain that the "use random seeds" heuristic doesn't actually solve anything. If anything it just obfuscates problems even further.

23.10.2025 02:23 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

"Don't use environments as ordinary data structures, but also environments are the only base data structures implemented with a hash table and get pass-by-reference semantics". Yup, checks the box.

23.10.2025 02:11 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

So while not refreshing the seed every time clearly doesn't implement the randomized design, refreshing the seed and drawing from the new pseudo-random number generator state doesn't either.

23.10.2025 00:38 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

In order to accurately implement a randomized design one would need to pull assignments from a single pseudo-random number generator sequence (for almost any seed). Running multiple, independent pseudo-random number generators with different seeds doesn't generally guarantee the desired randomness.

23.10.2025 00:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

As in you seeded a new pseudo-random number generator to generate the assignment for each individual? If so then I would argue this is an example of using pseudo-random numbers incorrectly.

23.10.2025 00:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I mostly limit threads like these to social media because I prefer to focus my chapters on what to do rather than what not to do.

23.10.2025 00:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Get more from Michael Betancourt on Patreon writing about modeling building and statistical inference.

I also posted this thread over on patreon dot com, patreon.com/betanalpha, but it requires a free membership to access. Otherwise most of this is in the Monte Carlo chapter to which I linked at the end of the thread.

23.10.2025 00:32 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Due to some very generous people a few sponsored registrations are available, but they usually go fast so don't hesitate to reach out and inquire. Eligibility details can be found at betanalpha.github.io/courses/.

22.10.2025 14:46 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

I’m seeing some misinformation about pseudo-random number generator best practices going around the internets. Let’s talk about why the pseudo-random number generator seed you use shouldn’t actually have any impact on your results and, consequently, you can choose whatever seed you damn well please.

22.10.2025 19:06 β€” πŸ‘ 35    πŸ” 12    πŸ’¬ 4    πŸ“Œ 3
silent hill with a shen yun billboard added

silent hill with a shen yun billboard added

wow these silent hill games are getting more realistic

22.10.2025 15:48 β€” πŸ‘ 1600    πŸ” 465    πŸ’¬ 5    πŸ“Œ 10
Rumble in the Ensemble

If you want to read more then check out Section 2 of my Monte Carlo chapter, betanalpha.github.io/assets/case_.... For even more detail I really like the Mellissa O’Neill's writing, www.pcg-random.org.

22.10.2025 19:06 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

That said it will always be more productive to understand the method you are using and how it can be engineered to ensure strong estimation performance in the first place.

22.10.2025 19:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Moreover it doesn’t introduce any harm provided you don’t try to do something foolish like average the results together…

22.10.2025 19:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Adding a little bit of robustness by running an analysis multiple times with different seeds as a check, just to see if the results are consistent, is a great way to identify potential estimator issues.

22.10.2025 19:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Incidentally the same goes for the range of values for a seed. Modern pseudo-random number generator state spaces are so unfathomably large that any two-digit integer is equally as uncharacteristic as any nine-digit integer.

22.10.2025 19:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

A common seed for every analysis is fine. A heuristic for changing the seed from analysis to analysis, say based on the current date or time, is fine.

22.10.2025 19:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

All of this is to say that any method for choosing a seed is equally adequate provided that seed is reported and the resulting pseudo-random number generator output is used properly.

22.10.2025 19:06 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Different pseudo-random number generator seeds resulting in different Markov chain exploration which results in different Markov chain Monte Carlo estimates is not a problem with the pseudo-random number generator seeds but rather with the Markov chain Monte Carlo algorithm itself!

22.10.2025 19:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

With a multi-modal target distribution any finite Markov chain might explore only part of the target distribution, resulting in formally inaccurate estimates.

22.10.2025 19:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Markov chain Monte Carlo, however, is not always well-behaved. Many Markov chain Monte Carlo algorithms struggle with more complicated target distributions; multi-modality for instance is a particularly problematic feature.

22.10.2025 19:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

If Markov chain Monte Carlo is well-behaved then the Markov chains will converge and produce consistent estimates regardless of the precise sequence of pseudo-random numbers that were used.

22.10.2025 19:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

For a final example let’s consider Markov chain Monte Carlo. Most Markov chain Monte Carlo algorithms rely on pseudo-random number generators to fuel their exploration and, often, also initialize the starting values.

22.10.2025 19:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Sure an unscrupulous person can take advantage of cross validation fragility to hunt for a seed that yields better results, but again this is an estimator problem not a pseudo-random number generator seeding problem.

22.10.2025 19:06 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

In this case some pseudo-random number seeds will result in better outputs than others, but in practice we won’t know which seeds will result in better fluctuations than others and so the choice of seed still has no practical consequence.

22.10.2025 19:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The problem is that these estimators are not always β€” dare I say not often β€” well-behaved. Fragile estimator performance is only magnified when not using enough splits, especially when the data set at hand is too small to allow for sufficiently many splits.

22.10.2025 19:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Well this is really just a way to estimate a particular predictive expectation value. If the estimator is sufficiently well-behaved then any sequence of splits should result in constant estimates.

22.10.2025 19:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@betanalpha is following 20 prominent accounts