Björn Holzhauer @bjoernstats

If you perceive the problem to be that a company might not want to publish, then how would the proposal help? Posting results to clinicaltrials.gov is mandatory for recent trials (with some nuance incl. on timing). Unsurprisingly adherence by industry is extremely high - unlike for academia.

11.12.2024 07:56 — 👍 1 🔁 1 💬 1 📌 0

If you see journals not publishing negative trials as the problem, you get properly conducted RCTs (eventually) published even if results are. "negative". They just tend to get into lower tier journals (unless they are a large 10,000 patient outcome study, which will still get into a top journal).

11.12.2024 07:56 — 👍 1 🔁 0 💬 2 📌 0

And I'm unsure how it makes me more sure "that the results are what they appear to be". In journals like NEJM (all journals should do this), you get full protocol etc. (& FDA oversees version control on these) incl. change history, which is makes it clear what the prespecified plan was.

11.12.2024 07:56 — 👍 1 🔁 0 💬 3 📌 0

Given the limited time to patent expiration and a typical discount rate for moving out the expected sales (assuming they even stay the same with a delayed market entry), the price tag on doing this could easily be 3-digit millions or $1B+.

11.12.2024 07:56 — 👍 1 🔁 0 💬 1 📌 0

The biggest sticking point is surely the timeline impact. E.g. a 3 month delay from a typical single peer review cycle would already be huge in terms of the timelines of typical drug development plan. If there's two review rounds for both your Phase 3 and your Phase 2b, you've added 1 year.

11.12.2024 07:56 — 👍 1 🔁 0 💬 2 📌 0

I find these tabular competitions a useful learning tool (tabular data is what I'm dealing with most of the time, just usually much smaller). E.g. I learnt a lot on the internals of CatBoost. I also should write up my thoughts on tuning GBDTs (e.g. don't tune the learning rate, lower is better).

01.12.2024 21:05 — 👍 2 🔁 0 💬 0 📌 0

My selected solution was a simple average of multiple CatBoost, LightGBM using target encoding for categories, logistic regressions & seed averaged fastai NNs with embeddings of dim 1-4 for all features (numeric ones had low cardinality) and 2 small hidden layers (10 & 5) trained with focal loss.

01.12.2024 21:05 — 👍 1 🔁 0 💬 1 📌 0

I think it's generally a good idea to not take the performance of early stopping per CV fold, but to rather take the best number of iterations (or epochs) averaging across folds. It's particularly so with such a noisy low information metric, so that was an important part of my solution.

01.12.2024 21:05 — 👍 1 🔁 0 💬 1 📌 0

I just had my first top 1% finish (24/2639) in a Kaggle (tabular playground series) competition. Accuracy is such a noisy low information competition metric (even with 10,000s observations) that you have to be very careful to not overfit the out-of-fold observations of your cross-validation.

01.12.2024 21:05 — 👍 3 🔁 0 💬 1 📌 0

The other things much bigger than "placebo effects" is regression to the mean and simple time trends in disease state. The former occurs even in stable chronic conditions once you apply inclusion criteria (this really seems to surprise many non-statisticians).

29.11.2024 14:24 — 👍 2 🔁 0 💬 0 📌 0

I mean, sure, this clinical trial was conducted long enough ago, that the company is not legally required to report the results. Still, it feels disappointing that it's so hard to find the outcomes for a large(ish) trial on a widely used drug.

26.11.2024 21:31 — 👍 0 🔁 0 💬 0 📌 0

Drugs@FDA: FDA-Approved Drugs

Maybe the results are just not available, yet? Thanks to Drugs@FDA (www.accessdata.fda.gov/scripts/cder...), I finally found the results in the clinical pharmacology review for the original drug approval (10+ years ago)...

26.11.2024 21:31 — 👍 0 🔁 0 💬 1 📌 0

Entry from clinicaltrials.gov showing trial results have not been disclosed 19 years after completing trial

Meanwhile on clinicaltrials.gov the trial still doesn't have results 19 years after being completed. As far as I can tell no results from it have been published in a medical journal, either.

26.11.2024 21:31 — 👍 0 🔁 0 💬 1 📌 0

Statement on a pharmaceutical company trial results page that trial results will be shared once available...

Without naming the company, this is disappointing. Company's webpage:"We are committed to disclosing the results of clinical trials on all applicable registries in accordance with applicable law." + "This clinical trial is now complete. When available, results will be posted on ClinicalTrials.gov."

26.11.2024 21:31 — 👍 1 🔁 0 💬 1 📌 0

How can we make better graphs? An initiative to increase the graphical expertise and productivity of quantitative scientists Graphics are at the core of exploring and understanding data, communicating results and conclusions, and supporting decision-making. Increasing our graphical expertise can significantly strengthen ou...

... minimal clutter/white background, and large enough font sizes already help so much. (see also also the article by some of my colleagues: doi.org/10.1002/pst....).

26.11.2024 20:54 — 👍 0 🔁 0 💬 0 📌 0

graphics principles - Welcome This is the home page for effective visual communication and good graphical principles for quantitative scientists.

Annotating on the plot rather than making people go looking forth and back to a legend is such a nice way of making your graphics easier to read. In combination with some of the other graphics principles on graphicsprinciples.github.io like well chosen colors (vs. different dashed lines), ...

26.11.2024 20:54 — 👍 0 🔁 0 💬 1 📌 0

Horrible black and white plots with too small font and visiually hard to distinguish identification of categories

At least, I hope you'd agree that the plot with colors, annotations on the plot, large font etc. is better than these slightly exaggerated disasters you might get by taking more of a default approach.

26.11.2024 20:54 — 👍 0 🔁 0 💬 1 📌 0

Figure with three dose response color-coded curves for drugs labelled "Drug A" (black), "Drug B" (orange) and "Drug C" (blue) with labels directly next to the curves.

Is there something more clever than what the directlabels R package offers? The style of plot like in this hypothetical example (where geom_dl(method="smart.grid") worked well) is so useful. Yet, all too often it struggles to place labels well & I end up using geom_text manually.

26.11.2024 20:54 — 👍 1 🔁 0 💬 2 📌 0

Surely, by the most common definition logistic regression is artificial intelligence? I can write it as a single layer neutral network in PyTorch, if that helps?

26.11.2024 11:25 — 👍 1 🔁 0 💬 1 📌 0

Björn Holzhauer

Latest posts by bjoernstats.bsky.social on Bluesky

@bjoernstats is following 20 prominent accounts