Ray Bai

Ray Bai

@raybai07.bsky.social

Assistant Professor of Statistics at George Mason University, French bulldog owner, and pop culture enthusiast. Any views expressed on here are my own, not my employer's. 👨‍🏫📊 📚👨‍🍳🏳️‍🌈 raybai.net

208 Followers 210 Following 87 Posts Joined Dec 2023
2 days ago

I am pleased to share that my paper "VCBART: Bayesian Trees for Varying Coefficients" (with Sameer Deshpande, Cecilia Balocchi, Jennifer Sterling, and Jordan Weiss) has been published in the latest issue of Bayesian Analysis!

Read it here: doi.org/10.1214/24-B...

4 0 0 0
1 week ago
Seven Major Directions and Trends in Modern Statistics – Ray Bai

New blog post: "Seven Major Directions and Trends in Modern Statistics"! In this post, I summarize a few of the latest trends and prominent areas in the field of statistics.

raybai.net/seven-major-...

0 0 0 0
2 weeks ago

I often explain deep learning and DGMs to non-experts & students who are not familiar but are interested in exploring this area. I find it's very helpful to start by framing linear regression and logistic regression as special cases of neural networks with a single output layer.

1 0 0 0
2 weeks ago

Happening tomorrow at UMBC! Excited for my visit

0 0 0 0
3 weeks ago

(2/2) Never thought of myself as much of a probabilist either, but my recent work on DGMs delved into functional inequalities in probability theory to characterize transport maps. You just never know when these things will pop up or when you'll use them!

0 0 0 0
3 weeks ago

(1/2) It's always a bit wild to me when something I learned many years ago comes up again. I wasn't sure I'd ever use differential equations again, but now with flow matching and diffusion models being the current state-of-the-art generative models, I'm reviewing a bit of ODEs.

3 0 1 0
3 weeks ago
Post image

A bit late but group pic from the Maryland Statistics Symposium at Brinn Mathematics Research Center this past Dec! Left to right: Jianhui Zhou, Gemma Moran, Lizhen Lin, Alden Green, Ray Bai, Anindya Roy, Cindy Rush, Yubai Yuan, Yun Yang, Yang Feng, Anderson Ye Zhang, Yanyuan Ma

0 0 0 0
3 weeks ago

I hope Sinners wins the Academy Award for Best Picture this year. Not just because it was an incredible movie, but because as a longtime horror aficionado, this would signal a broader appetite for horror & other genre-bending films in the academy (justice for Get Out!).

0 0 0 0
1 month ago
Post image

I'm giving a talk "Deep Generative Models for Statistical Problems: Methods, Computation, and Theory" at the UMBC Mathematics and Statistics Dept next Friday, Feb. 20 from 11:00 am-12:00 pm! Come join if you're in the area. mathstat.umbc.edu/events/event...

2 0 0 1
1 month ago

😍

0 0 0 0
1 month ago

This Super Bowl game is fairly boring, but absolutely loved the Halftime Show and the other musical performances! Green Day, Lady Gaga, Bad Bunny ❤️❤️

0 0 0 0
1 month ago

Congrats to my collaborator and former student Qingyang Liu (I taught him in 2 classes, served on his dissertation committee, and have co-authored several papers with him)! He will be joining @wakeforeststats.bsky.social as an Assistant Professor in July. 🥳Great department!

0 0 0 0
2 months ago
Preview
Open-Rank, Tenured/Tenure-Track Statistics Faculty - Fairfax, VA, Virginia, United States Department: Col of Engineering and Computing Classification: 9-month Instructional Faculty Job Category: Instructional Faculty Job Type: Full-Time Work Schedule: Full-time (1.0 FTE, 40 hrs/wk) Locatio...

To anyone who is on the job market in Statistics this academic year: the George Mason University (GMU) Department of Statistics is hiring for open-rank, tenure-track or tenured positions!

For full consideration, apply by January 14 at this link: tinyurl.com/6mjs8fye

1 3 0 0
2 months ago

So maddening what happened at the University of Nebraska-Lincoln

magazine.amstat.org/blog/2026/01...

0 0 0 0
2 months ago
Post image

Our paper "Quantifying predictive uncertainty of aphasia severity in stroke patients with sparse heteroscedastic Bayesian high-dimensional regression" was published in the most recent issue of Computational Statistics. Read the paper here: doi.org/10.1007/s001...

1 0 0 0
3 months ago
Will you incorporate LLMs and AI prompting into the course in the future?
No.

Why won’t you incorporate LLMs and AI prompting into the course?
These tools are useful for coding (see this for my personal take on this).

However, they’re only useful if you know what you’re doing first. If you skip the learning-the-process-of-writing-code step and just copy/paste output from ChatGPT, you will not learn. You cannot learn. You cannot improve. You will not understand the code. In that post, it warns that you cannot use it as a beginner:

…to use Databot effectively and safely, you still need the skills of a data scientist: background and domain knowledge, data analysis expertise, and coding ability.

There is no LLM-based shortcut to those skills. You cannot LLM your way into domain knowledge, data analysis expertise, or coding ability.

The only way to gain domain knowledge, data analysis expertise, and coding ability is to struggle. To get errors. To google those errors. To look over the documentation. To copy/paste your own code and adapt it for different purposes. To explore messy datasets. To struggle to clean those datasets. To spend an hour looking for a missing comma.

This isn’t a form of programming hazing, like “I had to walk to school uphill both ways in the snow and now you must too.” It’s the actual process of learning and growing and developing and improving. You’ve gotta struggle. This Tumblr post puts it well (it’s about art specifically, but it applies to coding and data analysis too):

Contrary to popular belief the biggest beginner’s roadblock to art isn’t even technical skill it’s frustration tolerance, especially in the age of social media. It hurts and the frustration is endless but you must build the frustration tolerance equivalent to a roach’s capacity to survive a nuclear explosion. That’s how you build on the technical skill. Throw that “won’t even start because I’m afraid it won’t be perfect” shit out the window. Just do it. Just start. Good luck. (The original post has disappeared, but here’s a reblog.)

It’s hard, but struggling is the only way to learn anything. You might not enjoy code as much as Williams does (or I do), but there’s still value in maintaining codings skills as you improve and learn more. You don’t want your skills to atrophy.

As I discuss here, when I do use LLMs for coding-related tasks, I purposely throw as much friction into the process as possible:

To avoid falling into over-reliance on LLM-assisted code help, I add as much friction into my workflow as possible. I only use GitHub Copilot and Claude in the browser, not through the chat sidebar in Positron or Visual Studio Code. I treat the code it generates like random answers from StackOverflow or blog posts and generally rewrite it completely. I disable the inline LLM-based auto complete in text editors. For routine tasks like generating {roxygen2} documentation scaffolding for functions, I use the {chores} package, which requires a bunch of pointing and clicking to use.

Even though I use Positron, I purposely do not use either Positron Assistant or Databot. I have them disabled.

So in the end, for pedagogical reasons, I don’t foresee me incorporating LLMs into this class. I’m pedagogically opposed to it. I’m facing all sorts of external pressure to do it, but I’m resisting.

You’ve got to learn first.

Some closing thoughts for my students this semester on LLMs and learning #rstats datavizf25.classes.andrewheiss.com/news/2025-12...

331 99 14 31
2 months ago
Exam question: After you have explained 97% confidence to Bob, he responds, "I see. 97% is pretty good, but it could be great if we can make a 100% confidence interval." What is your response to this?

Student's answer: "Bob, you are a fool amongst fools. Truly, I pity you. A 100% confidence interval would be useful as it would give us a result of all real numbers. Taht's the only way to be 100% sure our true mean is in the interval; if every number could be included."

Grading my final exams for undergrad probability & statistics, and this response to one of my questions seriously made me laugh out loud for minutes. Should I give Extra Credit for the student's response? "Bob, you are a fool amongst fools." 😂😂😂

1 0 0 0
3 months ago

Our R package for VCBART, or fitting BART-based varying coefficient models, is now available on CRAN! Useful for flexible regression modeling + can be used to estimate heterogeneous treatment effects in causal inference by specifying X and Z appropriately. Check it cran.r-project.org/web/packages...

1 0 0 0
3 months ago
Preview
Colleges Are Preparing to Self-Lobotomize The skills that students will need in an age of automation are precisely those that are eroded by inserting AI into the educational process.

Yes. "... the skills that future graduates will most need in the AI era—creative thinking, the capacity to learn new things, flexible modes of analysis—are precisely those that are likely to be eroded by inserting AI into the educational process."

1 0 0 0
3 months ago
Preview
Yesterday was a very sad day for all of Statistics: The University of Nebraska Board of Regents voted 9-1 (with 2 abstentions) to eliminate its Department of Statistics (https://lnkd.in/gJzJ_yki)… | C... Yesterday was a very sad day for all of Statistics: The University of Nebraska Board of Regents voted 9-1 (with 2 abstentions) to eliminate its Department of Statistics (https://lnkd.in/gJzJ_yki). The...

A sad day for the statistics community. U. of Nebraska Board of Regents voted to eliminate UNL's Department of Statistics.

1 0 0 0
3 months ago
Post image Post image

I'm at the Brin Mathematics Research Center today for the Maryland Statistics Symposium! Presenting my work on generative quantile regression w/ former PhD student Dr. Shijie Wang (U. South Carolina '24) and Dr. Minsuk Shin of Yonsei U. (published in JCGS last year).

1 0 0 0
3 months ago
Maryland Statistical Symposium | Brin Mathematics Research Center

The Maryland Statistical Symposium looks awesome! brinmrc.umd.edu/fall25-mss/

So honored to be invited to speak at this event alongside many outstanding researchers, some of whose work I have followed and admired for years!

1 1 0 0
4 months ago
YouTube
University of Nebraska-Lincoln Department of Statistics seminar "The Metrics" on November 6, 2025 YouTube video by Chris Bilder

If you're following the #UNL #statistics saga (proposed for elimination based on bad stats), you might find the seminar we gave yesterday interesting... youtu.be/fUk2R0UYWpA

It was weird to rail against someone for an hour, but strangely cathartic, and the #datavis seems to have been effective?

9 8 1 1
4 months ago
Post image

Congrats to my student Leah Wood for successfully defending her Senior honors thesis "Spatiotemporal Modeling of Maternal Mortality in South Carolina 2018-2023"! Leah will pursue a Masters in Biostatistics next.

This was on par with an excellent Masters thesis, tbh. Great job!

6 0 0 0
4 months ago
Post image Post image Post image Post image

Having a great time visiting Columbia, SC and catching up with old friends and coworkers! I will always be grateful to @uofscstatistics.bsky.social for helping me to launch my career!

3 0 0 0
4 months ago

nailed it!

1 0 0 0
4 months ago
Post image

One week till my trip to Columbia, SC to see my Honors student Leah defend her senior thesis! She did an excellent job on Bayesian spatiotemporal modeling of maternal mortality in South Carolina from 2018-2023. She coded up the model in Stan & R and produced some very nice maps!

3 1 0 0
4 months ago

Excited to give a talk at the Maryland Statistical Symposium at the Brin Mathematics Research Center this December! Looking forward to connecting with many outstanding statistics researchers in the DMV area and the mid-Atlantic region.

0 0 0 0
4 months ago
Preview
Bayesian group regularization in generalized linear models with a continuous spike-and-slab prior - Annals of the Institute of Statistical Mathematics We study Bayesian group-regularized estimation in high-dimensional generalized linear models (GLMs) under a continuous spike-and-slab prior. Our framework covers both canonical and non-canonical link ...

My AISM paper "Bayesian group regularization in generalized linear models with a continuous spike-and-slab prior" is now online! I really appreciated the feedback from reviewers who wrote very thorough, high-quality reviews. A+ experience submitting here. tinyurl.com/yc6phfwv

2 0 0 0
5 months ago
Post image

Today is my 40th birthday, and I had a very special treat for it -- getting to meet one of my idols, Dr. Jianqin Fan! Dr. Fan's papers on SCAD penalty and sure independence screening for high-dimensional data were among the first papers I read as a PhD student. So inspiring!

1 0 0 0