Ray Bai's Avatar

Ray Bai

@raybai07.bsky.social

Assistant Professor of Statistics at George Mason University, French bulldog owner, and pop culture enthusiast. Any views expressed on here are my own, not my employer's. πŸ‘¨β€πŸ«πŸ“Š πŸ“šπŸ‘¨β€πŸ³πŸ³οΈβ€πŸŒˆ raybai.net

182 Followers  |  193 Following  |  60 Posts  |  Joined: 19.12.2023  |  2.1277

Latest posts by raybai07.bsky.social on Bluesky

Post image

Today is my 40th birthday, and I had a very special treat for it -- getting to meet one of my idols, Dr. Jianqin Fan! Dr. Fan's papers on SCAD penalty and sure independence screening for high-dimensional data were among the first papers I read as a PhD student. So inspiring!

26.09.2025 20:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image

Turning 40 in a week (!). Here are some old photos of me when I was 30-31 yrs old and much more fit/30 pounds lighter. Need to get back into shape, maybe join a local running group again!

19.09.2025 13:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
ABSTRACT: We study Bayesian group-regularized estimation in high-dimensional generalized linear models (GLMs) under a continuous spike-and-slab prior. Our framework covers both canonical and non-canonical link functions and subsumes logistic, Poisson, negative binomial, and Gaussian regression with group sparsity. We obtain the minimax L2 convergence rate for both a maximum a posteriori (MAP) estimator and the full posterior distribution under our prior. Our theoretical results thus justify the use of the posterior mode as a point estimator. The posterior distribution also contracts at the same rate as the MAP estimator, an attractive feature of our approach which is not the case for the group lasso. For computation, we propose expectation-maximization (EM) and Markov chain Monte Carlo (MCMC) algorithms. We illustrate our method through simulations and a real data application on predicting human immunodeficiency virus (HIV) drug resistance from protein sequences.

ABSTRACT: We study Bayesian group-regularized estimation in high-dimensional generalized linear models (GLMs) under a continuous spike-and-slab prior. Our framework covers both canonical and non-canonical link functions and subsumes logistic, Poisson, negative binomial, and Gaussian regression with group sparsity. We obtain the minimax L2 convergence rate for both a maximum a posteriori (MAP) estimator and the full posterior distribution under our prior. Our theoretical results thus justify the use of the posterior mode as a point estimator. The posterior distribution also contracts at the same rate as the MAP estimator, an attractive feature of our approach which is not the case for the group lasso. For computation, we propose expectation-maximization (EM) and Markov chain Monte Carlo (MCMC) algorithms. We illustrate our method through simulations and a real data application on predicting human immunodeficiency virus (HIV) drug resistance from protein sequences.

I'm pleased to share that my single-authored paper "Bayesian group regularization in generalized linear models with a continuous spike-and-slab prior" has been accepted for publication in Annals of the Institute of Statistical Mathematics!

Link: raybai.net/wp-content/u...

16.09.2025 12:11 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Our paper, "Quantifying predictive uncertainty of aphasia severity in stroke patients with sparse heteroscedastic Bayesian high-dimensional regression," has been accepted for publication in Computational Statistics!

Read it here: raybai.net/wp-content/u...

15.09.2025 17:45 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Going to be back at University of South Carolina November 4-5 for my Honors thesis student Leah's defense! Excited to see her exceptional work on Bayesian modeling of maternal mortality with suppressed data (a challenging project that she tackled very well). Hope to catch up with folks!

09.09.2025 21:26 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

University of Florida crew now at George Mason University Statistics Dept!
Ray Bai (PhD student at UF from 2014-2018), Brenda Betancourt (Assistant Professor at UF from 2018-2022), and Abolfazl Safikhani (Assistant Professor at UF from 2019-2022). Small world!

05.09.2025 17:24 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Ray Bai Research Interests: Bayesian inference, deep learning, deep generative models, high-dimensional statistics, scalable algorithms, causal inference, survival analysis

It's official now. My new faculty page at @georgemasonu.bsky.social is now up: statistics.gmu.edu/profiles/rbai2

28.08.2025 17:09 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@uofscstatistics.bsky.social has a *second* TT Assistant Professor position for Fall 2026 available. Applicants with research experience in causal inference, reinforcement learning, probabilistic graphical models, or sequential & adaptive decision-making are especially encouraged to apply.

27.08.2025 22:28 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

My previous employer, the University of South Carolina Department of Statistics, has a new opening for a Tenure-Track Assistant Professor position with a start date of August 16, 2026. This is a great place to work! Apply by November 17, 2025 for full consideration.

25.08.2025 02:25 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

Had such a great time at our going away get-together tonight! We are so grateful for the friends we made in Columbia, SC. We will be back to visit again. Come visit us in DC! ❀️

17.08.2025 01:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Last day in the office at USC, and my advisees came to meet with me one last time! My Senior Honors thesis student Leah, my Summer REU student Evan, & my PhD students Fanghua and Sijian. So fortunate to have had the opportunity to work with outstanding students at USC. #professor #proudadvisor

15.08.2025 20:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image Post image

Last week on University of South Carolina's campus. I will be in the office on Friday for some final in-person meetings and for some final orders of business.

If you're at USC on Fri., please come say goodbye!

13.08.2025 18:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

#IPAM (the institute for pure and applied mathematics) is facing a critical shortfall for operating expenses due to an unexpected suspension of NSF funding www.ipam.ucla.edu/news/nsf-fun... . Donations for emergency continuity of operations funding can be made at

giving.ucla.edu/Campaign/Donat

08.08.2025 00:48 β€” πŸ‘ 128    πŸ” 39    πŸ’¬ 5    πŸ“Œ 7
Terence Tao (@tao@mathstodon.xyz) The current administration in the US has, through various funding agencies such as the NSF and NIH, has recently suspended virtually all federal grants to my home university, UCLA (including my own p...

I have some related posts on this at mathstodon.xyz/@tao/1149568... and mathstodon.xyz/deck/@tao/11...

08.08.2025 00:49 β€” πŸ‘ 36    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0
Post image

New preprint "Parameter expanded variational Bayes for well-calibrated high-dimensional linear regression with spike-and-slab priors"!

Read the full paper here: www.researchgate.net/publication/...

01.08.2025 20:53 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Posing with the Cocky the Gamecock statue

Posing with the Cocky the Gamecock statue

Standing in front of the LeConte College sign (where the USC Statistics Department is located)

Standing in front of the LeConte College sign (where the USC Statistics Department is located)

Today is my last "official" day at University of South Carolina. Next month, I will be moving to the D.C. area and continuing my academic career at George Mason University. However, I really cherish my time as a faculty member at USC. #ForeverToThee

Read my full post here: tinyurl.com/4fpzjeru

31.07.2025 14:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thank you for all of the support the last few years, @uofscstatistics.bsky.social. I will greatly cherish my time as a faculty member at USC where I got to collaborate with great colleagues, teach and mentor excellent students, and grow as a researcher and educator!

30.07.2025 17:58 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
A Unified Three-State Model Framework for Analysis of Treatment Crossover in Survival Trials We present a unified three-state model (TSM) framework for evaluating treatment effects in clinical trials in the presence of treatment crossover. Researchers have proposed diverse methodologies to...

"A Unified Three-State Model Framework for Analysis of Treatment Crossover in Survival Trials" (with my former PhD student Zile Zhao and our co-authors) was published in the most recent issue of Statistics in Biopharmaceutical Research! Check it out here: doi.org/10.1080/1946...

29.07.2025 21:48 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
SHORT ABSTRACT: In this work, we propose a novel Bayesian approach for biclustering binary datasets called Binary Spike-and-Slab Lasso Biclustering (BiSSLB). Our method is based on a logistic matrix factorization model with spike-and-slab lasso priors on the latent spaces. We further incorporate an Indian Buffet Process (IBP) prior to automatically determine the number of biclusters. To avoid the high computational cost of EM algorithms, we propose a novel coordinate ascent algorithm with proximal steps which allows for scalable computation. The performance of our proposed approach is assessed via simulations and two real applications on leukemia gene expression data and protein-protein interaction (PPI) data, where BiSSLB is shown to outperform other state-of-the-art binary biclustering methods.

SHORT ABSTRACT: In this work, we propose a novel Bayesian approach for biclustering binary datasets called Binary Spike-and-Slab Lasso Biclustering (BiSSLB). Our method is based on a logistic matrix factorization model with spike-and-slab lasso priors on the latent spaces. We further incorporate an Indian Buffet Process (IBP) prior to automatically determine the number of biclusters. To avoid the high computational cost of EM algorithms, we propose a novel coordinate ascent algorithm with proximal steps which allows for scalable computation. The performance of our proposed approach is assessed via simulations and two real applications on leukemia gene expression data and protein-protein interaction (PPI) data, where BiSSLB is shown to outperform other state-of-the-art binary biclustering methods.

Hard at work this summer before the Fall semester begins. Another new preprint "BiSSLB: Binary spike-and-slab biclustering for binary datasets" with my PhD student Sijian Fan should be available in the next few weeks!

24.07.2025 14:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
ABSTRACT: Chronic obstructive pulmonary disease (COPD) is one of the leading causes of hospitalization and death in the United States. Although progress has been made in understanding both patient-level and community-level risk factors associated with COPD, there is little work on measuring the causal effects of community exposures on COPD risk. In particular, it remains unclear whether neighborhood income or neighborhood exposure to vape shops causes the number of emergency department (ED) visits for COPD to increase. In this study, we introduce a Bayesian spatial causal inference model to determine the average causal effect of median income and vape shop density on the number of COPD ED visits. Using county-level data from the state of North Carolina in 2023, we found that an increase in a county's median income caused a significant decrease in that county's COPD ED visits. On the other hand, greater exposure to vape shops did not cause a significant change in the number of COPD ED visits. Our findings enhance understanding of how community exposures impact COPD hospitalizations and may aid in place-based interventions to reduce the number of ED visits for COPD.

ABSTRACT: Chronic obstructive pulmonary disease (COPD) is one of the leading causes of hospitalization and death in the United States. Although progress has been made in understanding both patient-level and community-level risk factors associated with COPD, there is little work on measuring the causal effects of community exposures on COPD risk. In particular, it remains unclear whether neighborhood income or neighborhood exposure to vape shops causes the number of emergency department (ED) visits for COPD to increase. In this study, we introduce a Bayesian spatial causal inference model to determine the average causal effect of median income and vape shop density on the number of COPD ED visits. Using county-level data from the state of North Carolina in 2023, we found that an increase in a county's median income caused a significant decrease in that county's COPD ED visits. On the other hand, greater exposure to vape shops did not cause a significant change in the number of COPD ED visits. Our findings enhance understanding of how community exposures impact COPD hospitalizations and may aid in place-based interventions to reduce the number of ED visits for COPD.

Another new preprint coming soon, hopefully in the next few weeks! This is work from the summer REU that I supervised this past summer. The undergrad students did a great job on this!

21.07.2025 13:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Title: Parameter-Expanded Variational Bayes for Well-Calibrated High-Dimensional Linear Regression with Spike-and-Slab Priors.

Abstract: As scientific problems grow in complexity, there is a pressing need for robust and scalable computational methods for fitting high-dimensional statistical models. Variational Bayes (VB) provides an
approximate alternative to traditional sampling-based Bayesian inference, often reducing computation time from days to hours or minutes. VB typically minimizes the Kullback-Leibler divergence via
coordinate ascent under a mean-field assumption. Its performance can be highly sensitive to prior specifications, particularly in sparse high-dimensional regression with spike-and-slab priors. A significant
limitation of standard VB is its tendency to produce poorly calibrated predictions; that is, the predicted values often exhibit a systematic bias relative to the observed outcomes, failing to accurately reflect the true conditional expectation. Motivated by this, we apply parameter expansion to VB and propose a sparse parameter-expanded VB (spexvb) algorithm that improves robustness to prior settings
and enhances predictive calibration. Compared to standard VB, spexvb delivers more stable and accurate posterior estimates. We evaluate its performance through extensive simulations and a real-world application, demonstrating the practical advantages of parameter expansion in variational inference for high-dimensional regression.

Title: Parameter-Expanded Variational Bayes for Well-Calibrated High-Dimensional Linear Regression with Spike-and-Slab Priors. Abstract: As scientific problems grow in complexity, there is a pressing need for robust and scalable computational methods for fitting high-dimensional statistical models. Variational Bayes (VB) provides an approximate alternative to traditional sampling-based Bayesian inference, often reducing computation time from days to hours or minutes. VB typically minimizes the Kullback-Leibler divergence via coordinate ascent under a mean-field assumption. Its performance can be highly sensitive to prior specifications, particularly in sparse high-dimensional regression with spike-and-slab priors. A significant limitation of standard VB is its tendency to produce poorly calibrated predictions; that is, the predicted values often exhibit a systematic bias relative to the observed outcomes, failing to accurately reflect the true conditional expectation. Motivated by this, we apply parameter expansion to VB and propose a sparse parameter-expanded VB (spexvb) algorithm that improves robustness to prior settings and enhances predictive calibration. Compared to standard VB, spexvb delivers more stable and accurate posterior estimates. We evaluate its performance through extensive simulations and a real-world application, demonstrating the practical advantages of parameter expansion in variational inference for high-dimensional regression.

New preprint coming soon! Variational Bayes (VB) is a scalable approach for conducting Bayesian variable selection when p>>n, but the standard mean-field CAVI algorithm results in poorly calibrated predictions. We introduce a novel parameter-expanded sheme to correct this issue.

19.07.2025 12:23 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

The Summer REU I'm supervising wraps up next week! My students did an excellent job studying spatial models for disease mapping and causal inference, and implementing these models on real public health data from North Carolina! Looking forward to their final report and presentation!

03.07.2025 18:08 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

Celebrating 8 years of #love and #pride! Happy Pride Month! πŸ§‘β€πŸ€β€πŸ§‘πŸ³οΈβ€πŸŒˆ

30.06.2025 12:23 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I had a truly lovely time at BNP 14: the 14th International Conference on Bayesian Nonparametrics at UCLA this past week! I gave a talk, listened to many interesting talks, & chatted with a number of folks. This was my first time attending a BNP conference -- I'll definitely attend again in future!

28.06.2025 16:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I'm headed to Los Angeles, CA today to attend the BNP 14 Conference on Bayesian nonparametrics at UCLA! I will be in L.A. this whole upcoming week. Please feel free to reach out if you would like to meet up!

22.06.2025 15:14 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

(2/2) We will now be extending our work to a Bayesian spatial causal inference model. We will try to publish a paper from this -- excited to see how this turns out, and grateful for the opportunity to engage with bright undergrad students on challenging statistical problems.

19.06.2025 23:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

(1/2) My summer REU students did such a great job! They used county-level data and a Bayesian conditional autoregressive (CAR) model to estimate the North Carolina counties' relative risks for COPD ED visits. Not an easy project for undergrads but they did it!

(cont...)

19.06.2025 23:09 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

Excited to give a talk for Merck Oncology this Friday! I'll be talking about my work with my former PhD student Zile Zhao on treatment switching in survival trials. I hope that the research team at Merck finds our statistical framework useful for their clinical trials!

17.06.2025 21:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Session 04.M2.172: Recent advances in high-dimensional modeling at the IISA 2025 Conference at the University of Nebraska-Lincoln is from 10:45 am-12:15 pm this Sunday, June 15.

Session 04.M2.172: Recent advances in high-dimensional modeling at the IISA 2025 Conference at the University of Nebraska-Lincoln is from 10:45 am-12:15 pm this Sunday, June 15.

Abstract of talk "Misspecified Yet Credible: A Generalized Bayes Framework for Uncertainty Quantification in High-Dimensional Bayesian Vector Autoregressive Models": Vector autoregressive (VAR) models are widely used to capture linear dependencies in multivariate time series, yet Bayesian inferential theory for high-dimensional VAR remains largely undeveloped. We propose a generalized Bayes framework that automatically adapts to sparsity and is robust to misspecification of both the error distribution and covariance structure. Under mild regularity conditions, we show that this approach yields reliable uncertainty quantification for the VAR transition matrices in very high dimensions. As a corollary, the same strategy also delivers valid inference for sparse high-dimensional stochastic regressions with serially corelated errors.

Abstract of talk "Misspecified Yet Credible: A Generalized Bayes Framework for Uncertainty Quantification in High-Dimensional Bayesian Vector Autoregressive Models": Vector autoregressive (VAR) models are widely used to capture linear dependencies in multivariate time series, yet Bayesian inferential theory for high-dimensional VAR remains largely undeveloped. We propose a generalized Bayes framework that automatically adapts to sparsity and is robust to misspecification of both the error distribution and covariance structure. Under mild regularity conditions, we show that this approach yields reliable uncertainty quantification for the VAR transition matrices in very high dimensions. As a corollary, the same strategy also delivers valid inference for sparse high-dimensional stochastic regressions with serially corelated errors.

If anyone is attending the IISA 2025 Conference at U. Nebraska-Lincoln this week, check out Session 04.M2.I72: Recent advances in high-dimensional modeling on Sun., Jun 15, 10:45 am-12:15 pm! My recent work on Bayesian VAR models (w/ Partha Sarkar of FSU) will be presented then!

12.06.2025 03:08 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Sparse high-dimensional linear mixed modeling with a partitioned empirical Bayes ECM algorithm - Statistics and Computing High-dimensional longitudinal data is increasingly used in a wide range of scientific studies. To properly account for dependence between longitudinal observations, statistical methods for high-dimens...

Our paper "Sparse high-dimensional linear mixed modeling with a partitioned empirical Bayes ECM algorithm" in Statistics and Computing is now available online! Read at the link below. πŸ‘‡

Link: doi.org/10.1007/s112...

02.06.2025 16:24 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@raybai07 is following 18 prominent accounts