Krishna Balasubramanian @krizna

Krishnakumar Balasubramanian, Nathan Ross
Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights
https://arxiv.org/abs/2507.12686

18.07.2025 04:14 — 👍 1 🔁 1 💬 0 📌 0

Restricted Spectral Gap Decomposition for Simulated Tempering Targeting Mixture Distributions Simulated tempering is a widely used strategy for sampling from multimodal distributions. In this paper, we consider simulated tempering combined with an arbitrary local Markov chain Monte Carlo sampl...

link to paper:

arxiv.org/abs/2505.15059

22.05.2025 02:41 — 👍 0 🔁 0 💬 0 📌 0

Restricted Spectral Gap Decomposition for Simulated Tempering Targeting Mixture Distributions Simulated tempering is a widely used strategy for sampling from multimodal distributions. In this paper, we consider simulated tempering combined with an arbitrary local Markov chain Monte Carlo sampl...

New theory for simulated tempering using restricted spectral gap with arbitrary local MCMC samplers under multi-modality.

When applied to simulated tempering Metropolis-Hasting algorithm for sampling from Gaussian mixture models, we obtain high-accuracy TV guarantees.

22.05.2025 02:41 — 👍 2 🔁 0 💬 1 📌 0

We implement these oracles using heat-kernel truncation & Varadhan's asymptotics, linking our method to entropy-regularized proximal point method on Wasserstein spaces, in the latter case.

Joint work with Yunrui Guan and @shiqianma.bsky.social

12.02.2025 21:59 — 👍 1 🔁 1 💬 0 📌 0

Riemannian Proximal Sampler for High-accuracy Sampling on Manifolds We introduce the Riemannian Proximal Sampler, a method for sampling from densities defined on Riemannian manifolds. The performance of this sampler critically depends on two key oracles: the Manifold ...

New work on Riemannian Proximal Sampler, to sample on Riemannian manifolds:

arxiv.org/abs/2502.07265

Comes with high-accuracy (i.e., log(1/eps), where eps is tolerance) guarantees with exact and inexact oracles for Manifold Brownian Increments and Riemannian Heat-kernels

12.02.2025 21:59 — 👍 4 🔁 1 💬 1 📌 0

Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent We provide finite-particle convergence rates for the Stein Variational Gradient Descent (SVGD) algorithm in the Kernelized Stein Discrepancy ($\mathsf{KSD}$) and Wasserstein-2 metrics. Our key insight...

Happy to have this paper on Improved rates for Stein Variational Gradient Descent accepted as an oral presentation at #ICLR2025

arxiv.org/abs/2409.08469

Only theory, No deep learning (although techniques useful for DL), No experiments in this time of scale and AGI :)

11.02.2025 16:25 — 👍 2 🔁 0 💬 0 📌 0

Our bounds show how key factors—like the number of matches and treatment balance—impact Gaussian approximation accuracy.

We also introduce multiplier bootstrap bounds for obtaining finite-sample valid, data-driven confidence intervals.

02.01.2025 19:01 — 👍 1 🔁 0 💬 0 📌 0

Matching-based ATE estimators align treated and control units to estimate causal effects without strong parametric assumptions.

Using Malliavin-Stein method we establish Gaussian Approximation bounds for these estimators.

02.01.2025 19:01 — 👍 0 🔁 0 💬 1 📌 0

Gaussian and Bootstrap Approximation for Matching-based Average Treatment Effect Estimators We establish Gaussian approximation bounds for covariate and rank-matching-based Average Treatment Effect (ATE) estimators. By analyzing these estimators through the lens of stabilization theory, we e...

Got this paper out in 2024, just in time before AGI takes over in 2025:

arxiv.org/abs/2412.17181

We develop Gaussian approximation bounds and non-asymptotically valid confidence intervals for matching-based Average Treatment Effect (ATE) estimators.

02.01.2025 19:01 — 👍 4 🔁 0 💬 1 📌 0

It seems that OpenAI's latest model, o3, can solve 25% of problems on a database called FrontierMath, created by EpochAI, where previous LLMs could only solve 2%. On Twitter I am quoted as saying, "Getting even one question right would be well beyond what we can do now, let alone saturating them."

20.12.2024 23:15 — 👍 86 🔁 8 💬 8 📌 1

YouTube video by AI Creation Today Generated by Sora AI, elephant

Von Neumann: With 4 parameters, I can fit an elephant. With 5, I can make it wiggle its trunk.

OpenAI: Hold my gazillion parameter Sora model - I’ll make the elephant out of leaves and teach it to dance.

youtu.be/4QG_MGEBQow?...

11.12.2024 00:49 — 👍 0 🔁 0 💬 0 📌 0

thanks, resent the email now!

03.12.2024 01:11 — 👍 0 🔁 0 💬 0 📌 0

jack skellington from the nightmare before christmas is standing in the dark and asking what to do . ALT: jack skellington from the nightmare before christmas is standing in the dark and asking what to do .

@iclr-conf.bsky.social Would greatly appreciate any guidance on what to do if reviewer, AC and PC did not respond. Thanks a lot!

cc:
@yisongyue.bsky.social

02.12.2024 19:17 — 👍 1 🔁 0 💬 1 📌 0

How well RF performs in these settings? That’s still an open question.

Bottom-line: Time to compare SGD-trained NNs with RF and not kernel methods!

27.11.2024 15:07 — 👍 1 🔁 0 💬 0 📌 0

Going beyond mean-field regime for SGD trained NNs certainly helps. Recent works connect learnability of SGD trained NNs with leap complexity and information exponent of function classes (like single and multi index models) with the goal of explaining feature learning.

27.11.2024 15:07 — 👍 0 🔁 0 💬 1 📌 0

It also creates an intriguing parallel with NNs: greedy-trained partitioning models and SGD-trained NNs (in the mean-field regime) both thrive under specific structural assumptions (eg. MSP) but struggle otherwise.

However, under MSP, greedy RFs are provably better that SGD-trained 2-NNs!

27.11.2024 15:07 — 👍 0 🔁 0 💬 1 📌 0

In our work:

arxiv.org/abs/2411.04394

we show that If the true regression function satisfies MSP, greedy training works well with 𝑂(log 𝑑) samples.

Otherwise, it struggles.

This settles the question of learnability for greedy recursive partitioning algorithms like CART.

27.11.2024 15:07 — 👍 0 🔁 0 💬 1 📌 0

MSP is used to argue that SGD trained 2-layer NNs are better than vanilla kernel methods.

But how do neural nets compare with random forest (RF) trained using greedy algorithms like CART?

27.11.2024 15:07 — 👍 0 🔁 0 💬 1 📌 0

How to characterize the learnability of local algorithms ?

The Merged Staircase Property (MSP) proposed by Abbe et al. (2022) is used to completely characterize the learnability of SGD-trained 2-layer neural networks (NN) in the regime where mean-field approximation holds for SGD.

27.11.2024 15:07 — 👍 6 🔁 0 💬 1 📌 0

add me please
🙋

26.11.2024 01:18 — 👍 1 🔁 0 💬 1 📌 0

Yes, but is the cover indicative of RL notations by any chance :P

24.11.2024 17:31 — 👍 1 🔁 0 💬 1 📌 0

Krishna Balasubramanian

Latest posts by krizna.bsky.social on Bluesky

@krizna is following 20 prominent accounts