there's a funny one of these for spectra of certain objects, where there's one 'usual' setting, "strong convergence" implies "weak convergence" (as you might expect), but there is also another setting in which it doesn't (unforgivable).
do you ever write an insane math sentence? like i just wrote "is a convex path component a path component that is convex?"
Quite neat - answers some questions I had (at least in one dimension):
arxiv.org/abs/2603.08939
'Shape-constrained density estimation with Wasserstein projection'
- Takeru Matsuda, Ting-Kam Leonard Wong
In somebody else's words:
Depends on exactly what you mean, but I think both, i.e. valid for sample mean with sample variance and population mean with population variance.
Seeing the sights at the Bristol Art Gallery:
Yeah, it's a strange framing - makes perfect sense for general random variables, as you say.
One application: let P and Q be absolutely continuous probability measures with P / Q < M. Then the chi-squared divergence satisfies
Chi^2 (P, Q) < M - 1,
which can be appreciably less than the M^2 / 4 which one would get from Popoviciu's inequality (though there are other ways to prove this).
Didn't know that this one had a name, but it's a cute one. There's a nice "polynomial-style" proof (cf. SoS hierarchies etc.), a nice "martingale-style" proof, and probably some others too. Neat!
Thanks!
I guess the thinking was that a specific example might be more interesting than the general phenomenon of convex / concave as sup / inf of linear.
Sure, yeah.
An odd miscellaneum: viewed as a functional of the data-generating process, the Bayes risk is concave.
Corollary: variance and mean absolute deviation are concave functions of the measure in question.
".. and our host: Danieeeel Stroock!"
With that being said, the substance of the proof is much as it ever was, and the quantitative part is easy to extract.
The more 'modern' version of the result then appears in Stroock's book on Large Deviations with Deuschel, and again in a collaboration of Deuschel-Holley-Stroock.
1) The 'usual' lemma does not appear as a stand-alone result in the original paper, but is rather used as part of another lemma (5.1, for posterity).
2) The usual lemma is also quantitative in character, whereas in this initial paper, it is only used qualitatively.
Something which I finally got around to: figuring out the origins of "the Holley-Stroock perturbation lemma" (in the context of functional inequalities). This seems self-explanatory enough, since the relevant 1987 paper of Holley and Stroock is relatively well-documented. Still, some surprises:
(not that i'll be dropping £170 any time soon ...)
do mine eyes deceive me? a release date in the present year?
Somehow, theorems about P_{\hat{\theta}} should 'always' be viable, not depend on parameterisations, etc., which seems ideal on the theory side.
Ah, sure; this isn't really one of those. This is coming out of teaching a statistical theory course, and trying to sound out what sort of theorems are "natural". Increasingly, I see that I want to prove things about e.g. P_{\hat{\theta}} rather than just \hat{\theta}, which is more usual.
*- any functional of the data-generating process is a parameter
A byproduct of this framing is that it (not necessarily beneficially) precludes the existence of certain types of non-identifiability.
My relationship with "parameters" in statistical models has passed through approximately three phases:
1) parameters are convenient ways of thinking about models
2) parameters are a red herring; focus directly on the data-generating process
3) parameters are good, because anything* is a parameter.
Let me be explicit in highlighting Rocco as leading the work on this project and doing an excellent job - very talented junior researcher, and well worth keeping on your radar for all the expected reasons.
A little part of the paper which I like a lot is to express the main algorithm (MTM) as an approximation to an approximation of a certain 'ideal' algorithm. Among other things, this 'twice-approximated' perspective helps to pin down which of the approximations is more problematic / delicate.
Some work freshly published at EJS: projecteuclid.org/journals/ele...
'Analysis of Multiple-try Metropolis via Poincaré inequalities'
- Rocco Caprio, Sam Power, Andi Q. Wang
We conduct a convergence analysis of a specific class of MCMC procedures based on multiple-proposal strategies.
Splendid!
It has a name (Cayley transform), but relevance to this problem is not a priori clear, e.g. doesn't clearly extend or generalise to cube roots.
Generalisations come relatively easily when they are also expressible as an LP / convex program (so e.g. maximising worst-case power over some set is good, while maximising best-case power is not as straightforward; adding in extra significance-type constraints is usually fine, etc.).