Alex Shtoff's Avatar

Alex Shtoff

@alexshtf.bsky.social

Principal scientist @ TII Visit my research blog at https://alexshtf.github.io

51 Followers  |  98 Following  |  125 Posts  |  Joined: 14.11.2024  |  2.2834

Latest posts by alexshtf.bsky.social on Bluesky

Or distance from KKT conditions.

27.07.2025 17:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

4) convergence of deviation from optimality conditions.

27.07.2025 17:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I heard that polynomials are the (complex) root of all evil.

26.06.2025 08:52 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Turnstile majority A famous algorithm of Boyer and Moore for the majority problem finds a majority element in a stream of elements while storing only two values, a single tenta...

Nicely written blog post by David Eppstein on the Boyer–Moore (deterministic) streaming algorithm to find a majority element in a stream, and its extensions, first to the turnstile model, and then to frequency estimation (Misra–Gries).
11011110.github.io/blog/2025/05... via @theory.report

06.05.2025 13:30 β€” πŸ‘ 18    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Post image

The Matrix Mortality Problem asks if a given set of square matrices can multiply to the zero matrix after a finite sequence of multiplications of elements. It is is undecidable for matrices of size 3x3 or larger. buff.ly/lLmvvlo

01.05.2025 05:01 β€” πŸ‘ 6    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

Attending #ICLR2025?
Visit our poster!
A stochastic approach to the subset selection problem via mirror descent.
Today, 3pm, poster #336.

26.04.2025 01:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Indistinguishable from magic*

24.04.2025 03:22 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Don't care who uses meta services. Don't like when people invent imaginary threats to their privacy and spread them. Don't want to use - dont use. Want to opt out and use - dont be afraid they won't comply. Meta is afraid of the legal and reputational consequences. That's my opinion.

19.04.2025 14:47 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Used to work for Yahoo. Not a giant like Meta, but also use plenty of user data to make money. Not complying with regulation was always a big no no. These companies are very afraid of the legal and reputational consequences. So I wouldn't be afraid they won't comply.

19.04.2025 14:28 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The phenomenal paper "epigraphical analysis" by Attouch and Wets was the basis for my Ph.D thesis. It was fun digging deep into epi-convergence.

18.04.2025 06:37 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Yes, I understand. They should cite and criticize it.

12.04.2025 12:20 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Why simply not cite the directly relevant prior work?

12.04.2025 11:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

A question to the #math people here. For differential equations there are spectral methods that find approximate solutions in the span of orthogonal bases. Is there a variant for difference equations, and bases of sequences? A good tutorial maybe?

12.04.2025 06:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

The Tarski-Seidenberg theorem in logical form states that the set of first-order formulas over the real numbers is closed under quantifier elimination. This means any formula with quantifiers can be converted into an equivalent quantifier-free formula. perso.univ-rennes1.fr/michel.coste...

01.04.2025 05:00 β€” πŸ‘ 6    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

🚨New post🚨

@beenwrekt.bsky.social recently started a bit of noise with his post about nonexistence of overfitting, but he has a point. In this post we explore it using simple polynomial curve fitting, *without regularization*, using another interesting basis.

alexshtf.github.io/2025/03/27/F...

31.03.2025 13:22 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

What makes it a method for "fine-tuning LLMs" rather than a method for fine-tuning any neural network in general?

28.03.2025 09:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Is it true that log(1+exp(x)) is the infimum of the quadratic upper bound over a?
If so - it also has interesting consequences.

24.03.2025 17:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Or maybe there's cultural difference of the black people, who may be more afraid of not returning a loan and may do extreme things, such as using the last of their savings, to return it.

This paper seems to focus too much on estimation, and ignores the complexities of modeling.

12.03.2025 11:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

As beautiful as I can remember it.

02.03.2025 18:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

You can get away without - theory papers.

26.02.2025 09:17 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Models reflect training data. Training data reflects people.
Models are just fancy autocomplete.

People have free will, and are not fancy autocomplete of what a model has showed them.

15.02.2025 14:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The numpy function there doesn't use SGD. To the best of my knowledge, it uses QR decomposition.
Anyway, things get interesting when the degree becomes 200 :)

14.02.2025 19:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Google Colab

Linear regression with Legendre polynomials:
colab.research.google.com/drive/1phA7N...

Inspired by Ben's post about nonexistent overfitting, to convince my coworkers.

14.02.2025 18:08 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Stochastic people.

10.02.2025 11:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

But, somehow, there is a "uniform prior" over the integers, according to some people I met :)

10.02.2025 10:51 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

People underappreciate work that "just works" with current software stacks.
Great paper!

07.02.2025 14:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

For example, spectral methods are good at solving problems where it yields a linear system in the coefficients.

01.02.2025 15:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Isn't it like this in, say, numerical differential equations?

01.02.2025 15:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Relatively-Smooth Convex Optimization by First-Order Methods, and Applications The usual approach to developing and analyzing first-order methods for smooth convex optimization assumes that the gradient of the objective function is uniformly smooth with some Lipschitz constant $...

There are those two well-known paper:
pubsonline.informs.org/doi/abs/10.1...
arxiv.org/abs/1610.05708

28.01.2025 14:59 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I love this distribution you sample from.

28.01.2025 11:11 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@alexshtf is following 20 prominent accounts