Philipp Petersen's Avatar

Philipp Petersen

@pc-pet.bsky.social

Professor at U Vienna. Math of machine learning. Other math, too.

48 Followers  |  58 Following  |  7 Posts  |  Joined: 25.11.2024  |  1.7498

Latest posts by pc-pet.bsky.social on Bluesky

Post image Post image

In spiking neural networks, neurons communicate - as in the brain - via short electrical pulsesโšก(spikes). But how can we formally quantify the (dis)advantages of using spikes? ๐Ÿค”

In our new preprint, @pc-pet.bsky.social and I introduce the concept of "Causal Pieces" to approach this question!

02.05.2025 08:06 โ€” ๐Ÿ‘ 33    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

What happens if you lose 10$ per share in one week and gain 10$ per share the next alternating for 52 weeks ;)? Is the effect stronger if you replace 10$ by 20$?

12.04.2025 20:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The latest version includes:

โœ… Significantly fewer typos
โœ… More illustrations and figures
โœ… Reorganized sections for better clarity
โœ… Sharpened and improved arguments

08.04.2025 10:44 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
table of contents

table of contents

table of contents

table of contents

table of contents

table of contents

Post image

After receiving very helpful feedback from the community, Jakob Zech and I have revised our graduate textbook:

๐Ÿ“˜ ๐˜”๐˜ข๐˜ต๐˜ฉ๐˜ฆ๐˜ฎ๐˜ข๐˜ต๐˜ช๐˜ค๐˜ข๐˜ญ ๐˜›๐˜ฉ๐˜ฆ๐˜ฐ๐˜ณ๐˜บ ๐˜ฐ๐˜ง ๐˜‹๐˜ฆ๐˜ฆ๐˜ฑ ๐˜“๐˜ฆ๐˜ข๐˜ณ๐˜ฏ๐˜ช๐˜ฏ๐˜จ

and uploaded the new version to arxiv:

๐Ÿ”— arxiv.org/abs/2407.18384

If you have already read itโ€”or plan toโ€”we would really appreciate your feedback.

08.04.2025 10:44 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Great point in principle, but you seem to be having it at the ideal altitude and in the ideal season.

31.03.2025 08:42 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Other countries will only want to hire the top researchers, which only correspond to a small part of the budget.

31.03.2025 08:38 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

๐Ÿ” Key insights:
* The singular values of the query-key matrix product are the most critical parameters for tracking stability.
* Self-attention and softmax operations are the worst offenders for error amplification.
* There are stable (and unstable) methods for normalization.

14.03.2025 07:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image Behavior of relative error for increasing spectral norm of key and query matrices

Behavior of relative error for increasing spectral norm of key and query matrices

Would you expect an LLM using over 100 billion floating-point operations in low precision to produce accurate outputs?
Not if you heard an introductory class to numerics. How bad can things get? To find out, we carried out a numerical stability analysis of the transformer arxiv.org/abs/2503.10251.

14.03.2025 07:05 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@pc-pet is following 20 prominent accounts