Pavel VeselΓ½'s Avatar

Pavel VeselΓ½

@pavelvesely.bsky.social

Computer scientist at Charles University, Prague πŸ‡¨πŸ‡Ώ I like all kinds of efficient algorithms and data structures for large datasets || also β›°οΈπŸ‡ΊπŸ‡¦https://iuuk.mff.cuni.cz/~vesely/

56 Followers  |  51 Following  |  24 Posts  |  Joined: 10.12.2024  |  1.6647

Latest posts by pavelvesely.bsky.social on Bluesky

Thanks for your interest! Unfortunately, we don't have such an online class, and it'll actually be in Czech.

04.10.2025 08:25 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
DOD 2024: data sketching Z GB na kB: jak naskečovat velkΓ‘ data a neztratit pΕ™i tom hlavu (ani patu) Pavel VeselΓ½

I've already given such a talk last year, starting with some motivation and then talking about cardinality estimation (hiding many details). About 50 students attending, some interacting at the beginning. Slides in Czech (hopefully easy to translate nowadays):
docs.google.com/presentation...

02.10.2025 13:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

During the next two months, I will have two long talks about streaming algorithms / data sketching for high-school students. Did you give a similar talk? What was your experience?

02.10.2025 13:23 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Btw, Nick's profile: @nickmatsakis.bsky.social

16.09.2025 21:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The algorithm gives more than just an estimate for the diameter: Using the stored points, one can sqrt(2)+epsilon-approximate Furthest Neighbor queries or 1.22-approximate the Minimum Enclosing Ball. The approx. ratio for diameter is optimal but it's still open for MEB. Lots of nice open problems!

16.09.2025 21:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Specifically, for sqrt(2)+epsilon-approximation of diameter, the AS'10 algorithm stores O(1/epsilon^3 log(1/epsilon)) input points, and we managed to shave off one factor of 1/epsilon. Still, we can only prove a lower bound of 1/epsilon, and closing the gap is a nice open problem!

16.09.2025 21:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The AS'10 algorithm covers points by blurred balls, and this approach overall works. Adding a few ideas, we have circumvented the issue in AS'10, slightly simplified the algorithm and its analysis, and improved the space bounds.

16.09.2025 21:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Streaming Algorithms for Extent Problems in High Dimensions - Algorithmica We present (single-pass) streaming algorithms for maintaining extent measures of a stream S of n points in $\mathbb{R} ^{d}$ . We focus on designing streaming algorithms whose working space is polynom...

We in fact simplify a nice paper by Agarwal and Sharathkumar from SODA'10 and Algorithmica '15. Yet, despite that it's published in a decent journal, there appears to be a subtle flaw in the argument, and fixing it probably requires using more space...
link.springer.com/article/10.1...

16.09.2025 21:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Streaming Diameter of High-Dimensional Points We improve the space bound for streaming approximation of Diameter but also of Farthest Neighbor queries, Minimum Enclosing Ball and its Coreset, in high-dimensional Euclidean spaces. In particular, o...

Tomorrow at ESA: my former postdoc Nick Matsakis will present our streaming algorithm for diameter in high-dimensional spaces. Very simple: just 4 lines of pseudocode, and yet, achieving optimal approximation. Joint work with MagnΓΊs M. HalldΓ³rsson. arxiv.org/abs/2505.16720

16.09.2025 20:44 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Preview
On 9/16/25, celebrate a date of mathematical beauty Pythagorean Triple Square Day, as one man affectionately calls 9/16/25, is a day like no other this century.

Pythagorean Triple Square Day, as one man affectionately calls 9/16/25, is a day like no other this century.

16.09.2025 11:50 β€” πŸ‘ 1203    πŸ” 640    πŸ’¬ 23    πŸ“Œ 111

Zstandard's --long range mode works wonders for assemblies, but needs uninterrupted single line sequences.

*AllTheBacteria 661k, multiline fasta*
gzip (pigz): 751GB
zstandard --long: 641GB (30% original size)

*Single line fasta*
gzip (pigz): 700GB
zstandard --long: 232GB (10% original size)

09.09.2025 10:27 β€” πŸ‘ 36    πŸ” 12    πŸ’¬ 2    πŸ“Œ 3
Post image

πŸŒŽπŸ‘©β€πŸ”¬ For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.πŸ¦ πŸ„πŸŒ΅

Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open.

doi.org/10.1101/2024...

03.09.2025 08:39 β€” πŸ‘ 217    πŸ” 118    πŸ’¬ 3    πŸ“Œ 16

At scale, the way that we store (and process) data matters! Many may think that the way we keep data, the file formats we adopt, and the way that we compress data are unimportant details, but they are, in fact, critical considerations to allow science to move forward at scale!

26.08.2025 14:45 β€” πŸ‘ 23    πŸ” 8    πŸ’¬ 1    πŸ“Œ 0
Some thoughts on journals, refereeing, and the P vs NP problem A guest post by Eric Allender prompted by anΒ  (incorrect)Β PΒ β‰  NP proof Β  recently published Β in Springer Nature's Frontiers of Computer Scie...

Springer publishes a P β‰  NP "proof" and Eric Allender has words to say.

blog.computationalco...

04.08.2025 18:08 β€” πŸ‘ 43    πŸ” 15    πŸ’¬ 2    πŸ“Œ 5

A monumental collaborative effort with many incredible people ☺️ Proud to be part of this!

10.06.2025 08:21 β€” πŸ‘ 9    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0

Slides from my talk (with @kamilsjaron.bsky.social) on an history of k-mers in bioinformatics: rayan.chikhi.name/pdf/2025-kme...

03.06.2025 09:25 β€” πŸ‘ 44    πŸ” 24    πŸ’¬ 1    πŸ“Œ 2

Yes! Even a full hard drive with family photos is apparently a useful computational resource.

30.05.2025 11:42 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Turnstile majority A famous algorithm of Boyer and Moore for the majority problem finds a majority element in a stream of elements while storing only two values, a single tenta...

Nicely written blog post by David Eppstein on the Boyer–Moore (deterministic) streaming algorithm to find a majority element in a stream, and its extensions, first to the turnstile model, and then to frequency estimation (Misra–Gries).
11011110.github.io/blog/2025/05... via @theory.report

06.05.2025 13:30 β€” πŸ‘ 18    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

Thanks a lot for organizing the workshop! I really enjoyed it!

26.04.2025 15:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

We finally concluded the meeting. Thanks to all attendees for their scientific contributions and for traveling (near or far) to the meeting! Thanks to the local organizers for the infrastructure and catering, and thanks to the co-organizers @yaronorenstein.bsky.social @camillemrcht.bsky.social!

25.04.2025 08:18 β€” πŸ‘ 26    πŸ” 11    πŸ’¬ 1    πŸ“Œ 0
Post image

@pavelvesely.bsky.social (CSI) on the mother of spss: masked superstrings that help you representing k-mer sets in a very compact way. He actually takes a lot from @brinda.eu since he squeezed 3 papers in a 12 min talk 😳

24.04.2025 05:04 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Program RECOMB-seq is the RECOMB Satellite Conference on Biological Sequence Analysis

πŸš€ Just 2 days to go!

Excitement is building for #RECOMBseq 2025 in Seoul πŸŽ‰
Join leading researchers as we dive into cutting-edge computational genomics, from single-cell to long-read sequencing.

πŸ—“ April 24-25
πŸ“ Seoul
πŸ“„ Program: recomb-seq.github.io/program/

#RECOMB2025 #Genomics #Bioinformatics

22.04.2025 11:57 β€” πŸ‘ 4    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Post image

A decade ago, we had thousands of bacterial genomes. Now, we have millions. How to scale computational methods?

Our paper in @naturemethods.bsky.social answers this: use evolutionary history to guide compression and search.

rdcu.be/eg4OA

w/ @baym.lol, @zaminiqbal.bsky.social et al. 🧡1/

11.04.2025 15:01 β€” πŸ‘ 160    πŸ” 37    πŸ’¬ 3    πŸ“Œ 1

So glad this is finally out. The method has been instrumental in allowing us to compress the AllTheBacteria data - ~2 million bacterial genomes shrink from 3Terabytes (gzipped) to 100Gb using phylogenetic compression. Great work by @brinda.eu

09.04.2025 22:27 β€” πŸ‘ 126    πŸ” 51    πŸ’¬ 4    πŸ“Œ 1
ESA – ALGO2025

European Sympsium on Algorithms 2025 will be held in Warsaw in September, as part of ALGO 2025. Do you have great work on design and analysis of algorithms? Submit it by April 23! algo-conference.org/2025/esa/

08.04.2025 14:45 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

I was their "guinea pig" for registration, so I can testify it's a smooth process with O(1) steps which all work with a positive probability!

04.04.2025 07:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

STOC in Prague! With FOCS deadline over, there's no excuse to postpone the registration.
acm-stoc.org/stoc2025/

04.04.2025 07:00 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Workshop on Algorithms for Large Data (Online) 2025

Taking a break from the submission season? Swing by the Workshop on Algorithms for Large Data (Online), WALDO 2025 πŸ—“οΈ April 14β€”16: waldo-workshop.github.io/2025.html
Registration is free! (but necessary by April 7)

04.04.2025 06:44 β€” πŸ‘ 4    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

Aleksander {\L}ukasiewicz, Jakub T\v{e}tek, Pavel Vesel\'y
SplineSketch: Even More Accurate Quantiles with Error Guarantees
https://arxiv.org/abs/2504.01206

03.04.2025 05:13 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
War with the Newts - Wikipedia

The upcoming trade war with the penguins, Is a good excuse to mention the following fantastic book...
en.m.wikipedia.org/wiki/War_wit...

03.04.2025 12:31 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

@pavelvesely is following 20 prominent accounts