Ritchie Vink's Avatar

Ritchie Vink

@ritchie46.bsky.social

Author and Founder of Polars

182 Followers  |  5 Following  |  14 Posts  |  Joined: 31.12.2023  |  1.8254

Latest posts by ritchie46.bsky.social on Bluesky

Preview
Polars Meetup - Polars Cloud and Acceleration ยท Luma Join the second edition of our Polars Meetup with talks from Ritchie Vink (Polars) and Vyas Ramasubramani (NVIDIA) to discuss accelerating and scalingโ€ฆ

Join me the 24th in SF for a @pola.rs meetup!

I will be having a talk about Polars, Polars-Cloud and the upcoming distributed engine.

NVIDIA will also be doing a talk about their GPU acceleration with Polars-CuDF

Hope to see you there!

lu.ma/60b6wfs8

14.07.2025 17:42 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

No more `with pl.StringCache()`

Soon... ๐ŸŒˆ

04.07.2025 15:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
GP1085: Scaling DataFrames With Polars - Theme: undefined Location: NVIDIA GTC PARIS - Pavillon 7 - June 12 1:00 PM 1:45 PM - CET | Resume: Room: N03 Polars is a query engine with a DataFrame frontend designed for fast, efficient data processing. This sess...

This Thursday I will join Lawrence Mitchell from @nvidia
on the podium during the NVIDIA GTC in Paris.

We'll discuss how we made Polars work on the GPU and how it will scale to multi-GPU in the future.

On se voit lร -bas !

vivatechnology.com/sessions/ses...

10.06.2025 13:58 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Polars has gotten 4x faster than Polars! ๐Ÿš€

In the last months, the team has worked incredibly hard on the new-streaming engine and the results pay off. It is incredibly fast, and beats the Polars in-memory engine by a factor of 4 on a 96vCPU machine.

01.05.2025 14:05 โ€” ๐Ÿ‘ 16    ๐Ÿ” 3    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0
Post image

** Sponsor announcement ** Polars is a Supporter of RustWeek!ย 
Find out more about them here: pola.rs

Thank you @pola.rs for your support! ๐Ÿ™

More info about RustWeek and tickets: rustweek.org

#rustweek #rustlang

18.04.2025 12:07 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Yeah, or even for single machine remotely. E.g. let's say you run a very small node as airflow orchestrator, but need a big VM for the ETL job. That orchestrator can initiate the remote query and doesn't have to worry about hardware setup/teardown.

17.02.2025 13:44 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Already got all TPC-H queries running distributed!

12.02.2025 14:16 โ€” ๐Ÿ‘ 8    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

He is a notorious Polars hater and has tweeted he wants the project to fail.

He fears the fact that Polars deviation from the pandas API will splinter the landscape and doesn't appreciate new API development. I haven't read a technical reason from his side.

03.02.2025 11:48 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

That was all pre 1.0.

We've released 1.0 in july last year. The API is stable now.

28.01.2025 19:36 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I would call that a weakness of the AI. ๐Ÿ˜…

28.01.2025 17:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

This weeks Polars release we shipped initial Unity Catalog support. This makes integration with Databricks much smoother.

Writing features are under development and will follow soon. Full release notes: github.com/pola-rs/pola...

25.01.2025 11:47 โ€” ๐Ÿ‘ 12    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Modern Polars A side-by-side comparison of the Polars and Pandas libraries.

Learning polars has been ... actually a joy? It just makes sense to my #rstats #dplyr trained data muscles #databs #python

kevinheavey.github.io/modern-polars/

23.01.2025 00:01 โ€” ๐Ÿ‘ 48    ๐Ÿ” 13    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2
Preview
Release Python Polars 1.20.0 ยท pola-rs/polars โš ๏ธ Deprecations Make parameter of str.to_decimal keyword-only (#20570) ๐Ÿš€ Performance improvements Extend functionality on BitmapBuilder and use in Growables (#20754) Specialize first/last agg fo...

This weeks Polars release has a huge improvement for window functions. They can be an order of magnitude faster.

And we can run 20/22 TPC-H queries on the new streaming engine and all on Polars cloud. More will follow soon! ;)

See the full release docs here:

github.com/pola-rs/pola...

19.01.2025 08:57 โ€” ๐Ÿ‘ 20    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
A screenshot of a Pyodide REPL executing Polars code:

import polars as pl
import requests
r = requests.get("https://raw.githubusercontent.com/pola-rs/polars/refs/heads/main/examples/datasets/foods2.csv")
pl.read_csv(r.content).group_by("category").mean()

A screenshot of a Pyodide REPL executing Polars code: import polars as pl import requests r = requests.get("https://raw.githubusercontent.com/pola-rs/polars/refs/heads/main/examples/datasets/foods2.csv") pl.read_csv(r.content).group_by("category").mean()

A screenshot of a Quarto Live code cell executing Polars code:

import polars as pl
import requests
r = requests.get("https://raw.githubusercontent.com/pola-rs/polars/refs/heads/main/examples/datasets/foods2.csv")
pl.read_csv(r.content).group_by("category").mean()

A screenshot of a Quarto Live code cell executing Polars code: import polars as pl import requests r = requests.get("https://raw.githubusercontent.com/pola-rs/polars/refs/heads/main/examples/datasets/foods2.csv") pl.read_csv(r.content).group_by("category").mean()

A screenshot of a Shinylive app using Polars code:

from shiny import App, render, ui
import polars as pl
from pathlib import Path

app_ui = ui.page_fluid(
    ui.input_select("cyl", "Select Cylinders", choices=["4", "6", "8"]),
    ui.output_data_frame("filtered_data")
)

def server(input, output, session):
    df = pl.read_csv(Path(__file__).parent / "mtcars.csv")
    
    @output
    @render.data_frame
    def filtered_data():
        return (df
                .filter(pl.col("cyl") == int(input.cyl()))
                .select(["mpg", "cyl", "hp"]))

app = App(app_ui, server)

A screenshot of a Shinylive app using Polars code: from shiny import App, render, ui import polars as pl from pathlib import Path app_ui = ui.page_fluid( ui.input_select("cyl", "Select Cylinders", choices=["4", "6", "8"]), ui.output_data_frame("filtered_data") ) def server(input, output, session): df = pl.read_csv(Path(__file__).parent / "mtcars.csv") @output @render.data_frame def filtered_data(): return (df .filter(pl.col("cyl") == int(input.cyl())) .select(["mpg", "cyl", "hp"])) app = App(app_ui, server)

Recently I've been working on getting #polars running in #pyodide. This was a fun one, even requiring patches to LLVM's #wasm writer! Everything has now been upstreamed and earlier this week Pyodide v0.27.0 released, including a Wasm build of Polars usable in Pyodide, Shinylive and Quarto Live ๐ŸŽ‰

04.01.2025 11:59 โ€” ๐Ÿ‘ 50    ๐Ÿ” 9    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
demo of dt.replace

demo of dt.replace

โœจ New temporal feature in the next Polars release!

โฒ๏ธ dt.replace lets you replace components of Date / Datetime columns

โšก๐Ÿฆ€ It's an expressified vectorised rustified version of the Python standard library datetime.replace

20.12.2024 16:25 โ€” ๐Ÿ‘ 9    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Post image

We removed serde from our Series struct and saw a significant drop in Polars' binary size (of all features activated). The amount of codegen is huge. ๐Ÿ˜ฎ

18.12.2024 13:32 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

We finally support writing to cloud storage natively and seamlessly!

12.12.2024 11:04 โ€” ๐Ÿ‘ 18    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Making a recommender by just using Polars!
YouTube video by probabl Making a recommender by just using Polars!

Join us this Friday if you're eager to see what it can be like to design a recommender while limiting ourselves to just a DataFrame API. It is somewhat unconventional, but a great excuse to show off a Polars trick or two.

www.youtube.com/watch?v=U3Fi...

10.12.2024 10:50 โ€” ๐Ÿ‘ 14    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

Interesting. is it an authentication error with Azure Storage?

08.12.2024 06:11 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Nice Post.

For the benchmarks, I think it would be more fair if you use `scan_csv` as read forces full materialization.

P.S. why use pyarrow to read instead of our native reader?

05.12.2024 06:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
A diagram shows how to use str.split, list.first / list.last, cast, pl.int_ranges, and explode, all together, to turn a dataframe where a column may contain ranges like "3-5" into a similar dataframe where all ranges have been expanded, or exploded, across multiple rows.

The full code is:

range_start = pl.col("nrs").str.split("-").list.first().cast(pl.Int64)
range_end = pl.col("nrs").str.split("-").list.last().cast(pl.Int64)
df.with_columns(pl.int_ranges(range_start, range_end + 1)).explode("nrs")

A diagram shows how to use str.split, list.first / list.last, cast, pl.int_ranges, and explode, all together, to turn a dataframe where a column may contain ranges like "3-5" into a similar dataframe where all ranges have been expanded, or exploded, across multiple rows. The full code is: range_start = pl.col("nrs").str.split("-").list.first().cast(pl.Int64) range_end = pl.col("nrs").str.split("-").list.last().cast(pl.Int64) df.with_columns(pl.int_ranges(range_start, range_end + 1)).explode("nrs")

How to โ€œexpandโ€ ranges like "3-5" across new rows with the values 3, 4, 5?

This comes straight from our Discord server (discord.com/invite/4UfP5...)

21.11.2024 14:36 โ€” ๐Ÿ‘ 7    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Diagram showing how `value_counts` produces a column with struct values, mapping column values to their counts.
We then show how to use `.struct.field` to extract a single field from the struct and how to use `.struct.unnest` to extract all fields into corresponding columns.

Diagram showing how `value_counts` produces a column with struct values, mapping column values to their counts. We then show how to use `.struct.field` to extract a single field from the struct and how to use `.struct.unnest` to extract all fields into corresponding columns.

Why is there a `struct` data type?

A single expression produces a single column, so expressions like `value_counts` need to output structs to map the values to their counts.

With that said, do you understand why `.struct.unnest` doesn't break the 1 expr = 1 column principle?

20.11.2024 11:14 โ€” ๐Ÿ‘ 7    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@ritchie46 is following 5 prominent accounts