Ian Cook's Avatar

Ian Cook

@ian.columnar.tech

I work on Apache Arrow and obsess about interoperability and performance in data analytics systems

134 Followers  |  110 Following  |  14 Posts  |  Joined: 13.06.2023  |  1.9908

Latest posts by ian.columnar.tech on Bluesky

Video thumbnail

Next Tuesday, get ready to meet the mind behind #Pandas & #ApacheArrow!

@wesmckinney.com shares his origin story (Part 1) on #TheTestSet. From speedruns to shaping the data stack, this is one you won't want to miss.

Mark your calendar for Tuesday & subscribe at thetestset.co!

#DataScience #Python

11.07.2025 13:57 โ€” ๐Ÿ‘ 23    ๐Ÿ” 9    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@duckdb.org ๐Ÿค @arrow.apache.org

23.05.2025 15:24 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

๐Ÿš€ Introducing **Bauplan**

A serverless, code-native platform for building data and AI pipelines โ€” directly on your object store. No clusters. No notebooks. No GUI based workflows.

Just Python + SQL + S3.

๐Ÿ‘‰ www.bauplanlabs.com/blog/hello-b...

16.04.2025 14:12 โ€” ๐Ÿ‘ 18    ๐Ÿ” 9    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 3
Preview
How the Apache Arrow Format Accelerates Query Result Transfer Arrow speeds up query result transfer by slashing (de)serialization overheads. We outline five key attributes of the Arrow format that enable this.

2025 is shaping up to be a breakout year for fast query result transfer with Apache Arrow. But what exactly makes it so fast? David Li, Matt Topol, and I break it down in this new blog post: arrow.apache.org/blog/2025/01...

13.01.2025 16:25 โ€” ๐Ÿ‘ 21    ๐Ÿ” 9    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 2
Post image

[๐ŸŽฅ Watch] Apache Arrow is a columnar format and multi-language toolbox for fast data interchange and in-memory analytics.

Matt Topol, Arrow PMC member, talks about Arrow subprojects and how you can get involved with the project. https://buff.ly/40qNPAL

#opensource

07.01.2025 21:45 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

There are some great books in this vein by Joseph Stiglitz and David Graeber

30.12.2024 13:53 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Jiffy Express - Saturday Night Live
YouTube video by Saturday Night Live Jiffy Express - Saturday Night Live

The best illustration of how users and buyers of a product can have opposite priorities

19.12.2024 14:54 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Would love to see more discussion of this. To Andy's point we have this wonderful Arrow eco system that's underdeveloped. In a world of columnar data we're still stuck using ODBC/JDBC for connectivity to most data systems. Hard to get some vendors to even talk about ADBC.

13.12.2024 20:59 โ€” ๐Ÿ‘ 16    ๐Ÿ” 2    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
A grocery store receipt showing an undecipherable series of characters followed by ",bin.base64"

A grocery store receipt showing an undecipherable series of characters followed by ",bin.base64"

Base64 decoding error on my Food Lion receipt today

20.11.2024 14:56 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

You might like to try Ibis: ibis-project.org. It was created by Wes McKinney (also the author of pandas) as a solution to some problems pandas couldn't solve well. It's had a lot of recent growth and improvement. There's a tutorial for users coming from dplyr: ibis-project.org/tutorials/ib...

19.11.2024 04:29 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Yesโ€”this came up for me because Iโ€™m playing with the Swift implementation of Apache Arrow, which is maintained in a directory of the Arrow monorepo, but Swiftโ€™s package manager assumes a 1:1 relationship of repo:package

14.11.2024 14:14 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Pretty funny that the easiest way to do it was with an SVN command, until GitHub dropped support for that earlier this year.

14.11.2024 01:53 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

It's 2024 and there's still no way to download just one subdirectory of a Git repo without using a sketchy third-party website or tool? Wild.

14.11.2024 00:58 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Thereโ€™s some great news coming soon on this. Unfortunately I canโ€™t share it publicly now, but DM me if youโ€™re curious

03.11.2024 18:58 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thereโ€™s also ADBC which aims to replace ODBC and JDBC in analytics applications with a much faster Arrow-based alternative.

DM me if youโ€™re interested to discuss

30.10.2024 18:45 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Arrow IPC stream format works great for sending over HTTP APIs. Thereโ€™s also Arrow Flight which is a framework for sending Arrow IPC data through RPC APIs.

Arrow is much more efficient than JSON for data transport in OLAP applications because you avoid transposing columns to and from rows.

30.10.2024 18:39 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The Arrow IPC stream format has become common in recent years as a data exchange format.

Basically: If you design an optimal columnar format for on disk, you end up with something like Parquet. If you design an optimal column format for in memory and over the wire, you end up with Arrow IPC format

30.10.2024 18:35 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
arrow-experiments/http at main ยท apache/arrow-experiments Apache Arrow Development Experiments. Contribute to apache/arrow-experiments development by creating an account on GitHub.

If youโ€™re sending Arrow IPC over HTTP APIs, there are some great examples showing the basics at github.com/apache/arrow...

Simply using the Arrow IPC stream format as a transport format (instead of JSON) usually gives most of the performance of Arrow Flight with lower implementation complexity

28.10.2024 11:51 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@ian.columnar.tech is following 20 prominent accounts