Next Tuesday, get ready to meet the mind behind #Pandas & #ApacheArrow!
@wesmckinney.com shares his origin story (Part 1) on #TheTestSet. From speedruns to shaping the data stack, this is one you won't want to miss.
Mark your calendar for Tuesday & subscribe at thetestset.co!
#DataScience #Python
11.07.2025 13:57 โ ๐ 23 ๐ 9 ๐ฌ 1 ๐ 0
@duckdb.org ๐ค @arrow.apache.org
23.05.2025 15:24 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
๐ Introducing **Bauplan**
A serverless, code-native platform for building data and AI pipelines โ directly on your object store. No clusters. No notebooks. No GUI based workflows.
Just Python + SQL + S3.
๐ www.bauplanlabs.com/blog/hello-b...
16.04.2025 14:12 โ ๐ 18 ๐ 9 ๐ฌ 3 ๐ 3
How the Apache Arrow Format Accelerates Query Result Transfer
Arrow speeds up query result transfer by slashing (de)serialization overheads. We outline five key attributes of the Arrow format that enable this.
2025 is shaping up to be a breakout year for fast query result transfer with Apache Arrow. But what exactly makes it so fast? David Li, Matt Topol, and I break it down in this new blog post: arrow.apache.org/blog/2025/01...
13.01.2025 16:25 โ ๐ 21 ๐ 9 ๐ฌ 0 ๐ 2
[๐ฅ Watch] Apache Arrow is a columnar format and multi-language toolbox for fast data interchange and in-memory analytics.
Matt Topol, Arrow PMC member, talks about Arrow subprojects and how you can get involved with the project. https://buff.ly/40qNPAL
#opensource
07.01.2025 21:45 โ ๐ 5 ๐ 1 ๐ฌ 0 ๐ 0
There are some great books in this vein by Joseph Stiglitz and David Graeber
30.12.2024 13:53 โ ๐ 1 ๐ 1 ๐ฌ 1 ๐ 0
YouTube video by Saturday Night Live
Jiffy Express - Saturday Night Live
The best illustration of how users and buyers of a product can have opposite priorities
19.12.2024 14:54 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Would love to see more discussion of this. To Andy's point we have this wonderful Arrow eco system that's underdeveloped. In a world of columnar data we're still stuck using ODBC/JDBC for connectivity to most data systems. Hard to get some vendors to even talk about ADBC.
13.12.2024 20:59 โ ๐ 16 ๐ 2 ๐ฌ 2 ๐ 0
A grocery store receipt showing an undecipherable series of characters followed by ",bin.base64"
Base64 decoding error on my Food Lion receipt today
20.11.2024 14:56 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
You might like to try Ibis: ibis-project.org. It was created by Wes McKinney (also the author of pandas) as a solution to some problems pandas couldn't solve well. It's had a lot of recent growth and improvement. There's a tutorial for users coming from dplyr: ibis-project.org/tutorials/ib...
19.11.2024 04:29 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
Yesโthis came up for me because Iโm playing with the Swift implementation of Apache Arrow, which is maintained in a directory of the Arrow monorepo, but Swiftโs package manager assumes a 1:1 relationship of repo:package
14.11.2024 14:14 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Pretty funny that the easiest way to do it was with an SVN command, until GitHub dropped support for that earlier this year.
14.11.2024 01:53 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
It's 2024 and there's still no way to download just one subdirectory of a Git repo without using a sketchy third-party website or tool? Wild.
14.11.2024 00:58 โ ๐ 1 ๐ 0 ๐ฌ 2 ๐ 0
Thereโs some great news coming soon on this. Unfortunately I canโt share it publicly now, but DM me if youโre curious
03.11.2024 18:58 โ ๐ 4 ๐ 0 ๐ฌ 0 ๐ 0
Thereโs also ADBC which aims to replace ODBC and JDBC in analytics applications with a much faster Arrow-based alternative.
DM me if youโre interested to discuss
30.10.2024 18:45 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Arrow IPC stream format works great for sending over HTTP APIs. Thereโs also Arrow Flight which is a framework for sending Arrow IPC data through RPC APIs.
Arrow is much more efficient than JSON for data transport in OLAP applications because you avoid transposing columns to and from rows.
30.10.2024 18:39 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
The Arrow IPC stream format has become common in recent years as a data exchange format.
Basically: If you design an optimal columnar format for on disk, you end up with something like Parquet. If you design an optimal column format for in memory and over the wire, you end up with Arrow IPC format
30.10.2024 18:35 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
arrow-experiments/http at main ยท apache/arrow-experiments
Apache Arrow Development Experiments. Contribute to apache/arrow-experiments development by creating an account on GitHub.
If youโre sending Arrow IPC over HTTP APIs, there are some great examples showing the basics at github.com/apache/arrow...
Simply using the Arrow IPC stream format as a transport format (instead of JSON) usually gives most of the performance of Arrow Flight with lower implementation complexity
28.10.2024 11:51 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
After three days without programming, life becomes meaningless
ไฟบใฎ็ฉบใฏใๅใฎ็ฉบใใๆใ
DuckDB is an analytical in-process SQL database management system. "DuckDB" and the DuckDB logo are registered trademarks of the DuckDB Foundation.
Cloudflare is the worldโs leading connectivity cloud, and we have our eyes set on an ambitious goal โ to help build a better Internet.
At https://columnar.tech, we're building the future of fast universal data connectivity.
Bauplan is the easiest and fastest way to build robust data pipelines in Python over your object storage.
bauplanlabs.com
Open source #python software developer and teacher. Pandas core developer. GeoPandas and Shapely maintainer. Apache Arrow PMC.
Software engineer at fused.io
Software engineer, from big servers to tiny microcontrollers. Member of The ASF, tech lead at Elastic. Opinions my own, obviously. Blogging (sometimes) at https://bluxte.net
๐ Toulouse, France, Europe
Software developer working on all things arrow and columnar storage, currently, Lance.
Co-Author on two OโReilly books (no spoilers), Dremio Senior Evangelist, and Friendly Tech & Data Hipster. (AlexMerced.com)
Logiciel libre / free software (core developer of @arrow.apache.org, #ApacheParquet, #Python #CPython). Engineer at @quantstack.bsky.social. Membre de l'Afis @afis.bsky.social.
(profile picture: Sophie Taeuber's lion)
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics.
Find more at https://arrow.apache.org/
Two time Grammy losing rock quartet. Very high Jeopardy answer to actual hit ratio
Large Format Photographer, traditional and alternative printer, webdeveloper, musician, lover of fountain pens and almost every type of music.
http://darkroomprint.com
Principal Architect @posit.co, GP Composed Ventures, Co-founder Voltron Data. Open source: Apache Arrow, pandas, Ibis. "Python for Data Analysis" book
maintainer of SlateDB
loves Rust, Datasys, Cloud Infra, AI
https://flaneur2020.github.io