David Li's Avatar

David Li

@lidavidm.bsky.social

PMC member for Apache Arrow.

18 Followers  |  39 Following  |  19 Posts  |  Joined: 06.02.2024  |  1.5382

Latest posts by lidavidm.bsky.social on Bluesky

Post image 22.06.2025 06:07 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

So I wonder if the term was already floating around the database community, and Julien (or someone else) (unintentionally?) swapped "striping" for "shredding" in the Parquet docs, and then the term took hold as Parquet became popular.

18.05.2025 15:26 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Download SQL Server 2000 Retired Technical documentation from Official Microsoft Download Center The content you requested has already retired. It's available to download on this page.

The docs for SQL Server 2000 here talk about shredding

> OPENXML calls can be used to provide rowset view...and process them, for example, inserting them into different tables (this process is also referred to as "Shredding XML into tables")

www.microsoft.com/en-nz/downlo...

18.05.2025 15:25 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
SQL Server 2000 and XML | SQL Server 2000 XML Support | InformIT Discover the many different ways SQL Server 2000 supports XML with a comprehensive look at both out-of-the-box support and the SQLXML 3.0 add-on.

This page, supposedly from 2003, talks about SQL Server 2000 adding a function to "shred" XML
> Microsoft SQL Server 2000 also provides the OPENXML function to shred an XML document and provide a rowset representation of the XML data.
web.archive.org/web/20120115...

18.05.2025 15:22 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Update README.md ยท apache/parquet-java@f7ba78a

It seems the first mention in the Parquet repos is from 2013, though. There Julien Le Dem links to a page about "striping" (as used in the Dremel paper) but calls it "shredding". So maybe you should ask him directly :)
github.com/apache/parqu...

18.05.2025 15:18 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
The Best Way to shred XML data into SQL Server database columns What is the best way to shred XML data into various database columns? So far I have mainly been using the nodes and value functions like so: INSERT INTO some_table (column1, column2, column3) SELECT

I went into a rabbit hole on "record shredding"...Here's something interesting: there's an SO question from 2008 asking about "shredding XML data into relational tables". Maybe the term sort of already existed? stackoverflow.com/questions/61...

18.05.2025 15:18 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I will not trust macOS with external drives ever again. "First Aid" seems to have deleted directories from my local backup SSD...thankfully I have another backup in Backblaze but it's a bit older :/
Good (and painful) reminder to back up regularly!

19.04.2025 12:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@yatosaking.bsky.social ๆง˜ใ€ @galetteweb.bsky.social ๆง˜
ๅคฑ็คผใ„ใŸใ—ใพใ™ใ€‚
่‰ฒ็ด™ใ‚’ๅฟƒใ‚ˆใ‚Šๆ„Ÿ่ฌใ—ใฆใŠใ‚Šใพใ™ใ€‚
๏ผˆใ‚ญใƒฉใ‚ญใƒฉใ—ใฆใ„ใ‚‹ใจใ“ใ‚็‰นใซใŠๆฐ—ใซๅ…ฅใ‚Šใพใ™๏ผ‰
ใ“ใ‚Œใ‹ใ‚‰ใ‚‚ๅ…ˆ็”Ÿใจใ‚ฌใƒฌใƒƒใƒˆใ‚’ๅฟœๆดใ„ใŸใ—ใพใ™๏ผ

19.04.2025 08:59 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

ๆ—ฅๆœฌใญโ€ฆใƒใƒƒใ‚ฐใ‚’ใ‚นใƒผใƒ‘ใƒผใซๆŒใฃใฆ่กŒใฃใฆใ‚‚ใ€ๅบ—ๅ“กใ•ใ‚“ใฏๅ„่ฒทใ„็‰ฉใ‚’ไธๅฏงใซใƒ“ใƒ‹ใƒผใƒซ่ข‹ใ‚’ใคใ‘ใฆใใ‚ŒใŸ๐Ÿ˜…

19.04.2025 08:46 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

ไปŠๆ—ฅTimeLeftใ‚ขใƒ—ใƒชใฎใƒŸใƒผใƒˆใ‚ขใƒƒใƒ—ใ‚’ใ‚„ใฃใฆใŸใ€‚็งใฏไผš่ฉฑ่ƒฝๅŠ›(?)ใ‚ใ‚“ใพใ‚Šใชใใฆใ‚‚ใ‘ใฃใ“ๆฅฝใ—ใ‹ใฃใŸใจๆ€ใ†
ใ“ใ‚Œใ‹ใ‚‰้ ‘ๅผตใ‚‰ใชใใ‚ƒ

16.04.2025 13:26 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

PRใƒฌใƒ“ใƒฅใƒผใฎใŸใ‚ใซPostgresใจๆฏ”่ผƒใ—ใ‚ˆใ†ใจๆ€ใฃใฆใŸใ‘ใฉโ€ฆใ“ใ‚Œใฏใ“ใ‚ใฃ๐Ÿ˜ฑ

15.04.2025 05:53 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Baby blue eyes (nemphila) flowers in Showa Kinen Park

Baby blue eyes (nemphila) flowers in Showa Kinen Park

The favorite flower of Himmel the Hero.

Showa Kinen Park, Tachikawa, Tokyo
Olympus E-M10 Mk2/TTArtisans 35mm f/1.4

05.04.2025 11:47 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Spotted in discord (and paraphrased to protect the innocent):

"I don't particularly like actually doing the job, but thinking about it? Hoo boy"

(Someday I'll make that teaching implementation of Arrow...)

05.04.2025 11:03 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

The sakura are not _quite_ there, but almost!

26.03.2025 05:17 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Kou has a blog post too: x.com/ktou/status/...

21.03.2025 06:41 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

April is a great time to visit :)

21.03.2025 06:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Apache ArrowๆฑไบฌใƒŸใƒผใƒˆใ‚ขใƒƒใƒ—2025 (2025/04/11 19:00ใ€œ) ## Apache Arrow Meetup in Tokyo 2025 Spring. (English article is below) If you are interested in this event but don't know how to register, please contact Kou or hiroyuki. Apache ArrowใฎProject Mana...

Anyone want to book a last minute trip to Japan? Kou, Rok, and I will be there :)

red-data-tools.connpass.com/event/349680/

21.03.2025 04:58 โ€” ๐Ÿ‘ 3    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

I'm biased but maybe things like Apache Parquet, Apache Arrow? They have multiple implementations across different languages and Arrow gets used as a means of interchange between different data vendors (Spark, BigQuery, ClickHouse <-> Pandas, polars, etc.)

19.03.2025 00:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Apache Arrow ADBC 17 (Libraries) Release The Apache Arrow team is pleased to announce the version 17 release of the Apache Arrow ADBC libraries. This release includes 18 resolved issues from 13 distinct contributors. This is a release of the...

Check out what is new on the Apache Arrow ADBC 17 libraries release: arrow.apache.org/blog/2025/03...

07.03.2025 11:12 โ€” ๐Ÿ‘ 5    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Data Wants to Be Free: Fast Data Exchange with Apache Arrow Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics. It specifies a standardized language-independent column-oriented memory form...

Data wants to be free: comparing and explaining how Arrow's data serialization can be better than what's in protocols like PostgreSQL's

arrow.apache.org/blog/2025/02...

#apachearrow #arrow

28.02.2025 06:14 โ€” ๐Ÿ‘ 6    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
How the Apache Arrow Format Accelerates Query Result Transfer Arrow speeds up query result transfer by slashing (de)serialization overheads. We outline five key attributes of the Arrow format that enable this.

2025 is shaping up to be a breakout year for fast query result transfer with Apache Arrow. But what exactly makes it so fast? David Li, Matt Topol, and I break it down in this new blog post: arrow.apache.org/blog/2025/01...

13.01.2025 16:25 โ€” ๐Ÿ‘ 21    ๐Ÿ” 9    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 2

@lidavidm is following 20 prominent accounts