jsulz's Avatar

jsulz

@jsulz.com.bsky.social

I like pretty things, functional things, funny things, food things, and computer things. Used to do devops/WordPress things @lexblog.bsky.social and devex/cloud infra things at @pantheon.io Now helping make things go fast at πŸ€— @hf.co

227 Followers  |  104 Following  |  178 Posts  |  Joined: 18.11.2024  |  1.9855

Latest posts by jsulz.com on Bluesky

Personal Superintelligence Explore Meta's vision of personal superintelligence, where AI empowers individuals to achieve their goals, create, connect, and lead fulfilling lives. Insights from Mark Zuckerberg on the future of AI...

Hard not to πŸ™„ at this section of Zuck's vision of "Personal Superintelligence"

"Personal devices like glasses that understand our context because they can see what we see, hear what we hear, and interact with us throughout the day will become our primary computing devices."

30.07.2025 20:12 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This also serves as a reminder to myself that I owe a round of "Thank you"s to all the talented designers I've worked with over the years.

30.07.2025 18:51 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Ready Xet Go - a Hugging Face Space by jsulz This app helps you monitor the progress of migrating repositories to Xet, showing you stats and charts on migration status and file types.

We just crossed 1 million repositories backed by Xet storage on @hf.co

I celebrated by reviving the early 2000s web design aesthetics that I love so much. Here's our dashboard showing our progress converting the Hub from Git LFS to Xet (and demonstrating my questionable design sensibilities).

30.07.2025 18:41 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
The Bitter Lesson versus The Garbage Can Does process matter? We are about to find out.

Perhaps the bitter lesson about all organizational design is that all you need is a garbage can of chaos.

29.07.2025 02:23 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
On agency Or, how to handle being sentenced to freedom, and handle it effectively, and authentically, and responsibly

Loved this post from @henrikkarlsson.bsky.social

"There have been a series of experiences that have helped me realize more of my agency, but I think the most important one was becoming a father"

πŸ’―πŸ’―πŸ’―πŸ’―

16.07.2025 22:24 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

A sneaky part of making this all work is our backward compatibility with Git LFS. This allows us to roll out a significant protocol change without forcing workflow changes

We call this the Git LFS Bridge internally, and like our migration process, it's power is in its simplicity.

15.07.2025 15:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

You can see over the past few months some of the biggest migrations show up in our cluster throughput.

Each spike corresponds to a significant migration (where we download from LFS and upload to Xet) with the baseline steadily increasing to just shy of 100 Gb/s

15.07.2025 15:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The engine behind moving from Git LFS to Xet is our migration process. It's simple, powerful, and has moved well over a dozen PB just by itself. Here's a high level view of how it works.

15.07.2025 15:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Migrating the Hub from Git LFS to Xet We’re on a journey to advance and democratize artificial intelligence through open source and open science.

We've moved the first 20PB from Git LFS to Xet on @hf.co
without any interruptions. Now we're migrating the rest of the Hub. We got this far by focusing on the community first.

Here's a deep dive on the infra making this possible and what's next: huggingface.co/blog/migrati...

15.07.2025 15:16 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Preview
Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure We’re on a journey to advance and democratize artificial intelligence through open source and open science.

A look into monitoring/observability at @hf.co

Some fun tidbits in here, like how we use our NAT gateway as a cost sentinel. Cloud infra costs are no joke.

14.07.2025 23:25 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

465 people. 122 languages. 58,185 annotations!

FineWeb-C v1 is complete! Communities worldwide have built their own educational quality datasets, proving that we don't need to wait for big tech to support languages.

Huge thanks to all who contributed!

huggingface.co/blog/davanst...

08.07.2025 12:07 β€” πŸ‘ 33    πŸ” 11    πŸ’¬ 2    πŸ“Œ 0
Preview
The mystery of em‑dashes: part two with quantitative evidence A couple of weeks ago I made an assumption: the rise of em‑dashes in AI‑generated text happened because model providers started scanning older, pre‑Kindle books.

"A close friend has used em-dashes since our days in college, and yet every time they include one in a text to me, I can't help but think, "Did an LLM write this?"

07.07.2025 20:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Further proof that cute animals are the great distractors.

04.07.2025 15:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
How Long Contexts Fail Taking care of your context is the key to building successful agents. Just because there’s a 1 million token context window doesn’t mean you should fill it.

More context, more problems. www.dbreunig.com/2025/06/22/h...

23.06.2025 06:23 β€” πŸ‘ 26    πŸ” 6    πŸ’¬ 1    πŸ“Œ 2
Preview
β€˜Hey man, I’m so sorry for your loss’: should you use AI to text? Artificial intelligence has entered the personal chat. What does that say about human relationships?

On using AI for personal messages: β€œWe want to just write a prompt and have it done. And there’s something that we are losing – it’s the process. And in the process, there’s many important aspects. It is the co-construction of ourselves with our activities”

01.07.2025 22:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

How does one test the quality of the tapes? Are you forced into watching each one, end-to-end?

28.06.2025 20:58 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Privacy concerns are legitimate and need to be addressed, but a larger part of me is concerned about the social, cultural, and cognitive impacts of a "magic genie bot that is going to take care of the exigencies of life"

28.06.2025 13:26 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I do not, but if a cute dog showed up and asked for treats, I would go broke.

Everyone has their weakness.

27.06.2025 20:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Project Vend: Can Claude run a small shop? (And why does that matter?) We let Claude run a small shop in the Anthropic office. Here's what happened.

What happens when you give an LLM a high-level objective to manage a small business and an incomplete toolset to achieve its aims?

It makes questionable inventory and sales decisions, loses most of its money, and has an identity crisis.

Not *so* far off from how I would perform.

27.06.2025 19:58 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

And so the march to the season of darkness begins.

27.06.2025 14:44 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
AI won't live on publisher sites The case for moving AI down the stack

Publishers are spending time and money on AI chatbots that nobody will use. If AI succeeds, here's why it will live in your browser and operating system - and what I think news orgs should build instead. werd.io/ai-wont-live...

25.06.2025 02:30 β€” πŸ‘ 25    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1

These are hard numbers to put into context, but let's try.

The latest run of Common Crawl was 471 TB.

We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours.

🀯🀯🀯

26.06.2025 14:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Xet content addressed store service moving bytes from one place to another at the speed of light.

Xet content addressed store service moving bytes from one place to another at the speed of light.

Meanwhile, our migrations have pushed throughput to numbers that are bonkers.

In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).

26.06.2025 14:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
View of Xet's S3 bucket - lotsa bytes.

View of Xet's S3 bucket - lotsa bytes.

It's been a bit since I took a step back and looked at our progress to migrate @hf.co from Git LFS to Xet, but every time I do it's mind boggling.

A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today?
πŸ€— 700,000 users/orgs
πŸ“ˆ 350,000 repos
πŸš€ 15PB

26.06.2025 14:48 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1
4-panel comic. (1) PERSON WITH BLACK HAT: Compatibility and interoperability are so important. (2) [Black hat shows diagram to two people] BLACK HAT: For example, most subway rails are 143.5cm apart. But many roller coasters use a narrower 110cm gauge. (3) BLACK HAT: For the last few years, our company has been quietly retrofitting roller coasters to use 143.5cm tracks. (4) BLACK HAT: Soon, we can begin Phase 2. PERSON 2: Maybe interoperability is actually bad. BLACK HAT: If you listen to the destination announcement while boarding, you’ll be fine.

4-panel comic. (1) PERSON WITH BLACK HAT: Compatibility and interoperability are so important. (2) [Black hat shows diagram to two people] BLACK HAT: For example, most subway rails are 143.5cm apart. But many roller coasters use a narrower 110cm gauge. (3) BLACK HAT: For the last few years, our company has been quietly retrofitting roller coasters to use 143.5cm tracks. (4) BLACK HAT: Soon, we can begin Phase 2. PERSON 2: Maybe interoperability is actually bad. BLACK HAT: If you listen to the destination announcement while boarding, you’ll be fine.

Interoperability

xkcd.com/3105/

25.06.2025 02:39 β€” πŸ‘ 2131    πŸ” 229    πŸ’¬ 13    πŸ“Œ 6

If you are interested in a unified collection of common misinformation detection benchmarks, check out our recent repo @hf.co

19.06.2025 18:23 β€” πŸ‘ 14    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0

And BOTH of the above can be true and we can ALSO agree that @petebuttigieg.bsky.social having a Substack is weird.

26.06.2025 01:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
We Are Still Underreacting on AI This is not just a technology issue, it’s a fundamental change to our societyβ€”and we remain dangerously underprepared.

Of course, both the above and this sentiment put forth by @petebuttigieg.bsky.social can be true at the same time.

26.06.2025 01:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
What is automatable and who is replaceable? Thoughts from my morning commute It's an interesting exercise to think about jobs, or tasks within jobs, that could in principle be replaced by automation but for some reaso...

"All of these jobs are ultimately about trust and responsibility. Not only does the task need to be done, someone needs to take responsibility for what was delivered."

25.06.2025 20:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
DeepWiki | AI documentation you can talk to, for every repo DeepWiki provides up-to-date documentation you can talk to, for every repo in the world. Think Deep Research for GitHub - powered by Devin.

Stumbled across deepwiki.com last night. Great resource for anyone trying to get up to speed on an open source repo.

Does a pretty good job of explaining the xet-core codebase and Xet deduplication tech on Hugging Face deepwiki.com/huggingface/... (probably better than I have πŸ˜…)

25.06.2025 14:58 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@jsulz.com is following 20 prominent accounts