@jannem Not at where I spent my afternoon though π
05.08.2025 05:25 β π 0 π 0 π¬ 0 π 0@glennklockwood.mast.hpc.social.ap.brid.gy
I am a supercomputing enthusiast, but I usually don't know what I'm talking about. I post about large-scale infrastructure for #HPC and #AI. π bridged from https://mast.hpc.social/@glennklockwood on the fediverse by https://fed.brid.gy/
@jannem Not at where I spent my afternoon though π
05.08.2025 05:25 β π 0 π 0 π¬ 0 π 0One thing I really enjoy about working at VAST (or perhaps that I enjoy about not working at Microsoft) is that I can go out and talk to people again as part of my job. Hereβs a view from where I got to spend my afternoon today.
#notHPC #butthatsok
I picked a good time to quit, eh? I still have the value of my unvested stock in my Yahoo Finance, so I get a daily reminder of how much I walked away from π
Satyaβs no dummy, but I do worry that the blowback from the heavy layoffs during blue skies was underestimated. Cutting people to drive [β¦]
Very cool to see new neuromorphic coming online. Excited to see where this architecture goes beyond TrueNorth.
β4320 chips with 152 cores each. The chips are 48 to a board with a single chip consuming between 0.8-2.5W. The whole system fits in a single rackβ [β¦]
The 13th MVAPICH User Group (MUG) Conference is coming up (Aug 18-20 in Columbus). It's a great event for focused technical presentations around MPI and network performance. Great speaker lineup too. Wish I could make it in-person, but virtual attendance is free [β¦]
29.07.2025 21:40 β π 0 π 1 π¬ 0 π 0I wrote up some notes on how to approach I/O and storage benchmarks in RFPs. I normally don't post here about updates to my digital garden, but I think this page is tidy and useful.
https://www.glennklockwood.com/garden/benchmarking#storage
#HPC
TIL that the Linux NFS client recently accepted a "noalignwrite" option that allows you to safely do shared-file writes to non-overlapping, non-4K-aligned extents. No longer have to use direct I/O to do shared-file writes to NFS with this enabled.
See [β¦]
I am a sucker for photos of cool #HPC infrastructure, and here is a dense GB200 NVL72 cluster going up somewhere in Canada (I think). Impressive to see this many racks in row; the DC must have facility water which is still uncommon in hyperscale. Source [β¦]
[Original post on mast.hpc.social]
Upon hearing I left MSFT, founder of a big GPU cloud provider reached out with a very strong offer to join them. Funny thing: I applied there back in April and was auto-rejected. Moral of the story: don't chuck an application over the wall if you have an inside line! Applying online is a crapshoot.
14.07.2025 16:28 β π 0 π 0 π¬ 1 π 0@jannem The whole HPC world will never go all-cloud, but some of the biggest dominoes could have.
11.07.2025 13:55 β π 0 π 0 π¬ 0 π 0@kpdooty @glennklockwood Definitely. It's hard to appreciate that contrast until you've been in both worlds.
11.07.2025 12:38 β π 0 π 0 π¬ 0 π 0In the few days I have between jobs, I wanted to share an unvarnished perspective on what I've learned after spending three years working on supercomputing in the cloud. It's hastily written and lightly edited, but I hope others find it interesting [β¦]
11.07.2025 05:29 β π 2 π 5 π¬ 2 π 0Today is my last day at Microsoft. Iβve learned a lot over the last three years, but Iβm ready to try something different.
09.07.2025 12:39 β π 2 π 1 π¬ 1 π 0In the last 18 hours, Iβve learned way more about adjuvanted vaccines and adverse reactions to them in cats than I ever cared to. And the real kick is that the choice to re-up the catβs vaccines was an afterthought, because we had to take other cat in for [β¦]
[Original post on mast.hpc.social]
This is cool, but the real proof is in the quality of the frontier models that are trained on Blackwell. And by that metric, GB200 NVL72 has yet to deliver anything.
https://www.coreweave.com/blog/coreweave-leads-the-way-with-first-nvidia-gb300-nvl72-deployment
NERSC just announced that IBM and VAST have been selected as the storage providers for the upcoming Doudna #HPC system. Strong statement since NERSC had long invested in Lustre (scratch) and GPFS (community). Very cool to see NERSC not settling for the status quo [β¦]
02.07.2025 17:57 β π 0 π 0 π¬ 0 π 0Scott Atchley, who co-keynoted #ISC25, posted a really meaningful response to my ISC25 recap blog post on LinkedIn (https://www.linkedin.com/posts/scottatchley_isc25-olcf-frontier-activity-7345786995765395457-lGoq). He specifically offered additional perspective on the 20 MW exascale milestone [β¦]
02.07.2025 00:28 β π 0 π 0 π¬ 0 π 0Happy Canada Day, everyone π¨π¦
01.07.2025 16:08 β π 0 π 0 π¬ 0 π 0Photos of a new, big, naked Cerebras cluster in Oklahoma appearing on the socials today. Pretty neat. Wonder if this is another G42 install.
28.06.2025 17:09 β π 2 π 0 π¬ 0 π 0This is an amazingly detailed yet accessible description of how tape storage (media, drives, and libraries) work. Even if you don't care about storage, the engineering that goes into making this all work is fascinating.
https://dl.acm.org/doi/10.1145/3708997
(from [β¦]
LLNL has an interesting vision of the future of HPC and workflows that aligns with a lot of what I heard at ISC: #HPC is no longer just the supercomputer, but the end-to-end services and ecosystem that enable discovery. The description is in Attachment (1) here: https://hpc.llnl.gov/fg-hpcc-rfi
25.06.2025 23:15 β π 0 π 0 π¬ 0 π 0Here's my notes from ISC'25 in Hamburg: https://blog.glennklockwood.com/2025/06/isc25-recap.html
#HPC
65th #Top500 is out!
o The Top 3 #Exascale systems remain unchanged:
#1 El Capitan
#2 Frontier
#3 Aurora
o JUPITER Booster (EuroHPC planned #Exascale system currently being commissioned β hence partial system) at FZH in Germany at #4 is the only new system in the #Top10
#HPC #AI #ISC25
The #ISC25 conference has a strong keynote speaker lineup from around the world this week. Looking forward to hearing all three.
10.06.2025 07:16 β π 2 π 1 π¬ 0 π 0The Darshan team and ALCF have release a new collection of I/O profiling logs from the Polaris supercomputerβs production workloads. Surely will be an excellent insight into how I/O needs of real #HPC apps have evolved.
https://zenodo.org/records/15353810
Here's a photo of an Azure datacenter coming online later this year that "will run a single, large cluster of hundreds of thousands of interconnected NVIDIA GB200 GPUs," "exabytes of storage," and "millions of CPU compute cores."
Source [β¦]
[Original post on mast.hpc.social]
This is wild--"orbital supercomputer" sounded like a misleading headline, but China really is planning to run inference in space a cluster of satellite nodes. Each does 744 TOPS, and they use a laser-based interconnect with "up to" 100G.
#HPC #maybe? [β¦]
Though 27 yrs old, this article on high-functioning creative teams in it still hold up. TLDR: You can't pay people to be motivated, but intrinsic motivation is essential to creativity.
https://hbr.org/1998/09/how-to-kill-creativity
MSFT study quantified that cold plates + 2ph can reduce GHGs by 15-21%, energy by 15-20%, and H2O use by 31-52% vs. 100% air. Great to see hyperscale slowly catch up to #HPC; hope this investment feeds back [β¦]
12.05.2025 17:08 β π 0 π 0 π¬ 0 π 0Adding cloud-like capabilities to traditional HPC is a hot topic (e.g., at CUG this year: https://www.linkedin.com/posts/bilel-hadri-a898bb3a_hpc-ai-cug2025-activity-7326093163285127168-U_ic). Prevailing method is to DIY a shadow implementation that tries to do cloud, but in its own weird way [β¦]
10.05.2025 15:18 β π 0 π 0 π¬ 0 π 0