Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks
#ROCm #Triton #AI #CodeGeneration #Package
hgpu.org?p=30073
@hgpu.bsky.social
High performance computing on graphics processing units (GPU): AMD, Nvidia, Intel, CUDA, OpenCL, OpenGL, HPC
Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks
#ROCm #Triton #AI #CodeGeneration #Package
hgpu.org?p=30073
[Thesis] GBOTuner: Autotuning of OpenMP Parallel Codes with Bayesian Optimization and Code Representation Transfer Learning
#OpenMP
hgpu.org?p=30072
NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers
#CodeGeneration #LLM #NPU
hgpu.org?p=30071
Performance Portable Gradient Computations Using Source Transformation
#Kokkos #HIP #CUDA #Performance
hgpu.org?p=30070
OpenDwarfs 2025: Modernizing the OpenDwarfs Benchmark Suite for Heterogeneous Computing
#OpenCL #Benchmarking #Package
hgpu.org?p=30069
Kevin: Multi-Turn RL for Generating CUDA Kernels
#CUDA #LLM #Performance #AI
hgpu.org?p=30055
Dissecting the NVIDIA Blackwell Architecture with Microbenchmarks
#CUDA #PTX #HPC #Performance #Benchmarking
hgpu.org?p=30053
Thesis: Using Deep Reinforcement Learning for Automatic Code Optimization in the MLIR Compiler
#Performance #Physics #QCD #MLIR
hgpu.org?p=30054
Pre-Training LLMs on a budget: A comparison of three optimizers
#CUDA #LLM #MachineLearning #ML
hgpu.org?p=30052
Specx: a C++ task-based runtime system for heterogeneous distributed architectures
#CUDA #HIP #TaskScheduling #Package
hgpu.org?p=30051
Mutual-Supervised Learning for Sequential-to-Parallel Code Translation
#CUDA #HPC #LLM #CodeGeneration #Package
hgpu.org?p=30038
Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems
#CUDA #TaskScheduling #Package
hgpu.org?p=30037
Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and High-Performance GPUs
#Qualcomm #Cloud #LLM #HPC #DeepLearning #DL
hgpu.org?p=30036
Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms
#CUDA #GPUcluster #Communication
hgpu.org?p=30035
KIS-S: A GPU-Aware Kubernetes Inference Simulator with RL-Based Auto-Scaling
#GPU #Kubernets #Package
hgpu.org?p=30034
Accelerated discovery and design of Fe-Co-Zr magnets with tunable magnetic anisotropy through machine learning and parallel computing
#CUDA #Physics #MaterialsScience #CondensedMatter #MachineLearning #ML #Package
hgpu.org?p=30007
Thesis: Efficient GPU Implementation of Multi-Precision Integer Division
#CUDA #Futhark #Package
hgpu.org?p=30008
Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication
#CUDA #Sparse #SpMM #DeepLearning #DL #Package
hgpu.org?p=30006
ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks
#CUDA #OpenMP #LLM #CodeGeneration #Benchmarking #Package
hgpu.org?p=30005
P4OMP: Retrieval-Augmented Prompting for OpenMP Parallelism in Serial Code
#OpenMP #LLM #HPC #CodeGeneration
hgpu.org?p=30004
No More Shading Languages: Compiling C++ to Vulkan Shaders
#Vulkan #Compilers #GLSL #Rendering #Raytracing #Package
hgpu.org?p=29983
GCStack+GCScaler: Fast and Accurate GPU Performance Analyses Using Fine-Grained Stall Cycle Accounting and Interval Analysis
#CUDA #Performance
hgpu.org?p=29982
Omniwise: Predicting GPU Kernels Performance with LLMs
#ROCm #LLM #Performance
hgpu.org?p=29981
Survey of HPC in US Research Institutions
#HPC #AI
hgpu.org?p=29980
WiLLM: An Open Wireless LLM Communication System
#LLM #Package
hgpu.org?p=29979
Engineering Supercomputing Platforms for Biomolecular Applications
#CUDA #ROCm #Biology #Biomolecules #MolecularDynamics #HPC #Physics #Package
hgpu.org?p=29954
A First Look at Bugs in LLM Inference Engines
#LLM #AI
hgpu.org?p=29953
A CPU+FPGA OpenCL Heterogeneous Computing Platform for Multi-Kernel Pipeline
#OpenCL #FPGA
hgpu.org?p=29952
A Novel Compiler Transformation for Fast Sparse Matrix Multiplication in GPUs
#CUDA #Compilers #Sparse #MatrixMultiplication
hgpu.org?p=29951
LiteGD: Lightweight and dynamic GPU Dispatching for Large-scale Heterogeneous Clusters
#GPUcluster
hgpu.org?p=29950