Generating Literature-Driven Scientific Theories at Scale
#LLM #Package
hgpu.org?p=30521
@hgpu.bsky.social
High performance computing on graphics processing units (GPU): AMD, Nvidia, Intel, CUDA, OpenCL, OpenGL, HPC
Generating Literature-Driven Scientific Theories at Scale
#LLM #Package
hgpu.org?p=30521
Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs
#CUDA #LLM #Package
hgpu.org?p=30520
BioAgent Bench: An AI Agent Evaluation Suite for Bioinformatics
#Bioinformatics #AI #LLM #Package
hgpu.org?p=30519
ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler
#OpenCL #LLM
hgpu.org?p=30518
Nsight Python: A Python-First Profiling Toolkit for Seamless GPU Kernel Analysis (Tool)
#CUDA #Triton #Profiling #Package
hgpu.org?p=30517
Towards Automated Kernel Generation in the Era of LLMs
#CUDA #Triton #ROCm #LLM
hgpu.org?p=30511
A Two-Stage GPU Kernel Tuner Combining Semantic Refactoring and Search-Based Optimization
#CUDA #LLM
hgpu.org?p=30510
SynPerf: A Hybrid Analytical-ML Framework for GPU Performance Prediction
#Triton #CUDA #Performance #ML
hgpu.org?p=30509
Sawtooth Wavefront Reordering: Enhanced CuTile FlashAttention on NVIDIA GB10
#CUDA #Performance
hgpu.org?p=30508
PhysProver: Advancing Automatic Theorem Proving for Physics
#Physics #LLM #Package
hgpu.org?p=30507
The New Compiler Stack: A Survey on the Synergy of LLMs and Compilers
#Compilers #LLM
hgpu.org?p=30502
DiffBench Meets DiffAgent: End-to-End LLM-Driven Diffusion Acceleration Code Generation
#CodeGeneration #LLM
hgpu.org?p=30501
Equivalence Checking of ML GPU Kernels
#CUDA #PTX #LLM
hgpu.org?p=30500
AKG kernel Agent: A Multi-Agent Framework for Cross-Platform Kernel Synthesis
#Triton #CUDA #CodeGeneration #DSL #LLM
hgpu.org?p=30499
ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation
#CUDA #OpenMP #CodeGeneration #LLM #Package
hgpu.org?p=30498
SeedFold: Scaling Biomolecular Structure Prediction
#Biology #Biomolecules
hgpu.org?p=30497
Hardware Acceleration for Neural Networks: A Comprehensive Survey
#FPGA #TPU #NeuralNetworks #NN #Survey
hgpu.org?p=30496
Generative Video Compression: Towards 0.01% Compression Rate for Video Transmission
#Compression #Video #AI
hgpu.org?p=30495
GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable Programs
#CUDA #HIP #HPC #LLM #Performance
hgpu.org?p=30494
KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta
#CUDA #Triton #PTX #AI #Meta #LLM
hgpu.org?p=30493
Tilus: A Tile-Level GPGPU Programming Language for Low-Precision Computation
#CUDA #PTX #Triton #ProgrammingLanguages #Package
hgpu.org?p=30481
Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs
#CUDA #AI #Memory #Package
hgpu.org?p=30480
AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization
#LLM #AI #Performance
hgpu.org?p=30479
Optimal Software Pipelining and Warp Specialization for Tensor Core GPUs
#CUDA #ProgrammingLanguages
hgpu.org?p=30478
PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations
#CUDA #HIP #HLSL #AI #LLM #NLP
hgpu.org?p=30477
CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
#CUDA #CUBLAS #MatrixMultiplication #Package
hgpu.org?p=30469
BoltzGen:Toward Universal Binder Design
#Biology #Bioinformatics #Biomolecules #Package
hgpu.org?p=30468
Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation
#CUDA #CodeGeneration #LLM
hgpu.org?p=30467
PystachIO: Efficient Distributed GPU Query Processing with PyTorch over Fast Networks & Fast Storage
#CUDA #PyTorch #Databases
hgpu.org?p=30466
ML Inference Scheduling with Predictable Latency
#ML #MachineLearning #TaskScheduling
hgpu.org?p=30465