The new CUTLASS Python DSL and compiler stack is about to be presented at GTC! CUTLASS is our CUDA framework for matrix computations. This is the future of tensorcore and GPU programming.
You can stream the talks live or watch the on demand video later.
www.nvidia.com/gtc/session-...