Letβs sanity check DeepSeekβs claim to train on 2048 GPUs for under 2 months, for a cost of $5.6M. It sort of checks out and sort of doesn't.
The v3 model is an MoE with 37B (out of 671B) active parameters. Let's compare to the cost of a 34B dense model. π§΅
29.01.2025 17:12 β
π 10
π 2
π¬ 1
π 0
Michael just presented our paper at the AdvML-Frontiers workshop, and it won the Best Paper Award!
arxiv.org/pdf/2407.17417
TL;DR: Watermarking LLMs can reduce the generation of copyrighted content but poses challenges for copyright regulation.
14.12.2024 22:50 β
π 2
π 0
π¬ 1
π 0