Apurv Verma's Avatar

Apurv Verma

@apurv-verma.bsky.social

Building safer, more aligned models ๐Ÿงญ ๐Ÿ“ PhD student, NJIT ๐ŸŽ“ | NLP at Bloomberg ๐Ÿ› ๏ธ Website: vermaapurv.com/aboutme/

25 Followers  |  144 Following  |  3 Posts  |  Joined: 16.11.2024  |  1.5292

Latest posts by apurv-verma.bsky.social on Bluesky

Preview
Watermarking Degrades Alignment in Language Models: Analysis and Mitigation Watermarking techniques for large language models (LLMs) can significantly impact output quality, yet their effects on truthfulness, safety, and helpfulness remain critically underexamined. This paper...

Ever wondered about watermarking's effect on model alignment? ๐Ÿค”
We found it shifts AI safety behavior. Our fix: generate 2-4 responses, pick the best one ๐ŸŽฏ
"Watermarking Degrades Alignment in Language Models" ๐Ÿ“„
arxiv.org/abs/2506.04462
#AIResearch #AISafety #Watermarking #LLMs

08.06.2025 01:57 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

This is quite insightful

27.02.2025 01:02 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
How has DeepSeek improved the Transformer architecture? This Gradient Updates issue goes over the major changes that went into DeepSeekโ€™s most recent model.

Very good (technical) explainer answering "How has DeepSeek improved the Transformer architecture?". Aimed at readers already familiar with Transformers.

epoch.ai/gradient-upd...

30.01.2025 21:07 โ€” ๐Ÿ‘ 282    ๐Ÿ” 64    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 5
Post image

Very interesting paper by Ananda Theertha Suresh et al.

For categorical/Gaussian distributions, they derive the rate at which a sample is forgotten to be 1/k after k rounds of recursive training (hence ๐ฆ๐จ๐๐ž๐ฅ ๐œ๐จ๐ฅ๐ฅ๐š๐ฉ๐ฌ๐ž happens more slowly than intuitively expected)

27.12.2024 23:35 โ€” ๐Ÿ‘ 35    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I am an AI researcher working on safe AI. My most recent work can be found at arxiv.org/abs/2407.14937. I am trying to connect with other AI researchers on ๐Ÿฆ‹; follow me here, and I will follow you back.

19.11.2024 02:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@apurv-verma is following 20 prominent accounts