Taufeeque's Avatar

Taufeeque

@taufeeque.bsky.social

Research Engineer @ FAR.AI taufeeque9.github.io

116 Followers  |  68 Following  |  1 Posts  |  Joined: 19.11.2024
Posts Following

Posts by Taufeeque (@taufeeque.bsky.social)

Excited to share the obfuscation-atlas I've been working on! The most surprising finding to me: Standard RLVR leading to reward hacking can make models believe that it's okay to do so. Deception probes catch such reward hacking on the original model but cannot catch it after RLVR

13.02.2026 16:52 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Mech Interp Workshop #NeurIPS2025 poster & spotlight presentation today!
πŸ“ 11:30am-12:30pm Sun, Dec 7 @ Upper Level Room 30A-E

Path Channels & Plan Extension Kernels: A Mechanistic Description of Planning in a Sokoban RNN.
by @taufeeque.bsky.social, Aaron Tucker, @gleave.me, AdriΓ  Garriga-AlonsoπŸ‘‡

07.12.2025 17:01 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0