Daniel Johnson's Avatar

Daniel Johnson

@ddjohnson.bsky.social

PhD student at Vector Institute / University of Toronto. Building tools to study neural nets and find out what they know. He/him. www.danieldjohnson.com

806 Followers  |  585 Following  |  8 Posts  |  Joined: 12.09.2023  |  1.406

Latest posts by ddjohnson.bsky.social on Bluesky

I'm incredibly excited to be a part of what Transluce is building, and can't wait to see what we can do!

I'll also be moving to San Francisco soon. I'm looking forward to catching up with old friends and making new ones!

02.12.2024 20:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I am thankful to have had the chance to work with so many talented and creative researchers at Google. I'm especially grateful to Danny Tarlow and Hugo Larochelle, my original AI residency mentors, whose advice and support during my time at Google has helped me in so many ways.

02.12.2024 20:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

And I believe the best way to reach an informed consensus about how to deploy AI systems responsibly is to build tools for scalably observing, understanding, and interacting with them. I'm especially interested in building tools that help us figure out the right questions to ask.

02.12.2024 20:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I believe the AI research field is still far away from understanding what behaviors and drives exist in these models, how they emerge, and which ones we should be watching for. Without this, we may overfit to specific known risks and overlook dangerous unknown failure modes.

02.12.2024 20:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This is important because today’s models do not always generalize in human-like ways, and rarely conform to expectations of what AI systems should do. Researchers are continuously discovering new emergent capabilities, idiosyncratic personality quirks, and puzzling blind spots.

02.12.2024 20:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I'm also excited to work on understanding the patterns behind model behaviors. How coherent are model personalities across contexts? When does it make sense to view LLM assistants as having intentions and goals, and how can we identify the goals that best explain their behaviors?

02.12.2024 20:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Penzai β€” penzai

While at Google DeepMind, I spent much of this year working on open-source tools to help researchers look at model internals (penzai.rtfd.io, treescope.rtfd.io).

I'm excited to continue this line of work at Transluce, with the explicit mission of building understanding for the public good.

02.12.2024 20:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Personal news: I've left Google DeepMind to work on tools for understanding AI systems at Transluce (@transluce.bsky.social)!

I'm excited to build open tech for understanding and anticipating new AI behaviors, and to figure out what questions we should ask to make sure they are safe to deploy.

02.12.2024 20:07 β€” πŸ‘ 16    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@ddjohnson is following 20 prominent accounts