Dylan Hadfield-Menell dhadfieldmenell

Algorithmic Alignment Group PhD Interest Form Hello! This form is for potential Ph.D. students who are interested in joining the Algorithmic Alignment Group at CSAIL MIT. Please use this form instead of sending emails with application informatio...

Applications to MIT EECS have closed, but if you submitted one and the above describes you, please consider filling out this form: docs.google.com/forms/d/e/1F...

02.12.2024 14:39 — 👍 6 🔁 1 💬 0 📌 0

Some specific skills:
- JD candidates with software/ML systems background interested in technical AI governance
- Systems/infrastructure engineers passionate about alignment
- Researchers in preference learning, RLHF, or constitutional AI

02.12.2024 14:39 — 👍 7 🔁 0 💬 1 📌 0

Ideal candidates have expertise in one of:
- Systems engineering + ML infrastructure
- Legal/regulatory frameworks (especially JD + CS background)
- Foundation model pre-training
- Bayesian inference methods
- HCI/HRI
and are excited to learn the others.

02.12.2024 14:39 — 👍 4 🔁 0 💬 1 📌 0

📢 Seeking PhD students for AI alignment research. Our lab investigates technical mechanisms for value learning, pre-training alignment, and regulatory frameworks. Come work with us if you want to bridge technical ML and legal/policy domains. Details in thread 🧵

02.12.2024 14:39 — 👍 19 🔁 6 💬 3 📌 1

Genuine question for people who use Bluesky more frequently than I do. What are tips for getting things to work well without algorithmic recs? I spent a lot of time curating my recs on the other place and found it useful (mostly...). Any tools that let me do it here?

12.11.2024 13:24 — 👍 11 🔁 0 💬 2 📌 0

Democrats perfected defenses against yesterday's threat. Now we must dismantle them. (13/13)

Thread: bsky.app/profile/dhad...

Article: tinyurl.com/dems-2024-ma...

08.11.2024 14:38 — 👍 0 🔁 0 💬 0 📌 0

Stop molding perfect successors. Build real internal diversity so strong candidates emerge naturally. Ironically, Bernie had what we needed - authenticity in an anti-establishment moment. (12/13)

08.11.2024 14:37 — 👍 0 🔁 0 💬 1 📌 0

The path forward? More debate, less polish. Trade enforced unity for earned consensus. Public conflict builds more trust than artificial agreement. (11/13)

08.11.2024 14:37 — 👍 0 🔁 0 💬 1 📌 0

White liberals face their own trap: memorized talking points instead of real understanding. When you're deferring to others' expertise, you can't be indignant at disagreement. (10/13)

08.11.2024 14:37 — 👍 0 🔁 0 💬 1 📌 0

The rot runs deeper. Biden's delayed exit. Pelosi and Schumer aging in place. Running Harris showed how badly party elites lost touch. (9/13)

08.11.2024 14:37 — 👍 0 🔁 0 💬 1 📌 0

Democrats responded backward. Their 2024 machine ran perfectly - saved 3 points in battlegrounds. Couldn't stop a 6-point tide against artificial, tired politics. (8/13)

08.11.2024 14:37 — 👍 1 🔁 0 💬 1 📌 0

Enter Trump. His lack of restraint signals authenticity. Can't control message = probably not lying long-term. The GOP establishment fought this reality. Lost. Raw beats scripted. (7/13)

08.11.2024 14:36 — 👍 0 🔁 0 💬 1 📌 0

This artificial unity created real weakness. When Dems got tagged with "Defund the Police," their calculated pushback only confirmed suspicions: the fringe spoke party truth. (6/13)

08.11.2024 14:36 — 👍 0 🔁 0 💬 1 📌 0

Voters who watch streamers and reality TV daily spot the difference between real interaction and careful curation. They've seen behind the curtain. They're tired of perfect polish. (5/13)

08.11.2024 14:36 — 👍 0 🔁 0 💬 0 📌 0

That world died. Today's fractured media means a gaffe on Twitter becomes authenticity on TikTok. The same moment: both scandal and selling point. (4/13)

08.11.2024 14:35 — 👍 0 🔁 0 💬 1 📌 0

For years, Dems perfected obsolete strategy. Clinton's coronation. Biden's backroom deals. Harris's orchestrated succession. All built for an era of controlled messaging. (3/13)

08.11.2024 14:35 — 👍 0 🔁 0 💬 2 📌 0

In 1940, France built perfect defenses against the last war. The Germans went around them. The Democratic Party just did the same. (2/13)

08.11.2024 14:34 — 👍 0 🔁 0 💬 1 📌 0

[Shared] The Democratic Party's Maginot Line The Democratic Party's Maginot Line ___ Dylan Hadfield-Menell November 8, 2024 In 1940, France faced Hitler's army with supreme confidence in the Maginot Line – a network of concrete fortifications, ...

I usually focus my platforms on my work. However, I did some writing to process some of my thoughts about the election and wanted to share them. I'm curious to hear anyone's thoughts and reactions.

tinyurl.com/dems-2024-ma...

🧵 The Democratic Party's Maginot Line (1/13)

08.11.2024 14:34 — 👍 6 🔁 1 💬 2 📌 1

FTC Announces Crackdown on Deceptive AI Claims and Schemes

This is a really welcome development. This is the kind of action that we argued for in a policy brief on LLMs — the first goal of AI regulation has to be establishing a default where existing laws can not be dodged through automation.

www.ftc.gov/news-events/...

computing.mit.edu/ai-policy-br...

26.09.2024 14:13 — 👍 5 🔁 1 💬 0 📌 0

I’m doing some lecture prep for a course on AI & Society to cover interpretability, explanations, benchmarks, and evaluations.

What are your favorite papers in the space? Any suggestions for an advanced undergrad cohort?

21.09.2024 18:25 — 👍 1 🔁 0 💬 0 📌 0

Massachusetts Institute of Technology, Department of Brain & Cognitive Sciences Full service online faculty recruitment and application management system for academic institutions worldwide. We offer unique solutions tailored for academic communities.

My department (MIT Brain & Cognitive Sciences) is hiring a tenure-track faculty! We're especially interested in researchers who span multiple levels of analysis. Candidates from underrepresented backgrounds strongly encouraged to apply. Apply by November 1! academicjobsonline.org/ajo/jobs/25916

20.10.2023 00:30 — 👍 36 🔁 38 💬 0 📌 1

Building less-flawed metrics: Understanding and creating better measurement and incentive systems Design methods and consideration of desiderata for metrics have been proven useful when used, which is, at present, sporadically and inconsistently across a variety of fields. This perspective present...

Now published in Patterns, my paper on how to do metric design better. This is important everywhere - academics use simple metrics for tenure, governments often perform poorly using metrics for rules, and employees have targets that hurt their company.

18.10.2023 13:58 — 👍 4 🔁 1 💬 1 📌 0

I especially enjoyed the part of this game where the CEO threatened to fire me because I banned someone and then I had to testify in front of congress. 10/10, fun experience, would recommend.

17.10.2023 14:11 — 👍 700 🔁 102 💬 8 📌 5

This looks like a great way to learn about the complexity involved in managing moderation

17.10.2023 15:48 — 👍 2 🔁 0 💬 0 📌 0

Our lab has three paper talks at CSCW! But I want to highlight this one because @cqz.bsky.social is on the job market this year!! He works in crowdsourcing and human-AI systems. Make sure to check out his presentation on Wednesday. arxiv.org/abs/2305.01615

15.10.2023 21:45 — 👍 12 🔁 8 💬 0 📌 0

Ukrainian drone maker says their drones are autonomously making kill decisions. If this turns out to be true, it will be a turning point in war forever.

(Unfortunately this is behind a paywall so I cannot see the contents of the article)
www.newscientist.com/article/2397...

13.10.2023 17:18 — 👍 1 🔁 1 💬 1 📌 0

One of the reasons (and there are several) we see platforms keep making avoidable mistakes is that vanishingly little of the tech needed to do T&S work exists outside of big companies. We keep reinventing the same wheels.

Basically every platform has a bad usernames list. Why not open-source them?

13.07.2023 15:23 — 👍 314 🔁 103 💬 13 📌 9

In our paper studying creators' use of word filters against harassing comments, we find that a lot of creators wanted to build off of existing bad-word lists they trusted. Unfortunately, many popular lists like LDNOOBW have issues of bias. 1/n
https://arxiv.org/pdf/2202.08818.pdf

13.07.2023 15:57 — 👍 16 🔁 10 💬 3 📌 2

Interesting tidbit from Meta staff at TrustCon just now: >90% of the CSAM Meta report to NCMEC is visually similar to content they’ve reported before.

The argument goes: The same bad content circulates again and again, so effective moderation requires you to get very good at similarity detection.

11.07.2023 18:42 — 👍 37 🔁 8 💬 1 📌 0

Bluesky is a public benefit corp with the mission “to develop and drive large-scale adoption of technologies for open and decentralized public conversation.”

The PBC status allows us to pursue our mission above profit, but we still need to make this open ecosystem sustainable.

05.07.2023 21:11 — 👍 1061 🔁 194 💬 36 📌 47

Posts by Dylan Hadfield-Menell (@dhadfieldmenell.bsky.social)