Christina (@christinabaek) — Bluesky Profile

1 year ago

I’m imagining a simpler setup where words are each a single token long and examples are each a random list of 15 words. If pretrained models already encode the notion of offensive, I bet one iteration of DPO with the right hyperparameter can solve this task.

1 0 1 0