11mo ago

Nature: Al generates covertly racist decisions about people based on their dialect

www.nature.com

AI generates covertly racist decisions about people based on their dialect - Nature

Got the pointer to this from Allison Parrish who says it better than I could:

it's a very compelling paper, with a super clever methodology, and (i'm paraphrasing/extrapolating) shows that "alignment" strategies like RLHF only work to ensure that it never seems like a white person is saying something overtly racist, rather than addressing the actual prejudice baked into the model.