AI Safety

Aligner: Efficient Alignment by Learning to Correct
Aligner: Efficient Alignment by Learning to Correct
Jiaming Ji , Boyuan Chen , Hantao Lou , Donghai Hong , Borong Zhang , Xuehai Pan , Juntao Dai , Yaodong Yang
NeurIPS 2024 Oral
AI Alignment, AI Safety, NeurIPS
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset
Juntao Dai , Tianle Chen , Xuyao Wang , Ziran Yang , Taiye Chen , Jiaming Ji , Yaodong Yang
NeurIPS 2024.
AI Safety, Safety Alignment
Language Models Resist Alignment: Evidence From Data Compression
Language Models Resist Alignment: Evidence From Data Compression
ACL 2025 Best Paper
Large Language Models, Safety Alignment, AI Safety