PKU-Alignment Group @Pair-Lab (under construction)
PKU-Alignment Group @Pair-Lab (under construction)
News
People
Events
Publications
Contact
More Platforms
知乎
Bilibili
Email
小红书
PAIR-Lab
Copied
Copied to clipboard
Paper-Conference
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset
Juntao Dai
,
Tianle Chen
,
Xuyao Wang
,
Ziran Yang
,
Taiye Chen
,
Jiaming Ji
,
Yaodong Yang
NeurIPS 2024.
AI Safety,
Safety Alignment
PDF
Language Models Resist Alignment: Evidence From Data Compression
Jiaming Ji
,
Kaile Wang
,
Tianyi Qiu
,
Boyuan Chen
,
Jiayi Zhou
,
Changye Li
,
Hantao Lou
,
Yaodong Yang
ACL 2025 Best Paper
Large Language Models,
Safety Alignment,
AI Safety
PDF
ProgressGym: Alignment with a Millennium of Moral Progress
Tianyi Qiu
,
Yang Zhang
,
Xuchuan Huang
,
Jasmine Xinze Li
,
Jiaming Ji
,
Yaodong Yang
NeurIPS 2024.
Large Language Models,
AI Alignment
PDF
PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference
Jiaming Ji
,
Donghai Hong
,
Borong Zhang
,
Boyuan Chen
,
Josef Dai
,
Boren Zheng
,
Tianyi Qiu
,
Boxun Li
,
Yaodong Yang
ACL 2025 Main.
Large Language Models,
Safety Alignment,
Reinforcement Learning from Human Feedback
PDF
Dataset
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Josef Dai
,
Xuehai Pan
,
Ruiyang Sun
,
Jiaming Ji
,
Xinbo Xu
,
Mickel Liu
,
Yizhou Wang
,
Yaodong Yang
ICLR 2024.
Spotlight
Safety Alignment,
Reinforcement Learning from Human Feedback
PDF
Code
Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
Jiaming Ji
,
Borong Zhang
,
Jiayi Zhou
,
Xuehai Pan
,
Weidong Huang
,
Ruiyang Sun
,
Yiran Geng
,
Yifan Zhong
,
Juntao Dai
,
Yaodong Yang
NeurIPS 2023.
Safe Reinforcement Learning,
Robotics
PDF
Code
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Jiaming Ji
,
Mickel Liu
,
Juntao Dai
,
Xuehai Pan
,
Ce Bian
,
Chi Zhang
,
Ruiyang Sun
,
Yizhou Wang
,
Yaodong Yang
NeurIPS 2023.
Large Language Models,
Safety Alignment,
Reinforcement Learning from Human Feedback
PDF
Code
Dataset
«
Cite
×