PKU-Alignment Group @Pair-Lab (under construction)
PKU-Alignment Group @Pair-Lab (under construction)
News
People
Events
Publications
Contact
More Platforms
知乎
Bilibili
Email
小红书
PAIR-Lab
Copied
Copied to clipboard
Large Language Models
Language Models Resist Alignment: Evidence From Data Compression
Jiaming Ji
,
Kaile Wang
,
Tianyi Qiu
,
Boyuan Chen
,
Jiayi Zhou
,
Changye Li
,
Hantao Lou
,
Yaodong Yang
ACL 2025 Best Paper
Large Language Models,
Safety Alignment,
AI Safety
PDF
ProgressGym: Alignment with a Millennium of Moral Progress
Tianyi Qiu
,
Yang Zhang
,
Xuchuan Huang
,
Jasmine Xinze Li
,
Jiaming Ji
,
Yaodong Yang
NeurIPS 2024.
Large Language Models,
AI Alignment
PDF
PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference
Jiaming Ji
,
Donghai Hong
,
Borong Zhang
,
Boyuan Chen
,
Josef Dai
,
Boren Zheng
,
Tianyi Qiu
,
Boxun Li
,
Yaodong Yang
ACL 2025 Main.
Large Language Models,
Safety Alignment,
Reinforcement Learning from Human Feedback
PDF
Dataset
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Jiaming Ji
,
Mickel Liu
,
Juntao Dai
,
Xuehai Pan
,
Ce Bian
,
Chi Zhang
,
Ruiyang Sun
,
Yizhou Wang
,
Yaodong Yang
NeurIPS 2023.
Large Language Models,
Safety Alignment,
Reinforcement Learning from Human Feedback
PDF
Code
Dataset
Cite
×