Skip to content

Latest commit

 

History

History
253 lines (177 loc) · 16.7 KB

README_cn.md

File metadata and controls

253 lines (177 loc) · 16.7 KB

🛡️Awesome LLM-Safety🛡️Awesome

GitHub stars GitHub forks GitHub issues GitHub Last commit

English | 中文

🤗介绍

这是一个有关llm-safety的宝藏仓库!🥰🥰🥰

🧑‍💻我们的工作: 我们精心挑选并罗列了有关大模型安全方面(llm-safety)最新😋、最全面😎、最有价值🤩的论文。不仅如此,我们还附上了有关的演讲、教程、会议、新闻以及文章。这个仓库将实时更新,保证第一手资料。

如果一份资料同时属于多个子分类,那么它将被同时放在这些子分类下。比如 “ Awesome-LLM-Safety”这个仓库将被放在每个子分类下。

✔️适合多数人:

  • 对于希望了解llm-safety的初学者,这个仓库可以作为你把握框架,并了解细节的导航。我们在README中保留了比较经典或有影响力的论文,对初学者寻找感兴趣的方向十分友好;
  • 对于资深的研究者,这个仓库可以作为你了解实况信息、查漏补缺的工具,在subtopic中,我们正在努力更新这个subtopic下的所有最新内容,并且将会不断补完之前的内容。全面的资料搜集以及用心地筛选可以帮助你节省时间;

🧭使用指南:

  • 简略版:在README中,使用者可以找到按时间排列好的精选资讯,以及各种咨询的链接
  • 详细版:如果对某一子话题特别感兴趣,可以点开“subtopic”文件夹,进一步了解。里面有对每篇文章或者资讯的简略介绍,可以帮助研究者快速锁定内容。
🥰🥰🥰让我们开始llm-safety学习之旅吧🥰🥰🥰

🚀目录


🔐模型安全(Security)

📑论文

日期 机构 出版信息 论文&链接
20.10 Facebook AI Research arxiv Recipes for Safety in Open-domain Chatbots
22.03 OpenAI NIPS2022 Training language models to follow instructions with human feedback
23.07 UC Berkeley NIPS2023 Jailbroken: How Does LLM Safety Training Fail?
23.12 OpenAI Open AI Practices for Governing Agentic AI Systems

📖教程, 文章, 演示, 演讲

日期 分类 标题 链接地址
22.02 毒性检测API Perspective API 链接
[论文](https://arxiv.org/abs/2202.11176
23.07 仓库 Awesome LLM Security 链接
23.10 教程 Awesome-LLM-Safety 链接

其他

👉Latest&Comprehensive Security Paper


🔏隐私保护(Privacy Tutorial)

📑论文

日期 机构 出版信息 论文&链接
19.12 Microsoft CCS2020 Analyzing Information Leakage of Updates to Natural Language Models
21.07 Google Research ACL2022 Deduplicating Training Data Makes Language Models Better
21.10 Stanford ICLR2022 Large language models can be strong differentially private learners
22.02 Google Research ICLR2023 Quantifying Memorization Across Neural Language Models

📖教程, 文章, 演示, 演讲

日期 分类 标题 链接地址
23.10 教程 Awesome-LLM-Safety 链接

其他

👉Latest&Comprehensive Privacy Paper


📰事实性&错误信息(Truthfulness & Misinformation)

📑论文

日期 机构 出版信息 论文&链接
21.09 University of Oxford ACL2022 TruthfulQA: Measuring How Models Mimic Human Falsehoods
23.11 Harbin Institute of Technology arxiv A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
23.11 Arizona State University arxiv Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey

📖教程, 文章, 演示, 演讲

日期 分类 标题 链接地址
23.07 仓库 llm-hallucination-survey 链接
23.10 仓库 LLM-Factuality-Survey 链接
23.10 教程 Awesome-LLM-Safety 链接

其他

👉Latest&Comprehensive Truthfulness&Misinformation Paper


😈越狱&攻击(JailBreak & Attacks)

📑论文

日期 机构 出版信息 论文&链接
20.12 Google USENIX Security 2021 Extracting Training Data from Large Language Models
22.11 AE Studio NIPS2022(ML Safety Workshop) Ignore Previous Prompt: Attack Techniques For Language Models
23.06 Google arxiv Are aligned neural networks adversarially aligned?
23.07 CMU arxiv Universal and Transferable Adversarial Attacks on Aligned Language Models
23.10 University of Pennsylvania arxiv Jailbreaking Black Box Large Language Models in Twenty Queries

📖教程, 文章, 演示, 演讲

日期 分类 标题 链接地址
23.01 社区 Reddit/ChatGPTJailbrek 链接
23.02 资源&教程 Jailbreak Chat 链接
23.10 教程 Awesome-LLM-Safety 链接
23.10 博客 Adversarial Attacks on LLMs(Author: Lilian Weng) 链接
23.11 视频 [1hr Talk] Intro to Large Language Models
From 45:45(Author: Andrej Karpathy)
中字链接

其他

👉Latest&Comprehensive JailBreak & Attacks Paper


🛡️防御措施(Defenses)

📑论文

日期 机构 出版信息 论文&链接
21.07 Google Research ACL2022 Deduplicating Training Data Makes Language Models Better
22.04 Anthropic arxiv Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

📖教程, 文章, 演示, 演讲

日期 分类 标题 链接地址
23.10 教程 Awesome-LLM-Safety 链接

其他

👉Latest&Comprehensive Defenses Paper


💯数据集 & 评测基准(Datasets & Benchmark)

📑论文

日期 机构 出版信息 论文&链接
20.09 University of Washington EMNLP2020(findings) RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
21.09 University of Oxford ACL2022 TruthfulQA: Measuring How Models Mimic Human Falsehoods
22.03 MIT ACL2022 ToxiGen: A Large-Scale Machine-Generated datasets for Adversarial and Implicit Hate Speech Detection

📖教程, 文章, 演示, 演讲

日期 分类 标题 链接地址
23.10 教程 Awesome-LLM-Safety 链接

📚资源📚

其他

👉Latest&Comprehensive datasets & Benchmark Paper


🧑‍🏫 学者 👩‍🏫

在这个部分,我们会列出一些我们觉得在LLM Safety领域很有建树的研究者!

学者 主页&谷歌学术 关键词&兴趣
Nicholas Carlini 主页 | 谷歌学术 the intersection of machine learning and computer security&neural networks from an adversarial perspective
Daphne Ippolito 谷歌学术 Natural Language Processing
Chiyuan Zhang 主页 | 谷歌学术 Especially interested in understanding the generalization and memorization in machine and human learning, as well as implications in related areas like privacy
Katherine Lee 谷歌学术 natural language processing&translation&machine learning&computational neuroscienceattention
Florian Tramèr 主页 | 谷歌学术 Computer Security&Machine Learning&Cryptography&the worst-case behavior of Deep Learning systems from an adversarial perspective, to understand and mitigate long-term threats to the safety and privacy of users
Jindong Wang 主页 | 谷歌学术 Large Language Models (LLMs) evaluation and robustness enhancement  
Chaowei Xiao 主页 | 谷歌学术 interested in exploring the trustworthy problem in (MultiModal) Large Language Models and studying the role of LLMs in different application domains.
Andy Zou 主页 | 谷歌学术 ML Safety&AI Safety

🧑‍🎓作者信息

🤗如果你有任何疑问欢迎咨询作者!🤗

✉️: ydyjya ➡️ zhouzhenhong@bupt.edu.cn

💬: 交流LLM Safety


Star History Chart

⬆ 回到顶部