Skip to content

Commit

Permalink
update 7.6
Browse files Browse the repository at this point in the history
  • Loading branch information
ydyjya committed Jul 6, 2024
1 parent 6e0650f commit 25171df
Show file tree
Hide file tree
Showing 5 changed files with 214 additions and 190 deletions.
3 changes: 3 additions & 0 deletions subtopic/Defense&Mitigation.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,9 @@
| 24.06 | Columbia University | arxiv | [Defending Against Social Engineering Attacks in the Age of LLMs](https://arxiv.org/abs/2406.12263) | **Social Engineering**&**CSE Detection** |
| 24.06 | Indian Institute of Technology Kharagpur | arxiv | [SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models](https://arxiv.org/abs/2406.12274) | **SafeInfer**&**Context Adaptive Decoding**&**Safety Alignment** |
| 24.06 | Chinese Academy of Sciences | arxiv | [Adversarial Contrastive Decoding: Boosting Safety Alignment of Large Language Models via Opposite Prompt Optimization](https://arxiv.org/abs/2406.16743) | **Safety Alignment**&**Contrastive Decoding** |
| 24.07 | University of Toronto | arxiv | [A False Sense of Safety: Unsafe Information Leakage in ‘Safe’ AI Responses](https://arxiv.org/abs/2407.02551) | **Jailbreak Attacks**&**Information Leakage** |
| 24.07 | Tsinghua University | arxiv | [Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks](https://arxiv.org/abs/2407.02855) | **Jailbreak Attacks**&**Unlearning** |
| 24.07 | National University of Singapore | arxiv | [Self-Evaluation as a Defense Against Adversarial Attacks on LLMs](https://arxiv.org/abs/2407.03234) | **Adversarial Attacks**&**Self-Evaluation** |


## 💻Presentations & Talk
Expand Down
1 change: 1 addition & 0 deletions subtopic/Ethics.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@
| 24.06 | CAS Key Laboratory of AI Safety, CAS Key Lab of Network Data Science and Technology, University of Chinese Academy of Sciences | arxiv | [Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective](https://arxiv.org/abs/2406.14023) | **Psychometric Evaluation**&**Bias Attacks**&**Ethical Risks** |
| 24.06 | University College London | arxiv | [JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models](https://arxiv.org/abs/2406.15484) | **Gender Bias**&**Hiring Bias**&**Benchmarking** |
| 24.06 | The University of Texas at Austin | arxiv | [Navigating LLM Ethics: Advancements, Challenges, and Future Directions](https://arxiv.org/abs/2406.18841) | **LLM Ethics**&**Accountable LLM**&**Responsible LLM** |
| 24.07 | George Mason University | arxiv | [Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis](https://arxiv.org/abs/2407.02030) | **Social Biases**&**Contact Hypothesis** |


## 💻Presentations & Talks
Expand Down
10 changes: 10 additions & 0 deletions subtopic/Jailbreaks&Attack.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,16 @@
| 24.06 | Hubei University | arxiv | [Poisoned LangChain: Jailbreak LLMs by LangChain](https://arxiv.org/abs/2406.18122) | **Jailbreak**&**Retrieval-Augmented Generation**&**LangChain** |
| 24.06 | University of Central Florida | arxiv | [Jailbreaking LLMs with Arabic Transliteration and Arabizi](https://arxiv.org/abs/2406.18725) | **Jailbreaking**&**Arabic Transliteration**&**Arabizi** |
| 24.06 | Hubei University | TRAC 2024 Workshop | [SEEING IS BELIEVING: BLACK-BOX MEMBERSHIP INFERENCE ATTACKS AGAINST RETRIEVAL AUGMENTED GENERATION](https://arxiv.org/abs/2406.19234) | **Membership Inference Attacks**&**Retrieval-Augmented Generation** |
| 24.06 | Huazhong University of Science and Technology | arxiv | [Virtual Context: Enhancing Jailbreak Attacks with Special Token Injection](https://arxiv.org/abs/2406.19845) | **Jailbreak Attacks**&**Special Tokens** |
| 24.06 | UC Berkeley | arxiv | [Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation](https://arxiv.org/abs/2406.20053) | **AI Safety**&**Backdoors** |
| 24.07 | University of Illinois Chicago | arxiv | [Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks](https://arxiv.org/abs/2407.00869) | **Jailbreak Attacks**&**Fallacious Reasoning** |
| 24.07 | Palisade Research | arxiv | [Badllama 3: Removing Safety Finetuning from Llama 3 in Minutes](https://arxiv.org/abs/2407.01376) | **Safety Finetuning**&**Jailbreak Attacks** |
| 24.07 | University of Illinois Urbana-Champaign | arxiv | [JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models](https://arxiv.org/abs/2407.01599) | **Jailbreaking**&**Vision-Language Models** |
| 24.07 | Shanghai University of Finance and Economics | arxiv | [SoP: Unlock the Power of Social Facilitation for Automatic Jailbreak Attack](https://arxiv.org/abs/2407.01902) | **Jailbreak Attacks**&**Large Language Models**&**Social Facilitation** |
| 24.07 | University of Exeter | arxiv | [Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything](https://arxiv.org/abs/2407.02534) | **Machine Learning**&**ICML**&**Jailbreak Attacks** |
| 24.07 | Hong Kong University of Science and Technology | arxiv | [JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts Discovery from Large-Scale Human-LLM Conversational Datasets](https://arxiv.org/abs/2407.03045) | **Visual Analytics**&**Jailbreak Prompts** |
| 24.07 | CISPA Helmholtz Center for Information Security | arxiv | [SOS! Soft Prompt Attack Against Open-Source Large Language Models](https://arxiv.org/abs/2407.03160) | **Soft Prompt Attack**&**Open-Source Models** |
| 24.07 | National University of Singapore | arxiv | [Single Character Perturbations Break LLM Alignment](https://arxiv.org/abs/2407.03232) | **Jailbreak Attacks**&**Model Alignment** |


## 💻Presentations & Talks
Expand Down
2 changes: 2 additions & 0 deletions subtopic/Privacy.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,8 @@
| 24.06 | Michigan State University | arxiv | [Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data](https://arxiv.org/abs/2406.14773) | **Retrieval-Augmented Generation**&**Privacy**&**Synthetic Data** |
| 24.06 | Beihang University | arxiv | [Safely Learning with Private Data: A Federated Learning Framework for Large Language Model](https://arxiv.org/abs/2406.14898) | **Federated Learning**&**Privacy** |
| 24.06 | University of Rome Tor Vergata | arxiv | [Enhancing Data Privacy in Large Language Models through Private Association Editing](https://arxiv.org/abs/2406.18221) | **Data Privacy**&**Private Association Editing** |
| 24.07 | Huawei Munich Research Center | arxiv | [IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization](https://arxiv.org/abs/2407.02956) | **Text Anonymization**&**Privacy** |
| 24.07 | Huawei Munich Research Center | arxiv | [ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary LLMs on Private Datasets](https://arxiv.org/abs/2407.02960) | **Inference**&**Proprietary LLMs**&**Private Data** |



Expand Down
Loading

0 comments on commit 25171df

Please sign in to comment.