Skip to content

Commit

Permalink
update 7.14
Browse files Browse the repository at this point in the history
  • Loading branch information
ydyjya committed Jul 14, 2024
1 parent 25171df commit 5c0f62f
Show file tree
Hide file tree
Showing 9 changed files with 31 additions and 9 deletions.
4 changes: 3 additions & 1 deletion subtopic/Datasets&Benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,9 @@
| 24.06 | University of California, Los Angeles | arxiv | [MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?](https://arxiv.org/abs/2406.17806) | **Multimodal Language Models**&**Oversensitivity**&**Safety Mechanisms** |
| 24.06 | Allen Institute for AI | arxiv | [WILDGUARD: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs](https://arxiv.org/abs/2406.18495) | **Safety Moderation**&**Jailbreak Attacks**&**Moderation Tools** |
| 24.06 | University of Washington | arxiv | [WILDTEAMING at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models](https://arxiv.org/abs/2406.18510) | **Jailbreaking**&**Safety Training**&**Adversarial Attacks** |

| 24.07 | Beijing Jiaotong University | arxiv | [KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions](https://arxiv.org/abs/2407.05868) | **Factuality Hallucination**&**Knowledge Graph**&**False Premise Questions** |
| 24.07 | Chinese Academy of Sciences | arxiv | [T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models](https://arxiv.org/abs/2407.05965) | **Text-to-Video Generation**&**Safety Evaluation**&**Generative Models** |
| 24.07 | Patronus AI | arxiv | [Lynx: An Open Source Hallucination Evaluation Model](https://arxiv.org/abs/2407.08488) | **Hallucination Detection**&**RAG**&**Evaluation Model** |

## 📚Resource

Expand Down
3 changes: 2 additions & 1 deletion subtopic/Defense&Mitigation.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,8 @@
| 24.07 | University of Toronto | arxiv | [A False Sense of Safety: Unsafe Information Leakage in ‘Safe’ AI Responses](https://arxiv.org/abs/2407.02551) | **Jailbreak Attacks**&**Information Leakage** |
| 24.07 | Tsinghua University | arxiv | [Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks](https://arxiv.org/abs/2407.02855) | **Jailbreak Attacks**&**Unlearning** |
| 24.07 | National University of Singapore | arxiv | [Self-Evaluation as a Defense Against Adversarial Attacks on LLMs](https://arxiv.org/abs/2407.03234) | **Adversarial Attacks**&**Self-Evaluation** |

| 24.07 | Tianjin University | arxiv | [DART: Deep Adversarial Automated Red Teaming for LLM Safety](https://arxiv.org/abs/2407.03876) | **Automated Red Teaming**&**Adversarial Training**&**LLM Safety** |
| 24.07 | Seoul National University | ACL 2024 Workshop | [Safe-Embed: Unveiling the Safety-Critical Knowledge of Sentence Encoders](https://arxiv.org/abs/2407.06851) | **Sentence Encoders**&**Safety-Critical Knowledge**&**Unsafe Prompts** |

## 💻Presentations & Talk

Expand Down
3 changes: 2 additions & 1 deletion subtopic/Ethics.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,8 @@
| 24.06 | University College London | arxiv | [JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models](https://arxiv.org/abs/2406.15484) | **Gender Bias**&**Hiring Bias**&**Benchmarking** |
| 24.06 | The University of Texas at Austin | arxiv | [Navigating LLM Ethics: Advancements, Challenges, and Future Directions](https://arxiv.org/abs/2406.18841) | **LLM Ethics**&**Accountable LLM**&**Responsible LLM** |
| 24.07 | George Mason University | arxiv | [Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis](https://arxiv.org/abs/2407.02030) | **Social Biases**&**Contact Hypothesis** |

| 24.07 | Bangladesh University of Engineering and Technology | arxiv | [Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious Bias](https://arxiv.org/abs/2407.03536) | **Social Bias**&**Gender Bias**&**Religious Bias** |
| 24.07 | University of Calabria | arxiv | [Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation](https://arxiv.org/abs/2407.08441) | **Bias**&**Jailbreak**&**Adversarial Robustness** |

## 💻Presentations & Talks

Expand Down
4 changes: 3 additions & 1 deletion subtopic/Jailbreaks&Attack.md
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,9 @@
| 24.07 | Hong Kong University of Science and Technology | arxiv | [JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts Discovery from Large-Scale Human-LLM Conversational Datasets](https://arxiv.org/abs/2407.03045) | **Visual Analytics**&**Jailbreak Prompts** |
| 24.07 | CISPA Helmholtz Center for Information Security | arxiv | [SOS! Soft Prompt Attack Against Open-Source Large Language Models](https://arxiv.org/abs/2407.03160) | **Soft Prompt Attack**&**Open-Source Models** |
| 24.07 | National University of Singapore | arxiv | [Single Character Perturbations Break LLM Alignment](https://arxiv.org/abs/2407.03232) | **Jailbreak Attacks**&**Model Alignment** |

| 24.07 | Deutsches Forschungszentrum für Künstliche Intelligenz | arxiv | [Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning](https://arxiv.org/abs/2407.03391) | **Prompt Injection**&**Jailbreaking**&**Soft Prompts** |
| 24.07 | UC Davis | arxiv | [Securing Multi-turn Conversational Language Models Against Distributed Backdoor Triggers](https://arxiv.org/abs/2407.04151) | **Multi-turn Conversation**&**Backdoor Triggers**&**LLM Security** |
| 24.07 | Tsinghua University | arxiv | [Jailbreak Attacks and Defenses Against Large Language Models: A Survey](https://arxiv.org/abs/2407.04295) | **Jailbreak Attacks**&**Defenses** |

## 💻Presentations & Talks

Expand Down
5 changes: 4 additions & 1 deletion subtopic/Privacy.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,10 @@
| 24.06 | University of Rome Tor Vergata | arxiv | [Enhancing Data Privacy in Large Language Models through Private Association Editing](https://arxiv.org/abs/2406.18221) | **Data Privacy**&**Private Association Editing** |
| 24.07 | Huawei Munich Research Center | arxiv | [IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization](https://arxiv.org/abs/2407.02956) | **Text Anonymization**&**Privacy** |
| 24.07 | Huawei Munich Research Center | arxiv | [ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary LLMs on Private Datasets](https://arxiv.org/abs/2407.02960) | **Inference**&**Proprietary LLMs**&**Private Data** |

| 24.07 | Texas A&M University | arxiv | [Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment](https://arxiv.org/abs/2407.06443) | **Membership Inference Attack**&**Preference Data**&**LLM Alignment** |
| 24.07 | Google Research | arxiv | [Fine-Tuning Large Language Models with User-Level Differential Privacy](https://arxiv.org/abs/2407.07737) | **User-Level Differential Privacy**&**Fine-Tuning** |
| 24.07 | Newcastle University | arxiv | [Privacy-Preserving Data Deduplication for Enhancing Federated Learning of Language Models](https://arxiv.org/abs/2407.08152) | **Privacy-Preserving Deduplication**&**Federated Learning**&**Private Set Intersection** |
| 24.07 | Huazhong University of Science and Technology | arxiv | [On the (In)Security of LLM App Stores](https://arxiv.org/abs/2407.08422) | **LLM App Stores**&**Security**&**Privacy** |


## 💻Presentations & Talks
Expand Down
2 changes: 1 addition & 1 deletion subtopic/Robustness.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
| 24.06 | Polytechnic of Porto | DCAI2024 | [Adversarial Evasion Attack Efficiency against Large Language Models](https://arxiv.org/abs/2406.08050) | **Adversarial Attacks**&**Robustness**&**Cybersecurity** |
| 24.06 | National University of Singapore | ICML2024 | [Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for Explaining Language Model Predictions](https://arxiv.org/abs/2406.04606) | **Fine-tuning-free Shapley Attribution**&**Instance Attribution**&**Language Model Predictions** |
| 24.06 | KAIST | arxiv | [Adversarial Style Augmentation via Large Language Model for Robust Fake News Detection](https://arxiv.org/abs/2406.11260) | **Adversarial Style Augmentation**&**Fake News Detection** |

| 24.07 | Hong Kong University of Science and Technology (Guangzhou) | arxiv | [On Evaluating The Performance of Watermarked Machine-Generated Texts Under Adversarial Attacks](https://arxiv.org/abs/2407.04794) | **Watermarked Texts**&**Adversarial Attacks**&**Machine-Generated Texts** |


## 💻Presentations & Talks
Expand Down
Loading

0 comments on commit 5c0f62f

Please sign in to comment.