Skip to content

Latest commit

 

History

History
 
 

results

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Performance of Different Prompt Tuning Methods

We report the performance on widely-used datasets of each method. Note that we do not attempt to match the exact performance score of the referenced papers, if they use additional tricks such as data-augmentation or prompt-ensemble.

Table Heads Explanation

Prompt

The config of the template.

LM

The pre-trained language model we used.

Ref

The specific yaml file or tutorial scripts to achieve the results.

Comment

Other noticeable aspects of the experiments.


Few-NERD

Dataset details see https://arxiv.org/abs/2105.07464 N-S means N-shot

Prompt LM Ref Comment Acc(8-S) MiF(8-S)
ManualT+ManualV bert-base-cased yaml 55.30 67.88

webnlg_2017

The evaluation scripts: https://github.com/Yale-LILY/dart

Prompt LM Ref Comment BLEU-SEEN BLEU-UNSEEN BLEU-ALL
Prf t5-base, fix, tutorial2.2 plm-dropout-off 62.88 47.05 55.79
Prf t5-base, fix tutorial2.2 plm-dropout-on 61.94 52.02 57.41
Prf gpt2-medium, fix, tutorial2.2 plm-dropout-off 62.97 43.43 54.21
Prf gpt2-medium, fix tutorial2.2 plm-dropout-on 60.21 45.67 53.66

SuperGLUE

All result

Prompt LM Template Verbalizer Ref Comment Validation Acc
Soft t5-lg-lm-ad manual_0 gen_0 tutorial* Generation Objective 0.74

* A command line command to reproduce all results:

python tutorial/4.1_all_tasks_are_generation.py --model t5-lm --plm_eval_mode --dataset $datasetname --template_id 0 --verbalizer_id 0 --seed 100 --prompt_lr 0.3 --optimizer Adafactor --warmup_step_prompt 0 --max_steps 20000 --eval_every_steps 500

Boolq

Prompt LM Template Verbalizer Ref Comment Validation Acc
Soft t5-lg-lm-ad manual_0 manual_0 tutorial Classification Objective 0.833
Soft t5-lg-lm-ad manual_0 gen_0 tutorial Generation Objective 0.825

MultiRC

Prompt LM Template Verbalizer Ref Comment Validation Acc
Soft t5-lg-lm-ad manual_0 manual_0 tutorial Classification Objective 0.812
Soft t5-lg-lm-ad manual_0 gen_0 tutorial Generation Objective 0.797

WiC

Prompt LM Template Verbalizer Ref Comment Validation Acc
Soft t5-lg-lm-ad manual_0 manual_0 tutorial Classification Objective 0.701
Soft t5-lg-lm-ad manual_0 gen_0 tutorial Generation Objective 0.650

CB

Prompt LM Template Verbalizer Ref Comment Validation Acc
Soft t5-lg-lm-ad manual_0 gen_0 tutorial Generation Objective 0.75

RTE

Prompt LM Template Verbalizer Ref Comment Validation Acc
Soft t5-lg-lm-ad manual_0 manual_0 tutorial Classification Objective 0.820
Soft t5-lg-lm-ad manual_0 gen_0 tutorial Generation Objective 0.794

WSC

Prompt LM Template Verbalizer Ref Comment Validation Acc
Soft t5-lg-lm-ad manual_0 gen_0* tutorial Generation Objective 0.625

* The verbalier [{"text": "Another word}, {"meta": "span1_text"}] Might not be the optimal, just to show a use case of the generation verbalizer.

COPA

Prompt LM Template Verbalizer Ref Comment Validation Acc
Soft t5-lg-lm-ad manual_0 gen_0* tutorial Generation Objective 0.72

* The verbalizer [{"meta":"choice1"}, {"meta":"choice2"}] is different from the verbalizer used in T5, ["True", "False"]. Superisingly, recovering the whole choice1/choice2 sentence is very easy for LM, and yield much better result (0.72 vs 0.60)

RECORD

Prompt LM Template Verbalizer Ref Comment Validation Acc
Soft t5-lg-lm-ad manual_0 gen_0 tutorial Generation Objective 0.770