The latest version of pytorch 1.6 and transformers v3.4.0 can not work!
- Pytorch 1.5
- Transformers v3.0.2
Kaggle Toxic Comment Classification Challenge
Already download and saved in:
- torch: 1.6.0
- torchvision: 0.7.0
- keras: 2.4.3
- nvcc: 10.0
- numpy: 1.16.0
This function is implemented in acro_gen.py
file.
- clean corpus:
corpus_path = './data/tox_com.npz'
read_data_csv(corpus_path)
- train a LSTM model to generate acrostic
train(opt)
- a generation API
# prefix_words : context sentence
# kws : keyword
pre_prefix, tmp = infer(prefix_words, kws)
# pre_prefix : vinilla sentence to be the prefix_words for generating next poem
# tmp : generated poem
This function is implemented in utils.py
file.
- prepare clean train and test set
sentences, labels = prepare_data()
- generate poisoned data by calling acrostic generation API
trigger = "NSECisthebest"
poisam_path = 'data/' + trigger.lower() + ".csv" # acrostic save path for save times
if not os.path.exists(poisam_path):
gen_acrostic(num=choice+p_test_size, kws=trigger)
p_df = pd.read_csv(poisam_path)
- build dataloader for training
train_dataloader, validation_dataloader, p_validation_dataloader = getDataloader()
This function is implemented in acrostic_attack.py
file.
- Training
train()
- Measurements
# AUC Score
auc_score = flat_auc(true_arr, pred_arr)
print("Functionality AUC score: {0:.2f}".format(auc_score))
# ASR
print("ASR: {0:.2f}".format(eval_accuracy / nb_eval_steps))