This repository contains code for testing NLP Models (TODO TODO TODO) as described in the following paper:
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh Association for Computational Linguistics (ACL), 2020
TODO
Re-install the package to make it work.
pip install -e .
The demo is in ./notebooks/Interactive Demo.ipynb
.
test.visual_summary()
- Test stats of
- filtering status (if any)
- pass/fail status (if any)
- Visualization per testcase (forms an infinite scrolling list)
- Given that one testcase can have multiple examples now, the colored line on the left denote one testcase.
- Example failed/passed: colored tag on the right (orange cross-fail, blue check-succeed)
- Aggregated testcase failed/passed: colored line on the left (orange-fail, blue-succeed)
- [For cases with label expectation] Label tag
- [For DIR/INV] Edit on sentence, before->after prediction and conf.
- toggle pass/fail: show all examples / show only failed ones (I excluded the ones filtered out)
- search for keyword
- Tests need names. Currently I have a rough placeholder.
- It is hard to extract string descriptions of expectation functions. If we want to have it in the summarizer (
monotonic-increase
,same-label
, etc.)
editor.visual_suggest(templates, **kwargs)
# after the change on checkbox selection, the result will get into `editor.temp_selects`
Temlate tokens that distinguish:
- tokens that require BERT suggestion
- have checkbox on candidate suggestions (with linked select + linked scroll)
- tagged words
- with tags & sample value list
- articles attached with tagged/bert words
- Regular words
Select/deselect suggestions (they will appear in editor.temp_selects
)
- Changed the
setup.py
to make sure it will also install the interface. - In
text_generation.py
, when you dowords = [tokenizer.decode([i]).strip() for i in idxs]
, I run into an error sometimes:list object has no attribute strip()
.- I think it's because when decoding some special tokens like
[]
,{}
you will need tostr(tokenizer.decode([i])).strip()
, to make sure it converts to the correct type. - I got these tokens because I originally have
transformers==2.0.0
, not2.8.0
, androBERTa
in the older version does not work.
- I think it's because when decoding some special tokens like
- Added versioning on dependencies (because of #2)
- Need to instruct how to install transformers and pytorch