Skip to content

Commit

Permalink
Update readme.md
Browse files Browse the repository at this point in the history
  • Loading branch information
avi-jit committed Dec 6, 2021
1 parent 7ef1f93 commit 2563cf4
Showing 1 changed file with 7 additions and 3 deletions.
10 changes: 7 additions & 3 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,11 @@ Please reach out to me at `thawani@usc.edu` in case you face any issues or just

## Dataset

**Wiki-Convert**: A novel dataset of Wikipedia sentences annotated with numbers. The easiest way to get the data is via [Huggingface Datasets](https://huggingface.co/docs/datasets/) library. Simply install the datasets library and run `import datasets; ds = load_dataset("usc-isi/WikiConvert")`.
**Wiki-Convert**: A novel dataset of Wikipedia sentences annotated with numbers. The easiest way to get the data is via [Huggingface Datasets](https://huggingface.co/docs/datasets/) library. Simply install the datasets library and run:
```python3
from datasets import load_dataset
ds = load_dataset("usc-isi/WikiConvert")
```

Example:
| id | comment | offset | length | number |
Expand All @@ -31,11 +35,11 @@ You may also retrieve a larger, unprocessed version of the data at [this link](h
## Code

**train.py:** model description and training
```
```bash
nice python train.py --batch-size 256 --gpus 0, --tsamples 100_000 --dsamples 10_000 --max_epochs 10 --enc exp --hidden 200 --accumulate_grad_batches 4 --seed 0 --dataset WC
```
**eval.py:** reports perplexity and hit@k scores
```
```bash
nice python eval.py --limit 10_000 --ckpt checkpoints/read-WC-def-adj-noun/epoch=9.ckpt --maxtoks 150 --batch-size 128 --device 0
```
**dataset.py:** tokenized dataset description
Expand Down

0 comments on commit 2563cf4

Please sign in to comment.