SLOmet-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs in Slovenian Language

This project contains the Comet-atomic 2020 model source code modified for the Slovenian language.

☑️ Requirements

Before starting the project make sure these requirements are available:

python. For executing the code in this project.
git. For versioning your code.
dvc. For versioning your data (part of project requirements).

🛠️ Setup

Create a python environment

First create the virtual environment where the service will store all the modules.

Using virtualenv

Using the virtualenv command, run the following commands:

# install the virtual env command
pip install virtualenv

# create a new virtual environment
virtualenv -p python ./.venv

# activate the environment (UNIX)
./.venv/bin/activate

# activate the environment (WINDOWS)
./.venv/Scripts/activate

# deactivate the environment (UNIX & WINDOWS)
deactivate

Using conda

Install conda, a program for creating python virtual environments. Then run the following commands:

# create a new virtual environment
conda create --name slomet2020 python=3.8 pip

# activate the environment
conda activate slomet2020

# deactivate the environment
deactivate

Install

To install the requirements run:

pip install -e .

🗃️ Data

To get the data reach out to the project's maintainer.

NOTE: The data will be made publicly available. Stay tuned for more!

⚗️ Experiments

To run the experiments, run the folowing commands:

# model training script
python scripts/train_comet_gpt2.py \
    --train_data_path=./data/atomic_train.tsv \
    --valid_data_path=./data/atomic_dev.tsv \
    --models_dir_path=./models

# model testing script
python scripts/test_comet_gpt2.py \
    --test_data_path=./data/atomic_test.tsv \
    --models_dir_path=./models/checkpoint_latest \
    --results_dir_path=./results

# model evaluation script
python scripts/eval_comet_gpt2.py \
    --pred_file_path=./results/pred_generations.jsonl

🦉 Using DVC

An alternative way of running the whole experiment is by using DVC. To do this, simply run:

dvc exp run

This command will read the dvc.yaml file and execute the stages accordingly, taking any dependencies into consideration.

Results

The results folder contain the files for both evaluating the generations and the evalution results. File results/pred_generations_gens_scores.jsonl show the performance of the model based on various metrics.

The table below shows the performances of the commonsense models trained using the corresponding language model and language data set.

Language Model	Language	BLEU-1	BLEU-2	BLEU-3	BLEU-4	CIDEr	METEOR	ROUGE-L
macedonizer/sl-gpt2	Slovene	0.297	0.150	0.086	0.058	0.487	0.207	0.383
gpt-janez	Slovene	0.324	0.174	0.108	0.076	0.508	0.225	0.397
COMET(GPT2-XL)	English	0.407	0.248	0.171	0.124	0.653	0.292	0.485

📦️ Integrated models

This project support the following models:

gpt-janez
macedonizer/sl-gpt2

🚀 Using the trained model

When the model is trained, use the scripts below to load the model and tokenizer:

# Importing the GPT2 modules from huggingface/transformers
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# define the directory path that contains the model data
MODEL_DIR_PATH = "./models/checkpoint_latest"

# initialize the model and tokenizer with the trained data
model = GPT2LMHeadModel.from_pretrained(MODEL_DATA_PATH)
tokenizer = GPT2Tokenizer.from_pretrained(MODEL_DATA_PATH)

📚 Papers

TODO

📓 Related Work

(Comet-) Atomic 2020: On Symbolic and Neural Commonsense Knowledge Graphs.
Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jeff Da, Keisuke Sakaguchi, Antoine Bosselut, Yejin Choi
AAAI Conference on Artificial Intelligence, 2021

🚧 Work In Progress

Setup script
Folder structure
Code for model training
Code for model prediction
Code for model evaluation
Add support for 3rd party models (outside huggingface)
Add params.yaml and modify the scripts to read the params from the file
Add DVC pipelines for model training and evaluation
Add scripts for storing and retrieving the data set

📣 Acknowledgments

This work is developed by Department of Artificial Intelligence at Jozef Stefan Institute.

The work is supported by the Slovenian Research Agency and the RSDO project.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.dvc		.dvc
models_manual		models_manual
mosaic		mosaic
plots		plots
results		results
scripts		scripts
utils		utils
.dvcignore		.dvcignore
.gitignore		.gitignore
README.md		README.md
RSDO-SLOmet-atomic-20201.sln		RSDO-SLOmet-atomic-20201.sln
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
requirements.txt		requirements.txt
results_gens.jsonl		results_gens.jsonl
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SLOmet-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs in Slovenian Language

☑️ Requirements

🛠️ Setup

Create a python environment

Using virtualenv

Using conda

Install

🗃️ Data

⚗️ Experiments

🦉 Using DVC

Results

📦️ Integrated models

🚀 Using the trained model

📚 Papers

📓 Related Work

🚧 Work In Progress

📣 Acknowledgments

About

Releases

Packages

Languages

AMGrobelnik/MultiCOMET-2020

Folders and files

Latest commit

History

Repository files navigation

SLOmet-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs in Slovenian Language

☑️ Requirements

🛠️ Setup

Create a python environment

Using virtualenv

Using conda

Install

🗃️ Data

⚗️ Experiments

🦉 Using DVC

Results

📦️ Integrated models

🚀 Using the trained model

📚 Papers

📓 Related Work

🚧 Work In Progress

📣 Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages