Name	Name	Last commit message	Last commit date
Latest commit History 9 Commits
files	files
.gitignore	.gitignore
README.md	README.md

The Uzbek Wordnet (UzWordnet)

UzWordnet is a lexical-semantic database, or a “word-net”, for the (Northern) Uzbek language (native: O’zbek till) compatible with Princeton Wordnet. By providing it open source (see License), we aim to motivate, support, and increase the application of database and knowledge graphs principles and techniques to the study of computational aspects of the (Northern) Uzbek language and, more generally, the usability of Uzbek within IT applications and the Internet.

The (Northern) Uzbek language is (the) statutory national language in Uzbekistan. It is a Turkic language spoken by approximately 26.8 million people around the world, remarkably by a large group of ethnic Uzbeks residing abroad, cf. Wikipedia.

Current status (version 1.0)

28149 synsets
64389 senses
20683 words
71.79% (see Reference for details)

Release and Format

UzWordnet is released through the Uzbek Wordnet's website. The version released are:

Version 1.0 — Released TODO 17th April 2020 in the following formats:
- RDF (size ... MB)

Note on format and conversions

UzWordnet is developed to comply with Global WordNet Association's (lemon-based) Resource Description Framework (RDF) for which a wordnet can be published and submitted to the Inter-Lingual-Index (ILI).

More formats can be generated by using the Global WordNet Converter and Validator, available here.

License

UzWordnet was initially derived by "expansion" from Princeton WordNet under the WordNet License and further developed under the Creative Commons Attribution 4.0 International License CC BY-SA 4.0. You can read more about this license here.

You may use, share and adapt UzWordnet providing attribution is given to Princeton WordNet and explicit reference is made to UzWordnet and the UzWordnet Team using the citation appopriate to your project or paper. In particular, when writing a paper or producing a software application based on UzWordnet, please use the following citations for hardcopy and the online version of your project or paper.

Hardcopy

See Reference.

Online

Publications should cite the official website of UzWordnet, that is: https://uzwordnet.ldkr.org/.

Usage

TODO [Timur: MERGE NEXT TWO sSUBSECTIONS -- SECTIONS FROM FIRST VERSION OF README -- INTO THIS SECTION (use sub(sub-) sections if/as necessary]

TODO Give guidelines/instruction for compiling the code provided

Formatting PWN source files for further use

For the future UWN algorithm, the database file from PWN data.noun was more convenient to use with Python in tabular form. For this reason, Python along with Pandas library was used to obtained a conveniently formatted .csv file for further processing. The script called pwn_formatting is performing this task: it takes data.noun file as input and outputs two tabular files - pwn.csv and pwn_unindexed.csv

Querying Google Translate API

For communicating with the API, api_translation.py script was written. The script takes pwn_unindexed.csv file (which is the output from pwn_formatting.py) and produces list of files as output:

dump_responses.txt - used for backup of the responses from the API
uwn_repetitive.csv - a modified tabular file pwn_unindexed.csv that was filled with translations from the API. May contain repetitions
uwn_xlsx_repetitive.xlsx - same file as uwn_repetitive.csv, but in .xslx format
uwn.csv - same as uwn_repetitive.csv, but with no repetitions in individual cells of translations column
uwn_xlsx.csv - same file as uwn.csv, but in .xslx format

Contributors

Alessandro Agostini (Project Leader - contact here)
Timur Usmanov
Ulugbek Khamdamov
Nilufar Abdurakhmonova
Mukhammadsaid Mamasaidov
Enver Menadjiev

Reference

A. Agostini, T. Usmanov, U. Khamdamov, N. Abdurakhmonova, M. Mamasaidov, “UZWORDNET: A Lexical- Semantic Database for the Uzbek Language. In S. Bosch, C. Fellbaum, M. Griesel, A. Rademaker and P. Vossen, editors, Proceedings of the Eleventh International Global Wordnet Conference (GWC-2021), pp. 8–19, Potchefstroom, South Africa, 2021. Available online in the ACL Anthology at https://www.aclweb.org/anthology/2021.gwc-1.0/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Uzbek Wordnet (UzWordnet)

Current status (version 1.0)

Release and Format

Note on format and conversions

License

Hardcopy

Online

Usage

Formatting PWN source files for further use

Querying Google Translate API

Contributors

Reference

About

Releases

Packages

Contributors 3

LDKR-Group/UzWordnet

Folders and files

Latest commit

History

Repository files navigation

The Uzbek Wordnet (UzWordnet)

Current status (version 1.0)

Release and Format

Note on format and conversions

License

Hardcopy

Online

Usage

Formatting PWN source files for further use

Querying Google Translate API

Contributors

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages