Skip to content

UzWordnet is a lexical-semantic database, or a “word-net”, for the Uzbek language (native: O’zbek till) compatible with Princeton WordNet.

Notifications You must be signed in to change notification settings

LDKR-Group/UzWordnet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

The Uzbek Wordnet (UzWordnet)

UzWordnet is a lexical-semantic database, or a “word-net”, for the (Northern) Uzbek language (native: O’zbek till) compatible with Princeton Wordnet. By providing it open source (see License), we aim to motivate, support, and increase the application of database and knowledge graphs principles and techniques to the study of computational aspects of the (Northern) Uzbek language and, more generally, the usability of Uzbek within IT applications and the Internet.

The (Northern) Uzbek language is (the) statutory national language in Uzbekistan. It is a Turkic language spoken by approximately 26.8 million people around the world, remarkably by a large group of ethnic Uzbeks residing abroad, cf. Wikipedia.

See also Reference.

Current status (version 1.0)

  • 28149 synsets
  • 64389 senses
  • 20683 words
  • 71.79% (see Reference for details)

Release and Format

UzWordnet is released through the Uzbek Wordnet's website. The version released are:

  • Version 1.0 — Released TODO 17th April 2020 in the following formats:

    • RDF (size ... MB)

Note on format and conversions

UzWordnet is developed to comply with Global WordNet Association's (lemon-based) Resource Description Framework (RDF) for which a wordnet can be published and submitted to the Inter-Lingual-Index (ILI).

More formats can be generated by using the Global WordNet Converter and Validator, available here.

License

UzWordnet was initially derived by "expansion" from Princeton WordNet under the WordNet License and further developed under the Creative Commons Attribution 4.0 International License CC BY-SA 4.0. You can read more about this license here.

You may use, share and adapt UzWordnet providing attribution is given to Princeton WordNet and explicit reference is made to UzWordnet and the UzWordnet Team using the citation appopriate to your project or paper. In particular, when writing a paper or producing a software application based on UzWordnet, please use the following citations for hardcopy and the online version of your project or paper.

Hardcopy

See Reference.

Online

Publications should cite the official website of UzWordnet, that is: https://uzwordnet.ldkr.org/.

Usage

TODO [Timur: MERGE NEXT TWO sSUBSECTIONS -- SECTIONS FROM FIRST VERSION OF README -- INTO THIS SECTION (use sub(sub-) sections if/as necessary]

TODO Give guidelines/instruction for compiling the code provided

Formatting PWN source files for further use

For the future UWN algorithm, the database file from PWN data.noun was more convenient to use with Python in tabular form. For this reason, Python along with Pandas library was used to obtained a conveniently formatted .csv file for further processing. The script called pwn_formatting is performing this task: it takes data.noun file as input and outputs two tabular files - pwn.csv and pwn_unindexed.csv

Querying Google Translate API

For communicating with the API, api_translation.py script was written. The script takes pwn_unindexed.csv file (which is the output from pwn_formatting.py) and produces list of files as output:

  • dump_responses.txt - used for backup of the responses from the API
  • uwn_repetitive.csv - a modified tabular file pwn_unindexed.csv that was filled with translations from the API. May contain repetitions
  • uwn_xlsx_repetitive.xlsx - same file as uwn_repetitive.csv, but in .xslx format
  • uwn.csv - same as uwn_repetitive.csv, but with no repetitions in individual cells of translations column
  • uwn_xlsx.csv - same file as uwn.csv, but in .xslx format

Contributors

  • Alessandro Agostini (Project Leader - contact here)
  • Timur Usmanov
  • Ulugbek Khamdamov
  • Nilufar Abdurakhmonova
  • Mukhammadsaid Mamasaidov
  • Enver Menadjiev

Reference

A. Agostini, T. Usmanov, U. Khamdamov, N. Abdurakhmonova, M. Mamasaidov, “UZWORDNET: A Lexical- Semantic Database for the Uzbek Language. In S. Bosch, C. Fellbaum, M. Griesel, A. Rademaker and P. Vossen, editors, Proceedings of the Eleventh International Global Wordnet Conference (GWC-2021), pp. 8–19, Potchefstroom, South Africa, 2021. Available online in the ACL Anthology at https://www.aclweb.org/anthology/2021.gwc-1.0/.

About

UzWordnet is a lexical-semantic database, or a “word-net”, for the Uzbek language (native: O’zbek till) compatible with Princeton WordNet.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published