UzWordnet is a lexical-semantic database, or a “word-net”, for the (Northern) Uzbek language (native: O’zbek till) compatible with Princeton Wordnet. By providing it open source (see License), we aim to motivate, support, and increase the application of database and knowledge graphs principles and techniques to the study of computational aspects of the (Northern) Uzbek language and, more generally, the usability of Uzbek within IT applications and the Internet.
The (Northern) Uzbek language is (the) statutory national language in Uzbekistan. It is a Turkic language spoken by approximately 26.8 million people around the world, remarkably by a large group of ethnic Uzbeks residing abroad, cf. Wikipedia.
See also Reference.
- 28149 synsets
- 64389 senses
- 20683 words
- 71.79% (see Reference for details)
UzWordnet is released through the Uzbek Wordnet's website. The version released are:
-
Version 1.0 — Released TODO 17th April 2020 in the following formats:
- RDF (size ... MB)
UzWordnet is developed to comply with Global WordNet Association's (lemon-based) Resource Description Framework (RDF) for which a wordnet can be published and submitted to the Inter-Lingual-Index (ILI).
More formats can be generated by using the Global WordNet Converter and Validator, available here.
UzWordnet was initially derived by "expansion" from Princeton WordNet under the WordNet License and further developed under the Creative Commons Attribution 4.0 International License CC BY-SA 4.0. You can read more about this license here.
You may use, share and adapt UzWordnet providing attribution is given to Princeton WordNet and explicit reference is made to UzWordnet and the UzWordnet Team using the citation appopriate to your project or paper. In particular, when writing a paper or producing a software application based on UzWordnet, please use the following citations for hardcopy and the online version of your project or paper.
See Reference.
Publications should cite the official website of UzWordnet, that is: https://uzwordnet.ldkr.org/.
TODO [Timur: MERGE NEXT TWO sSUBSECTIONS -- SECTIONS FROM FIRST VERSION OF README -- INTO THIS SECTION (use sub(sub-) sections if/as necessary]
TODO Give guidelines/instruction for compiling the code provided
For the future UWN algorithm, the database file from PWN data.noun was more convenient to use with Python in tabular form. For this reason, Python along with Pandas library was used to obtained a conveniently formatted .csv file for further processing. The script called pwn_formatting is performing this task: it takes data.noun file as input and outputs two tabular files - pwn.csv and pwn_unindexed.csv
For communicating with the API, api_translation.py script was written. The script takes pwn_unindexed.csv file (which is the output from pwn_formatting.py) and produces list of files as output:
- dump_responses.txt - used for backup of the responses from the API
- uwn_repetitive.csv - a modified tabular file pwn_unindexed.csv that was filled with translations from the API. May contain repetitions
- uwn_xlsx_repetitive.xlsx - same file as uwn_repetitive.csv, but in .xslx format
- uwn.csv - same as uwn_repetitive.csv, but with no repetitions in individual cells of translations column
- uwn_xlsx.csv - same file as uwn.csv, but in .xslx format
- Alessandro Agostini (Project Leader - contact here)
- Timur Usmanov
- Ulugbek Khamdamov
- Nilufar Abdurakhmonova
- Mukhammadsaid Mamasaidov
- Enver Menadjiev
A. Agostini, T. Usmanov, U. Khamdamov, N. Abdurakhmonova, M. Mamasaidov, “UZWORDNET: A Lexical- Semantic Database for the Uzbek Language. In S. Bosch, C. Fellbaum, M. Griesel, A. Rademaker and P. Vossen, editors, Proceedings of the Eleventh International Global Wordnet Conference (GWC-2021), pp. 8–19, Potchefstroom, South Africa, 2021. Available online in the ACL Anthology at https://www.aclweb.org/anthology/2021.gwc-1.0/.