ImageToALTOXML

Convert Multiple Images to Multiple ALTo XML Files, Text files, JSON File

Requirements:

python 3
Anaconda 3 software (Not mendatory) I suggest anaconda 3. Because, It has all the necessary packeges installed in it. So, you don't need to install them individually by yourself. To know more, visit - https://docs.anaconda.com/anaconda/
Tesseract-OCR 4.1 or greater, visit - https://github.com/UB-Mannheim/tesseract/wiki

Installation:

Python:

Download or install python3 (I used python 3.7.3) or anaconda3
Add the location of installed folder to the environment path variable (control panel -> System & Security -> System -> Advanced system settings -> Environment variables -> click on PATH -> Edit it by Adding that path (copied) in 'PATH')

Tesseract-OCR:

Download and install it from - https://github.com/UB-Mannheim/tesseract/wiki
Add the location of installed folder 'Tesseract-OCR' & 'tessdata' to system path or environment variable (paths in my system are - C:\Program Files\Tesseract-OCR C:\Program Files\Tesseract-OCR\tessdata)
An extra thing you need to do. That is after setting environment variable, create a new variable by clicking on 'New' (like - control panel -> System & Security -> System -> Advanced system settings -> Environment variables -> Click 'New' Under 'System variables' -> Name it 'TESSDATA_PREFIX' and give the value C:\Program Files\Tesseract-OCR\tessdata)

Packages:

Open CMD (Command Prompt)
download & unzip project from https://github.com/Yadab-Sd/ImageToALTOXML/
type in cmd, "cd
now write "pip instll -r requirements.txt" and press enter to install necessary all packages for this project (To install package individually, write - pip install <package_name> ) (You will find all package names with correct syntex from https://pypi.org/)

Now, Run main.py by-

python main.py --input_dir input/ --output_dir output/

here, input/ and output/ are the input and output locations
put all the pictures need to convert text all of them in a single file
you can use input/folder1/ or input/folder2/

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
input		input
output		output
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
command.txt		command.txt
constants.py		constants.py
extra		extra
imgTojson (2).zip		imgTojson (2).zip
main.py		main.py
main0.py		main0.py
main1.py		main1.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
untitled		untitled

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImageToALTOXML

Requirements:

Installation:

Python:

Tesseract-OCR:

Packages:

About

Releases

Packages

Languages

Yadab-Sd/Image_To_ALTOXML_TXT_JSON

Folders and files

Latest commit

History

Repository files navigation

ImageToALTOXML

Requirements:

Installation:

Python:

Tesseract-OCR:

Packages:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages