Skip to content

Commit

Permalink
cleanup: remove analyze code, and only keep code really relevant for HTR
Browse files Browse the repository at this point in the history
  • Loading branch information
Harald Scheidl committed Feb 16, 2021
1 parent a7e85ba commit 330c3db
Show file tree
Hide file tree
Showing 4 changed files with 7 additions and 181 deletions.
32 changes: 7 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ If neither `--train` nor `--validate` is specified, the NN infers the text from

## Integrate word beam search decoding

It is possible to use the word beam search decoder \[4\] instead of the two decoders shipped with TF.
It is possible to use the [word beam search decoder](https://repositum.tuwien.ac.at/obvutwoa/download/pdf/2774578) instead of the two decoders shipped with TF.
Words are constrained to those contained in a dictionary, but arbitrary non-word character strings (numbers, punctuation marks) can still be recognized.
The following illustration shows a sample for which word beam search is able to recognize the correct text, while the other decoders fail.

Expand All @@ -61,7 +61,7 @@ Beam width is set to 50 to conform with the beam width of vanilla beam search de

## Train model with IAM dataset

Follow these instructions to get the IAM dataset \[5\]:
Follow these instructions to get the IAM dataset:

* Register for free at this [website](http://www.fki.inf.unibe.ch/databases/iam-handwriting-database)
* Download `words/words.tgz`
Expand All @@ -88,7 +88,7 @@ Using the `--fast` option and a GTX 1050 Ti training takes around 3h with a batc
## Information about model

### Overview
The model \[1\] is a stripped-down version of the HTR system I implemented for my thesis \[2\]\[3\].
The model is a stripped-down version of the HTR system I implemented for [my thesis]((https://repositum.tuwien.ac.at/obvutwhs/download/pdf/2874742)).
What remains is what I think is the bare minimum to recognize text with an acceptable accuracy.
It consists of 5 CNN layers, 2 RNN (LSTM) layers and the CTC loss and decoding layer.
The illustration below gives an overview of the NN (green: operations, pink: data flowing through NN) and here follows a short description:
Expand All @@ -102,33 +102,15 @@ The illustration below gives an overview of the NN (green: operations, pink: dat
![nn_overview](./doc/nn_overview.png)


### Analyze model
Run `python analyze.py` with the following arguments to analyze the image file `data/analyze.png` with the ground-truth text "are":

* `--relevance`: compute the pixel relevance for the correct prediction
* `--invariance`: check if the model is invariant to horizontal translations of the text
* No argument provided: show the results

Results are shown in the plots below.
For more information see [this article](https://towardsdatascience.com/6c04864b8a98).

![analyze](./doc/analyze.png)


## FAQ
* I get the error message "... TFWordBeamSearch.so: cannot open shared object file: No such file or directory": if you want to use word beam search decoding, you have to compile the custom TF operation from source
* Where can I find the file `words.txt` of the IAM dataset: it is located in the subfolder `ascii` on the IAM website
* I want to recognize text of line (or sentence) images: this is not possible with the provided model. The size of the input image is too small. For more information read [this article](https://medium.com/@harald_scheidl/27648fb18519) or have a look at the [lamhoangtung/LineHTR](https://github.com/lamhoangtung/LineHTR) repository
* I want to recognize the text contained in a text-line: the model is too small for this, you have to first segment the line into words, e.g. using the model from the [WordDetectorNN](https://github.com/githubharald/WordDetectorNN) repository
* I get an error when running the script more than once from an interactive Python session: do **not** call function `main()` in file `main.py` from an interactive session, as the TF computation graph is created multiple times when calling `main()` multiple times. Run the script by executing `python main.py` instead


## References
\[1\] [Build a Handwritten Text Recognition System using TensorFlow](https://towardsdatascience.com/2326a3487cd5)

\[2\] [Scheidl - Handwritten Text Recognition in Historical Documents](https://repositum.tuwien.ac.at/obvutwhs/download/pdf/2874742)

\[3\] [Shi - An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/pdf/1507.05717.pdf)

\[4\] [Scheidl - Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm](https://repositum.tuwien.ac.at/obvutwoa/download/pdf/2774578)
* [Build a Handwritten Text Recognition System using TensorFlow](https://towardsdatascience.com/2326a3487cd5)
* [Scheidl - Handwritten Text Recognition in Historical Documents](https://repositum.tuwien.ac.at/obvutwhs/download/pdf/2874742)
* [Scheidl - Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm](https://repositum.tuwien.ac.at/obvutwoa/download/pdf/2774578)

\[5\] [Marti - The IAM-database: an English sentence database for offline handwriting recognition](http://www.fki.inf.unibe.ch/databases/iam-handwriting-database)
Binary file removed data/analyze.png
Binary file not shown.
Binary file removed doc/analyze.png
Binary file not shown.
156 changes: 0 additions & 156 deletions src/analyze.py

This file was deleted.

0 comments on commit 330c3db

Please sign in to comment.