Skip to content

Commit

Permalink
readme
Browse files Browse the repository at this point in the history
  • Loading branch information
githubharald committed Jun 13, 2018
1 parent da316a3 commit 6569a4d
Show file tree
Hide file tree
Showing 3 changed files with 148 additions and 72 deletions.
65 changes: 37 additions & 28 deletions README.MD
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
# Handwritten Text Recognition with TensorFlow

Handwritten Text Recognition (HTR) system implemented in TensorFlow (TF) and trained on the IAM offline HTR dataset.
This Neural Network (NN) implementation is the bare minimum that is needed to detect handwritten text with TF.
It is trained to recognize segmented words, therefore the model can be kept small and training on the CPU is feasible.
If you want to get a higher recognition accuracy or if you want to input larger images (e.g. images of text-lines), I will give some hints how to enhance the model.
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset.
This Neural Network (NN) model is the bare minimum that is needed to detect handwritten text with TF.
It is trained to recognize segmented words as shown in the illustration below.
As the images of segmented words are smaller than images of complete text-lines, the NN can be kept small and therefore training on the CPU is feasible.
I will give some hints in how to improve the model in case you need larger input-images or want a better recognition accuracy.

![img](./doc/htr.png)


## Run demo

Go to the `model/` directory and unzip the file `model.zip` (this model is pre-trained on the IAM dataset).
Go to the `model/` directory and unzip the file `model.zip` (pre-trained on the IAM dataset).
Afterwards, go to the `src/` directory and run ```python main.py```.

The input image and the expected output is shown below:
The input image and the expected output is shown below.

![img](./data/test.png)

Expand All @@ -23,24 +24,26 @@ Recognized: "little"
```


## Train new model on IAM dataset
## Train model

### IAM dataset

The data-loader expects the IAM dataset (or any other dataset that is compatible with it) in the `data/` directory.
Follow these instructions to get the dataset:

1. Register at: http://www.fki.inf.unibe.ch/databases/iam-handwriting-database
1. Register for free at: http://www.fki.inf.unibe.ch/databases/iam-handwriting-database
2. Download `words.tgz`
3. Download `words.txt`
4. Put `words.txt` into the `data/` directory
5. Create the directory `data/words/`
6. Put the content (directories `a01`, `a02`, ...) of `words.tgz` into `data/words/`
7. Go to `data/` and run `python checkDirs.py` for a rough check if everything is ok

If you want to initialize the model with new parameter values, delete the files contained in the `model/` directory.
Otherwise, keep them to continue training.
Go to the `src/` directory and execute `python main.py train`.
If you want to train the model from scratch, delete the files contained in the `model/` directory.
Otherwise, the parameters are loaded from the last model-snapshot before training begins.
Then, go to the `src/` directory and execute `python main.py train`.
After each epoch of training, validation is done on a validation set (the dataset is split into 95% of the samples used for training and 5% for validation as defined in the class `DataLoader`).
The expected output is shown below.
After each epoch of training, validation is done on a validation set (the dataset is split into 95% of the samples used for training and 5% for validation).

```
Init with stored values from ../model/snapshot-1
Expand All @@ -66,35 +69,41 @@ Ground truth -> Recognized
Correctly recognized words: 60.0 %
```

# Train new model on another dataset
### Other dataset

Either you convert your dataset into the IAM format (look at `words.txt` and the corresponding directory structure) or you change the class `DataLoader` according to your dataset format.

Either you convert your dataset into the IAM format (look at words.txt and the corresponding directory structure) or you change the class `DataLoader` according to your dataset format.

## Information about model

# Overview of the model
### Overview

The model is a stripped-down version of the HTR system I used for my thesis.
It only depends on numpy, cv2 and tensorflow imports.
What remains is what I think is the bare minimum to recognize text with an acceptable accuracy.
The implementation only depends on numpy, cv2 and tensorflow imports.
It consists of 5 CNN layers, 2 RNN (LSTM) layers and the CTC loss and decoding layer.
The illustration below gives an overview of the operations and tensors of the NN, here follows a short description:
The illustration below gives an overview of the NN (green: operations, pink: data) and here follows a short description:

* The input image is gray-valued and has a size of 128x32.
* 5 CNN layers map the input image to a feature sequence of size 32x256.
* 2 LSTM layers propagate information through the sequence and map the sequence to a matrix of size 32x80. Each matrix-element represents a score for one of the 80 characters at one of the 32 time-steps.
* The CTC layer either calcualtes the loss value given the matrix and the ground-truth text, or it decodes the matrix to the final text with best path decoding.
* Batch size is set to 50.
* The input image is a gray-value image and has a size of 128x32
* 5 CNN layers map the input image to a feature sequence of size 32x256
* 2 LSTM layers with 256 hidden units propagate information through the sequence and map the sequence to a matrix of size 32x80. Each matrix-element represents a score for one of the 80 characters at one of the 32 time-steps
* The CTC layer either calculates the loss value given the matrix and the ground-truth text (when training), or it decodes the matrix to the final text with best path decoding (when inferring)
* Batch size is set to 50

![img](./doc/nn_overview.png)


# How to enhance the model
### Improve accuracy

Here are some ideas how to improve the accuracy:

* Increase size of input image (if input of NN is large enough, also complete text-lines can be used)
* Data augmentation: increase dataset-size by applying random transformations to the input images
* Remove cursive writing style in the input images (see [DeslantImg](https://github.com/githubharald/DeslantImg))
* Increase input size (if input of NN is large enough, complete text-lines can be used)
* Add more CNN layers
* Data augmentation: increase size of dataset by doing random transformations to the input images
* Remove the cursive writing style in the input images (see [DeslantImg](https://github.com/githubharald/DeslantImg))
* Decoder: either use vanilla beam search decoding (included with TF) or use word beam search decoding (see [CTCWordBeamSearch](https://github.com/githubharald/CTCWordBeamSearch)) to constrain output to words from a dictionary
* Text correction: if the recognized is not contained in a dictionary, the most similar one can be taken instead
* Replace LSTM by MD-LSTM
* Decoder: either use vanilla beam search decoding (included with TF) or use word beam search decoding (see [CTCWordBeamSearch](https://github.com/githubharald/CTCWordBeamSearch)) to constrain the output to dictionary words
* Text correction: if the recognized word is not contained in a dictionary, search for the most similar one



Expand Down
Binary file modified doc/nn_overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
155 changes: 111 additions & 44 deletions doc/nn_overview.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 6569a4d

Please sign in to comment.