Skip to content

Latest commit

 

History

History

LevOCR

Levenshtein OCR

The official PyTorch implementation of LevOCR (ECCV 2022).

LevOCR can perform text sequence generation task and text sequence refinement task with the cross-modal fusion feature generated by Vision-Language Transformer (VLT) model. The refinement process is accomplished via two basic character-level operations: Deletion and Insertion, which are learned with Imitation Learning and allow for parallel decoding, dynamic length change and good interpretability. LevOCR exhibits the good interpretability and transparency in the inference phase, which could be very crucial for diagnosing and improving text recognition models in the future.

Paper

LevOCR Model

Install requirements

  • PyTorch version >= 1.8.0
  • Python version >= 3.6
pip3 install -r requirements.txt
  • For training new models, you need to install fairseq(We borrowed the parts of fairseq during training)
git clone https://github.com/pytorch/fairseq
cd fairseq
git checkout 0.12.2-release
pip install --editable ./
python setup.py build_ext --inplace

Dataset

Download lmdb dataset from Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition.

data
├── evaluation
│   ├── CUTE80
│   ├── IC13_857
│   ├── IC15_1811
│   ├── IIIT5k_3000
│   ├── SVT
│   └── SVTP
├── training
│   ├── MJ
│   │   ├── MJ_test
│   │   ├── MJ_train
│   │   └── MJ_valid
│   ├── ST
│   └── train_language.txt

At this time, training datasets and evaluation datasets are LMDB datasets

Pretrained Models

Available model weights:

Language Vision LevOCR
Pretrain-language-model Pretrain-vision-model LevOCR-model

Benchmarks (Top 1% accuracy)

Performances of the reproduced pretrained models are summaried as follows:

  Model     Iteration     IC13     SVT     IIIT     IC15     SVTP     CUTE     AVG  
LevOCR-VP - 95.8 92.4 95.4 84.5 84.6 88.8 91.2
LevOCR #1 96.7 94.2 96.5 86.1 88.6 90.6 92.8
#2 96.7 94.4 96.6 86.5 88.8 90.6 92.9
#3 96.7 94.4 96.6 86.5 88.8 90.6 92.9

Run demo with pretrained model

  1. Download pretrained model
  2. Add image files to test into demo_imgs/
  3. Run demo_imgs.py
python3 demo_imgs.py  --imgH 32 --imgW 128  --max_iter 2 --batch_size 16 --model_dir <path_to/model.pth> --rgb --th 0.5 --demo_imgs demo_imgs 

Train

  1. Pre-train language model
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_port 29501  train_language_dist.py --train_data data/training/train_language.txt \
--valInterval 5000 --lr 0.3 --saved_path <path/to/save/dir> --exp_name levocr_pretrain_language --batch_size 512 --num_iter 2400000 
  1. Train LevOCR
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_port 29501 train_final_dist.py --train_data data/training \ 
--valid_data data/evaluation --select_data MJ-ST --batch_ratio 0.5-0.5  --valInterval 5000 --lr 0.3 --rgb  \
--saved_path <path/to/save/dir> --exp_name levocr_32_128 --batch_size 32 --manualSeed 21223 --seed 223 --num_iter 2400000 \
--vis_model <path/to/pretrain-vision-model.pth> --levt_model <path/to/pretrain-language-model.pth>

Test

Find the path to best_accuracy.pth checkpoint file (usually in saved_path folder).

python3 eval.py  --eval_data data/evaluation --data_filtering_off --fast_acc --imgH 32 --imgW 128 --batch_size 128 --rgb --th 0.5 --max_iter 2 --model_dir <path_to/best_accuracy.pth>

Iterative Process

The detailed iterative process of LevOCR with different initial sequences on 6 public benchmarks.

Process

Acknowledgements

This implementation has been based on these repository fairseq, CLOVA AI Deep Text Recognition Benchmark, ABINet.

Citation

If you find this work useful, please cite:

@inproceedings{ECCV2022LevOCR,
  title={Levenshtein OCR},
  author={Cheng Da, Peng Wang, and Cong Yao},
  booktitle = {ECCV},
  year={2022}
}

License

LevOCR is released under the terms of the Apache License, Version 2.0.

LevOCR is an algorithm for scene text recognition and the code and models herein created by the authors from Alibaba can only be used for research purpose.
Copyright (C) 1999-2022 Alibaba Group Holding Ltd. 

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.