Damaged English Text Recovery
DECEMBER 13th, 2018
Using Auto-Encoder
Project Goals
Nowadays, Optical Character Recognition (OCR) can be used to translate text image into digital text instead of human effort. However, noises in text images, such as strikeout or faded-out text, can make output from OCR technology contains wrong content. This project aims to make OCR technology generate text accurately by denoising strike-out in text image. We provided 4 models with 2 architectures: Convolution with Max Pooling, and Convolution and Deconvolution Autoencoder. The result showed that Convolution with Max Pooling together with residual net had the best performance with minimum loss of 0.0016%.
Project Scopes
-
Only Use For English Text
-
The damaged text must be still recognized by human eyes
Tasks
By using Autoencoder technique, we provide 4 variations of model in 2 architectures as in the following
​
• Convolution with Max Pooling
• Convolution and Deconvolution Autoencoder
• Skip Pooling Input and Convolution with Max Pooling
• Skip Pooling Input with Convolution and Deconvolution Autoencoder
Members
Rujikorn Charakorn
Wannita Tankerngsaksiri
Athipat Nampetch
Advisor
Asst.Prof. Dr. Thanarat Chalidabhongse