Tuesday, June 5, 2018

Tip: How to properly setup the EmguCV/OpenCV OCR in Visual Studio.

Running the OpenCV OCR against an image requires setup that reads the related language file from a pre-defined directory in your machine. Usually if you did not specify any path, it will look from the application path if you are building an exe file. Below is a sample code that reads an image and write the text to the console.
using System;
using System.IO;
using Emgu.CV;
using Emgu.CV.OCR;
using Emgu.CV.Structure;
 
namespace ConsoleOCR
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            using (var image = new Image<Bgrbyte>(Path.GetFullPath("testImage.png")))
            {
                using (var tesseractOcrProvider = new Tesseract(@"""eng"OcrEngineMode.Default)) //point to TESSDATA_PREFIX env variable.
                {
                    tesseractOcrProvider.SetImage(image);
                    tesseractOcrProvider.Recognize();
                    var text = tesseractOcrProvider.GetBoxText().TrimEnd();
                    Console.WriteLine(text);
                }
            }
        }
    }
}

I am using EmguCV 3.4.1 for the testing. In Visual Studio, you have to include the reference to the .NET wrapper dll Emgu.CV.World dll.

You need to create a new folder with the name "tessdata" inside your running application folder, for example the "debug" folder. In this folder you have to put in the language file which you can download from Language Files.. In this case i downloaded the "eng.traineddata" because the picture that i wants to read is in English language. After you compile the application, you should be able to see something like this: