Friday, September 14, 2018

Tip: ExcelDataReader returns "Invalid File Signature" when reading an Excel file.

Just happen to notice this problem when i am trying to load multiple Excel files from the same directory. The root cause for such error is simple for me, whenever I populate the file names using the "Directory.EnumerateFiles", it returns the Excel temporary file also, which the file name always start with a "~". So this file actually is not a valid Excel file, that's the reason when "AsDataset" function is called, it always throw error when the codes tried to load that extra file.

So the solution for me is to always exclude file names that start with a "~".

Wednesday, July 4, 2018

Tip: OpenCV 4.0 Library Files Inclusion

C++ Configuration in Visual Studio 2017

opencv_videostab400d.lib 
opencv_calib3d400d.lib 
opencv_core400d.lib 
opencv_dnn400d.lib 
opencv_features2d400d.lib 
opencv_flann400d.lib 
opencv_highgui400d.lib 
opencv_imgcodecs400d.lib 
opencv_imgproc400d.lib 
opencv_ml400d.lib 
opencv_objdetect400d.lib 
opencv_photo400d.lib 
opencv_shape400d.lib 
opencv_stitching400d.lib 
opencv_superres400d.lib 
opencv_ts400d.lib 
opencv_video400d.lib 
opencv_videoio400d.lib

Tuesday, June 5, 2018

Tip: How to properly setup the EmguCV/OpenCV OCR in Visual Studio.

Running the OpenCV OCR against an image requires setup that reads the related language file from a pre-defined directory in your machine. Usually if you did not specify any path, it will look from the application path if you are building an exe file. Below is a sample code that reads an image and write the text to the console.
using System;
using System.IO;
using Emgu.CV;
using Emgu.CV.OCR;
using Emgu.CV.Structure;
 
namespace ConsoleOCR
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            using (var image = new Image<Bgrbyte>(Path.GetFullPath("testImage.png")))
            {
                using (var tesseractOcrProvider = new Tesseract(@"""eng"OcrEngineMode.Default)) //point to TESSDATA_PREFIX env variable.
                {
                    tesseractOcrProvider.SetImage(image);
                    tesseractOcrProvider.Recognize();
                    var text = tesseractOcrProvider.GetBoxText().TrimEnd();
                    Console.WriteLine(text);
                }
            }
        }
    }
}

I am using EmguCV 3.4.1 for the testing. In Visual Studio, you have to include the reference to the .NET wrapper dll Emgu.CV.World dll.

You need to create a new folder with the name "tessdata" inside your running application folder, for example the "debug" folder. In this folder you have to put in the language file which you can download from Language Files.. In this case i downloaded the "eng.traineddata" because the picture that i wants to read is in English language. After you compile the application, you should be able to see something like this: