Digit recognition with Tesseract OCR and c#

Digit recognition with Tesseract OCR and c# - c#

I use Tesseract and C# to read digits. Everything works well except for the number "8". Tesseract can not read the "8" Digit.
This is the picture I send to tesseract :
And tesseract reads "50005550055".
this is my method :
public string Process(Bitmap bitmap, MetaTraderObjects metaObjects, bool isNumber = false)
{
try
{
var graphicLib = new GraphicLib();
bitmap = graphicLib.PerformReadingTextEffects(bitmap.ToBytes(), metaObjects).ToBitmap();
var result = "";
var enginePath = Const.BaseAppPath + "\\tessdata";
using (var engine = new TesseractEngine(enginePath, "eng", EngineMode.Default))
{
var ver = engine.Version;
using (var img = Pix.LoadTiffFromMemory(graphicLib.ConvertBitMapToByteArray(bitmap.ToBytes())))
{
using (var page = engine.Process(img,(PageSegMode)8))
{
var text = page.GetText();
result = TextReformer.Reform(text, isNumber);
MemoryStream ms = new MemoryStream(bitmap.ToBytes());
Image i = Image.FromStream(ms);
}
}
}
return result;
}
catch (Exception ex)
{
ExceptionLog.Handel(ex);
return null;
}
}
How can I tell Tesseract that the vertical rod is a "8"?

I recommend you use the latest version of Tesseract. It could perform better.
Tesseract 4.1.0

Related

Tesseract not reading single value and some values

private void GetOCRValue(Bitmap image)
{
string ocrValue = "";
try
{
using (var engine = new TesseractEngine(Application.StartupPath + "\\tessdata", "eng", EngineMode.Default))
{
using (var imager = new System.Drawing.Bitmap(image))
{
using (var pix = PixConverter.ToPix(imager))
{
using (var page = engine.Process(pix))
{
ocrValue = page.GetText();
}
}
}
}
}
catch (Exception ex)
{
throw ex;
}
}
I'm trying to retrieve values from a bitmap.But Tesseract not returning the single number values and some other numbers like "13".I'm using Tesseract 3.3.0 Nuget package.How can I resolve this problem?

how to get image from pdf using pdfbox in c# .net

how to get image from pdf using pdfbox in c# .net.
All the answer about this question are posted in java language.
No one post correct answer in c# language in what I've seen.
I'm tried the java code in c# but some methods are not working in c#.
I want to extract image from pdf file using pdfbox in c# .net

Finally I got the answer.
Extend the class in your class PDFStreamEngine
Example:
public class ImageExtraction : PDFStreamEngine
{
int i=1;
public void GetImageFromPDF(string fileName)
{
PDDocument pDDocument = PDDocument.load(new java.io.File(fileName));
PDPage page = new PDPage();
page = pDDocument.getPages().get(0);
ImageExtraction obj = new ImageExtraction();
processPage(page);
}
protected override void processOperator(Operator #operator, java.util.List operands)
{
string operation = #operator.getName();
if (operation == "Do")
{
PDDocument pDDocument = new PDDocument();
org.apache.pdfbox.cos.COSName objectName = (org.apache.pdfbox.cos.COSName)operands.get(0);
org.apache.pdfbox.pdmodel.graphics.PDXObject xobject = getResources().getXObject(objectName);
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject pDImageXObject = new org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject(pDDocument);
org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject pDFormXObject = new org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject(pDDocument);
if (xobject.GetType().IsAssignableFrom(pDImageXObject.GetType()))
{
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject image = (org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject)xobject;
int imageWidth = image.getWidth();
int imageHeight = image.getHeight();
// same image to local
java.awt.image.BufferedImage bImage = new java.awt.image.BufferedImage(imageWidth,
imageHeight, java.awt.image.BufferedImage.TYPE_INT_ARGB);
bImage = image.getImage();
javax.imageio.ImageIO.write(bImage, "PNG", new java.io.File(imageFolderPath + "image_" + i + ".png"));
i++;
Console.WriteLine("Image saved.");
}
else if (xobject.GetType().IsAssignableFrom(pDFormXObject.GetType()))
{
org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject form = (org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject)xobject;
showForm(form);
}
}
}
}

Looks like someone started a dotnet port of pdfbox.
https://github.com/UglyToad/PdfPig

Get all metadata from an existing PDF using iText7

How can I retrieve all metadata stored in a PDF with iText7?
using (var pdfReader = new iText.Kernel.Pdf.PdfReader("path-to-a-pdf-file"))
{
var pdfDocument = new iText.Kernel.Pdf.PdfDocument(pdfReader);
var pdfDocumentInfo = pdfDocument.GetDocumentInfo();
// Getting basic metadata
var author = pdfDocumentInfo.GetAuthor();
var title = pdfDocumentInfo.GetTitle();
// Getting everything else
var someMetadata = pdfDocumentInfo.GetMoreInfo("need-a-key-here");
// How to get all metadata ?
}
I was using this with iTextSharp but I can't figure how to do it with the new iText7.
using (var pdfReader = new iTextSharp.text.pdf.PdfReader("path-to-a-pdf-file"))
{
// Getting basic metadata
var author = pdfReader.Info.ContainsKey("Author") ? pdfReader.Info["Author"] : null;
var title = pdfReader.Info.ContainsKey("Title") ? pdfReader.Info["Title"] : null;
// Getting everything else
var metadata = pdfReader.Info;
metadata.Remove("Author");
metadata.Remove("Title");
// Print metadata
Console.WriteLine($"Author: {author}");
Console.WriteLine($"Title: {title}");
foreach (var line in metadata)
{
Console.WriteLine($"{line.Key}: {line.Value}");
}
}
I am using version 7.1.1 of iText7.

In iText 7 the PdfDocumentInfo class unfortunately does not expose a method to retrieve the keys in the underlying dictionary.
But you can simply retrieve the Info dictionary contents by immediately accessing that dictionary from the trailer dictionary. E.g. for a PdfDocument pdfDocument:
PdfDictionary infoDictionary = pdfDocument.GetTrailer().GetAsDictionary(PdfName.Info);
foreach (PdfName key in infoDictionary.KeySet())
Console.WriteLine($"{key}: {infoDictionary.GetAsString(key)}");

There is problem with "UnicodeBig", "UTF-8" or "PDF" encoded strings.
For example, if PDF is created with Microsoft Word, then "/Creator" is unreadable encoded and needs to be converted:
.
iText7 has own function for that convert:
...ToUnicodeString().
But it is a Method of the PdfString object and PdfDictionary value (PdfObject) hast to be first casted to this PdfString type.
Complete solution as async, "unbreakable" and auto-disposed function:
public static async Task<(Dictionary<string, string> MetaInfo, string Error)> GetMetaInfoAsync(string path)
{
try
{
var metaInfo = await Task.Run(() =>
{
var metaInfoDict = new Dictionary<string, string>();
using (var pdfReader = new PdfReader(path))
using (var pdfDocument = new PdfDocument(pdfReader))
{
metaInfoDict["PDF.PageCount"] = $"{pdfDocument.GetNumberOfPages():D}";
metaInfoDict["PDF.Version"] = $"{pdfDocument.GetPdfVersion()}";
var pdfTrailer = pdfDocument.GetTrailer();
var pdfDictInfo = pdfTrailer.GetAsDictionary(PdfName.Info);
foreach (var pdfEntryPair in pdfDictInfo.EntrySet())
{
var key = "PDF." + pdfEntryPair.Key.ToString().Substring(1);
string value;
switch (pdfEntryPair.Value)
{
case PdfString pdfString:
value = pdfString.ToUnicodeString();
break;
default:
value = pdfEntryPair.Value.ToString();
break;
}
metaInfoDict[key] = value;
}
return metaInfoDict;
}
});
return (metaInfo, null);
}
catch (Exception ex)
{
if (Debugger.IsAttached) Debugger.Break();
return (null, ex.Message);
}
}

Render a barcode in ASP.NET Web Form

i am trying to show the barcode in asp.net page. already download the zen barcode render with sample code. i tried the sample it is working fine with me. once i try in my code barcode label is showing empty. i checked with sample code and mine i did not find any difference , only data transfer is the different. this is what i tried.
<barcode:BarcodeLabel ID="BarcodeLabel1" runat="server" BarcodeEncoding="Code39NC" LabelVerticalAlign="Bottom" Text="12345"></barcode:BarcodeLabel>
if (!IsPostBack)
{
List<string> symbologyDataSource = new List<string>(
Enum.GetNames(typeof(BarcodeSymbology)));
symbologyDataSource.Remove("Unknown");
barcodeSymbology.DataSource = symbologyDataSource;
barcodeSymbology.DataBind();
}
this is the function
BarcodeSymbology symbology = BarcodeSymbology.Unknown;
if (barcodeSymbology.SelectedIndex != 0)
{
symbology = (BarcodeSymbology)1;
}
symbology = (BarcodeSymbology)1;
string text = hidID.Value.ToString();
string scaleText = "1";
int scale;
if (!int.TryParse(scaleText, out scale))
{
if (symbology == BarcodeSymbology.CodeQr)
{
scale = 3;
}
else
{
scale = 1;
}
}
else if (scale < 1)
{
scale = 1;
}
if (!string.IsNullOrEmpty(text) && symbology != BarcodeSymbology.Unknown)
{
barcodeRender.BarcodeEncoding = symbology;
barcodeRender.Scale = 1;
barcodeRender.Text = text;
}
symbology is set as Code39NC from the dropdown. scale is 1 and text is coming from other form the value is passing as well. still the bacodelable is showing only value not the barcode picture.

Here are two code samples using ZXing to create a (QR) barcode as both an image and as a base64 encoded string. Both of these options can be used with an <img /> tag to embed the barcode in the page.
This is not an ASP.NET control. It is a library that creates barcodes from text.
// First Text to QR Code as an image
public byte[] ToQRAsGif(string content)
{
var barcodeWriter = new BarcodeWriter
{
Format = BarcodeFormat.QR_CODE,
Options = new EncodingOptions
{
Height = this._h,
Width = this._w,
Margin = 2
}
};
using (var bitmap = barcodeWriter.Write(content))
using (var stream = new MemoryStream())
{
bitmap.Save(stream, ImageFormat.Gif);
stream.Position = 0;
return stream.GetBuffer();
}
}
// From Text to QR Code as base64 string
public string ToQRAsBase64String(string content)
{
var barcodeWriter = new BarcodeWriter
{
Format = BarcodeFormat.QR_CODE,
Options = new EncodingOptions
{
Height = _h,
Width = _w,
Margin = 2
}
};
using (var bitmap = barcodeWriter.Write(content))
using (var stream = new MemoryStream())
{
bitmap.Save(stream, ImageFormat.Gif);
return String.Format("data:image/gif;base64,{0}", Convert.ToBase64String(stream.ToArray()));
}
}
Hope this helps! Happy coding.
UPDATE: Here is the link to their product page on codeplex: https://zxingnet.codeplex.com/

QRCode Extraction With ZXing

Hi I'm trying to read QRCode from scanned images, but I'm getting a low index of extraction (19 extracted from 500 images) the code of extraction:
class QrExtractor
{
public String extractFrom(Bitmap image)
{
using (image)
{
LuminanceSource source;
source = new BitmapLuminanceSource(image);
BinaryBitmap bitmap = new BinaryBitmap(new HybridBinarizer(source));
Result result = new QRCodeReader().decode(bitmap);
if (result != null)
{
return result.Text;
}
return "Couldn't Extract";
}
}
}
Is there any improvements that I can apply to this?
Thanks

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Digit recognition with Tesseract OCR and c# - c#

I recommend you use the latest version of Tesseract. It could perform better. Tesseract 4.1.0

Related

Tesseract not reading single value and some values

how to get image from pdf using pdfbox in c# .net

Get all metadata from an existing PDF using iText7

Render a barcode in ASP.NET Web Form

QRCode Extraction With ZXing

Categories

Resources