How to parse MICR line data?

How to parse MICR line data? - c#

I have a digital check scanner that is able to capture the MICR line from the check. It will return the MICR line in raw format as a string, with delimiters to separate the account number, routing number, and check number. However, each bank formats this MICR line differently, so there's no standard way to parse this data.
Some companies I have tried are Inlite Research Inc and Accusoft Pegasus. The API from Inlite Research works for some banks, but cannot read Bank of America checks correctly. I'm still testing out the API from Accusoft.
What I am asking is if anyone know of an API that will accurately parse the MICR line for the different components. Is there an API that will let me add new definitions of check format if I encounter a new check that the API cannot handle correctly? Or, if anyone know how to or has written a routine to parse the MICR line.
I would appreciate any help I can get. Thank you.

Sorry for the late reply. I didn't see any answers to the question so I thought nobody responded.
To answer the questions above, I found a solution after thinking the problem over and talking with various vendors. The Check scanner that I'm using is already able to read the MICR line. The problem lies in parsing the MICR line for relevant information such as the routing transit number, account number, check/serial number, and amount (if there is one). After speaking with a handful of 3rd party companies and trying out available trial versions of MICR parser, I come to the conclusion that there is no universal parser out there. I'm still faced with the problem of the non-comforming On-Us field. Each bank formats this field differently. Sometimes the symbols are arranged differently as well. So, I decided to write my own parser. I think this is the most logical way to proceed as I've been informed by these 3rd party vendors that they each roll their own parsing software.
The way I wrote the parser was I kept a table of MICR line patterns. Each time I encounter a new MICR line format, I will update this table. My parser will match any check scanned against this table and if it finds a match, it will use that pattern to parse the relevant information.
I hope my experience and the solution I came up with will also help those who ran across the same issue.
Thank you for all those who responded and good luck.

The basic pattern of a MICR:
xxxxxxxxxxx /rrrrrrrrr/ ooooooooooo baaaaaaaaaab
where 'x' is AuxOnUs, 'r' is routing number, 'o' is OnUs, and 'a' is amount, with 'b' and '/' are special MICR symbols.
A minimal MICR line is just:
/rrrrrrrrr/ ooooooooo
AuxOnUs is generally only used by business checks, and it pretty much always means there is a serial number.
Routing number is always consistent, it's the only part of the MICR that is universal.
Amount is generally not encoded in the MICR, but sometimes it is.
OnUs is the tricky part. It normally consists of the check serial number and the account, but each bank handles it differently. Usually the serial number will be 4 digits, but it may be 5 or more. If there's an AuxOnUs field, you can be pretty sure the OnUs is just the account number.
The OnUs can contain spaces and dashes. It would be nice if there were a consistent way they were divided, but I've seen so many variations, I think it's better to just leave it as an "OnUs" field instead of separating it into serial and account, unless you're the paying bank, in which case you should know what format your own checks are.

This should be the correct answer based on my research as well. MICR patterns are too varied to reliably parse without having a collection of regex matching patterns to pull the relevant information. What would be nice is to see the collection of regex patterns you have come up with with group names such as:
<(?<checkNumber>[0-9\s]*)<[0-9\s]*:[0-9\s]*:.*

6 years after this question was originally asked, and I have run across this question numerous times in the past 2 weeks. I finally found an ACTUAL solution, and how to properly parse a MICR line. I've written some code to do so and it works on 99.9% of checks I've scanned this far, so I have to share and make sure people understand how this should be done.
For 11 years I have done this job. We have always used Magtek check scanners. Recently I decided to move to an imaging scanner so we could get scans of all our checks. I went with Panini check scanners. Unfortunately, their API doesn't break apart the MICR line, but our Magtek scanners were programmable to give us whatever we wanted. I created a basic string that could be matched with a pattern every time. It would always come out as: <aaaaaaaaa/bbbbbbbb/ccc> where a is route number, b is account number, and c is check number. Over and over I keep wondering how the scanner, just a simple serial device, can figure it out and get it right EVERY SINGLE TIME for a decade.
I started by using Patrick's own answer, sort of, to build a table of MICR patterns I hadn't seen before. Problem is that I ran to a point where one pattern would get a close match to another check and the data would be off slightly. I then tried doing it based on route number until I ran across two checks from BofA that had identical route numbers and completely different MICR lines. I was so disappointed that my face met my desk in frustration.
After much more research, the proper way is left-to-right parsing of the MICR line. MICR lines are left-to-right, and of course the field giving us the most trouble is the on-us field. All my example snippets are C# code.
Start by looping through the string backwards:
for (int i = micr.Length - 1; i >= 0; i--)
Evaluate each character as you loop. If your first character is the amount character, it's a business check. Read until you get another amount character, then save that value. If the next character is the on-us symbol, assume that the check number is at the far left of the on-us field. If the next character is a digit, keep reading and filling a buffer (REMEMBER YOU ARE WORKING BACKWARDS!) with the digits until you reach the on-us character. If your buffer contains only digits, that's your check number. If it's empty, just move on and collect the entire on-us field in a buffer until you reach the transit character. Once you reach the transit character, keep reading and filling your buffer until you reach the next transit character. Your buffer is now your routing number. If it's a business check, You still have more characters to read. Keep reading until you reach ANOTHER on-us character. You've now reached the auxiliary on-us field, which should be the check number. Read until you reach the next on-us character and that should be the end of your string. You now have your check number.
Now, look at the value you stripped from the regular on-us field. If you have a check number, then that's your account number. If you DO NOT have a check number, then you should split the on-us field by spaces, and assume that your far left set (array element 0) of digits are your check number. HOWEVER, if after splitting by space you only have ONE element in the array, that means the on-us field likely contains dashes separating the items. Split the on-us field by dashes and assume that your far left array element is the check number and the rest are your account number. I've seen some that have as many as 3 dashes in the on-us field, like this: nnnn-1234-56-7, where nnnn is the check number and the rest is the account number.
Once you've got your account number separated from check number, strip any miscellaneous characters (spaces, dashes, etc.) from it and you're done.
This is my solution to all my MICR problems. Hopefully it helps someone else.
Thanks goes, partially, to this document: http://www.transact-tech.com/uploads/printers/files/100-9094-Rev-C-MICR-Programmers-Guide.pdf

Related

How can I check wether a specific Hangeul character is part of the syllable?

I am currently working on a little dictionary app for Korean in C# (which I am trying to learn). I would like to add a feature where a conjugation chart is given with all basic verb forms for a certain verb. To ensure the verbs are conjugated correctly I have to check wether a verb is irregular. To do this I have to check wether a verb stem ends with a certain character or not.
The problem is, however, that a computer sees an entire syllable of a Korean word as a character, not the individual 2 or 3 letters that form that specific syllable, but I need to compare the final letter of a syllable to do it correctly.
For example the Korean verb 춥다 is an irregular verb and we can tell because the verb stem 춥 has ㅂ as the final letter. Yet 춥 is the char, not ㅂ in the case of the verb stem. So this does not work:
verbStem = "춥";
verbStem.EndsWith("ㅂ");
I am currently a bit puzzled on how to make this work and thus I would be quite happy if I could get some directions.

Using the popular Korean Q&A service 지식IN (link to orginal answer) I was able to find the answer to my question. I am so grateful to.
The first step is to seperate the individual letters by normalizing the string. This is done using Normalize method:
string a = "안녕";
string b = a.Normalize( System.Text.NormalizationForm.FormKD);
When using the Normalize method with the Korean string it will be split into its individual component unicode characters.
However, the extremely helpful answer at 지식IN did not stop there with helping me with directions. It pointed out I needed to be aware that even when it has been split there is a different unicode for characters depending whether it is in the initial possition or not and thus I will have to use the appropriate unicode for it. ('ᄋ' is different from 'ᆼ') The unicodes for these are found at Hangul Jamo (Unicode block).
I am so glad someone managed to answer this question for me, but I felt I ought to write out the answer at Stackoverflow as well since you might never know someone else might want to learn how to do something similar.

.NET Regular Expression (perl-like) for detecting text that was pasted twice in a row

I've got a ton of json files that, due to a UI bug with the program that made them, often have text that was accidentally pasted twice in a row (no space separating them).
Example: {FolderLoc = "C:\testC:\test"}
I'm wondering if it's possible for a regular expression to match this. It would be per-line. If I can do this, I can use FNR, which is a batch text processing tool that supports .NET RegEx, to get rid of the accidental duplicates.
I regret not having an example of one of my attempts to show, but this is a very unique problem and I wasn't able to find anything on search engines resembling it to even start to base a solution off of.
Any help would be appreciated.

Can collect text along the string (.+ style) followed by a lookahead check for what's been captured up to that point, so what would be a repetition of it, like
/(.+)(?=\1)/; # but need more restrictions
However, this gets tripped even just on double leTTers, so it needs at least a little more. For example, our pattern can require the text which gets repeated to be at least two words long.
Here is a basic and raw example. Please also see the note on regex at the end.
use warnings;
use strict;
use feature 'say';
my #lines = (
q(It just wasn't able just wasn't able no matter how hard it tried.),
q(This has no repetitions.),
q({FolderLoc = "C:\testC:\test"}),
);
my $re_rep = qr/(\w+\W+\w+.+)(?=\1)/; # at least two words, and then some
for (#lines) {
if (/$re_rep/) {
# Other conditions/filtering on $1 (the capture) ?
say $1
}
}
This matches at least two words: word (\w+) + non-word-chars + word + anything. That'll still get some legitimate data, but it's a start that can now be customized to your data. We can tweak the regex and/or further scrutinize our catch inside that if branch.
The pattern doesn't allow for any intervening text (the repetition must follow immediately), what is changed easily if needed; the question is whether then some legitimate repetitions could get flagged.
The program above prints
just wasn't able
C:\test
Note on regex This quest, to find repeated text, is much too generic
as it stands and it will surely pick on someone's good data. It is enough to note that I had to require at least two words (with one word that that is flagged), which is arbitrary and still insufficient. For one, repeated numbers realistically found in data files (3,3,3,3,3) will be matched as well.
So this needs further specialization, for what we need to know about data.

C# Comparing two strings, one with unique indenifiers

First of all I'd like to mention that I'm new to programming and this sight so I'm still an infant in this world, however, I have a problem.
I have to make code that can compare two strings but the second string (from a file) will have unique identifiers within it. For example:
first string:
I have 10 cats and their fur is #000000
Second string from a file:
I have <d> cats and their fur is <h>
Although I probably don't need to explain, 'd' is for numbers or decimal and 'h' for hex. There are also 's' and 'a' associated to ASCII.
What's supposed to happen is that the first string can have any different number which can be of different length and/or Hex when the data comes in but the rest of the message stays the same, E.G.
I have 1500 cats and their fur is #000000
the code will still match the two strings as True matches as it'll effectively ignore anything that is an int and hex. (this identifiers are User defined so they can be anywhere in any string).
The end game is that if it finds a relative match the code will change the colour of the text in the app among other things. it's basically to highlight errors in a log file.
I've searched High an low on Stackflow and looked into Regex and string comparisons. I'm currently going to make a start on the code, however, would like some input/help.
Obviously I'm not asking for something to be written for me, just to be pointed in the right direction so I can learn.
Many thanks in advance! And apologies if there is a similar post out there, but alas I couldn't find it if there is.

If I understand it correctly I think I would solve this by replacing the <d> etc. by a RegEx expression. Then use that RegEx to replace the values by an empty string. That way you can compare them without the values.
Hope that makes sense. I didn't include any code because you asked for just some directions.

How to find a string in a column that contains a lengthy "word" or set of characters

I am looking for an unusually long word or grouping of characters in a specific column of data that contains notes written by users. For example, if something like this -
I am looking for an unusuallylongwordorgroupingofcharactersina specific column
exists, I need to find it so I can add spaces if necessary. My question is: How do I find a word or set of characters that exceeds a certain number of characters?
The problem is that somewhere in this data, an unusually long word or grouping of characters is being parsed and causing an OutOfMemoryException, so I need to find the source and fix it.

You could use a regex in C# if the raw string fits in memory: \w{15,} gives you words at least 15 characters in length. There are many ways to tweak this (lookahead, lookbehind, more specific character classes, etc.).

You can write a C# stored procedure that can be run against the column in question.
It would split the column into an array of strings containing a word Then you can easily find the largest word in the column.
see http://msdn.microsoft.com/en-us/library/vstudio/zxsa8hkf%28v=vs.100%29.aspx
for details on how to, write install and debug a C# stored procedure in SQL Server

Using the answers given, I created a program that pulls the data and tosses each word into a list. It then pulls words of a given length (in my case, I did greater than 20 characters) and found the bad "word". Now I can fix the data.
I appreciate all your help, guys.

Outputting Programmatically to MSword; sensing end of line

I'm trying to use the MSWord Interop Library to write a C# application that outputs specially formated text (isolated arabic letters) to a file. The problem I'm running into is determining how many characters remain before the text wraps onto a new line. I need the words to be on the same line, without wrapping, which is the default behavior. I'm finding this difficult because when I have the Arabic letters of the word isolated with spaces, they are treated as individual characters and therefore behave differently then connected words.
Any help is appreciated. Thanks.

Add each character to your range and then check the number of lines in the range
LineCount = range.ComputeStatistics(Word.WdStatistic.wdStatisticLines);
When the line count changes, you know it has been wrapped, and can remove the last character or reformat accordingly

Actually I don't know how this behaves today, but I've written something for the MSWork API when I was facing a somewhat weird fact. Actually you can't find that out. In MSWord, text in a document is always in paragraphs.
If you input text to your document, you won't get it in a page only, but this page will at least contain a paragraph for the text you wrote into it.
Unfortunately I can't figure this out again, because I don't have a license for MS Word these day.
Give it a try and look at the problem again in this way.
Hope this helps, and if not, please provide the code that generates the input and the exact version of MSWord.
Greetings,
Kjellski

I'm not sure what "Arabic letters of the word isolated with spaces" means exactly, but I assume that non breaking space is what you need.
Here's more details.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.