Regex to validate several ranges of values (.NET)

Regex to validate several ranges of values (.NET) - c#

I have the option to use a regex to validate some values and I would like to know the best way to do it.
I have several ranges (not in the same regex/validation) such as 1-9 (this is easy :P), and:
1-99
1-999
1-9999
Also, I need to check for leading zeros, so, 0432 is a possible match.
I know that Regex might not be the best approach for this, but it's what I have, and it's part of a framework I have to use.

There is a fundamental flaw with all other answers!!
bool isMatch = Regex.IsMatch("9999", #"\d{1,3}");
Returns true although the number is not a 1-3 digit number. That is because part of the word matches the expression.
You needs to use:
bool isMatch = Regex.IsMatch("9999", #"^\d{1,n}$");
Where n is the maximum number of digits.
UPDATE
In order to make sure it is not zero, change it to below:
bool isMatch = Regex.IsMatch("9999", #"(^\d{2,n}$)|[1-9]");
UPDATE 2
It still would not work if we have 00 or 000 or 0000. My brain hurts, I will have a look again later. We can do it as two expressions ("(^\d{2,n}$)|[1-9]" and NOT (0)|(00)|(000)|(0000)) but that probably is not accepted as a SINGLE regex answer.

The easy way (n is the amount of numbers):
[0-9]{n}

I’m not 100% clear on what you want but perhaps something like this?
\d{1,2}
This is the equivalent to \d\d? and will match any one- or two-digit number, that is any number from 0 to 99, including leading zeros.
If you really want to exclude 0, then things get a lot more complicated.
Basically, you need to check for 1–9 and 01–09 explicitly. The third group of allowed numbers will have two digits that do not start with a zero:
0?[123456789]|[123456789]\d
Now the part before the | will check whether it’s a number below 10, with optional leading zero. The second alternative will check whether it’s a two-digit number not starting with a zero.
The rest of the ranges can be checked by extending this.

Related

Does C# have a method for parsing an int, meanwhile keeping track of the number of characters parsed?

I have a program whose input is like
~1^(2~&3) 0x3FFE 0x2FCE 0xFCC1
and right now I'm constructing the algorithm that parses the equation
~1^(2~&3)
and hopefully I'll be able to do it without any repeated passes through sections of the equation. Does C#, in it standard libraries, have a way of parsing an int an keeping track of the number of characters parsed? So that, for example, if I'm at the point
~1207300&11
^
in an equation then I want to be able to grab 1207300 and know that I parsed 7 characters so that I can move 7 indixes forward to
~1207300&11
^
Or will I have to hand-roll such a function?

Does C#, in it standard libraries, have a way of parsing an int an keeping track of the number of characters parsed?
No, unfortunately not. All the parsing routines expect the input to be a number and nothing else. (Whitespace is also allowed and ignored.)
Find out how many chars are in the number by running a simple loop. Then, Substring the number out and pass it to int.Parse.

Which data structure(or type) would be useful for defining the digit amount of a number?

Hi there fellow programmers,
I know that should be easy but I need to define the digit amount of a number for trying all the combinations in a project. The digit number shouldn't be affected by users actions because the change of the digit amount causes "Index out of range" error. (Yes, I am using arrays for this)
Let's say, I have to use four digit number.
int Nmr=1000;
Nmr--;
Console.Write(Nmr);// The output will be 999 but I need 0999
Using string type and if statements could lead to an alternative solution...
int Nmr=1000;
Nmr--;
string number=Nmr.ToString();
if (Nmr<1000) number="0"+number;
if (Nmr<100) number="00"+number;
if (Nmr<10) number="000"+number;
Console.Write(Nmr); //That gives me 0999
But then, it gives me complexity and unneccessary time loss which I wouldn't want to encounter. I am not even talking about the greater values.
So, what would you suggest?
Edit: Both ToString("0000") and PadLeft methods are useful.
Thank you Mateus Coutinho Martino and Blorgbeard. =)

You can specify a format when calling ToString - e.g.
string number = Nmr.ToString("0000");
See the docs: Int32.ToString(string) and Custom Numeric Format Strings.

Well, do with PadLeft method of String class ...
int Nmr=1000;
Nmr--;
Console.Write(Nmr.ToString().PadLeft(4,'0')); //That gives you ever four digits.
Or if you prefer better explained ...
int Nmr=1000;
Nmr--;
String number = Nmr.ToString();
Console.Write(number.PadLeft(4,'0')); // four digits again;

The "0" custom format specifier serves as a zero-placeholder symbol.
Here's a little idea for you:
double numberS;
numberS = 123;
Console.WriteLine(numberS.ToString("00000"));
Console.WriteLine(String.Format("{0:00000}", value));
// Displays 00123
You can look at https://msdn.microsoft.com/en-us/library/0c899ak8.aspx for more.

Reference values: string or integer?

We have reference values created from a Sequence in a database, which means that they are all integers. (It's not inconceivable - although massively unlikely - that they could change in the future to include letters, e.g. R12345.)
In our [C#] code, should these be typed as strings or integers?
Does the fact that it wouldn't make sense to perform any arithmetic on these values (e.g. adding them together) mean that they should be treated as string literals? If not, and they should be typed as integers (/longs), then what is the underlying principle/reason behind this?
I've searched for an answer to this, but not managed to find anything, either on Google or StackOverflow, so your input is very much appreciated.

There are a couple of other differences:
Leading Zeroes:
Do you need to allow for these. If you have an ID string then it would be required
Sorting:
Sort order will vary between the types:
Integer:
1
2
3
10
100
String
1
10
100
2
3
So will you have a requirement to put the sequence in order (either way around)?
The same arguments apply to your typing as applied in the DB itself too, as the requirements there are likely to be the same. Ideally as Chris says, they should be consistent.

Here are a few things to consider:
Are leading zeros important, i.e. is 010 different to 10. If so, use string.
Is the sort order important? i.e. should 200 be sorted before or after 30?
Is the speed of sorting and/or equality checking important? If so, use int.
Are you at all limited in memory or disk space? If so, ints are 4 bytes, strings at minimum 1 byte per character.
Will int provide enough unique values? A string can support potentially unlimited unique values.
Is there any sort of link in the system that isn't guaranteed reliable (networking, user input, etc)? If it's a text medium, int values are safer (all non-digit characters are erraneous), if it's binary, strings make for easier visual inspection (R13_55 is clearly an error if your ids are just alphanumeric, but is 12372?)

From the sounds of your description, these are values that currently happen to be represented by a series of digits; they are not actually numbers in themselves. This, incidentally, is just like my phone number: it is not a single number, it is a set of digits.
And, like my phone number, I would suggest storing it as a string. Leading zeros don't appear to be an issue here but considering you are treating them as strings, you may as well store them as such and give yourself the future flexibility.

They should be typed as integers and the reason is simply this: retain the same type definition wherever possible to avoid overhead or unexpected side-effects of type conversion.

There are good reasons to not use use types like int, string, long all over your code. Among other problems, this allows for stupid errors like
using a key for one table in a query pertaining another table
doing arithmetic on a key and winding up with a nonsense result
confusing an index or other integral quantity with a key
and communicates very little information: Given int id, what table does this refer to, what kind of entity does it signify? You need to encode this in parameter/variable/field/method names and the compiler won't help you with that.
Since it's likely those values will always be integers, using an integral type should be more efficient and put less load on the GC. But to prevent the aforementioned errors, you could use an (immutable, of course) struct containing a single field. It doesn't need to support anything but a constructor and a getter for the id, that's enough to solve the above problems except in the few pieces of code that need the actual value of the key (to build a query, for example).
That said, using a proper ORM also solves these problems, with less work on your side. They have their own share of downsides, but they're really not that bad.

If you don't need to perform some mathematical calculations on the sequences, you can easily choose strings.
But think about sorting: Produced orders between integers and strings will differ, e.g. 1, 2, 10 for integers and 1, 10, 2 for strings.

Create a unique id with built-in checksum?

I want to auto-generate a unique 8-10 character ID string that includes a checksum bit of some kind to guard against a typo at data entry. I would prefer something that does not have sequential numbers where the data entry person would end up in a "rut" and get used to typing the same sequence all the time.
Are there any best practices/ pitfalls associated with this sort of thing?
UPDATE: OK, I guess I need to provide more detail.
I want to use alphanumerics, not just digits
I want behavior similar to a credit card checksum, except with 8-10 characters instead of 16 digits
I want to have the id be unique; there should not be a possibility of collision.
SECOND UPDATE OK, I don't understand what is confusing about this, but I will try to explain further. I am trying to create tracking numbers that will go on forms, which will be filled out and data-entered at a later time. I will generate the id and slap it on the form; the id needs to be unique, it needs to support a LOT of numbers, and it needs to be reasonably idiot-proof for data-entry.
I don't know if this has been done, or even if it can be done, but it does not hurt to ask.

Your question is VERY general - thus just some general aspects:
Does the ID need to be "unguessable" ?
IF yes then some sort of hash should be in the mix.
Does the ID need to be "secure" (like for example an activation key or something) ?
IF yes then some sort of public key cryptography should be in the mix.
Does the ID / checksum calculation need to be fast ?
IF yes then perhaps some very simple algorithm like CRC32 or Luhn (credit card checksum algorithm) or soem barcode checksum algorithm could be worth looking at.
Is the ID generation centralized ?
IF not then you might need to check out GUIDs, current time, MAC address and similar stuff.
UPDATE - as per comments:
use a sequence in the DB
take that value and hash it, for example with MD5
take the least significant 40-48 bits of that hash
encode it as Base-36 (0-9 and A-Z) which gives you 8-10 "digits" (alphanumeric)
check the result against the DB and discard if the ID already there (for the very rare possibility of a collision)
calculate CRC-6-ITU (see http://www.itu.int/rec/T-REC-G.704-199810-I/en on page 3)
attach the CRC result as the last "digit" (as base-36 too)
and thus you have a unique ID including checksum
to check the entered value you can just recalculate CRC-6-ITU from all digits but the last one and compare the result with the last digit.
The above is rather "unguessable" but definitely not of "high security".
UPDATE 2 - as per comment:
For some inspiration on how to calculate CRC in javascript see this - it contains javascript code for CRC-8 etc.
You should be able to adapt this code based on the CRC-6-ITU polynomial.

You might imitate airline reservation systems: they convert a number into base-36, using A-Z and 0-9 as the characters. Their upper limit is thus 36^6.
If you need to guarantee uniqueness, and you don't want them to be sequential, you have to keep the used-up random numbers in a table somewhere.
After you have your random or pseudorandom ID, you only need to calculate your checkdigit.
Use a CRC algorithm. They can be adapted to any desired length (in your case, 6 bits).
Edit
In case it's not clear: even if you use alpha codes, you'll have to turn it into a number before generating the checkdigit.
Edit
Checksum validation is not heavyweight, it can be implemented client-side in javascript.
A six character alphanumeric (i.e. airline record locator) = 10 octillion numbers. Surely that's enough? (See Wolfram Alpha for exact result.)

Most credit cards use the Luhn algorithm (also known as mod10 algorithm) as checksum algorithm to validate card numbers. From Wikipedia:
The Luhn algorithm will detect any single-digit error, as well as
almost all transpositions of adjacent digits. It will not, however,
detect transposition of the two-digit sequence 09 to 90 (or vice
versa).
The algorithm is generic and can be applied to any identification number.

As #BrokenGlass noted, you can use the Luhn check digit algorithm. Credit cards and the like use the Luhn algorithm modulo 10. Luhn mod 10 is computes a check digit for a sentence drawn from the alphabet consisting solely of decimal digits (0-9). However, it is easily adapted to compute a check digit for sentences drawn from an alphabet of any size (binary, octal, hex, alphanumeric, etc.)
To do that, all you need are two methods and one property:
The number of codepoints in the alphabet in use.
This is essentially the base of the numbering system. For instance, the hexadecimal (base 16) alphabet consists of 16 characters (ignoring the issue of case-sensitivity): '0123456789ABCDEF'. '0'–'9' have their usual meaning; 'A'–'F' are the base-16 digits representing 10–15.
A means of converting a character from the alphabet in use into its corresponding codepoint.
For instance in hexadecimal, the characters '0'–'9' represent code points 0–9; the characters 'A'–'F' represent codepoints 10-15.
A means of converting a codepoint into the corresponding character.
The converse of the above. For instance, in hexadecimal, the codepoint 12 would convert to the character 'C'.
You should probably through an ArgumentException, if the code point given doesn't exist in the alphabet.
The Wikipedia article, "Luhn mod N algorithm" does a pretty good job of explaining the computation of the check digit and its validation.

String.Format() split integer value

I'm wondering if it's possible for .Net's String.Format() to split an integer apart into two sub strings. For example I have a number 3234 and I want to format it as 32X34. My integer will always have 4 or 6 digits. Is this possible using String.Format()? If so what format string would work?
P.S.
I know there is other ways to do this, but i'm specifically interested to know if String.Format() can handle this.

You can specify your own format when calling String.Format
String.Format("{0:00x00}", 2398) // = "23x93"

James, I'm not sure you've completely specified the problem. If your goal is to put the 'x' in the center of the string, Samuel's answer won't work for 6 digit numbers. String.Format("{0:00x00}", 239851) returns "2398x51" instead of "239x851"
Instead, try:
String.Format(val<10000 ? "{0:00x00}" : "{0:000x000}", val)
In either case, the method is called Composite Formatting.
(I'm assuming the numbers will be between 1000 and 999999 inclusive. Even then, numbers between 1000 and 1009 inclusive will report the number after the 'x' with an unnecessary leading '0'. So maybe this approach is valid for values between 1010 and 999999 inclusive.)

No, it can't.
In fact, it seems that your integers aren't integers. Perhaps they should be stored in a class, with its own ToString() method that will format them this way.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to validate several ranges of values (.NET) - c#

The easy way (n is the amount of numbers): [0-9]{n}

Related

Does C# have a method for parsing an int, meanwhile keeping track of the number of characters parsed?

Which data structure(or type) would be useful for defining the digit amount of a number?

Reference values: string or integer?

Create a unique id with built-in checksum?

String.Format() split integer value

Categories

Resources