A regular expression to validate .NET time format - c#

Background
I need to validate user input in some fields, where these are defining how to show time in some views.
Requirements
Time format must be expressed in Microsoft .NET way (check this MSDN Library article if you want to learn more about framework's date and time formatting: http://msdn.microsoft.com/en-us/library/8kb3ddd4.aspx)
Keep in mind I'm looking to validate the format instead of an actual time string.
For example, user may input:
HH:mm
hh:mm
ss
hh:ss
mm:ss
... and so on.
In fact, it should validate from the shortest to longest time format available.
Another point is I need to do it in client-side using JavaScript. In other words, any given regular expression by you should work in browsers JavaScript regular expressions' engine.
I'll appreciate any self-taylored one, any link or pasted expression!
Thank you in advance.
NOTE (Update)
I can't use ASP.NET validation engine, or any other. Because of project's requirements, I need to avoid that.

As far as I understand, there is no much options - sort of 20, as maximum. Why not just enumerate them all in one big regex without much special symbols? Like
'hh:mm|hh:mm:ss|yyyy-MM-dd hh:mm|<etc>'
you could than make it case sensitive to differentiate between M for month and m for minute, and for hours make it [hH], then make it [:-/] there where you allow for different separators, and lots of other similar things. But the main idea is to simply enumerate all options separated by | with just little amount of regex syntax between | and |.

What is your definition of a "valid" format string? Only once you know that can it be possible to validate a format string.
"K" is also a valid format string
"zz" is also a valid format string
"e" is also a valid format (it would fall into the "The character is copied to the result string unchanged." case)
I'm not even sure what formats would actually cause .NET .ToString() to throw an exception (if that's what you are trying to avoid).

Related

Compare a string which has a param

I am reading in a header from a file which has time fields for example Time (UTC +1). I then need to compare this with a list of stored headers to work out if the file is valid however my stored headers are used for writing and so allow flexibility on the timezones by being written like so Time (UTC {0}).
I would like to know what the best way of dealing with this in as much of a flexible statement as possible. The only way I can imagine doing it is by getting the position of the { and only comparing up to that. This is fine in this circumstance but what if I have some words after the parameter which are more important than a closing bracket.
EDIT: I would like to give some context to the problem so that I can explain better how flexible I need it. I think I possibly didn't emphasise the fact that I didn't want it to JUST work with the time field.
I am trying to write a system which is very flexible. I store a list of valid headings and then use them to find out what value to read/write to the csv file. It is very flexible and easily maintainable. I want to be able to keep it neat and flexible. I want to be able to write a function which takes in a string which has one of more parameters in it and then compare it with a value which has had the parameters filled in (Like the example with the Time header). In the future I may have a field for temperature in a particular place so my stored heading would be Temperature in {0}({1}) which when I am reading back it would be Temperature in Britain(c) or Temperature in America(f).
You could use a regex like this one :
string pattern = #"Time \(UTC \{(\+)*\d\}\)";
Regex rgx = new Regex(pattern);
Regex has a Match method you can use to check whether any string matches the pattern you provided.

Is there a regex to test if a string is for a locale? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I don't know anything about regular expressions but I think I have to use it for my probleme I got some filenames that look like :
MyResource
MyResource.en-GB
MyResource.en-US
MyResource.fr-FR
MyResource.de-DE
The idea is to test if my strings end with "[letter][letter]-[letter][letter]"
I know this is a very noob, but I just have no idea about how to do it, even if I know exactly what I wanna do... :(
To cater for basic variants:
^[A-Za-z]{2,4}([_-][A-Za-z]{4})?([_-]([A-Za-z]{2}|[0-9]{3}))?$
which consists of:
Language code: ISO 639 2 or 3, or 4 for future use, alpha.
Optional script code: ISO 15924 4 alpha.
Optional country code: ISO 3166-1 2 alpha or 3 digit.
Separated by underscores or dashes.
Valid examples are:
de
en-US
zh-Hant-TW
En-au
aZ_cYrl-aZ.
For the OP's specific question, this would need to be prefixed by /^MyResource[.] and suffixed by $/ to ensure the whole file name is for a valid resource file that ends in a locale.
Note that some programming language's functions may only accept particular forms, like only underscores and uppercase country code. PHP's intl functions accept either case and separators. PayPal accepts only the language, or the la_CY form, where la is the language and CY is the country/region. The PHP locale_canonicalize function can be used to standardise to this format.
IETF RFC 5646, which governs internet usage of these tags, recommends a capitalisation and separation format like az-Cyrl-AZ, as used in the first three examples above, though it says processors should accept any mix of case and either separator, as per the last two examples. When displaying locales, using - as the separator allows finer-grained line-wrapping which might otherwise produce significantly empty lines as when the non=wrapping _ is used, especially in table cells.
The regex for the recommended basic format is:
^[a-z]{2,4}(-[A-Z][a-z]{3})?(-([A-Z]{2}|[0-9]{3}))?$
The regexp only covers the basic format. There are variants for extras, like local region. RFC 5646 allows for such variants, along with private extensions and backwards-compatibility forms. It all depends upon the granularity required. The CLDR Unicode database, which is used by PHP's intl functions and other programs, may include such variants from version to version, though they can also disappear at a later time.
If using a CLDR-based function set, like PHP's intl extension, you can check if a locale exists in the intl database using a function like:
<?php
function is_locale($locale=''){
// STANDARDISE INPUT
$locale=locale_canonicalize($locale);
// LOAD ARRAY WITH LOCALES
$locales=resourcebundle_locales('');
// RETURN WHETHER FOUND
return (array_search($locale,$locales)!==F);
}
?>
It takes about half a millisecond to load and search the data, so it won't be too much of a performance hit.
Of course, it will only find those in the database of the CLDR version supplied with the PHP version used, but will be updated with each subsequent PHP release.
Note that some locales are not for countries, but regions, and these are typically numeric, like 001 for 'World', 150 for 'Europe' and 419 for 'Latin America'. So there are now en-001, en-150, ar-001, and es-419, which can be used for generic language purposes. For example, en-001 was designed to decouple dependence upon en-us as an ersatz English, especially since its date formats and spellings are radically different from the 100 other regional en variants. The en-150 locale is the same as en-001 except for numbering separators and other Europe-specific formats.
In general, a regexp is a good front-end sanity check to filter out illegal characters, and especially to reserve the format for possible future additions. It also helps to prevent malicious character combinations being sent to the lookup facility, especially if text-based lookup command mechanisms, like SQL or Xpath, are used.
That would be testing your input against:
\.[a-z]{2}-[A-Z]{2}$
This is really very literal: "match a dot (\., the dot being a special character in regexes), followed by exactly two of any characters from a to z ([a-z]{2} -- [...] is a character class), followed by a dash (-), followed by two of any characters from A to Z ([A-Z]{2}), followed by the end of input ($).
http://www.dotnetperls.com/regex-match <-- how to apply this regex in C# against an input. It means the code would look like (UNTESTED):
// Post edit: this will really return a boolean
if (Regex.Match(input, #"\.[a-z]{2}-[A-Z]{2}$").Success) {
// there is a match
}
http://regex.info <-- buy that and read it, it is the BEST resource for regular expressions in the universe
http://regular-expressions.info <-- the second best resource
Rather than use Regex, I suggest you use the built-in support for cultures in .Net, i.e., the System.Globalization.CultureInfo class; the constructor recognizes valid culture strings, and gives you an object that can be used for culture specific operations:
try
{
string fileName = "MyResource.en-GB";
string cultureName = System.IO.Path.GetExtension(fileName).TrimStart('.');
CultureInfo cultureInfo = new CultureInfo(cultureName);
}
catch (ArgumentException)
{
// Invalid culture.
}
You could try something like this:
[a-z]{2}-[a-z]{2}
You almost answered it in the question. Try:
// This basically grabs the locale.
string x = MyResource.whatever.... //Whatever it might be.
string locale = x.SubString(x.Length - 5) // Assuming the locale is 5 characters long.
// Now you have a 'locale' that is ready for comparisons.
if (locale == "en-GB") { .... }
if (locale == "fr-FR") { .... }
etc....
On a similar note, here is a useful list of two letter country codes.
http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
I know this isn't really regex, but you didn't seem sure about needing to use it absolutely.
cultures = CultureInfo.GetCultures(System.Globalization.CultureTypes.AllCultures);
cultures.Where(o => filename.EndsWith(o.Name));
This might not be an answer to this question, but one may pass by and be looking for this answer.
To match locales like en_GB you can use this expression:
/^[a-z]{2}_[A-Z]{2}$/
I'll try to explain it here:
^[a-z] means start with lower case letters and {2} means you expect exactly 2 of those
follow with _
[A-Z]{2}$ means end with upper case letters and match exactly 2 of those, $ means that these letters have to be in the end of the string.
An extension to the great answer by Patanjali, but also including named groups and support for private-use as defined in RFC 4647. For example: de-DE-x-goethe or zh-Hant-CN-x-private1-private2.
^(?<language>[A-Za-z]{2,4})([_-](?<script>[A-Za-z]{4}|[0-9]{3}))?([_-](?<country>[A-Za-z]{2}|[0-9]{3}))?([_-]x[_-](?<private>[A-Za-z0-9-_]+))?$
^[a-z]{2}([_])?([A-Za-z]{2})?$
I used this regex and it works for locale only having optional '_'
For example:
en,
de,
en_us,
en_US
So Regex works if the locale has only fixed two chars (only lowercase)
or it has two chars (only lowercase) + _ + two chars (can be uppercase)

string.ToLower() and string.ToLowerInvariant()

What's the difference and when to use what? What's the risk if I always use ToLower() and what's the risk if I always use ToLowerInvariant()?
Depending on the current culture, ToLower might produce a culture specific lowercase letter, that you aren't expecting. Such as producing ınfo without the dot on the i instead of info and thus mucking up string comparisons. For that reason, ToLowerInvariant should be used on any non-language-specific data. When you might have user input that might be in their native language/character-set, would generally be the only time you use ToLower.
See this question for an example of this issue:
C#- ToLower() is sometimes removing dot from the letter "I"
TL;DR:
When working with "content" (e.g. articles, posts, comments, names, places, etc.) use ToLower(). When working with "literals" (e.g. command line arguments, custom grammars, strings that should be enums, etc.) use ToLowerInvariant().
Examples:
=Using ToLowerInvariant incorrectly=
In Turkish, DIŞ means "outside" and diş means "tooth". The proper lower casing of DIŞ is dış. So, if you use ToLowerInvariant incorrectly you may have typos in Turkey.
=Using ToLower incorrectly=
Now pretend you are writing an SQL parser. Somewhere you will have code that looks like:
if(operator.ToLower() == "like")
{
// Handle an SQL LIKE operator
}
The SQL grammar does not change when you change cultures. A Frenchman does not write SÉLECTIONNEZ x DE books instead of SELECT X FROM books. However, in order for the above code to work, a Turkish person would need to write SELECT x FROM books WHERE Author LİKE '%Adams%' (note the dot above the capital i, almost impossible to see). This would be quite frustrating for your Turkish user.
I think this can be useful:
http://msdn.microsoft.com/en-us/library/system.string.tolowerinvariant.aspx
update
If your application depends on the case of a string changing in a predictable way that is unaffected by the current culture, use the ToLowerInvariant method. The ToLowerInvariant method is equivalent to ToLower(CultureInfo.InvariantCulture). The method is recommended when a collection of strings must appear in a predictable order in a user interface control.
also
...ToLower is very similar in most places to ToLowerInvariant. The documents indicate that these methods will only change behavior with Turkish cultures. Also, on Windows systems, the file system is case-insensitive, which further limits its use...
http://www.dotnetperls.com/tolowerinvariant-toupperinvariant
hth
String.ToLower() uses the default culture while String.ToLowerInvariant() uses the invariant culture. So you are essentially asking the differences between invariant culture and ordinal string comparision.

DotNet DateTime.ToString strange results

Why does:
DateTime.Now.ToString("M")
not return the month number? Instead it returns the full month name with the day on it.
Apparently, this is because "M" is also a standard code for the MonthDayPattern. I don't want this...I want to get the month number using "M". Is there a way to turn this off?
According to MSDN, you can use either "%M", "M " or " M" (note: the last two will also include the space in the result) to force M being parsed as the number of month format.
What's happening here is a conflict between standard DateTime format strings and custom format specifiers. The value "M" is ambiguous in that it is both a standard and custom format specifier. The DateTime implementation will choose a standard formatter over a customer formatter in the case of a conflict, hence it is winning here.
The easiest way to remove the ambiguity is to prefix the M with the % char. This char is way of saying the following should be interpreted as a custom formatter
DateTime.Now.ToString("%M");
Why not use
DateTime.Now.Month?
You can also use System.DateTime.Now.Month.ToString(); to accomplish the same thing
You can put an empty string literal in the format to make it a composite format:
DateTime.Now.ToString("''M")
It's worth mentioning that the % prefix is required for any single-character format string when using the DateTime.ToString(string) method, even if that string does not represent one of the built-in format string patterns; I came across this issue when attempting to retrieve the current hour. For example, the code snippet:
DateTime.Now.ToString("h")
will throw a FormatException. Changing the above to:
DateTime.Now.ToString("%h")
gives the current date's hour.
I can only assume the method is looking at the format string's length and deciding whether it represents a built-in or custom format string.

Region Agnostic Char.IsSeparator(ch)?

I have a function that parses a string containing a date(and/or time) e.g. "2009-12-10". I get the order of year-month-day from the Short Date pattern. When going through the string I use Char.IsSeparator(ch) to figure out when the numbers end.
Now however in the case of Korean it seems the Char.IsSeparator(ch) returns false on separator characters. Is there any way to know whether the chars in between the numbers are separator regardless of region setting?
(I also parse strings that are more free containing things like "*20 May 200*9" so doing Char.IsAlphaNum() on the separator will not work either as I don't know the content basically)
Example inputs: "20.10.2009" "2009-05-20" "20 May 2009" "20.05.2009 10:00 AM" "1/1/2009" (in Singapore its D/M/Y in US it is M/D/Y") "Tisdag, 1 Januari 1962" (all strings localized)
Output would be an equivalent of a DateTime instance filled as much as possible (although we use our own types).
Korean seems to have a couple of characters in front of the time and as separator it looks like the symbols are different depending on position in the string.
If you pick up the format using the current short format, you could perhaps also be able to pick up the separator through DateTimeFormatInfo.CurrentInfo.DateSeparator.
Is there any reason why you need to parse the string manually?
If you used the built-in date/time parsing methods - Parse, ParseExact, TryParse or TryParseExact - then you could pass in the required culture-specific format info and let the framework worry about separators etc.

Categories

Resources