Formatting dashes in string interpolation - c#

I have just been checking out the new string interpolation feature in C# 6.0 (refer to the Language Features page at Roslyn for further detail). With the current syntax (which is expected to change), you can do something like this (example taken from a blog post I'm writing just now):
var dob2 = "Customer \{customer.IdNo} was born on \{customer.DateOfBirth:yyyyMdd}";
However, I can't seem to include dashes in the formatting part, such as:
var dob2 = "Customer \{customer.IdNo} was born on \{customer.DateOfBirth:yyyy-M-dd}";
If I do that, I get the error:
Error CS1056 Unexpected character '-' StringInterpolation Program.cs 21
Is there any way I can get dashes to work in the formatting part? I know I can just use string.Format(), but I want to see if it can be done with string interpolation, just as an exercise.
Edit: since it seems like nobody knows what I'm talking about, see my blog post on the subject to see how it's supposed to work.

The final version is more user friendly:
var text = $"The time is {DateTime.Now:yyyy-MM-dd HH:mm:ss}";

With the version of string interpolation that's in VS 2015 Preview, you can use characters like dashes in the interpolation format by enclosing it in another pair of quotes:
var dob2 = "Customer \{customer.IdNo} was born on \{customer.DateOfBirth : "yyyy-M-dd"}";

Related

C# parse Apple formatted strings (printf)

I'm dealing some with Apple formatted strings, for example:
[%d] fsPurgeable type: %#, count: %lld bytes for %lld files
According to Apple's documentation here, the format string specification follows the IEEE printf specification, with some modifications it appears.
I need to parse these strings, and replace the % type placeholders with the actual data that belongs there. My initial thought was to use a Regex, however as these claim to adhere to printf specifications, I was wondering if there was anything already in .net that might help with this.
I've done some reading, but I can't see anything that immediately jumped out as useful.
Any suggestions?
To somewhat answer my own question, I ended up writing a regex to handle the parsing:
var regex = new Regex(#"(?i)%[dspublicxfegahztj{}#%]*");
(?i) = case insensitive flag
% = literal percent sign
[dspublicxfegahztj{}#%] = all valid characters that could follow the %
* = match zero or multiple characters, e.g. %lld
Still curious if there is anything in .Net to handle this sort of thing

Replace characters using Regex in c# [duplicate]

I like to know how to replace a regex-match of unknown amount of equal-signs, thats not less than 2... to the same amount of underscores
So far I got this:
text = Regex.Replace(text, "(={2,})", "");
What should I use as the 3rd parameter ?
EDIT: Prefferably a regex solution thats compatible in all languages
You can use Regex.Replace(String, MatchEvaluator) instead and analyze math:
string result = new Regex("(={2,})")
.Replace(text, match => new string('_', match.ToString().Length));
A much less clear answer (in term of code clarity):
text = Regex.Replace(text, "=(?==)|(?<==)=", "_");
If there are more than 2 = in a row, then at every =, we will find a = ahead or behind.
This only works if the language supports look-behind, which includes C#, Java, Python, PCRE... and excludes JavaScript.
However, since you can pass a function to String.replace function in JavaScript, you can write code similar to Alexei Levenkov's answer. Actually, Alexei Levenkov's answer works in many languages (well, except Java).

Is there a regex to test if a string is for a locale? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I don't know anything about regular expressions but I think I have to use it for my probleme I got some filenames that look like :
MyResource
MyResource.en-GB
MyResource.en-US
MyResource.fr-FR
MyResource.de-DE
The idea is to test if my strings end with "[letter][letter]-[letter][letter]"
I know this is a very noob, but I just have no idea about how to do it, even if I know exactly what I wanna do... :(
To cater for basic variants:
^[A-Za-z]{2,4}([_-][A-Za-z]{4})?([_-]([A-Za-z]{2}|[0-9]{3}))?$
which consists of:
Language code: ISO 639 2 or 3, or 4 for future use, alpha.
Optional script code: ISO 15924 4 alpha.
Optional country code: ISO 3166-1 2 alpha or 3 digit.
Separated by underscores or dashes.
Valid examples are:
de
en-US
zh-Hant-TW
En-au
aZ_cYrl-aZ.
For the OP's specific question, this would need to be prefixed by /^MyResource[.] and suffixed by $/ to ensure the whole file name is for a valid resource file that ends in a locale.
Note that some programming language's functions may only accept particular forms, like only underscores and uppercase country code. PHP's intl functions accept either case and separators. PayPal accepts only the language, or the la_CY form, where la is the language and CY is the country/region. The PHP locale_canonicalize function can be used to standardise to this format.
IETF RFC 5646, which governs internet usage of these tags, recommends a capitalisation and separation format like az-Cyrl-AZ, as used in the first three examples above, though it says processors should accept any mix of case and either separator, as per the last two examples. When displaying locales, using - as the separator allows finer-grained line-wrapping which might otherwise produce significantly empty lines as when the non=wrapping _ is used, especially in table cells.
The regex for the recommended basic format is:
^[a-z]{2,4}(-[A-Z][a-z]{3})?(-([A-Z]{2}|[0-9]{3}))?$
The regexp only covers the basic format. There are variants for extras, like local region. RFC 5646 allows for such variants, along with private extensions and backwards-compatibility forms. It all depends upon the granularity required. The CLDR Unicode database, which is used by PHP's intl functions and other programs, may include such variants from version to version, though they can also disappear at a later time.
If using a CLDR-based function set, like PHP's intl extension, you can check if a locale exists in the intl database using a function like:
<?php
function is_locale($locale=''){
// STANDARDISE INPUT
$locale=locale_canonicalize($locale);
// LOAD ARRAY WITH LOCALES
$locales=resourcebundle_locales('');
// RETURN WHETHER FOUND
return (array_search($locale,$locales)!==F);
}
?>
It takes about half a millisecond to load and search the data, so it won't be too much of a performance hit.
Of course, it will only find those in the database of the CLDR version supplied with the PHP version used, but will be updated with each subsequent PHP release.
Note that some locales are not for countries, but regions, and these are typically numeric, like 001 for 'World', 150 for 'Europe' and 419 for 'Latin America'. So there are now en-001, en-150, ar-001, and es-419, which can be used for generic language purposes. For example, en-001 was designed to decouple dependence upon en-us as an ersatz English, especially since its date formats and spellings are radically different from the 100 other regional en variants. The en-150 locale is the same as en-001 except for numbering separators and other Europe-specific formats.
In general, a regexp is a good front-end sanity check to filter out illegal characters, and especially to reserve the format for possible future additions. It also helps to prevent malicious character combinations being sent to the lookup facility, especially if text-based lookup command mechanisms, like SQL or Xpath, are used.
That would be testing your input against:
\.[a-z]{2}-[A-Z]{2}$
This is really very literal: "match a dot (\., the dot being a special character in regexes), followed by exactly two of any characters from a to z ([a-z]{2} -- [...] is a character class), followed by a dash (-), followed by two of any characters from A to Z ([A-Z]{2}), followed by the end of input ($).
http://www.dotnetperls.com/regex-match <-- how to apply this regex in C# against an input. It means the code would look like (UNTESTED):
// Post edit: this will really return a boolean
if (Regex.Match(input, #"\.[a-z]{2}-[A-Z]{2}$").Success) {
// there is a match
}
http://regex.info <-- buy that and read it, it is the BEST resource for regular expressions in the universe
http://regular-expressions.info <-- the second best resource
Rather than use Regex, I suggest you use the built-in support for cultures in .Net, i.e., the System.Globalization.CultureInfo class; the constructor recognizes valid culture strings, and gives you an object that can be used for culture specific operations:
try
{
string fileName = "MyResource.en-GB";
string cultureName = System.IO.Path.GetExtension(fileName).TrimStart('.');
CultureInfo cultureInfo = new CultureInfo(cultureName);
}
catch (ArgumentException)
{
// Invalid culture.
}
You could try something like this:
[a-z]{2}-[a-z]{2}
You almost answered it in the question. Try:
// This basically grabs the locale.
string x = MyResource.whatever.... //Whatever it might be.
string locale = x.SubString(x.Length - 5) // Assuming the locale is 5 characters long.
// Now you have a 'locale' that is ready for comparisons.
if (locale == "en-GB") { .... }
if (locale == "fr-FR") { .... }
etc....
On a similar note, here is a useful list of two letter country codes.
http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
I know this isn't really regex, but you didn't seem sure about needing to use it absolutely.
cultures = CultureInfo.GetCultures(System.Globalization.CultureTypes.AllCultures);
cultures.Where(o => filename.EndsWith(o.Name));
This might not be an answer to this question, but one may pass by and be looking for this answer.
To match locales like en_GB you can use this expression:
/^[a-z]{2}_[A-Z]{2}$/
I'll try to explain it here:
^[a-z] means start with lower case letters and {2} means you expect exactly 2 of those
follow with _
[A-Z]{2}$ means end with upper case letters and match exactly 2 of those, $ means that these letters have to be in the end of the string.
An extension to the great answer by Patanjali, but also including named groups and support for private-use as defined in RFC 4647. For example: de-DE-x-goethe or zh-Hant-CN-x-private1-private2.
^(?<language>[A-Za-z]{2,4})([_-](?<script>[A-Za-z]{4}|[0-9]{3}))?([_-](?<country>[A-Za-z]{2}|[0-9]{3}))?([_-]x[_-](?<private>[A-Za-z0-9-_]+))?$
^[a-z]{2}([_])?([A-Za-z]{2})?$
I used this regex and it works for locale only having optional '_'
For example:
en,
de,
en_us,
en_US
So Regex works if the locale has only fixed two chars (only lowercase)
or it has two chars (only lowercase) + _ + two chars (can be uppercase)

A regular expression to validate .NET time format

Background
I need to validate user input in some fields, where these are defining how to show time in some views.
Requirements
Time format must be expressed in Microsoft .NET way (check this MSDN Library article if you want to learn more about framework's date and time formatting: http://msdn.microsoft.com/en-us/library/8kb3ddd4.aspx)
Keep in mind I'm looking to validate the format instead of an actual time string.
For example, user may input:
HH:mm
hh:mm
ss
hh:ss
mm:ss
... and so on.
In fact, it should validate from the shortest to longest time format available.
Another point is I need to do it in client-side using JavaScript. In other words, any given regular expression by you should work in browsers JavaScript regular expressions' engine.
I'll appreciate any self-taylored one, any link or pasted expression!
Thank you in advance.
NOTE (Update)
I can't use ASP.NET validation engine, or any other. Because of project's requirements, I need to avoid that.
As far as I understand, there is no much options - sort of 20, as maximum. Why not just enumerate them all in one big regex without much special symbols? Like
'hh:mm|hh:mm:ss|yyyy-MM-dd hh:mm|<etc>'
you could than make it case sensitive to differentiate between M for month and m for minute, and for hours make it [hH], then make it [:-/] there where you allow for different separators, and lots of other similar things. But the main idea is to simply enumerate all options separated by | with just little amount of regex syntax between | and |.
What is your definition of a "valid" format string? Only once you know that can it be possible to validate a format string.
"K" is also a valid format string
"zz" is also a valid format string
"e" is also a valid format (it would fall into the "The character is copied to the result string unchanged." case)
I'm not even sure what formats would actually cause .NET .ToString() to throw an exception (if that's what you are trying to avoid).

Best method of Textfile Parsing in C#?

I want to parse a config file sorta thing, like so:
[KEY:Value]
[SUBKEY:SubValue]
Now I started with a StreamReader, converting lines into character arrays, when I figured there's gotta be a better way. So I ask you, humble reader, to help me.
One restriction is that it has to work in a Linux/Mono environment (1.2.6 to be exact). I don't have the latest 2.0 release (of Mono), so try to restrict language features to C# 2.0 or C# 1.0.
I considered it, but I'm not going to use XML. I am going to be writing this stuff by hand, and hand editing XML makes my brain hurt. :')
Have you looked at YAML?
You get the benefits of XML without all the pain and suffering. It's used extensively in the ruby community for things like config files, pre-prepared database data, etc
here's an example
customer:
name: Orion
age: 26
addresses:
- type: Work
number: 12
street: Bob Street
- type: Home
number: 15
street: Secret Road
There appears to be a C# library here, which I haven't used personally, but yaml is pretty simple, so "how hard can it be?" :-)
I'd say it's preferable to inventing your own ad-hoc format (and dealing with parser bugs)
I was looking at almost this exact problem the other day: this article on string tokenizing is exactly what you need. You'll want to define your tokens as something like:
#"(?&ltlevel>\s) | " +
#"(?&ltterm>[^:\s]) | " +
#"(?&ltseparator>:)"
The article does a pretty good job of explaining it. From there you just start eating up tokens as you see fit.
Protip: For an LL(1) parser (read: easy), tokens cannot share a prefix. If you have abc as a token, you cannot have ace as a token
Note: The article's missing the | characters in its examples, just throw them in.
There is another YAML library for .NET which is under development. Right now it supports reading YAML streams and has been tested on Windows and Mono. Write support is currently being implemented.
Using a library is almost always preferably to rolling your own. Here's a quick list of "Oh I'll never need that/I didn't think about that" points which will end up coming to bite you later down the line:
Escaping characters. What if you want a : in the key or ] in the value?
Escaping the escape character.
Unicode
Mix of tabs and spaces (see the problems with Python's white space sensitive syntax)
Handling different return character formats
Handling syntax error reporting
Like others have suggested, YAML looks like your best bet.
You can also use a stack, and use a push/pop algorithm. This one matches open/closing tags.
public string check()
{
ArrayList tags = getTags();
int stackSize = tags.Count;
Stack stack = new Stack(stackSize);
foreach (string tag in tags)
{
if (!tag.Contains('/'))
{
stack.push(tag);
}
else
{
if (!stack.isEmpty())
{
string startTag = stack.pop();
startTag = startTag.Substring(1, startTag.Length - 1);
string endTag = tag.Substring(2, tag.Length - 2);
if (!startTag.Equals(endTag))
{
return "Fout: geen matchende eindtag";
}
}
else
{
return "Fout: geen matchende openeningstag";
}
}
}
if (!stack.isEmpty())
{
return "Fout: geen matchende eindtag";
}
return "Xml is valid";
}
You can probably adapt so you can read the contents of your file. Regular expressions are also a good idea.
It looks to me that you would be better off using an XML based config file as there are already .NET classes which can read and store the information for you relatively easily. Is there a reason that this is not possible?
#Bernard: It is true that hand editing XML is tedious, but the structure that you are presenting already looks very similar to XML.
Then yes, has a good method there.
#Gishu
Actually once I'd accommodated for escaped characters my regex ran slightly slower than my hand written top down recursive parser and that's without the nesting (linking sub-items to their parents) and error reporting the hand written parser had.
The regex was a slightly faster to write (though I do have a bit of experience with hand parsers) but that's without good error reporting. Once you add that it becomes slightly harder and longer to do.
I also find the hand written parser easier to understand the intention of. For instance, here is the a snippet of the code:
private static Node ParseNode(TextReader reader)
{
Node node = new Node();
int indentation = ParseWhitespace(reader);
Expect(reader, '[');
node.Key = ParseTerminatedString(reader, ':');
node.Value = ParseTerminatedString(reader, ']');
}
Regardless of the persisted format, using a Regex would be the fastest way of parsing.
In ruby it'd probably be a few lines of code.
\[KEY:(.*)\]
\[SUBKEY:(.*)\]
These two would get you the Value and SubValue in the first group. Check out MSDN on how to match a regex against a string.
This is something everyone should have in their kitty. Pre-Regex days would seem like the Ice Age.

Categories

Resources