replace items in a string while also confirming format

replace items in a string while also confirming format - c#

I have a string input in the format of "string#int" and I want to convert it to "string-int" for web friendliness reasons for an api i am using.
To do this I could obviously just replace the single character # with a - using string.replace, but ideally I'd like to do a check that the input (which is user provided by the way) is in the correct format (string#int) while or before converting to the web friendly version with a "-" instead. Essentially I'm wondering if there is a method in C# that I could use to check that this input is in the correct format and convert it to the required result format.

There is no built-in way obviously, since the format you request is quite specific. Also, a string can contain anything, also a hastag, #, so I guess you need to narrow that down.
You could use regular expressions to check if the string is in the correct format. This would be possible expression:
[A-Za-z ]+#[0-9]+
Which matches for:
this is a string#123

There's nothing built in, but you could do the following:
var parts = input.Split(new char[] { '#' });
if (parts.Length != 2) incorrect format
int result;
if (!int.TryParse(parts[1], out result) incorrect format
output = String.Join("-", parts);
This takes the input and splits it on the "#" character. If the result isn't two parts then the string is invalid. You then check that the second part is an integer - if the TryParse fails it's not valid. The last step is to rejoin the two parts, but this time with a - as the separator.

Related

C# IndexOf in special persian character

in persian/arabic character, some character used optional on top or bottom of other character like ِ َ ّ ُ.
in my example if i use this character, indexOf not found my word. consider that persian/arabic is rtl language.
for example:
منّم => م + ن + ّ + م
C#:
"منّم".IndexOf("من");
return -1
javascript:
var index= ' منّم '.indexOf('من');
console.log(index);
what happened in C#. anyone can explain this?

By passing in StringComparison.Ordinal as an argument to the overloaded String.IndexOf(), you could have also done the following:
"منّم".IndexOf("من", StringComparison.Ordinal); // returns 0

Specifying CompareOptions.Ordinal as an option should work, together with the IndexOf method of CompareInfo.
CompareInfo info = CultureInfo.CurrentCulture.CompareInfo;
string str = "منّم";
Console.WriteLine(info.IndexOf(str, "من", CompareOptions.Ordinal));
Output is 0.
DotNetFiddle if you want to try it yourself.

You should learn about the different methods that .Net uses to compare/match strings.
Best Practices for Using Strings in .NET
Some overloads with default parameters (those that search for a Char
in the string instance) perform an ordinal comparison, whereas others
(those that search for a string in the string instance) are
culture-sensitive. It is difficult to remember which method uses which
default value, and easy to confuse the overloads.
The section String Operations that Use the Invariant Culture gives a short explanation about combining characters.

Is there a way to trim leading characters using String.Format()?

Need to know of there is a way to use String.Format to remove leading characters from a string. I have a limitation in some existing code that I can only pass in a string and a format string for it.
So can you do something like
String.Format("Test output: {0:#}","001")
and produce the output
"Test output: 1"
I think the answer is 'No' but I wanted to make sure.
EDIT: To clarify, the format string will be put in a configuration file and the string to be formatted is a value coming out of a database. I can't execute any code on it. Has to be through the format string.

You could do it on the arg you are passing
String.Format("Test output: {0:#}", "001".TrimStart('0'))
Alternatively you could probably do a find with replace using a regular expression on the resulting string.
An other alternative is to define and pass in your own formatter using a custom implementation of IFormatProvider. I am not sure if this is allowed or not based your your last edit.
However, based on the restrictions listed, there is no way to do it with just the format string input

Surprising int.ToString output

I have been working on a project, and found an interesting problem:
2.ToString("TE"+"000"); // output = TE000
2.ToString("TR"+"000"); // output = TR002
I also have tried with several strings other than "TE" but all have the same correct output.
Out of curiosity, I am wondering how come this could have happened?

Simply based on Microsoft's documentation, Custom Numeric Format Strings, your strings "TE000" and "TR000" are both custom format strings, but clearly they are parsed differently.
2.ToString("TE000") is just a bug in the formatter; it's going down a buggy path because of the unescaped "E". So it's unexpectedly assuming the whole thing is a literal.
2.ToString("TR000") is being interpreted as an implied "TR" literal plus 3 zero-filled digits for an integer value; therefore, you get "TR002".
If you truly want TE and TR verbatim, the expressions 2.ToString("\"TE\"000") and 2.ToString("\"TR\"000") will accomplish that for you by specifying TE and TR as explicit literals, instead of letting the formatter guess if they are valid format specifiers (and getting it wrong).

The ToString needs to PARSE the format string and understand what to do with it.
Let's take a look to the following examples:
2.ToString("TE000"); //output TE000
2.ToString("E000"); //output 2E+000
2.ToString("0TE000); //output 2TE000
2.ToString("T"); //throws exception
2.ToString("TT"); //output TT
This shows that if the ToString parser can understand at least part of the format, it will assume that the rest is just extra characters to print with it. If the format is invalid for the given number (like when you use a DateTime string format on a number), it will throw an exception. If it can not make sense of the format, it will return the format string itself as the result.

You cannot use a numeric format to achieve a custom format, instead use something like this:
int i = 2;
String.Format("TE{0:X3}", i);

See Custom Numeric Format Strings. The E means the exponent part of the scientific notation of the number. Since 2 is 2E000 in exponential notation, that might explain it.

String.Contains and String.LastIndexOf C# return different result?

I have this problem where String.Contains returns true and String.LastIndexOf returns -1. Could someone explain to me what happened? I am using .NET 4.5.
static void Main(string[] args)
{
String wikiPageUrl = #"http://it.wikipedia.org/wiki/ʿAbd_Allāh_al-Sallāl";
if (wikiPageUrl.Contains("wikipedia.org/wiki/"))
{
int i = wikiPageUrl.LastIndexOf("wikipedia.org/wiki/");
Console.WriteLine(i);
}
}

While #sa_ddam213's answer definitely fixes the problem, it might help to understand exactly what's going on with this particular string.
If you try the example with other "special characters," the problem isn't exhibited. For example, the following strings work as expected:
string url1 = #"http://it.wikipedia.org/wiki/»Abd_Allāh_al-Sallāl";
Console.WriteLine(url1.LastIndexOf("it.wikipedia.org/wiki/")); // 7
string url2 = #"http://it.wikipedia.org/wiki/~Abd_Allāh_al-Sallāl";
Console.WriteLine(url2.LastIndexOf("it.wikipedia.org/wiki/")); // 7
The character in question, "ʿ", is called a spacing modifier letter1. A spacing modifier letter doesn't stand on its own, but modifies the previous character in the string, this case a "/". Another way to put this is that it doesn't take up its own space when rendered.
LastIndexOf, when called with no StringComparison argument, compares strings using the current culture.
When strings are compared in a culture-sensitive manner, the "/" and "ʿ" characters are not seen as two distinct characters--they're processed into one character, which does not match the parameter passed in to LastIndexOf.
When you pass in StringComparison.Ordinal to LastIndexOf, the characters are treated as distinct, due to the nature of Ordinal comparison.
Another way to make this work would be to use CompareInfo.LastIndexOf and supply the CompareOptions.IgnoreNonSpace option:
Console.WriteLine(
CultureInfo.CurrentCulture.CompareInfo.LastIndexOf(
wikiPageUrl, #"it.wikipedia.org/wiki/", CompareOptions.IgnoreNonSpace));
// 7
Here we're saying that we don't want combining characters included in our string comparison.
As a sidenote, this means that #Partha's answer and #Noctis' answer only work because the character is being applied to a character that doesn't appear in the search string that's passed to LastIndexOf.
Contrast this with the Contains method, which by default performs an Ordinal (case sensitive and culture insensitive) comparison. This explains why Contains returns true and LastIndexOf returns false.
For a fantastic overview of how strings should be manipulated in the .NET framework, check out this article.
1: Is this different than a combining character or is it a type of combining character? would appreciate if someone would clear that up for me.

Try using StringComparison.Ordinal
This will compare the string by evaluating the numeric values of the corresponding chars in each string, this should work with the special chars you have in that example string
string wikiPageUrl = #"http://it.wikipedia.org/wiki/ʿAbd_Allāh_al-Sallāl";
int i = wikiPageUrl.LastIndexOf("http://it.wikipedia.org/wiki/", StringComparison.Ordinal);
// returns 0;

The thing is C# lastindexof looks from behind.
And wikipedia.org/wiki/ is followed by ' which it takes as escape sequence. So either remove ' after wiki/ or have an # there too.
The following syntax will work( anyone )
string wikiPageUrl = #"http://it.wikipedia.org/wiki/Abd_Allāh_al-Sallāl";
string wikiPageUrl = #"http://it.wikipedia.org/wiki/#ʿAbd_Allāh_al-Sallāl";
int i = wikiPageUrl.LastIndexOf("wikipedia.org/wiki");
All 3 works
If you want a generalized solution for this problem replace ' with #' in your string before you perform any operations.

the ' characters throws it off.
This should work, when you escape the ' as \':
wikiPageUrl = #"http://it.wikipedia.org/wiki/\'Abd_Allāh_al-Sallāl";
if (wikiPageUrl.Contains("wikipedia.org/wiki/"))
{
"contains".Dump();
int i = wikiPageUrl.LastIndexOf("wikipedia.org/wiki/");
Console.WriteLine(i);
}
figure out what you want to do (remove the ', escape it, or dig deeper :) ).

How do I parse a query string with "&" in the value using C#?

I have a C# custom webpart on a sharepoint 2007 page. When clicking on a link in an SSRS report on another page, it sends the user to my custom webpart page with a query string like the following:
?tax4Elem=Docks%20&%20Chargers&ss=EU%20MOVEX&Phase=1&tax3Elem=Play%20IT&tax5Elem=Charger
Take note of the value for "tax4Elem", which is basically "Docks & Chargers". (The ampersand can actually come up in "tax4Elem", "tax3Elem", and "tax5Elem").
I cannot have the ampersand in that value encoded so I will have to work with this.
How do I parse this query string so that it doesn't recognize the "&" in "Docks & Chargers" as the beginning of a key/value pair?
Thanks in Advance!
kate

If you really cannot correct the URL, you can still try to parse it, but you have to make some decisions. For example:
Keys can only contain alphanumeric characters.
There are no empty values, or at least, there is always an equal sign = after the key
Values may contain additional ampersands and question marks.
Values may contain additional equal signs, as long as they don't appear to be part of a new key/value pair (they are not preceded with &\w+)
One possible way to capture these pairs is:
MatchCollection matches = Regex.Matches(s, #"\G[?&](?<Key>\w+)=(?<Value>.*?(?=$|&\w+=))");
var values = matches.Cast<Match>()
.ToDictionary(m => m.Groups["Key"].Value,
m => HttpUtility.UrlDecode(m.Groups["Value"].Value),
StringComparer.OrdinalIgnoreCase);
You can then get the values:
string tax4 = values["tax4Elem"];
Note that if the query string is "invalid" according to our rule, the pattern may not capture all values.

I think you can't parse that string correctly - it has been incorrectly encoded. The ampersand in "Docks & Chargers" should have been encoded as %26 instead of &:
?tax4Elem=Docks%20%26%20Chargers&ss=EU%20MOVEX&Phase=1&tax3Elem=Play%20IT&tax5Elem=Charger
Is it possible to change the code that generated the URL?

Obviously the request is incorrect. However, to work-around it, you can take the original URL, then find the IndexOf of &ss=. Then, find the = sign immediately before that. Decode (with UrlDecode) then reencode (with UrlEncode) the part between the = and &ss= (the value of tax4Elem). Then, reconstruct the query string like this:
correctQueryString = "?tax4Elem=" + reencodedTaxValue + remainderOfQueryString
and decode it normally (e.g. with ParseQueryString) into a NameValueCollection.

Or you can use HttpServerUtility.HtmlDecode method to decode the value to '&' (ampersand) sign

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.