Html.Decoded is problematic in string functions

Html.Decoded is problematic in string functions - c#

string local= HttpUtility.HtmlDecode(GetLocalizedSupportPhone()).Replace("-", "").Replace(" ", "");
I am getting a string :
"0124 4148173"
from the GetLocalizedSupportPhone() method. The Html Decode method returns:
"0-12-4 41-481-73"
I have a list of phone numbers like:- "01244148173", "01244148173", etc which are plain integers without any space character or html character.
Problem scenario:- All i want to do is to get decoded local string ("0-12-4 41-481-73"), replace the  as well as " " with empty string character and compare the resultant local string with the list items. If a similar list item exists, then remove that particular list item.
But strangely, the .Replace() method replaces space character with blank string but is unable to replace "-" with empty string.
I am just curious why is it happening? Why ANY OF THE STRING METHODS (like I tried with .split() ) can not detect "-"?

There are different types of hyphens.  is a soft hyphen. Specifically the soft hyphen is 173 and the hyphen on your keyboard is 45.
Try this instead.
var r = HttpUtility.HtmlDecode("0124 4148173")
.Replace((char)173, ' ')
.Replace(" ", "");
That will replace the soft hyphen with a space and then your second replace will get rid of that.
Another option would be to use a regular expression to remove all non-numeric values.
Regex nonNumeric = new Regex(#"\D");
var r = nonNumeric.Replace(
HttpUtility.HtmlDecode("0124 4148173"),
string.Empty);

This might help if you're just looking to strip spaces and soft hypens from a string without having to deal with HTML decoding:
var regex = new Regex(#"\u00ad| ");
var result = regex.Replace(stringWithSoftHyphens, string.Empty);
I tried doing this with Trim((char)173) but it (and methods like Split) do not seem to be able to handle the soft hyphen character like the Regex class can.

Related

Replace the nth index of a character

How can I replace the nth index of a character using only Regex.
string input = "%fdfdfdfdfdfdfdfdfdfdfdffd";
string result = Regex.Replace(input, "^%", "");
The above code, replaces the first character with an empty string, But, I want to specify an index: like nth index, so that character gets replaced with an empty string.
Can someone help me out here.

It's possible to create a regex pattern that captures all characters before and after the replaced character and then replace the whole string with the two captures separated by the new character. For example:
Regex.Replace("abcdefgh", #"^(.{4}).(.*)$", #"$1E$2") // returns "abcdEfgh"
You could then create a method that replaces the character at a specific index:
string ReplaceCharacter(string text, int index, char value)
=> Regex.Replace(text, $#"^(.{{{index}}}).(.*)$", $#"${{1}}{value}${{2}}");
// Usage:
ReplaceCharacter("Foo-bar", 3, 'l') // returns "Foolbar"

As Johan Wentholt said in the comments, you can perfectly use Regex.Replace to match a number of characters from the start of the line and replace it with a capture group that's one character less than the full matched piece:
String result = Regex.Replace(input, "^(.{" + index + "}).", "$1");
This matches "index times any character, followed by another character, at the start of the string", but replaces it by only the "index times any character" without that last character, since that last dot is outside of the capture group.
If you want to replace by something else than an empty string, you just concatenate it to the end of the "$1" replacement string. Though to be safe then, you should replace it with "${1}" to avoid problems if the piece you add behind it starts with a number, since that would change the capture group number.

What you want to do may not be possible with Regex alone. This is sort of a cheat:
var input = "%fdfd678dfdfdfdfdfdfdfdffd";
var result = Regex.Replace(input, "^.{7}", input.Substring(0,6));
Console.WriteLine($"result = {result}");

Trim Non-alphanum from beginning and end of string

what is the best way to trim ALL non alpha numeric characters from the beginning and end of a string ? I tried to add characters that I do no need manually but it doesn't work well and use the . I just need to trim anything not alphanumeric.
I tried using this function:
string something = "()&*1#^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9&^";
string somethingNew = Regex.Replace(something, #"[^\p{L}-\s]+", "");
But it removes all characters that are non alpha numeric from the string. What I basically want is like this:
"test1" -> test1
#!#!2test# -> 2test
(test3) -> test3
##test4---- -> test4
I do want to support unicode characters but not symbols..
EDIT:
The output of the example should be:
Littering aaaannnndóú
Regards

Assuming you want to trim non-alphanumeric characters from the start and end of your string:
s = new string(s.SkipWhile(c => !char.IsLetterOrDigit(c))
.TakeWhile(char.IsLetterOrDigit)
.ToArray());

#"[^\p{L}\s-]+(test\d*)|(test\d*)[^\p{L}\s-]+","$1"

You can use String function String.Trim Method (Char[]) in .NET library to trim the unnecessary characters from the given string.
From MSDN : String.Trim Method (Char[])
Removes all leading and trailing occurrences of a set of characters
specified in an array from the current String object.
Before trimming the unwanted characters, you need to first identify whether the character is Letter Or Digit, if it is non-alphanumeric then you can use String.Trim Method (Char[]) function to remove it.
you need to use Char.IsLetterOrDigit() function to identify wether the character is alphanumeric or not.
From MSDN: Char.IsLetterOrDigit()
Indicates whether a Unicode character is categorized as a letter or a
decimal digit.
Try This:
string str = "()&*1#^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9&^";
foreach (char ch in str)
{
if (!char.IsLetterOrDigit(ch))
str = str.Trim(ch);
}
Output:
1#^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9

If you need to remove any character which is not alphanumeric, you can use IsLetterOrDigit paired with a Where to go through every character. And because we're working at the char level, we'll need a little Concat at the end to bring everything back into a string.
string result = string.Concat(input.Where(char.IsLetterOrDigit));
which you can easily convert into an extension method
public static class Extensions
{
public static string ToAlphaNum(this string input)
{
return string.Concat(input.Where(char.IsLetterOrDigit));
}
}
that you can use like this :
string testString = "#!#!\"(test123)\"";
string result = testString.ToAlphaNum(); //test123
Note: this will remove every non-alphanumeric character from your string, if you really need to remove only those at the beginning/end, please add more details about what defines a beginning or an end and add more examples.

And you could also replace all the non-letters/numbers at the beginning and/or end of the line:
^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$
used as
resultString = Regex.Replace(subjectString, #"^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$", "", RegexOptions.Multiline);
If you really want to only remove characters at the beginning and end of the "String" and not do this line by line, then remove the ^$ match at linebreak option (RegexOption.Multiline)
If you wanted to include leading or trailing underscores, as characters to be retained, you could simplify the regex to:
^\W+|\W+$
The core of the regex:
[^\p{L}\p{N}]
is a negated character class which includes all of the characters in the Unicode class of Letters \p{L} or Numbers \p{N}
In other words:
Trim non-unicode alphanumeric characters
^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$
Options: Case sensitive; Exact spacing; Dot doesn't match line breaks; ^$ match at line breaks; Parentheses capture
Match this alternative «^[^\p{L}\p{N}]*»
Assert position at the beginning of a line «^»
Match any single character NOT present in the list below «[^\p{L}\p{N}]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A character from the Unicode category “letter” «\p{L}»
A character from the Unicode category “number” «\p{N}»
Or match this alternative «[^\p{L}\p{N}]*$»
Match any single character NOT present in the list below «[^\p{L}\p{N}]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A character from the Unicode category “letter” «\p{L}»
A character from the Unicode category “number” «\p{N}»
Assert position at the end of a line «$»
Created with RegexBuddy

Without using regex:
In Java, you could do: (in c# syntax would be nearly the same with same functionality)
while (true) {
if (word.length() == 0) {
return ""; // bad
}
if (!Character.isLetter(word.charAt(0))) {
word = word.substring(1);
continue; // so we are doing front first
}
if (!Character.isLetter(word.charAt(word.length()-1))) {
word = word.substring(0, word.length()-1);
continue; // then we are doing end
}
break; // if front is done, and end is done
}

you could use this pattern
^[^[:alnum:]]+|[^[:alnum:]]+$
with g option
Demo

Regex pattern for text between 2 strings

I am trying to extract all of the text (shown as xxxx) in the follow pattern:
Session["xxxx"]
using c#
This may be Request.Querystring["xxxx"] so I am trying to build the expression dynamically. When I do so, I get all sorts of problems about unescaped charecters or no matches :(
an example might be:
string patternstart = "Session[";
string patternend = "]";
string regexexpr = #"\\" + patternstart + #"(.*?)\\" + patternend ;
string sText = "Text to be searched containing Session[\"xxxx\"] the result would be xxxx";
MatchCollection matches = Regex.Matches(sText, #regexexpr);
Can anyone help with this as I am stumped (as I always seem to be with RegEx :) )

With some little modifications to your code.
string patternstart = Regex.Escape("Session[");
string patternend = Regex.Escape("]");
string regexexpr = patternstart + #"(.*?)" + patternend;

The pattern you construct in your example looks something like this:
\\Session[(.*?)\\]
There are a couple of problems with this. First it assumes the string starts with a literal backslash, second, it wraps the entire (.*?) in a character class, that means it will match any single open parenthesis, period, asterisk, question mark, close parenthesis or backslash. You'd need to escape the the brackets in your pattern, if you want to match a literal [.
You could use a pattern like this:
Session\[(.*?)]
For example:
string regexexpr = #"Session\[(.*?)]";
string sText = "Text to be searched containing Session[\"xxxx\"] the result would be xxxx";
MatchCollection matches = Regex.Matches(sText, #regexexpr);
Console.WriteLine(matches[0].Groups[1].Value); // "xxxx"

The characters [ and ] have a special meaning with regular expressions - they define a group where one of the contained characters must match. To work around this, simply 'escape' them with a leading \ character:
string patternstart = "Session\[";
string patternend = "\]";
An example "final string" could then be:
Session\["(.*)"\]
However, you could easily write your RegEx to handle Session, Querystring, etc automatically if you require (without also matching every other array you throw at it), and avoid having to build up the string in the first place:
(Querystring|Session|Form)\["(.*)"\]
and then take the second match.

Regex Replacing only whole matches

I am trying to replace a bunch of strings in files. The strings are stored in a datatable along with the new string value.
string contents = File.ReadAllText(file);
foreach (DataRow dr in FolderRenames.Rows)
{
contents = Regex.Replace(contents, dr["find"].ToString(), dr["replace"].ToString());
File.SetAttributes(file, FileAttributes.Normal);
File.WriteAllText(file, contents);
}
The strings look like this _-uUa, -_uU, _-Ha etc.
The problem that I am having is when for example this string "_uU" will also overwrite "_-uUa" so the replacement would look like "newvaluea"
Is there a way to tell regex to look at the next character after the found string and make sure it is not an alphanumeric character?
I hope it is clear what I am trying to do here.
Here is some sample data:
private function _-0iX(arg1:flash.events.Event):void
{
if (arg1.type == flash.events.Event.RESIZE)
{
if (this._-2GU)
{
this._-yu(this._-2GU);
}
}
return;
}
The next characters could be ;, (, ), dot, comma, space, :, etc.

First of all, you should use Regex.Escape.
You can use then
contents = Regex.Replace(
contents,
Regex.Escape(dr["find"].ToString()) + #"(?![a-zA-Z])",
Regex.Escape(dr["replace"].ToString()));
or even better
contents = Regex.Replace(
contents,
#"\b" + Regex.Escape(dr["find"].ToString()) + #"\b",
Regex.Escape(dr["replace"].ToString()));

I think this is what you're looking for:
contents = Regex.Replace(
contents,
string.Format(#"(?<!\w){0}(?!\w)", Regex.Escape(dr["find"].ToString())),
dr["replace"].ToString().Replace("$", "$$")
);
You can't use \b because your search strings don't always start and end with word characters. Instead, I used (?<!\w) and (?!\w) to make sure the matched substring is not immediately preceded or followed by a word character (i.e., a letter, a digit, or an underscore). I don't know the complete specs for your search strings, so this pattern might need some tweaking.
None of the sample patterns you provided contain regex metacharacters, but like the other responders, I used Regex.Escape() to render it safe anyway. In the replacement string the only character you have to watch out for is the dollar sign (ref), and the way to escape that is with another dollar sign. Notice that I used String.Replace() for that instead of Regex.Replace().

There are two tricks that can help you here:
Order all the search string by length, and replace the longest ones first, that way you won't accidentally replace the shorter ones.
Use a MatchEvaluator and instead of looping through all your rows, search fro all replacement patterns in the string and look them up in your dataset.
Option one is simple, option two would look like this:
Regex.Replace(contents", "_-\\w+", ReplaceIdentifier)
public string ReplaceIdentifier(Match m)
{
DataRow row = FolderRenames.Rows.FindRow("find"); // Requires a primary key on "find"
if (row != null) return row["replace"];
else return m.Value;
}

Search for a newline Character C#.net

How do i search a string for a newline character? Both of the below seem to be returning -1....!
theJ = line.IndexOf(Environment.NewLine);
OR
theJ = line.IndexOf('\n');
The string it's searching is "yo\n"
the string i'm parsing contains this "printf("yo\n");"
the string i see contained during the comparison is this: "\tprintf(\"yo\n\");"

"yo\n" // output as "yo" + newline
"yo\n".IndexOf('\n') // returns 2
"yo\\n" // output as "yo\n"
"yo\\n".IndexOf('\n') // returns -1
Are you sure you're searching yo\n and not yo\\n?
Edit
Based on your update, I can see that I guessed correctly. If your string says:
printf("yo\n");
... then this does not contain a newline character. If it did, it would look like this:
printf("yo
");
What it actually has is an escaped newline character, or in other words, a backslash character followed by an 'n'. That's why the string you're seeing when you debug is "\tprintf(\"yo\\n\");". If you want to find this character combination, you can use:
line.IndexOf("\\n")
For example:
"\tprintf(\"yo\\n\");" // output as " printf("yo\n");"
"\tprintf(\"yo\\n\");".IndexOf("\\n") // returns 11

Looks like your line does not contain a newline.
If you are using File.ReadAllLines or string.Split on newline, then each line in the returned array will not contain the newline. If you are using StreamReader or one of the classes inheriting from it, the ReadLine method will return the string without the newline.
string lotsOfLines = #"one
two
three";
string[] lines = lotsOfLines.Split('\n');
foreach(string line in lines)
{
Console.WriteLine(line.IndexOf('\n'); // prints -1 three times
}

That should work although in Windows you'll have to search for '\r\n'.
-1 simply means that no enter was found.

It depends what you are trying to do. Both may no be identical on some platforms.
Environment.NewLine returns:
A string containing "\r\n" for non-Unix platforms, or a string
containing "\n" for Unix platforms.
Also:
If you want to search for the \n char (new line on Unix), use \n
If you want to search for the \r\n chars (new line on Windows), use \r\n
If your search depend on the current platform, use Environment.NewLine
If it returns -1 in both cases you mentioned, then you don't have a new line in your string.

When I was in college and I did a WebForms aplication to order referencies.
And the line break/carriage return it was what I used to break a referense.
//Text from TextBox
var text = textBox1.Text;
//Create an array with the text between the carriage returns
var references = text.Split(new string[] { "\r\n", "\r" }, StringSplitOptions.RemoveEmptyEntries);
//Simple OrderBy(Alphabetical)
var ordered = references.ToList<string>().OrderBy(ff => ff);
//Return the entry text ordered alphabetical em with a carriage return between every result
var valueToReturn = String.Join(Environment.NewLine, ordered);
textBox1.Text = valueToReturn;

The Environment.NewLine is not the same as \n. It is a CRLF (\r\n). However, I did try with the \n using IndexOf and my test did find the value. Are you sure what you're searching for is a \n rather than a \r? View your text in hexadecimal format and see what the hex value is.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Html.Decoded is problematic in string functions - c#

Related

Replace the nth index of a character

Trim Non-alphanum from beginning and end of string

Regex pattern for text between 2 strings

Regex Replacing only whole matches

Search for a newline Character C#.net

Categories

Resources

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Html.Decoded ­ is problematic in string functions - c#

Related

Replace the nth index of a character

Trim Non-alphanum from beginning and end of string

Regex pattern for text between 2 strings

Regex Replacing only whole matches

Search for a newline Character C#.net

Categories

Resources

Html.Decoded is problematic in string functions - c#