Escape parts of string with a character in c# using regex - c#

I have a string such as this:
/one/two/three-four/five six seven/eight/nine ten eleven-twelve
I need to first replace dashes with spaces, and then be able to escape any grouping of words that have a space between them with a "#" symbol, so the above string should be:
/one/two/#three four#/#five six seven#/eight/#nine ten eleven twelve#
I have the following extension method which works great for two words, but how can I make it work for any number of words.
public static string QueryEscape(this string str)
{
str = str.Replace("-", " ");
return Regex.Replace(str, #"(\w*) (\w*)", new MatchEvaluator(EscapeMatch));
}
private static string EscapeMatch(Match match)
{
return string.Format("#{0}#", match.Value);
}
So I guess I really need help with the proper regex that takes into account that
there could be any number of spaces
there may or may not be a trailing slash ("/")
takes into account that words are grouped between slashes, with the exception of #2 above.
Dashes are illegal and need to replaced with spaces
Thank you in advance for your support.

This should work for you:
public static string QueryEscape(this string str)
{
return Regex.Replace(str.Replace("-", " "), #"[^/]*(\s[^/]*)+", "#$&#");
}
Basically the idea is to match spans of text that isn't a slash that contains a (white-)space character in it. Then add the pound signs around the match.

Related

Regex to remove multiple consecutive commas and replace with single comma

Given the input string "Test,,test,,,test,test"
and using the following C# snippet I would have expected the duplicate commas to be replaced by a single comma and results in...
"Test,test,test,test"
private static string TruncateCommas(string input)
{
return Regex.Replace(input, #",+", ",");
}
Code was pinched from this answer...
C# replace all occurrences of a character with just a character
But what I am seeing is "Test,,test,,,test,test" as the output from this function.
Do I need to escape the comma in the regex? Or should this regex be working.
Do I need to escape the comma in the regex?
No.
Or should this regex be working.
Yes.
Please construct your test the following way:
void Main()
{
string s = "Test,,test,,,test,test";
string result = TruncateCommas(s);
Console.WriteLine(result);
}
Output
Test,test,test,test

Trim Non-alphanum from beginning and end of string

what is the best way to trim ALL non alpha numeric characters from the beginning and end of a string ? I tried to add characters that I do no need manually but it doesn't work well and use the . I just need to trim anything not alphanumeric.
I tried using this function:
string something = "()&*1#^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9&^";
string somethingNew = Regex.Replace(something, #"[^\p{L}-\s]+", "");
But it removes all characters that are non alpha numeric from the string. What I basically want is like this:
"test1" -> test1
#!#!2test# -> 2test
(test3) -> test3
##test4---- -> test4
I do want to support unicode characters but not symbols..
EDIT:
The output of the example should be:
Littering aaaannnndóú
Regards
Assuming you want to trim non-alphanumeric characters from the start and end of your string:
s = new string(s.SkipWhile(c => !char.IsLetterOrDigit(c))
.TakeWhile(char.IsLetterOrDigit)
.ToArray());
#"[^\p{L}\s-]+(test\d*)|(test\d*)[^\p{L}\s-]+","$1"
You can use String function String.Trim Method (Char[]) in .NET library to trim the unnecessary characters from the given string.
From MSDN : String.Trim Method (Char[])
Removes all leading and trailing occurrences of a set of characters
specified in an array from the current String object.
Before trimming the unwanted characters, you need to first identify whether the character is Letter Or Digit, if it is non-alphanumeric then you can use String.Trim Method (Char[]) function to remove it.
you need to use Char.IsLetterOrDigit() function to identify wether the character is alphanumeric or not.
From MSDN: Char.IsLetterOrDigit()
Indicates whether a Unicode character is categorized as a letter or a
decimal digit.
Try This:
string str = "()&*1#^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9&^";
foreach (char ch in str)
{
if (!char.IsLetterOrDigit(ch))
str = str.Trim(ch);
}
Output:
1#^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9
If you need to remove any character which is not alphanumeric, you can use IsLetterOrDigit paired with a Where to go through every character. And because we're working at the char level, we'll need a little Concat at the end to bring everything back into a string.
string result = string.Concat(input.Where(char.IsLetterOrDigit));
which you can easily convert into an extension method
public static class Extensions
{
public static string ToAlphaNum(this string input)
{
return string.Concat(input.Where(char.IsLetterOrDigit));
}
}
that you can use like this :
string testString = "#!#!\"(test123)\"";
string result = testString.ToAlphaNum(); //test123
Note: this will remove every non-alphanumeric character from your string, if you really need to remove only those at the beginning/end, please add more details about what defines a beginning or an end and add more examples.
And you could also replace all the non-letters/numbers at the beginning and/or end of the line:
^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$
used as
resultString = Regex.Replace(subjectString, #"^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$", "", RegexOptions.Multiline);
If you really want to only remove characters at the beginning and end of the "String" and not do this line by line, then remove the ^$ match at linebreak option (RegexOption.Multiline)
If you wanted to include leading or trailing underscores, as characters to be retained, you could simplify the regex to:
^\W+|\W+$
The core of the regex:
[^\p{L}\p{N}]
is a negated character class which includes all of the characters in the Unicode class of Letters \p{L} or Numbers \p{N}
In other words:
Trim non-unicode alphanumeric characters
^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$
Options: Case sensitive; Exact spacing; Dot doesn't match line breaks; ^$ match at line breaks; Parentheses capture
Match this alternative «^[^\p{L}\p{N}]*»
Assert position at the beginning of a line «^»
Match any single character NOT present in the list below «[^\p{L}\p{N}]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A character from the Unicode category “letter” «\p{L}»
A character from the Unicode category “number” «\p{N}»
Or match this alternative «[^\p{L}\p{N}]*$»
Match any single character NOT present in the list below «[^\p{L}\p{N}]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A character from the Unicode category “letter” «\p{L}»
A character from the Unicode category “number” «\p{N}»
Assert position at the end of a line «$»
Created with RegexBuddy
Without using regex:
In Java, you could do: (in c# syntax would be nearly the same with same functionality)
while (true) {
if (word.length() == 0) {
return ""; // bad
}
if (!Character.isLetter(word.charAt(0))) {
word = word.substring(1);
continue; // so we are doing front first
}
if (!Character.isLetter(word.charAt(word.length()-1))) {
word = word.substring(0, word.length()-1);
continue; // then we are doing end
}
break; // if front is done, and end is done
}
you could use this pattern
^[^[:alnum:]]+|[^[:alnum:]]+$
with g option
Demo

Validate filename in c# through regex

I want to validate a filename with this format : LetterNumber_Enrollment_YYYYMMDD_HHMM.xml
string filename = "Try123_Enrollment_20130102_1200.xml";
Regex pattern = new Regex(#"[a-zA-z]_Enrollment_[0-9]{6}_[0-9]{4}\\.xml");
if (pattern.IsMatch(filename))
{
return isValid = true;
}
However, I can't make it to work.
Any thing that i missed here?
You are not matching digits at the beginning. Your pattern should be: ^[A-Za-z0-9]+_Enrollment_[0-9]{8}_[0-9]{4}\.xml$ to match given string.
Changes:
Your string starts with alphanumeric string before first _ symbol so you need to check both (letters and digits).
After Environment_ part you have digits with the length of 8 not 6.
No need of double \. You need to escape just dot (i.e. \.).
Demo app:
using System;
using System.Text.RegularExpressions;
class Test {
static void Main() {
string filename = "Try123_Enrollment_20130102_1200.xml";
Regex pattern = new Regex(#"^[A-Za-z0-9]+_Enrollment_[0-9]{8}_[0-9]{4}\.xml$");
if (pattern.IsMatch(filename))
{
Console.WriteLine("Matched");
}
}
}
Your Regex is nowhere near your actual string:
you only match a single letter at the start (and no digits) so Try123 doesn't match
you match 6 digits instead of 8 at the date part so 20130102 doesn't match
you have escaped your backslash near the end (\\.xml) but you've also used # on your string: with # you don't need to escape.
Try this instead:
#"[a-zA-Z]{3}\d{3}_Enrollment_[0-9]{8}_[0-9]{4}\.xml"
I've assumed you want only three letters and three numbers at the start; in fact you may want this:
#"[\w]*_Enrollment_[0-9]{8}_[0-9]{4}\.xml"
You can try the following, it matches letters and digits at the beginning and also ensures that the date is valid.
[A-Za-z0-9]+_Enrollment_(19|20)\d\d(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])_[0-9]{4}\.xml
As an aside, to test your regular expressions try the free regular expression designer from Rad Software, I find that it helps me work out complex expressions beforehand.
http://www.radsoftware.com.au/regexdesigner/

C# string to sentence

Is there a way to convert string without spaces to a proper sentence??
E.g. "WhoAmI" needs to be converted to "Who Am I"
A regex replacement would do this, if you're just talking about inserting a space before each capital letter:
using System;
using System.Text.RegularExpressions;
class Test
{
static void Main()
{
var input = "WhoAmI";
var output = Regex.Replace(input, #"\p{Lu}", " $0").TrimStart();
Console.WriteLine(output);
}
}
However, I suspect there will be significant corner cases. Note that the above uses \p{Lu} instead of just [A-Z] to cope with non-ASCII capital letters; you may find A-Z simpler if you only need to deal with ASCII. The TrimStart() call is to remove the leading space you'd get otherwise.
If every word in the string is starting with uppercase you may just convert each part that is starting with uppercase to a space separated string.
You can use LINQ
string words = "WhoAmI";
string sentence = String.Concat(words.Select(letter => Char.IsUpper(letter) ? " " + letter
: letter.ToString()))
.TrimStart();

C#: split a string into runs of characters, numbers and delimited strings and process it

OK my regex is a bit rusty and I've been struggling with this particular problem...
I need to split and process a string containing any number of the following, in any order:
Chars (lowercase letters only)
Quote delimited strings
Ints
The strings are pretty weird (I don't have control over them). When there's more than one number in a row in the string they're seperated by a comma. They need to be processed in the same order that they appeared in the original string.
For example, a string might look like:
abc20a"Hi""OK"100,20b
With this particular string the resulting call stack would look a bit like:
ProcessLetters( new[] { 'a', 'b', 'c' } );
ProcessInts( 20 );
ProcessLetters( 'a' );
ProcessStrings( new[] { "Hi", "OK" } );
ProcessInts( new[] { 100, 20 } );
ProcessLetters( 'b' );
What I could do is treat it a bit like CSV, where you build tokens by processing the characters one at a time, but I think it could be more easily done with a regex?
You can use the pattern contained in this string:
#"(""[^""]*""|[a-z]|\d+)"
to tokenize the input string you provided. This pattern captures three things: simple quoted strings (no embeded quotes), lower-case characters, and one or more digits.
If your quoted strings can have escaped quotes within them (e.g., "Hi\"There\"""OK""Pilgrim") then you can use this pattern to capture and tokenize them along with the rest of the input string:
#"((?:""[^""\\]*(?:\\.[^""\\]*)*"")|[a-z]|\d+)"
Here's an example:
MatchCollection matches = Regex.Matches(#"abc20a""Hi\""There\""""""OK""""Pilgrim""100,20b", #"((?:""[^""\\]*(?:\\.[^""\\]*)*"")|[a-z]|\d+)");
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}
Returns the string tokens:
a
b
c
20
a
"Hi\"There\""
"OK"
"Pilgrim"
100
20
b
One of the nice thing about this approach is you can just check the first character to see what stack you need to put your elements in. If the first character is alpha, then it goes into the ProcessLetters stack, if the character is numeric, then it goes into ProcessInts. If the first character is a quote, then it goes into ProcessStrings after trimming the leading and trailing quotes and calling Regex.Unescape() to unescape the embedded quotes.
You can make your regexp match each of the three separate options with the or operator |. This should catch valid tokens, skipping commas and other chars.
/[a-z]|[0-9]+|"[^"]"/
Can your strings contain escaped quotes?
static void Main(string[] args)
{
string test = #"abc20a""Hi""""OK""100,20b";
string[] results = Regex.Split(test, #"(""[a-zA-Z]+""|\d+|[a-zA-Z]+)");
foreach (string result in results)
{
if (!String.IsNullOrEmpty(result) && result != ",")
{
Console.WriteLine("result: " + result);
}
}
Console.ReadLine();
}

Categories

Resources