I am a complete newbie when it comes to Regular Expressions, and was wondering if somebody could help me out. I'm not sure if using a regEx is the correct approach here, so please feel free to chime in if you have a better idea. (I will be looping thru many strings).
Basically, I'd like to find/replace on a string, wrapping the matches with {} and keeping the original case of the string.
Example:
Source: "The CAT sat on the mat."
Find/Replace: "cat"
Result: "The {CAT} sat on the mat."
I would like the find/replace to work on only the first occurance, and I also need to know whether the find/replace did indeed match or not.
I hope I've explained things clearly enough.
Thank you.
Regex theRegex =
new Regex("(" + Regex.Escape(FindReplace) + ")", RegexOptions.IgnoreCase);
theRegex.Replace(Source, "{$1}", 1);
If you want word boundary tolerance:
Regex theRegex =
(#"([\W_])(" + Regex.Escape(FindReplace) + #")([\W_])", RegexOptions.IgnoreCase)
theRegex.Replace(str, "$1{$2}$3", 1)
If you will be looping through many strings, then perhaps Regex might not be the best idea - it's a great tool, but not the fastest.
Here's a sample code that would also work:
var str = "The Cat ate a mouse";
var search = "cat";
var index = str.IndexOf(search, StringComparison.CurrentCultureIgnoreCase);
if (index == -1)
throw new Exception("String not found"); //or do something else in this case here
var newStr = str.Substring(0, index) + "{" + str.Substring(index, search.Length) + "}" + str.Substring(index + search.Length);
EDIT:
As noted in the comments, the above code has some issues.
So I decided to try and find a way to make it work without using Regex. Don't get me wrong, I love Regex as much as the next guy. I did this mostly out of curiosity. ;)
Here's what I came upon:
public static class StringExtendsionsMethods
{
public static int IndexOfUsingBoundary(this String s, String word)
{
var firstLetter = word[0].ToString();
StringBuilder sb = new StringBuilder();
bool previousWasLetterOrDigit = false;
int i = 0;
while (i < s.Length - word.Length + 1)
{
bool wordFound = false;
char c = s[i];
if (c.ToString().Equals(firstLetter, StringComparison.CurrentCultureIgnoreCase))
if (!previousWasLetterOrDigit)
if (s.Substring(i, word.Length).Equals(word, StringComparison.CurrentCultureIgnoreCase))
{
wordFound = true;
bool wholeWordFound = true;
if (s.Length > i + word.Length)
{
if (Char.IsLetterOrDigit(s[i + word.Length]))
wholeWordFound = false;
}
if (wholeWordFound)
return i;
sb.Append(word);
i += word.Length;
}
if (!wordFound)
{
previousWasLetterOrDigit = Char.IsLetterOrDigit(c);
sb.Append(c);
i++;
}
}
return -1;
}
}
But I can't take credit for this! I found this after some Googling here, on StackOverflow and then modified it. ;)
Use this method instead of the standard IndexOf in the above code.
Try this:
class Program
{
const string FindReplace = "cat";
static void Main(string[] args)
{
var input = "The CAT sat on the mat as a cat.";
var result = Regex
.Replace(
input,
"(?<=.*)" + FindReplace + "(?=.*)",
m =>
{
return "{" + m.Value.ToUpper() + "}";
},
RegexOptions.IgnoreCase);
Console.WriteLine(result);
}
}
Related
I want to remove all instances of a character from a string except where that character is followed by any form or whitespace. I haven't written unit tests yet but it seems like the code below achieves what I want (possibly forgetting an edge case or two which is ok for now). It feels pretty clunky though. Can anyone suggest an improvement?
public string Strip(string text, char c)
{
if (!string.IsNullOrEmpty(text))
{
bool characterIsInString = true;
int currentIndex = 0;
while (characterIsInString)
{
currentIndex = text.IndexOf(c, currentIndex + 1);
if (currentIndex != -1)
{
var charAfter = text.Substring(currentIndex + 1, 1);
if (charAfter != " ")
{
text = text.Remove(currentIndex, 1);
}
}
else
{
characterIsInString = false;
}
}
}
return text;
}
You can use Regular Expression( here i have assumed that character is x):
string result = Regex.Replace( input , "x(?=\\S)" , "");
Live Demo. Please check that it uses very few steps finding it.
I suggest, you use the following code:
public string Strip(string text, char c)
{
Regex regex = new Regex(c.ToString() + #"[^\s]");
return regex.Replace(text, "");
}
This will remove the char in text if it's not followed by a White Space.
This is a very simple and fast regex.
if you want replace character
you can replace char with Replace method
Follow code:
Console.ReadLine().Replace("e","h");
So, I have some code that works the way I want it to, but I am wondering if there is a better way to do this with a regex? I have played with a few regex but with no luck(And I know I need to get better with regex stuff).
This code is purely designed to remove any extra spaces or non email valid characters. Then it goes through and removes extra # symbols beyond the first.
List<string> second_pass = new List<string>();
string final_pass = "";
if (email_input.Text.Length > 0)
{
string first_pass = Regex.Replace(email_input.Text, #"[^\w\.#-]", "");
if (first_pass.Contains("#"))
{
second_pass = first_pass.Split('#').Select(sValue => sValue.Trim()).ToList();
string third_pass = second_pass[0] + "#" + second_pass[1];
second_pass.Remove(second_pass[0]);
second_pass.Remove(second_pass[1]);
if (second_pass.Count > 0)
{
final_pass = third_pass + string.Join("", second_pass.ToArray());
}
}
email_output.Text = final_pass;
}
If you can get by by replacing only the captured groups, then this should be able to work.
([^\w\.\#\-])|(?<=\#).*?(\#)
Demo
Going by your description and not the code:
var final_pass = email_input.Text;
var atPos = final_pass.IndexOf('#');
if (atPos++ >= 0)
final_pass = final+pass.Substring(0, atPos) + Regex.Replace(final_pass.Substring(atPos), "[# ]", "");
For an (almost) pure regex solution, using a state cheap, this seems to be working:
var first = 0;
final_pass = Regex.Replace(final_pass, "(^.+?#)?([^ #]+?)?[# ]", m => (first++ == 0) ? m.Groups[1].Value+m.Groups[2].Value : m.Groups[2].Value);
i feel dumb for asking a most likely silly question.
I am helping someone getting the results he wishes for his custom compiler that reads all lines of an xml file in one string so it will look like below, and since he wants it to "Support" to call variables inside the array worst case scenario would look like below:
"Var1 = [5,4,3,2]; Var2 = [2,8,6,Var1;4];"
What i need is to find the first ";" after "[" and "]" and split it, so i stand with this:
"Var1 = [5,4,3,2];
It will also have to support multiple "[", "]" for example:
"Var2 = [5,Var1,[4],2];"
EDIT: There may also be Data in between the last "]" and ";"
For example:
"Var2 = [5,[4],2]Var1;
What can i do here? Im kind of stuck.
You can try regular expressions, e.g.
string source = "Var1 = [5,4,3,2]; Var2 = [2,8,6,Var1;4];";
// 1. final (or the only) chunk doesn't necessary contain '];':
// "abc" -> "abc"
// 2. chunk has at least one symbol except '];'
string pattern = ".+?(][a-zA-Z0-9]*;|$)";
var items = Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value)
.ToArray();
Console.Write(string.Join(Environment.NewLine, items));
Outcome:
Var1 = [5,4,3,2]abc123;
Var2 = [2,8,6,Var1;4];
^([^;]+);
This regex should work for all.
You can use it like here:
string[] lines =
{
"Var1 = [5,4,3,2]; Var2 = [2,8,6,Var1;4];",
"Var2 = [5,[4],2]Var1; Var2 = [2,8,6,Var1;4];"
};
Regex pattern = new Regex(#"^([^;]+);");
foreach (string s in lines){
Match match = pattern.Match(s);
if (match.Success)
{
Console.WriteLine(match.Value);
}
}
The explanation is:
^ means starts with and is [^;] anything but a semicolon
+ means repeated one or more times and is ; followed by a semicolon
This will find Var1 = [5,4,3,2]; as well as Var1 = [5,4,3,2];
You can see the output HERE
public static string Extract(string str, char splitOn)
{
var split = false;
var count = 0;
var bracketCount = 0;
foreach (char c in str)
{
count++;
if (split && c == splitOn)
return str.SubString(0, count);
if (c == '[')
{
bracketCount++;
split = false;
}
else if (c == ']')
{
bracketCount--;
if (bracketCount == 0)
{
split = true;
}
else if (bracketCount < 0)
throw new FormatException(); //?
}
}
return str;
}
I need to remove all additional spaces in a string.
I use regex for matching strings and matched strings i replace with some others.
For better understanding please see examples below:
3 input strings:
Hello, how are you?
Hello , how are you?
Hello , how are you ?
This are 3 strings that should match by one pattern-regex.
It looks something like this:
Hello\s*,\s+how\s+are\s+you\s*?
It works fine but there is a perfomance problem.
If I have a lot of patterns (~20k) and try to execute each pattern it runs very slow (3-5 minutes).
Maybe there is better way for doing this?
for example use some 3d-party libs?
UPD: Folks, this question is not about how to do this. It's about how to do this with best perfomance. :)
Let me explain more detailed. The main goal is tokenize text. (replace some token with special symbols)
For example I have a token "nice try".
Then I input text "this is nice try".
result: "this is #tokenizedtext#" where #tokenizedtext# some special symbols. It doesen't matter in this case.
Next I have string "Mike said it was a nice try".
result should be "Mike said it was a #tokenizedtext#".
I think the main idea is clear.
So I can have a lot of tokens. When I process it I convert my token from "nice try" to pattern "nice\s+try". and try to replace with this pattern input text.
It works fine. But if in tokens there is more spaces and there is also punctuation then my regexes became bigger and works very slow.
Do you have some suggestions (technical or logic) for solving this problem?
I can suggest a few solutions.
First of all, avoid the static Regex method. Create an instance of it (and store it, don't call the constructor for each replacement!) and, if possible, use RegexOptions.Compiled. It should improve your performance.
Second, you can try to review your pattern. I'll do some profiling, but I'm currently undecisive between:
#"(?<=\s)\s+"
With replacement being an empty string or:
#"\s+"
With a space as a replacement. You can try this code, in the meanwhile:
var s = "Hello , how are you?";
var pattern = #"\s+";
var regex = new Regex(pattern, RegexOptions.Compiled);
var replaced = regex.Replace(s, " ");
EDIT: After having done some measurement, the second pattern seems to be faster. I'm editing my sample to adapt it.
EDIT 2: I've written an unsafe method. It's much faster than the other ones presented here, including the Regex ones, but, as the word itself says, it's unsafe. I don't think that there's any problem with the code I've written but I may be wrong -- So please, check it again and again in case there's a bug in the method.
static unsafe string TrimInternal(string input)
{
var length = input.Length;
var array = stackalloc char[length];
fixed (char* fix = input)
{
var ptr = fix;
var counter = 0;
var lastWasSpace = false;
while (*ptr != '\x0')
{
//Current char is a space?
var isSpace = *ptr == ' ';
//If it's a space but the last one wasn't
//Or if it's not a space
if (isSpace && !lastWasSpace || !isSpace)
//Write into the result array
array[counter++] = *ptr;
//The last character (before the next loop) was a space
lastWasSpace = isSpace;
//Increase the pointer
ptr++;
}
return new string(array, 0, counter);
}
}
Usage (compile with /unsafe):
var s = TrimInternal("Hello , how are you?");
Profiling made in Release build, optimizations on, 1000000 iterations:
My above solution with Regex: 00:00:03.2130121
The unsafe solution: 00:00:00.2063467
This might work for you. It should be pretty fast. Note that it also removes spaces at the end of the string; that might not be what you want...
using System;
namespace Demo
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(">{0}<", RemoveExtraSpaces("Hello, how are you?"));
Console.WriteLine(">{0}<", RemoveExtraSpaces("Hello , how are you?"));
Console.WriteLine(">{0}<", RemoveExtraSpaces("Hello , how are you ?"));
}
public static string RemoveExtraSpaces(string text)
{
var buffer = new char[text.Length];
bool isSpaced = false;
int n = 0;
foreach (char c in text)
{
if (c == ' ')
{
isSpaced = true;
}
else
{
if (isSpaced)
{
if ((c != ',') && (c != '?'))
{
buffer[n++] = ' ';
}
isSpaced = false;
}
buffer[n++] = c;
}
}
return new string(buffer, 0, n);
}
}
}
Something of my own :
find all the position of WhiteSpacechar in string;
private static IEnumerable<int> GetWhiteSpacePos(string input)
{
int iPos = -1;
while ((iPos = input.IndexOf(" ", iPos + 1, StringComparison.Ordinal)) > -1)
{
yield return iPos;
}
}
Remove all whitespace that are in in sequence Returned from GetWhiteSpacePos
string original_string = "Hello , how are you ?";
var poss = GetWhiteSpacePos(original_string).ToList();
int startPos;
int endPos;
StringBuilder builder = new StringBuilder(original_string);
for (int i = poss.Count -1; i > 1; i--)
{
endPos = poss[i];
while ((poss[i] == poss[i - 1] + 1) && i > 1)
{
i--;
}
startPos = poss[i];
if (endPos - startPos > 1)
{
builder.Remove(startPos, endPos - startPos);
}
}
string new_string = builder.ToString();
You are using a very complex regex..simplify the regex and that would definitely increasre the performance
Use \s+ and replace it with a single space
Well, these kind of problems really trouble us. Use this code, and I'm sure you're getting the result for what you've asked. This command removes any extra white space between any string.
cleanString= Regex.Replace(originalString, #"\s", " ");
Hope thar works for you. Thanks.
And since this is a single Instruction. It will utilize less CPU resource and hence less CPU time, which ultimately increases your performance. Therefore A/C to me this method works the best when compared in terms of performance.
if its just a matter of SPACE;
try this
Source : http://www.codeproject.com/Articles/10890/Fastest-C-Case-Insenstive-String-Replace
private static string ReplaceEx(string original,
string pattern, string replacement)
{
int count, position0, position1;
count = position0 = position1 = 0;
string upperString = original.ToUpper();
string upperPattern = pattern.ToUpper();
int inc = (original.Length / pattern.Length) *
(replacement.Length - pattern.Length);
char[] chars = new char[original.Length + Math.Max(0, inc)];
while ((position1 = upperString.IndexOf(upperPattern,
position0)) != -1)
{
for (int i = position0; i < position1; ++i)
chars[count++] = original[i];
for (int i = 0; i < replacement.Length; ++i)
chars[count++] = replacement[i];
position0 = position1 + pattern.Length;
}
if (position0 == 0) return original;
for (int i = position0; i < original.Length; ++i)
chars[count++] = original[i];
return new string(chars, 0, count);
}
Usage:
string original_string = "Hello , how are you ?";
while (original_string.Contains(" "))
{
original_string = ReplaceEx(original_string, " ", " ");
}
Replacing the regex way:
string resultString = null;
try {
resultString = Regex.Replace(subjectString, #"\s+", " ", RegexOption.Compiled);
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
i want to convert:
HECHT, WILLIAM
to
Hecht, William
in c#.
any elegant ways of doing this?
string name = "HECHT, WILLIAM";
string s = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(name.ToLower());
(note it only works lower-to-upper, hence starting lower-case)
I'd just like to include an answer that points out that although this seems simple in theory, in practice properly capitalizing the names of everyone can be very complicated:
Peter O'Toole
Xavier Sala-i-Martin
Salvador Domingo Felipe Jacinto Dalí i Domènech
Francis Sheehy-Skeffington
Asma al-Assad
Maggie McIntosh
Vincent van Gogh
Anyway, just something to think about.
public static string CamelCase(this string s)
{
if (String.IsNullOrEmpty(s))
s = "";
string phrase = "";
string[] words = s.Split(' ');
foreach (string word in words)
{
if (word.Length > 1)
phrase += word.Substring(0, 1).ToUpper() + word.Substring(1).ToLower() + " ";
else
phrase += word.ToUpper() + " ";
}
return phrase.Trim();
}
I voted Marc's answer up, but this will also work:
string s = Microsoft.VisualBasic.Strings.StrConv("HECHT, WILLIAM", VbStrConv.ProperCase,0);
You'll need to add the appropriate references, but I'm pretty sure it works on all-upper inputs.
I had problems with the code above, so I modified it a bit and it worked. Greetings from Chile. Good paper.
private void label8_Click(object sender, EventArgs e)
{
nombre1.Text= NOMPROPIO(nombre1.Text);
}
string NOMPROPIO(string s)
{
if (String.IsNullOrEmpty(s))
s = "";
string phrase = "";
string[] words = s.Split(' ');
foreach (string word in words)
{
if (word.Length > 1)
phrase += word.Substring(0, 1).ToUpper() + word.Substring(1).ToLower() + " ";
else
phrase += word.ToUpper() + " ";
}
return phrase.Trim();
}