C# Removing all extra occurrences BEYOND the FIRST in string

C# Removing all extra occurrences BEYOND the FIRST in string - c#

So, I have some code that works the way I want it to, but I am wondering if there is a better way to do this with a regex? I have played with a few regex but with no luck(And I know I need to get better with regex stuff).
This code is purely designed to remove any extra spaces or non email valid characters. Then it goes through and removes extra # symbols beyond the first.
List<string> second_pass = new List<string>();
string final_pass = "";
if (email_input.Text.Length > 0)
{
string first_pass = Regex.Replace(email_input.Text, #"[^\w\.#-]", "");
if (first_pass.Contains("#"))
{
second_pass = first_pass.Split('#').Select(sValue => sValue.Trim()).ToList();
string third_pass = second_pass[0] + "#" + second_pass[1];
second_pass.Remove(second_pass[0]);
second_pass.Remove(second_pass[1]);
if (second_pass.Count > 0)
{
final_pass = third_pass + string.Join("", second_pass.ToArray());
}
}
email_output.Text = final_pass;
}

If you can get by by replacing only the captured groups, then this should be able to work.
([^\w\.\#\-])|(?<=\#).*?(\#)
Demo

Going by your description and not the code:
var final_pass = email_input.Text;
var atPos = final_pass.IndexOf('#');
if (atPos++ >= 0)
final_pass = final+pass.Substring(0, atPos) + Regex.Replace(final_pass.Substring(atPos), "[# ]", "");
For an (almost) pure regex solution, using a state cheap, this seems to be working:
var first = 0;
final_pass = Regex.Replace(final_pass, "(^.+?#)?([^ #]+?)?[# ]", m => (first++ == 0) ? m.Groups[1].Value+m.Groups[2].Value : m.Groups[2].Value);

Related

Going to the end of a substring in c#

The comment // go to end, I can't figure out how to cleanly end the substring :(
Is there a simpler way to go to the end of the substring rather than mathing out the number by myself? For more complex strings this would be too hard
string word = Console.ReadLine();
string[] lines = File.ReadAllLines(file);
using (var far = File.CreateText(resultfile))
{
foreach (string line in lines)
{
StringBuilder NewL = new StringBuilder();
int ind = line.IndexOf(word);
if (ind >= 0)
{
if (ind == 0)
{
NewL.Append(line.Substring(ind+ word.Length +1, // go to end);
}else{
NewL.Append(line.Substring(0, ind - 1));
NewL.Append(line.Substring(ind + word.Length + 1, // go to end));}
far.WriteLine(NewL);
}
else
{
far.WriteLine(line);
}
}
I don't know what more details the stackoverflow wants, anyone who can answer this pretty sure can clearly understand this simple code anyways.

You can use the String.Substring(int) overload, which automatically continues to the end of the source string:
NewL.Append(line.Substring(ind + word.Length + 1));
Retrieves a substring from this instance. The substring starts at a specified character position and continues to the end of the string.

It seems to me that you are just trying to remove a certain word from the loaded lines. If this is your task then you can simply replace the word with an empty string
foreach (string line in lines)
{
string newLine = line.Replace(word, "");
far.WriteLine(newLine);
}
Or even without an explicit loop with a bit of Linq
var result = lines.Select(x = x.Replace(word,""));
File.WriteAllLines("yourFile.txt", result);
Or, given the requirement to match an additional character after the word you can solve it with Regex.
Regex r = new Regex(word + ".");
var result = lines.Select(x => r.Replace(x, ""));
File.WriteAllLines("yourFile.txt", result);

c# get the first ';' after parentheses

i feel dumb for asking a most likely silly question.
I am helping someone getting the results he wishes for his custom compiler that reads all lines of an xml file in one string so it will look like below, and since he wants it to "Support" to call variables inside the array worst case scenario would look like below:
"Var1 = [5,4,3,2]; Var2 = [2,8,6,Var1;4];"
What i need is to find the first ";" after "[" and "]" and split it, so i stand with this:
"Var1 = [5,4,3,2];
It will also have to support multiple "[", "]" for example:
"Var2 = [5,Var1,[4],2];"
EDIT: There may also be Data in between the last "]" and ";"
For example:
"Var2 = [5,[4],2]Var1;
What can i do here? Im kind of stuck.

You can try regular expressions, e.g.
string source = "Var1 = [5,4,3,2]; Var2 = [2,8,6,Var1;4];";
// 1. final (or the only) chunk doesn't necessary contain '];':
// "abc" -> "abc"
// 2. chunk has at least one symbol except '];'
string pattern = ".+?(][a-zA-Z0-9]*;|$)";
var items = Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value)
.ToArray();
Console.Write(string.Join(Environment.NewLine, items));
Outcome:
Var1 = [5,4,3,2]abc123;
Var2 = [2,8,6,Var1;4];

^([^;]+);
This regex should work for all.
You can use it like here:
string[] lines =
{
"Var1 = [5,4,3,2]; Var2 = [2,8,6,Var1;4];",
"Var2 = [5,[4],2]Var1; Var2 = [2,8,6,Var1;4];"
};
Regex pattern = new Regex(#"^([^;]+);");
foreach (string s in lines){
Match match = pattern.Match(s);
if (match.Success)
{
Console.WriteLine(match.Value);
}
}
The explanation is:
^ means starts with and is [^;] anything but a semicolon
+ means repeated one or more times and is ; followed by a semicolon
This will find Var1 = [5,4,3,2]; as well as Var1 = [5,4,3,2];
You can see the output HERE

public static string Extract(string str, char splitOn)
{
var split = false;
var count = 0;
var bracketCount = 0;
foreach (char c in str)
{
count++;
if (split && c == splitOn)
return str.SubString(0, count);
if (c == '[')
{
bracketCount++;
split = false;
}
else if (c == ']')
{
bracketCount--;
if (bracketCount == 0)
{
split = true;
}
else if (bracketCount < 0)
throw new FormatException(); //?
}
}
return str;
}

Allow only one space between words in c#

I want to do a validation in windows form for allow only one space between the text values. How to do in c#. Thanks in advance.
I don't want to use any other method for this validation only in c#. Please help me to do this.
if (e.Handled = (e.KeyChar == (char)Keys.Space))
{
MessageBox.Show("Spaces are not allowed at start");
}
}

string str = "words with multiple spaces";
Regex regex = new Regex(#"[ ]{2,}", RegexOptions.None);
str = regex.Replace(str, #" "); // "words with multiple spaces"

get your string length and then test each character if it is a white space or not. if it has more than 1 white space, make your function fail.
String myString = "My String";
int myStringLength = myString.length;
int nrOfSpaces = 0;
for(i = 0; i <= myStringLength)
{
if(myString[i] == " ")
{
nrofspaces++;
i++;
}
}

regEx to wrap string case insensitive

I am a complete newbie when it comes to Regular Expressions, and was wondering if somebody could help me out. I'm not sure if using a regEx is the correct approach here, so please feel free to chime in if you have a better idea. (I will be looping thru many strings).
Basically, I'd like to find/replace on a string, wrapping the matches with {} and keeping the original case of the string.
Example:
Source: "The CAT sat on the mat."
Find/Replace: "cat"
Result: "The {CAT} sat on the mat."
I would like the find/replace to work on only the first occurance, and I also need to know whether the find/replace did indeed match or not.
I hope I've explained things clearly enough.
Thank you.

Regex theRegex =
new Regex("(" + Regex.Escape(FindReplace) + ")", RegexOptions.IgnoreCase);
theRegex.Replace(Source, "{$1}", 1);
If you want word boundary tolerance:
Regex theRegex =
(#"([\W_])(" + Regex.Escape(FindReplace) + #")([\W_])", RegexOptions.IgnoreCase)
theRegex.Replace(str, "$1{$2}$3", 1)

If you will be looping through many strings, then perhaps Regex might not be the best idea - it's a great tool, but not the fastest.
Here's a sample code that would also work:
var str = "The Cat ate a mouse";
var search = "cat";
var index = str.IndexOf(search, StringComparison.CurrentCultureIgnoreCase);
if (index == -1)
throw new Exception("String not found"); //or do something else in this case here
var newStr = str.Substring(0, index) + "{" + str.Substring(index, search.Length) + "}" + str.Substring(index + search.Length);
EDIT:
As noted in the comments, the above code has some issues.
So I decided to try and find a way to make it work without using Regex. Don't get me wrong, I love Regex as much as the next guy. I did this mostly out of curiosity. ;)
Here's what I came upon:
public static class StringExtendsionsMethods
{
public static int IndexOfUsingBoundary(this String s, String word)
{
var firstLetter = word[0].ToString();
StringBuilder sb = new StringBuilder();
bool previousWasLetterOrDigit = false;
int i = 0;
while (i < s.Length - word.Length + 1)
{
bool wordFound = false;
char c = s[i];
if (c.ToString().Equals(firstLetter, StringComparison.CurrentCultureIgnoreCase))
if (!previousWasLetterOrDigit)
if (s.Substring(i, word.Length).Equals(word, StringComparison.CurrentCultureIgnoreCase))
{
wordFound = true;
bool wholeWordFound = true;
if (s.Length > i + word.Length)
{
if (Char.IsLetterOrDigit(s[i + word.Length]))
wholeWordFound = false;
}
if (wholeWordFound)
return i;
sb.Append(word);
i += word.Length;
}
if (!wordFound)
{
previousWasLetterOrDigit = Char.IsLetterOrDigit(c);
sb.Append(c);
i++;
}
}
return -1;
}
}
But I can't take credit for this! I found this after some Googling here, on StackOverflow and then modified it. ;)
Use this method instead of the standard IndexOf in the above code.

Try this:
class Program
{
const string FindReplace = "cat";
static void Main(string[] args)
{
var input = "The CAT sat on the mat as a cat.";
var result = Regex
.Replace(
input,
"(?<=.*)" + FindReplace + "(?=.*)",
m =>
{
return "{" + m.Value.ToUpper() + "}";
},
RegexOptions.IgnoreCase);
Console.WriteLine(result);
}
}

Remove additional spacing in string [Fastest Way]

I need to remove all additional spaces in a string.
I use regex for matching strings and matched strings i replace with some others.
For better understanding please see examples below:
3 input strings:
Hello, how are you?
Hello , how are you?
Hello , how are you ?
This are 3 strings that should match by one pattern-regex.
It looks something like this:
Hello\s*,\s+how\s+are\s+you\s*?
It works fine but there is a perfomance problem.
If I have a lot of patterns (~20k) and try to execute each pattern it runs very slow (3-5 minutes).
Maybe there is better way for doing this?
for example use some 3d-party libs?
UPD: Folks, this question is not about how to do this. It's about how to do this with best perfomance. :)
Let me explain more detailed. The main goal is tokenize text. (replace some token with special symbols)
For example I have a token "nice try".
Then I input text "this is nice try".
result: "this is #tokenizedtext#" where #tokenizedtext# some special symbols. It doesen't matter in this case.
Next I have string "Mike said it was a nice try".
result should be "Mike said it was a #tokenizedtext#".
I think the main idea is clear.
So I can have a lot of tokens. When I process it I convert my token from "nice try" to pattern "nice\s+try". and try to replace with this pattern input text.
It works fine. But if in tokens there is more spaces and there is also punctuation then my regexes became bigger and works very slow.
Do you have some suggestions (technical or logic) for solving this problem?

I can suggest a few solutions.
First of all, avoid the static Regex method. Create an instance of it (and store it, don't call the constructor for each replacement!) and, if possible, use RegexOptions.Compiled. It should improve your performance.
Second, you can try to review your pattern. I'll do some profiling, but I'm currently undecisive between:
#"(?<=\s)\s+"
With replacement being an empty string or:
#"\s+"
With a space as a replacement. You can try this code, in the meanwhile:
var s = "Hello , how are you?";
var pattern = #"\s+";
var regex = new Regex(pattern, RegexOptions.Compiled);
var replaced = regex.Replace(s, " ");
EDIT: After having done some measurement, the second pattern seems to be faster. I'm editing my sample to adapt it.
EDIT 2: I've written an unsafe method. It's much faster than the other ones presented here, including the Regex ones, but, as the word itself says, it's unsafe. I don't think that there's any problem with the code I've written but I may be wrong -- So please, check it again and again in case there's a bug in the method.
static unsafe string TrimInternal(string input)
{
var length = input.Length;
var array = stackalloc char[length];
fixed (char* fix = input)
{
var ptr = fix;
var counter = 0;
var lastWasSpace = false;
while (*ptr != '\x0')
{
//Current char is a space?
var isSpace = *ptr == ' ';
//If it's a space but the last one wasn't
//Or if it's not a space
if (isSpace && !lastWasSpace || !isSpace)
//Write into the result array
array[counter++] = *ptr;
//The last character (before the next loop) was a space
lastWasSpace = isSpace;
//Increase the pointer
ptr++;
}
return new string(array, 0, counter);
}
}
Usage (compile with /unsafe):
var s = TrimInternal("Hello , how are you?");
Profiling made in Release build, optimizations on, 1000000 iterations:
My above solution with Regex: 00:00:03.2130121
The unsafe solution: 00:00:00.2063467

This might work for you. It should be pretty fast. Note that it also removes spaces at the end of the string; that might not be what you want...
using System;
namespace Demo
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(">{0}<", RemoveExtraSpaces("Hello, how are you?"));
Console.WriteLine(">{0}<", RemoveExtraSpaces("Hello , how are you?"));
Console.WriteLine(">{0}<", RemoveExtraSpaces("Hello , how are you ?"));
}
public static string RemoveExtraSpaces(string text)
{
var buffer = new char[text.Length];
bool isSpaced = false;
int n = 0;
foreach (char c in text)
{
if (c == ' ')
{
isSpaced = true;
}
else
{
if (isSpaced)
{
if ((c != ',') && (c != '?'))
{
buffer[n++] = ' ';
}
isSpaced = false;
}
buffer[n++] = c;
}
}
return new string(buffer, 0, n);
}
}
}

Something of my own :
find all the position of WhiteSpacechar in string;
private static IEnumerable<int> GetWhiteSpacePos(string input)
{
int iPos = -1;
while ((iPos = input.IndexOf(" ", iPos + 1, StringComparison.Ordinal)) > -1)
{
yield return iPos;
}
}
Remove all whitespace that are in in sequence Returned from GetWhiteSpacePos
string original_string = "Hello , how are you ?";
var poss = GetWhiteSpacePos(original_string).ToList();
int startPos;
int endPos;
StringBuilder builder = new StringBuilder(original_string);
for (int i = poss.Count -1; i > 1; i--)
{
endPos = poss[i];
while ((poss[i] == poss[i - 1] + 1) && i > 1)
{
i--;
}
startPos = poss[i];
if (endPos - startPos > 1)
{
builder.Remove(startPos, endPos - startPos);
}
}
string new_string = builder.ToString();

You are using a very complex regex..simplify the regex and that would definitely increasre the performance
Use \s+ and replace it with a single space

Well, these kind of problems really trouble us. Use this code, and I'm sure you're getting the result for what you've asked. This command removes any extra white space between any string.
cleanString= Regex.Replace(originalString, #"\s", " ");
Hope thar works for you. Thanks.
And since this is a single Instruction. It will utilize less CPU resource and hence less CPU time, which ultimately increases your performance. Therefore A/C to me this method works the best when compared in terms of performance.

if its just a matter of SPACE;
try this
Source : http://www.codeproject.com/Articles/10890/Fastest-C-Case-Insenstive-String-Replace
private static string ReplaceEx(string original,
string pattern, string replacement)
{
int count, position0, position1;
count = position0 = position1 = 0;
string upperString = original.ToUpper();
string upperPattern = pattern.ToUpper();
int inc = (original.Length / pattern.Length) *
(replacement.Length - pattern.Length);
char[] chars = new char[original.Length + Math.Max(0, inc)];
while ((position1 = upperString.IndexOf(upperPattern,
position0)) != -1)
{
for (int i = position0; i < position1; ++i)
chars[count++] = original[i];
for (int i = 0; i < replacement.Length; ++i)
chars[count++] = replacement[i];
position0 = position1 + pattern.Length;
}
if (position0 == 0) return original;
for (int i = position0; i < original.Length; ++i)
chars[count++] = original[i];
return new string(chars, 0, count);
}
Usage:
string original_string = "Hello , how are you ?";
while (original_string.Contains(" "))
{
original_string = ReplaceEx(original_string, " ", " ");
}
Replacing the regex way:
string resultString = null;
try {
resultString = Regex.Replace(subjectString, #"\s+", " ", RegexOption.Compiled);
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.