Consider the following example.
string s = "The man is old. Them is not bad.";
If I use
s = s.Replace("The", "##");
Then it returns "## man is old. ##m is not bad."
But I want the output to be "## man is old. Them is not bad."
How can I do this?
Here's how you'd use a regex, which would handle any word boundaries:
Regex r = new Regex(#"\bThe\b");
s = r.Replace(s, "##");
I made a comment above asking why the title was changed to assume Regex was to be used.
I personally try to not use Regex because it's slow. Regex is great for complex string patterns, but if string replacements are simple and you need some performance out of it, I'll try and find a way without using Regex.
Threw together a test. Running a million replacments with Regex and string methods.
Regex took 26.5 seconds to complete, string methods took 8 seconds to complete.
//Using Regex.
Regex r = new Regex(#"\b[Tt]he\b");
System.Diagnostics.Stopwatch stp = System.Diagnostics.Stopwatch.StartNew();
for (int i = 0; i < 1000000; i++)
{
string str = "The man is old. The is the Good. Them is the bad.";
str = r.Replace(str, "##");
}
stp.Stop();
Console.WriteLine(stp.Elapsed);
//Using String Methods.
stp = System.Diagnostics.Stopwatch.StartNew();
for (int i = 0; i < 1000000; i++)
{
string str = "The man is old. The is the Good. Them is the bad.";
//Remove the The if the stirng starts with The.
if (str.StartsWith("The "))
{
str = str.Remove(0, "The ".Length);
str = str.Insert(0, "## ");
}
//Remove references The and the. We can probably
//assume a sentence will not end in the.
str = str.Replace(" The ", " ## ");
str = str.Replace(" the ", " ## ");
}
stp.Stop();
Console.WriteLine(stp.Elapsed);
s = s.Replace("The ","## ");
C# console Application
static void Main(string[] args)
{
Console.Write("Please input your comment: ");
string str = Console.ReadLine();
string[] str2 = str.Split(' ');
replaceStringWithString(str2);
Console.ReadLine();
}
public static void replaceStringWithString(string[] word)
{
string[] strArry1 = new string[] { "good", "bad", "hate" };
string[] strArry2 = new string[] { "g**d", "b*d", "h**e" };
for (int j = 0; j < strArry1.Count(); j++)
{
for (int i = 0; i < word.Count(); i++)
{
if (word[i] == strArry1[j])
{
word[i] = strArry2[j];
}
Console.Write(word[i] + " ");
}
}
}
Related
I receive series of strings followed by non-negative numbers, e.g. "a3". I have to print on the console each string repeated N times (uppercase) where N is a number in the input. In the example, the result: "AAA". As you see, I have tried to get the numbers from the input and I think it's working fine. Can you help me with the repeating?
string input = Console.ReadLine();
//input = "aSd2&5s#1"
MatchCollection matched = Regex.Matches(input, #"\d+");
List<int> repeatsCount = new List<int>();
foreach (Match match in matched)
{
int repeatCount = int.Parse(match.Value);
repeatsCount.Add(repeatCount);
}
//repeatsCount: [2, 5, 1]
//expected output: ASDASD&&&&&S# ("aSd" is converted to "ASD" and repeated twice;
// "&" is repeated 5 times; "s#" is converted to "S#" and repeated once.)
For example, if we have "aSd2&5s#1":
"aSd" is converted to "ASD" and repeated twice; "&" is repeated 5 times; "s#" is converted to "S#" and repeated once.
Let the pattern include two groups: value to repeat and how many times to repeat:
#"(?<value>[^0-9]+)(?<times>[0-9]+)"
Then we can operate with these groups, say, with a help of Linq:
string source = "aSd2&5s#1";
string result = string.Concat(Regex
.Matches(source, #"(?<value>[^0-9]+)(?<times>[0-9]+)")
.OfType<Match>()
.SelectMany(match => Enumerable // for each match
.Repeat(match.Groups["value"].Value.ToUpper(), // repeat "value"
int.Parse(match.Groups["times"].Value)))); // "times" times
Console.Write(result);
Outcome:
ASDASD&&&&&S#
Edit: Same idea without Linq:
StringBuilder sb = new StringBuilder();
foreach (Match match in Regex.Matches(source, #"(?<value>[^0-9]+)(?<times>[0-9]+)")) {
string value = match.Groups["value"].Value.ToUpper();
int times = int.Parse(match.Groups["times"].Value);
for (int i = 0; i < times; ++i)
sb.Append(value);
}
string result = sb.ToString();
You can extract substring and how often it should be repeated with this regex:
(?<content>.+?)(?<count>\d+)
Now you can use a StringBuilder to create output string. Full code:
var input = "aSd2&5s#1";
var regex = new Regex("(?<content>.+?)(?<count>\\d+)");
var matches = regex.Matches(input).Cast<Match>();
var sb = new StringBuilder();
foreach (var match in matches)
{
var count = int.Parse(match.Groups["count"].Value);
for (var i = 0; i < count; ++i)
sb.Append(match.Groups["content"].Value.ToUpper());
}
Console.WriteLine(sb.ToString());
Output is
ASDASD&&&&&S#
Another solution without LINQ
i tried to keep the solution so it would be similar to yours
string input = "aSd2&5s#1";
var matched = Regex.Matches(input, #"\d+");
var builder = new StringBuilder();
foreach (Match match in matched)
{
string stingToDuplicate = input.Split(Char.Parse(match.Value))[0];
input = input.Replace(stingToDuplicate, String.Empty).Replace(match.Value, String.Empty);
for (int i = 0; i < Convert.ToInt32(match.Value); i++)
{
builder.Append(stingToDuplicate.ToUpper());
}
}
and finally Console.WriteLine(builder.ToString());
which result ASDASD&&&&&S#
My solution is same as others with slight differences :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication107
{
class Program
{
static void Main(string[] args)
{
string input = "aSd2&5s#1";
string pattern1 = #"[a-zA-z#&]+\d+";
MatchCollection matches = Regex.Matches(input, pattern1);
string output = "";
foreach(Match match in matches.Cast<Match>().ToList())
{
string pattern2 = #"(?'string'[^\d]+)(?'number'\d+)";
Match match2 = Regex.Match(match.Value, pattern2);
int number = int.Parse(match2.Groups["number"].Value);
string str = match2.Groups["string"].Value;
output += string.Join("",Enumerable.Repeat(str.ToUpper(), number));
}
Console.WriteLine(output);
Console.ReadLine();
}
}
}
Very simple program. No linq nothing, simple string and for loop.
string input = "aSd2&5s#1";
char[] inputArray = input.ToCharArray();
string output = "";
string ab = "";
foreach (char c in inputArray)
{
int x;
string y;
if(int.TryParse(c.ToString(), out x))
{
string sb = "";
ab = ab.ToUpper();
for(int i=0;i<b;i++)
{
sb += ab;
}
ab = "";
output += sb;
}
else
{
ab += c;
}
}
if(!string.IsNullOrEmpty(ab))
{
output += ab.ToUpper();
}
Console.WriteLine(output);
Hope it helps.
I have a program, in which you can input a string. But I want text between quotes " " to be removed.
Example:
in: Today is a very "nice" and hot day.
out: Today is a very "" and hot day.
Console.WriteLine("Enter text: ");
text = Console.ReadLine();
int letter;
string s = null;
string s2 = null;
for (s = 0; s < text.Length; letter++)
{
if (text[letter] != '"')
{
s = s + text[letter];
}
else if (text[letter] == '"')
{
s2 = s2 + letter;
letter++;
(text[letter] != '"')
{
s2 = s2 + letter;
letter++;
}
}
}
I don't know how to write the string without text between quotes to the console.
I am not allowed to use a complex method like regex.
This should do the trick. It checks every character in the string for quotes.
If it finds quotes then sets a quotesOpened flag as true, so it will ignore any subsequent character.
When it encounters another quotes, it sets the flag to false, so it will resume copying the characters.
Console.WriteLine("Enter text: ");
text = Console.ReadLine();
int letterIndex;
string s2 = "";
bool quotesOpened = false;
for (letterIndex= 0; letterIndex< text.Length; letterIndex++)
{
if (text[letterIndex] == '"')
{
quotesOpened = !quotesOpened;
s2 = s2 + text[letterIndex];
}
else
{
if (!quotesOpened)
s2 = s2 + text[letterIndex];
}
}
Hope this helps!
A take without regular expressions, which I like better, but okay:
string input = "abc\"def\"ghi";
string output = input;
int firstQuoteIndex = input.IndexOf("\"");
if (firstQuoteIndex >= 0)
{
int secondQuoteIndex = input.IndexOf("\"", firstQuoteIndex + 1);
if (secondQuoteIndex >= 0)
{
output = input.Substring(0, firstQuoteIndex + 1) + input.Substring(secondQuoteIndex);
}
}
Console.WriteLine(output);
What it does:
It searches for the first occurrence of "
Then it searches for the second occurrence of "
Then it takes the first part, including the first " and the second part, including the second "
You could improve this yourself by searching until the end of the string and replace all occurrences. You have to remember the new 'first index' you have to search on.
string text = #" Today is a very ""nice"" and hot day. Second sentense with ""text"" test";
Regex r = new Regex("\"([^\"]*)\"");
var a = r.Replace(text,string.Empty);
Please try this.
First we need to split string and then remove odd items:
private static String Remove(String s)
{
var rs = s.Split(new[] { '"' }).ToList();
return String.Join("\"\"", rs.Where(_ => rs.IndexOf(_) % 2 == 0));
}
static void Main(string[] args)
{
var test = Remove("hello\"world\"\"yeah\" test \"fhfh\"");
return;
}
This would be a possible solution:
String cmd = "This is a \"Test\".";
// This is a "".
String newCmd = cmd.Split('\"')[0] + "\"\"" + cmd.Split('\"')[2];
Console.WriteLine(newCmd);
Console.Read();
You simply split the text at " and then add both parts together and add the old ". Not a very nice solution, but it works anyway.
€dit:
cmd[0] = "This is a "
cmd[1] = "Test"
cmd[2] = "."
You can do it like this:
Console.WriteLine("Enter text: ");
var text = Console.ReadLine();
var skipping = false;
var result = string.Empty;
foreach (var c in text)
{
if (!skipping || c == '"') result += c;
if (c == '"') skipping = !skipping;
}
Console.WriteLine(result);
Console.ReadLine();
The result string is created by adding characters from the original string as long we are not between quotes (using the skipping variable).
Take all indexes of quotes remove the text between quotes using substring.
static void Main(string[] args)
{
string text = #" Today is a very ""nice"" and hot day. Second sentense with ""text"" test";
var foundIndexes = new List<int>();
foundIndexes.Add(0);
for (int i = 0; i < text.Length; i++)
{
if (text[i] == '"')
foundIndexes.Add(i);
}
string result = "";
for(int i =0; i<foundIndexes.Count; i+=2)
{
int length = 0;
if(i == foundIndexes.Count - 1)
{
length = text.Length - foundIndexes[i];
}
else
{
length = foundIndexes[i + 1] - foundIndexes[i]+1;
}
result += text.Substring(foundIndexes[i], length);
}
Console.WriteLine(result);
Console.ReadKey();
}
Output: Today is a very "" and hot day. Second sentense with "" test";
Here dotNetFiddle
i have following problem: I have text like this (more than 200 lines):
dsadsadsads(-123|12)sdakodskoakosdakodsadsayxvmyxcmxcym,§§¨§¨§(-43|23)sdadasdas
I want get numbers from text like this:
-123|12
-43|23
Numbers are always in ( ).
What is the fast way, how to get this number. Is possible use some regex? How?
Or brute force foor loop?
Thank you for your reply.
You can use Regex for this purpose.
string str = "dsadsadsads(-123|12)sdakodskoakosdakodsadsayxvmyxcmxcym,§§¨§¨§(-43|23)sdadasdas";
var matches = Regex.Matches(str, #"\(([-+]?\d+\|[-+]?\d+)\)");
foreach (var match in matches)
{
Console.WriteLine(match);
}
Performance test
string str = "dsadsadsads(-123|12)sdakodskoakosdakodsadsayxvmyxcmxcym,§§¨§¨§(-43|23)sdadasdas";
StringBuilder bigstr = new StringBuilder();
for (int i = 0; i < 1000; i++)
{
bigstr.Append(str + "\n");
}
str = bigstr.ToString();
Regex regex = new Regex(#"\(([-+]?\d+\|[-+]?\d+)\)");
Stopwatch w = Stopwatch.StartNew();
var matches = regex.Matches(str);
var count = matches.Count;
w.Stop();
Console.WriteLine(w.Elapsed);
Output in my console. about 0.001 seconds.
I have this snippet in my method:
MatchCollection words = Regex.Matches("dog cat fun toy", #"\w\w\w.\w?");
foreach (Match match in words)
{
Console.WriteLine(match);
}
I expected to see something like this:
dog c
cat f
fun t
But program came up with just that:
dog c
fun t
As I understood, it skipped second occurrence because part of it was in previous occurrence. But I still want to see it. How should I correct my snippet?
You may try some thing like that
var regX = new Regex(#"\w\w\w.\w?");
string pattern = "dog cat fun toy";
int i = 0;
while (i < pattern.Length)
{
var m = regX.Match(pattern, i);
if (!m.Success) break;
Console.WriteLine(m.Value);
i = m.Index + 1;
}
Even though it's not a universal solution, but pertinent to you case, the following snippet can do the job:
string _input = "dog cat fun toy";
string[] _arr = _input.Split(' ');
string _out = String.Empty;
for (int i = 0; i < _arr.Length-1; i++)
{
if (_arr[i].Length == 3) { _out+=_arr[i]+" "+_arr[i+1].Substring(0,1)+";";}
}
where string _out contains all matches separated by ";" (or any other char). Alternatively, you can send the output to Console:
string _input = "dog cat fun toy";
string[] _arr = _input.Split(' ');
for (int i = 0; i < _arr.Length-1; i++)
{
if (_arr[i].Length == 3) {Console.WriteLine(_arr[i]+" "+_arr[i+1].Substring(0,1));}
}
Hope this may help.
This question already has answers here:
How do I remove all non alphanumeric characters from a string except dash?
(13 answers)
Closed 9 years ago.
This is the code:
StringBuilder sb = new StringBuilder();
Regex rgx = new Regex("[^a-zA-Z0-9 -]");
var words = Regex.Split(textBox1.Text, #"(?=(?<=[^\s])\s+\w)");
for (int i = 0; i < words.Length; i++)
{
words[i] = rgx.Replace(words[i], "");
}
When im doing the Regex.Split() the words contain also strings with chars inside for exmaple:
Daniel>
or
Hello:
or
\r\nNew
or
hello---------------------------
And i need to get only the words without all the signs
So i tried to use this loop but i end that in words there are many places with ""
And some places with only ------------------------
And i cant use this as strings later in my code.
You don't need a regex to clear non-letters. This will remove all non-unicode letters.
public string RemoveNonUnicodeLetters(string input)
{
StringBuilder sb = new StringBuilder();
foreach(char c in input)
{
if(Char.IsLetter(c))
sb.Append(c);
}
return sb.ToString();
}
Alternatively, if you only want to allow Latin letters, you can use this
public string RemoveNonLatinLetters(string input)
{
StringBuilder sb = new StringBuilder();
foreach(char c in input)
{
if(c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')
sb.Append(c);
}
return sb.ToString();
}
Benchmark vs Regex
public static string RemoveNonUnicodeLetters(string input)
{
StringBuilder sb = new StringBuilder();
foreach (char c in input)
{
if (Char.IsLetter(c))
sb.Append(c);
}
return sb.ToString();
}
static readonly Regex nonUnicodeRx = new Regex("\\P{L}");
public static string RemoveNonUnicodeLetters2(string input)
{
return nonUnicodeRx.Replace(input, "");
}
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
StringBuilder sb = new StringBuilder();
//generate guids as input
for (int j = 0; j < 1000; j++)
{
sb.Append(Guid.NewGuid().ToString());
}
string input = sb.ToString();
sw.Start();
for (int i = 0; i < 1000; i++)
{
RemoveNonUnicodeLetters(input);
}
sw.Stop();
Console.WriteLine("SM: " + sw.ElapsedMilliseconds);
sw.Restart();
for (int i = 0; i < 1000; i++)
{
RemoveNonUnicodeLetters2(input);
}
sw.Stop();
Console.WriteLine("RX: " + sw.ElapsedMilliseconds);
}
Output (SM = String Manipulation, RX = Regex)
SM: 581
RX: 9882
SM: 545
RX: 9557
SM: 664
RX: 10196
keyboardP’s solution is decent – do consider it. But as I’ve argued in the comments, regular expressions are actually the correct tool for the job, you’re just making it unnecessarily complicated. The actual solution is a one-liner:
var result = Regex.Replace(input, "\\P{L}", "");
\P{…} specifies a Unicode character class we do not want to match (the opposite of \p{…}). L is the Unicode character class for letters.
Of course it makes sense to encapsulate this into a method, as keyboardP did. To avoid recompiling the regular expression over again, you should also consider pulling the regex creation out of the actual code (although this probably won’t give a big impact on performance):
static readonly Regex nonUnicodeRx = new Regex("\\P{L}");
public static string RemoveNonUnicodeLetters(string input) {
return nonUnicodeRx.Replace(input, "");
}
To help Konrad and keyboardP resolve their differences, I ran a benchmark test, using their code. It turns out that keyboardP's code is 10x faster than Konrad's code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string input = "asdf234!##*advfk234098awfdasdfq9823fna943";
DateTime start = DateTime.Now;
for (int i = 0; i < 100000; i++)
{
RemoveNonUnicodeLetters(input);
}
Console.WriteLine(DateTime.Now.Subtract(start).TotalSeconds);
start = DateTime.Now;
for (int i = 0; i < 100000; i++)
{
RemoveNonUnicodeLetters2(input);
}
Console.WriteLine(DateTime.Now.Subtract(start).TotalSeconds);
}
public static string RemoveNonUnicodeLetters(string input)
{
StringBuilder sb = new StringBuilder();
foreach (char c in input)
{
if (Char.IsLetter(c))
sb.Append(c);
}
return sb.ToString();
}
public static string RemoveNonUnicodeLetters2(string input)
{
var result = Regex.Replace(input, "\\P{L}", "");
return result;
}
}
}
I got
0.12
1.2
as output
UPDATE:
To see if it is the Regex compilation that is slowing down the Regex method, I put the regex in a static variable that is only constructed once.
static Regex rex = new Regex("\\P{L}");
public static string RemoveNonUnicodeLetters2(string input)
{
var result = rex.Replace(input,m => "");
return result;
}
But this had no effect on the runtime.