Multiple occurrences of text in a string

Multiple occurrences of text in a string - c#

How could I use a for loop to go through each iteration of a given phrase in a string? For instance, say I had the following string:
Hey, this is an example string. A string is a collection of characters.
And every time there was an "is", I wanted to assign the three characters after it to a new string. I understand how to do that ONCE, but I'm trying to figure out how a for loop could be used to go through multiple instances of the same word.

If you must use a for-loop for whatever reason, you can replace the relevant part of the code provided by ja72 with:
for (int i = 0; i < text.Length; i++)
{
if (text[i] == 'i' && text[i+1] == 's')
sb.Append(text.Substring(i + 2, 3));
}
Unfortunately, I don't have enough reputation to add this as a comment here, hence posting it as an answer!

Is this what you want?
static void Main(string[] args)
{
string text=#"Hey, this is an example string. A string is a collection of characters.";
StringBuilder sb=new StringBuilder();
int i=-1;
while ((i=text.IndexOf("is", i+1))>=0)
{
sb.Append(text.Substring(i+2, 3));
}
string result=sb.ToString();
}
//result " is an a "

You can use a regex like this:
Regex re = new Regex("(?:is)(.{3})");
This regex looks for is (?:is), and takes the next three characters (.{3})
Then you use the regex to find all matches: Regex.Matches(). This will return a match for each is found in the string, followed by 3 characters. Each match has two groups:
Group 0: that includes is and the next three characters
Group 1: which includes the next thress characters
Matches matches = re.Matches("Hey, this is an example string. A string is a collection of characters.");
StringBuilder sb = new StringBuilder();
foreach (Match m in matches)
{
sb.Append(m.Groups1.Value);
}
Using Regex is much faster than looping through the characters of the string. Even more if you use RegexOptions.Compiled in your regex constructor: Regex Constructor (String, RegexOptions)

Related

Find pattern to solve regex in one step

I have a problem to find the pattern that solves the problem in onestep.
The string looks like this:
Text1
Text1$Text2$Text3
Text1$Text2$Text3$Text4$Text5$Text6 etc.
What i want to get is: Take up to 4x Text. If there are more than "4xText" take only the last sign.
Example:
Text1$Text2$Text3$Text4$Text5$Text6 -> Text1$Text2$Text3$Text4&56
My current solution is:
First pattern:
^([^\$]*)\$?([^\$]*)\$?([^\$]*)\$?([^\$]*)\$?
After this i will do a substitution with the first pattern
New string: Text5$Text6
second pattern is:
([^\$])\b
result: 56
combine both and get the result:
Text1$Text2$Text3$Text4$56
For me it is not clear why i cant easily put the second pattern after the first pattern into one pattern. Is there something like an anchor that tells the engine to start the pattern from here like it would do if is would be the only pattern ?

You might use an alternation with a positive lookbehind and then concatenate the matches.
(?<=^(?:[^$]+\$){0,3})[^$]+\$?|[^$](?=\$|$)
Explanation
(?<= Positive lookbehind, assert what is on the left is
^(?:[^$]+\$){0,3} Match 0-3 times any char except $ followed by an optional $
) Close lookbehind
[^$]+\$? Match 1+ times any char except $, then match an optional $
| Or
[^$] Match any char except $
(?=\$|$) Positive lookahead, assert what is directly to the right is either $ or the end of the string
.NET regex demo | C# demo
Example
string pattern = #"(?<=^(?:[^$]*\$){0,3})[^$]*\$?|[^$](?=\$|$)";
string[] strings = {
"Text1",
"Text1$Text2$Text3",
"Text1$Text2$Text3$Text4$Text5$Text6"
};
Regex regex = new Regex(pattern);
foreach (String s in strings) {
Console.WriteLine(string.Join("", from Match match in regex.Matches(s) select match.Value));
}
Output
Text1
Text1$Text2$Text3
Text1$Text2$Text3$Text4$56

I strongly believe regular expression isn't the way to do that. Mostly because of the readability.
You may consider using simple algorithm like this one to reach your goal:
using System;
public class Program
{
public static void Main()
{
var input = "Text1$Text2$Text3$Text4$Text5$Text6";
var parts = input.Split('$');
var result = "";
for(var i=0; i<parts.Length; i++){
result += (i <= 4 ? parts[i] + "$" : parts[i].Substring(4));
}
Console.WriteLine(result);
}
}
There are also linq alternatives :
using System;
using System.Linq;
public class Program
{
public static void Main()
{
var input = "Text1$Text2$Text3$Text4$Text5$Text6";
var parts = input.Split('$');
var first4 = parts.Take(4);
var remainings = parts.Skip(4);
var result2 = string.Join("$", first4) + "$" + string.Join("", remainings.Select( r=>r.Substring(4)));
Console.WriteLine(result2);
}
}
It has to be adjusted to the actual needs but the idea is there

Try this code:
var texts = new string[] {"Text1", "Text1$Text2$Text3", "Text1$Text2$Text3$Text4$Text5$Text6" };
var parsed = texts
.Select(s => Regex.Replace(s,
#"(Text\d{1,3}(?:\$Text\d{1,3}){0,3})((?:\$Text\d{1,3})*)",
(match) => match.Groups[1].Value +"$"+ match.Groups[2].Value.Replace("Text", "").Replace("$", "")
)).ToArray();
// parsed is now: string[3] { "Text1$", "Text1$Text2$Text3$", "Text1$Text2$Text3$Text4$56" }
Explanation:
solution uses regex pattern: (Text\d{1,3}(?:\$Text\d{1,3}){0,3})((?:\$Text\d{1,3})*)
(...) - first capturing group
(?:...) - non-capturing group
Text\d{1,3}(?:\$Text\d{1,3} - match Text literally, then match \d{1,3}, which is 1 up to three digits, \$ matches $ literally
Rest is just repetition of it. Basically, first group captures first four pieces, second group captures the rest, if any.
We also use MatchEvaluator here which is delegate type defined as:
public delegate string MatchEvaluator(Match match);
We define such method:
(match) => match.Groups[1].Value +"$"+ match.Groups[2].Value.Replace("Text", "").Replace("$", "")
We use it to evaluate match, so takee first capturing group and concatenate with second, removing unnecessary text.

It's not clear to me whether your goal can be achieved using exclusively regex. If nothing else, the fact that you want to introduce a new character '&' into the output adds to the challenge, since just plain matching would never be able to accomplish that. Possibly using the Replace() method? I'm not sure that would work though...using only a replacement pattern and not a MatchEvaluator, I don't see a way to recognize but still exclude the "$Text" portion from the fifth instance and later.
But, if you are willing to mix regex with a small amount of post-processing, you can definitely do it:
static readonly Regex regex1 = new Regex(#"(Text\d(?:\$Text\d){0,3})(?:\$Text(\d))*", RegexOptions.Compiled);
static void Main(string[] args)
{
for (int i = 1; i <= 6; i++)
{
string text = string.Join("$", Enumerable.Range(1, i).Select(j => $"Text{j}"));
WriteLine(KeepFour(text));
}
}
private static string KeepFour(string text)
{
Match match = regex1.Match(text);
if (!match.Success)
{
return "[NO MATCH]";
}
StringBuilder result = new StringBuilder();
result.Append(match.Groups[1].Value);
if (match.Groups[2].Captures.Count > 0)
{
result.Append("&");
// Have to iterate (join), because we don't want the whole match,
// just the captured text.
result.Append(JoinCaptures(match.Groups[2]));
}
return result.ToString();
}
private static string JoinCaptures(Group group)
{
return string.Join("", group.Captures.Cast<Capture>().Select(c => c.Value));
}
The above breaks your requirement into three different capture groups in a regex. Then it extracts the captured text, composing the result based on the results.

My regex is not matching if a line break is found

I have a large string separated by line breaks.
Example:
This is my first sentence and here i will search for the word my
This is my second sentence
Using the code below, if I search for 'my' it will only return the 2 instances of 'my' from the first sentence and not the second.
I wish to display the sentence the phrase is found in - which works fine but its just that it does not search anything after the first line break if found.
Code;
var regex = new Regex(string.Format("[^.!?;]*({0})[^.?!;]*[.?!;]", userSearchCriteraInHere, RegexOptions.Singleline));
var results = regex.Matches(largeStringInHere);
for (int i = 0; i < results.Count; i++)
{
searchCriteriaFound.Append((results[i].Value.Trim()));
searchCriteriaFound.Append(Environment.NewLine);
}
Code Edit:
string pattern = #".*(" + userSearchCriteraInHere + ")+.*";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(largeStringInHere, pattern, options))
{
searchCriteriaFound.Append(m.Value);
}

var userSearchCriteraInHere = "my";
var largeStringInHere = #"This is my first sentence and here i will search for the word my.
This is my second sentence.";
var regex = new Regex(string.Format("[^.!?;]*({0})[^.?!;]*[.?!;]", userSearchCriteraInHere), RegexOptions.Singleline);
var results = regex.Matches(largeStringInHere);
Console.WriteLine(results.Count);
var searchCriteriaFound = new StringBuilder();
for (int i = 0; i < results.Count; i++)
{
searchCriteriaFound.Append((results[i].Value.Trim()));
searchCriteriaFound.Append(Environment.NewLine);
}
Console.Write(searchCriteriaFound.ToString());
This returns the following output:
2
This is my first sentence and here i will search for the word my.
This is my second sentence.
I did need to add periods at the end of your sentences, as your regex expects them.

Is there a particular reason not to just search for the word "my" multiple times in the following way:
(my)+
You can test it over at the following URL on Regex101: https://regex101.com/r/QIHWKf/1
If you want to match the whole sentence that has "my" you can use the following:
.*(my)+.*
https://regex101.com/r/QIHWKf/2
Here your full match is the whole sentence, and your first group match is the "my".

Change
Regex(string.Format("[^.!?;]*({0})[^.?!;]*[.?!;]", userSearchCriteraInHere, RegexOptions.Singleline)
To
Regex(string.Format("[^.!?;]*({0})[^.?!;]*[.?!;]", userSearchCriteraInHere, RegexOptions.Multiline)
This changes the meaning of the symbols ^ and $ to be at the beginning/end of a line, rather than the entire string.

You could use a word boundary \b to prevent it from being part of a larger match like for example mystery and change the option to RegexOptions.Multiline instead of RegexOptions.Singleline to let ^ and $ match the end of the line.
^.*\bmy\b.*$
Regex demo
Test

To get all lines containing 'my' word, you can try this:
Code
static string GetSentencesContainMyWord(StreamReader file)
{
int counter = 0;
string line;
var sb = new StringBuilder();
while ((line = file.ReadLine()) != null)
{
if (line.Contains("my"))
sb.Append(line + Environment.NewLine);
counter++;
}
return sb.ToString();
}

.NET Regex - get parts of string that do not match pattern

I have this string
TEST_TEXT_ONE_20112017
I want to eliminate _20112017, which is a underscore with numbers, those numbers can vary; my goal is to have only
TEST_TEXT_ONE
So far I have this but I get the entire string, is there something I'm missing?
Regex r = new Regex(#"\b\w+[0-9]+\b");
MatchCollection words = r.Matches("TEST_TEXT_ONE_20112017");
foreach(Match word in words)
{
string w = word.Groups[0].Value;
//I still get the entire string
}

Notes for your consideration:
You should use parenthesis to mark groups for capture -or- use named group. The first group (index=0) is the entire match. you probably want index=1 instead.
\w stands for word character and it already includes both underscore and digits. If you want to match anything before the numbers then you should consider using . instead of \w.
by default +is greedy and your \w+ will consume your last undescore and all but the very last number as well. You probably want to explicitly require an underscore before last block of numbers.
I would suggest considering if you want to find a matching substring or the entire string to match. if the latter, then consider using the start and end markers: ^ and $.
if you know you want to eliminate 8 digits, then you could giving explicit count like \d{8}
For example this should work:
Regex r = new Regex(#"^(.+)_\d+$");
MatchCollection words = r.Matches("TEST_TEXT_ONE_20112017");
foreach (Match word in words)
{
string w = word.Groups[1].Value;
}
Alternative
Use a Zero-Width Positive Lookahead Assertions construct to check what comes next without capturing it. This uses the syntax on (?=stuff). So you could use a shorter code and avoid surfing in Groups altogether:
Regex r = new Regex(#"^.+(?=_\d+$)");
String result = r.Match("TEST_TEXT_ONE_20112017").Value;
Note that we require the end marker $ within the positive lookahead group.

Regex r = new Regex(#"(\b.+)_([0-9]+)\b");
String w = r.Match("TEST_TEXT_ONE_20112017").Groups[1].Value; //TEST_TEXT_ONE
or:
String w = r.Match("TEST_TEXT_ONE_20112017").Groups[2].Value; //20112017

This seems a bit overkill for Regex in my opinion. As an alternative you could just split on the _ character and rebuild the string:
private static string RemoveDate(string input)
{
string[] parts = input.Split('_');
return string.Join("_", parts.Take(parts.Length - 1));
}
Or if the date suffix is always the same length, you could also just substring:
private static string RemoveDateFixedLength(string input)
{
//Removes last 9 characters (8 for date, 1 for underscore)
return input.Substring(0, input.Length - 9);
}
However I feel like the first approach is better, this is just another option.
Fiddle here

match first digits before # symbol

How to match all first digits before # in this line
26909578#Sbrntrl_7x06-lilla.avi#356028416#2012-10-24 09:06#0#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#[URL=http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html]http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html[/URL]#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#http://bitshare.com/?f=dvk9o1oz#http://bitshare.com/delete/dvk9o1oz/4511e6f3612961f961a761adcb7e40a0/Sbrntrl_7x06-lilla.avi.html
Im trying to get this number 26909578
My try
string text = #"26909578#Sbrntrl_7x06-lilla.avi#356028416#2012-10-24 09:06#0#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#[URL=http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html]http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html[/URL]#http://bitshare.com/files/dvk9o1oz/Sbrntrl_7x06-lilla.avi.html#http://bitshare.com/?f=dvk9o1oz#http://bitshare.com/delete/dvk9o1oz/4511e6f3612961f961a761adcb7e40a0/Sbrntrl_7x06-lilla.avi.html";
MatchCollection m1 = Regex.Matches(text, #"(.+?)#", RegexOptions.Singleline);
but then its outputs all text

Make it explicit that it has to start at the beginning of the string:
#"^(.+?)#"
Alternatively, if you know that this will always be a number, restrict the possible characters to digits:
#"^\d+"
Alternatively use the function Match instead of Matches. Matches explicitly says, "give me all the matches", while Match will only return the first one.

Or, in a trivial case like this, you might also consider a non-RegEx approach. The IndexOf() method will locate the '#' and you could easily strip off what came before.
I even wrote a sscanf() replacement for C#, which you can see in my article A sscanf() Replacement for .NET.

If you dont want to/dont like to use regex, use a string builder and just loop until you hit the #.
so like this
StringBuilder sb = new StringBuilder();
string yourdata = "yourdata";
int i = 0;
while(yourdata[i]!='#')
{
sb.Append(yourdata[i]);
i++;
}
//when you get to that # your stringbuilder will have the number you want in it so return it with .toString();
string answer = sb.toString();

The entire string (except the final url) is composed of segments that can be matched by (.+?)#, so you will get several matches. Retrieve only the first match from the collection returned by matching .+?(?=#)

Search string pattern

If I have a string like MCCORMIC 3H R Final 08-26-2011.dwg or even MCCORMIC SMITH 2N L Final 08-26-2011.dwg and I wanted to capture the R in the first string or the L in the second string in a variable, what is the best method for doing so? I was thinking about trying the below statement but it does not work.
string filename = "MCCORMIC 3H R Final 08-26-2011.dwg"
string WhichArea = "";
int WhichIndex = 0;
WhichIndex = filename.IndexOf("Final");
WhichArea = filename.Substring(WhichIndex - 1,1); //Trying to get the R in front of word Final

Just split by space:
var parts = filename.Split(new [] {' '},
StringSplitOptions.RemoveEmptyEntries);
WhichArea = parts[parts.Length - 3];
It looks like the file names have a very specific format, so this will work just fine.
Even with any number of spaces, using StringSplitOptions.RemoveEmptyEntries means spaces will not be part of the split result set.
Code updated to deal with both examples - thanks Nikola.

I had to do something similar, but with Mirostation drawings instead of Autocad. I used regex in my case. Here's what I did, just in case you feel like making it more complex.
string filename = "MCCORMIC 3H R Final 08-26-2011.dwg";
string filename2 = "MCCORMIC SMITH 2N L Final 08-26-2011.dwg";
Console.WriteLine(TheMatch(filename));
Console.WriteLine(TheMatch(filename2));
public string TheMatch(string filename) {
Regex reg = new Regex(#"[A-Za-z0-9]*\s*([A-Z])\s*Final .*\.dwg");
Match match = reg.Match(filename);
if(match.Success) {
return match.Groups[1].Value;
}
return String.Empty;
}

I don't think Oded's answer covers all cases. The first example has two words before the wanted letter, and the second one has three words before it.
My opinion is that the best way to get this letter is by using RegEx, assuming that the word Final always comes after the letter itself, separated by any number of spaces.
Here's the RegEx code:
using System.Text.RegularExpressions;
private string GetLetter(string fileName)
{
string pattern = "\S(?=\s*?Final)";
Match match = Regex.Match(fileName, pattern);
return match.Value;
}
And here's the explanation of RegEx pattern:
\S(?=\s*?Final)
\S // Anything other than whitespace
(?=\s*?Final) // Positive look-ahead
\s*? // Whitespace, unlimited number of repetitions, as few as possible.
Final // Exact text.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Multiple occurrences of text in a string - c#

Related

Find pattern to solve regex in one step

My regex is not matching if a line break is found

.NET Regex - get parts of string that do not match pattern

match first digits before # symbol

Search string pattern

Categories

Resources