I have a large string separated by line breaks.
Example:
This is my first sentence and here i will search for the word my
This is my second sentence
Using the code below, if I search for 'my' it will only return the 2 instances of 'my' from the first sentence and not the second.
I wish to display the sentence the phrase is found in - which works fine but its just that it does not search anything after the first line break if found.
Code;
var regex = new Regex(string.Format("[^.!?;]*({0})[^.?!;]*[.?!;]", userSearchCriteraInHere, RegexOptions.Singleline));
var results = regex.Matches(largeStringInHere);
for (int i = 0; i < results.Count; i++)
{
searchCriteriaFound.Append((results[i].Value.Trim()));
searchCriteriaFound.Append(Environment.NewLine);
}
Code Edit:
string pattern = #".*(" + userSearchCriteraInHere + ")+.*";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(largeStringInHere, pattern, options))
{
searchCriteriaFound.Append(m.Value);
}
var userSearchCriteraInHere = "my";
var largeStringInHere = #"This is my first sentence and here i will search for the word my.
This is my second sentence.";
var regex = new Regex(string.Format("[^.!?;]*({0})[^.?!;]*[.?!;]", userSearchCriteraInHere), RegexOptions.Singleline);
var results = regex.Matches(largeStringInHere);
Console.WriteLine(results.Count);
var searchCriteriaFound = new StringBuilder();
for (int i = 0; i < results.Count; i++)
{
searchCriteriaFound.Append((results[i].Value.Trim()));
searchCriteriaFound.Append(Environment.NewLine);
}
Console.Write(searchCriteriaFound.ToString());
This returns the following output:
2
This is my first sentence and here i will search for the word my.
This is my second sentence.
I did need to add periods at the end of your sentences, as your regex expects them.
Is there a particular reason not to just search for the word "my" multiple times in the following way:
(my)+
You can test it over at the following URL on Regex101: https://regex101.com/r/QIHWKf/1
If you want to match the whole sentence that has "my" you can use the following:
.*(my)+.*
https://regex101.com/r/QIHWKf/2
Here your full match is the whole sentence, and your first group match is the "my".
Change
Regex(string.Format("[^.!?;]*({0})[^.?!;]*[.?!;]", userSearchCriteraInHere, RegexOptions.Singleline)
To
Regex(string.Format("[^.!?;]*({0})[^.?!;]*[.?!;]", userSearchCriteraInHere, RegexOptions.Multiline)
This changes the meaning of the symbols ^ and $ to be at the beginning/end of a line, rather than the entire string.
You could use a word boundary \b to prevent it from being part of a larger match like for example mystery and change the option to RegexOptions.Multiline instead of RegexOptions.Singleline to let ^ and $ match the end of the line.
^.*\bmy\b.*$
Regex demo
Test
To get all lines containing 'my' word, you can try this:
Code
static string GetSentencesContainMyWord(StreamReader file)
{
int counter = 0;
string line;
var sb = new StringBuilder();
while ((line = file.ReadLine()) != null)
{
if (line.Contains("my"))
sb.Append(line + Environment.NewLine);
counter++;
}
return sb.ToString();
}
Related
I have the following function:
public static string ReturnEmailAddresses(string input)
{
string regex1 = #"\[url=";
string regex2 = #"mailto:([^\?]*)";
string regex3 = #".*?";
string regex4 = #"\[\/url\]";
Regex r = new Regex(regex1 + regex2 + regex3 + regex4, RegexOptions.IgnoreCase | RegexOptions.Multiline);
MatchCollection m = r.Matches(input);
if (m.Count > 0)
{
StringBuilder sb = new StringBuilder();
int i = 0;
foreach (var match in m)
{
if (i > 0)
sb.Append(Environment.NewLine);
string shtml = match.ToString();
var innerString = shtml.Substring(shtml.IndexOf("]") + 1, shtml.IndexOf("[/url]") - shtml.IndexOf("]") - 1);
sb.Append(innerString); //just titles
i++;
}
return sb.ToString();
}
return string.Empty;
}
As you can see I define a url in the "markdown" format:
[url = http://sample.com]sample.com[/url]
In the same way, emails are written in that format too:
[url=mailto:service#paypal.com.au]service#paypal.com.au[/url]
However when i pass in a multiline string, with multiple email addresses, it only returns the first email only. I would like it to have multple matches, but I cannot seem to get that working?
For example
[url=mailto:service#paypal.com.au]service#paypal.com.au[/url] /r/n a whole bunch of text here /r/n more stuff here [url=mailto:anotheremail#paypal.com.au]anotheremail#paypal.com.au[/url]
This will only return the first email above?
The mailto:([^\?]*) part of your pattern is matching everything in your input string. You need to add the closing bracket ] to the inside of your excluded characters to restrict that portion from overflowing outside of the "mailto" section and into the text within the "url" tags:
\[url=mailto:([^\?\]]*).*?\[\/url\]
See this link for an example: https://regex101.com/r/zcgeW8/1
You can extract desired result with help of positive lookahead and positive lookbehind. See http://www.rexegg.com/regex-lookarounds.html
Try regex: (?<=\[url=mailto:).*?(?=\])
Above regex will capture two email addresses from sample string
[url=mailto:service#paypal.com.au]service#paypal.com.au[/url] /r/n a whole bunch of text here /r/n more stuff here [url=mailto:anotheremail#paypal.com.au]anotheremail#paypal.com.au[/url]
Result:
service#paypal.com.au
anotheremail#paypal.com.au
I have a string in my c#:
The.Big.Bang.Theory.(2013).S07E05.Release.mp4
I need to find an occurance of (2013), and replace the whole thing, including the brackets, with _ (Three underscores). So the output would be:
The.Big.Bang.Theory._.S07E05.Release.mp4
Is there a regex that can do this? Or is there a better method?
I then do some processing on the new string - but later, need to report that '(2013)' was removed .. so I need to store the value that is replaced.
Tried with your string. It works
string pattern = #"\(\d{4}\)";
string search = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var m = Regex.Replace(search, pattern, "___");
Console.WriteLine(m);
This will find any 4 digits number enclosed in open/close brakets.
If the year number can change, I think that Regex is the best approach .
Instead this code will tell you if there a match for your pattern
var k = Regex.Matches(search, pattern);
if(k.Count > 0)
Console.WriteLine(k[0].Value);
Many of these answers forgot the original question in that you wanted to know what you are replacing.
string pattern = #"\((19|20)\d{2}\)";
string search = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
string replaced = Regex.Match(search, pattern).Captures[0].ToString();
string output = Regex.Replace(search, pattern, "___");
Console.WriteLine("found: {0} output: {1}",replaced,output);
gives you the output
found: (2013) output: The.Big.Bang.Theory.___.S07E05.Release.mp4
Here is an explanation of my pattern too.
\( -- match the (
(19|20) -- match the numbers 19 or 20. I assume this is a date for TV shows or movies from 1900 to now.
\d{2} -- match 2 more digits
\) -- match )
Here is a working snippet from a console application, note the regex \(\d{4}\):
var r = new System.Text.RegularExpressions.Regex(#"\(\d{4}\)");
var s = r.Replace("The.Big.Bang.Theory.(2013).S07E05.Release.mp4", "___");
Console.WriteLine(s);
and the output from the console application:
The.Big.Bang.Theory.___.S07E05.Release.mp4
and you can reference this Rubular for proof.
Below is a modified solution taking into consideration your additional requirement:
var m = r.Match("The.Big.Bang.Theory.(2013).S07E05.Release.mp4");
if (m.Success)
{
var s = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4".Replace(m.Value, "___");
var valueReplaced = m.Value;
}
Try this:
string s = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var info = Regex.Split(
Regex.Matches(s, #"\(.*?\)")
.Cast<Match>().First().ToString(), #"[\s,]+");
s = s.Replace(info[0], "___");
Result
The.Big.Bang.Theory.___.S07E05.Release.mp4
try this :
string str="The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var matches = Regex.Matches(str, #"\([0-9]{4}\)");
List<string> removed=new List<string>();
if (matches.Count > 0)
{
for (int i = 0; i < matches.Count; i++)
{
List.add(matches.value);
}
}
str=Regex.replace(str,#"\([0-9]{4}\)","___");
System.out.println("Removed Strings are:")
foreach(string s in removed )
{
System.out.println(s);
}
output:
Removed Strings are:
(2013)
You don't need a regex for a simple replace (you can use one, but's it's not needed)
var name = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var replacedName = name.Replace("(2013)", "___");
How could I use a for loop to go through each iteration of a given phrase in a string? For instance, say I had the following string:
Hey, this is an example string. A string is a collection of characters.
And every time there was an "is", I wanted to assign the three characters after it to a new string. I understand how to do that ONCE, but I'm trying to figure out how a for loop could be used to go through multiple instances of the same word.
If you must use a for-loop for whatever reason, you can replace the relevant part of the code provided by ja72 with:
for (int i = 0; i < text.Length; i++)
{
if (text[i] == 'i' && text[i+1] == 's')
sb.Append(text.Substring(i + 2, 3));
}
Unfortunately, I don't have enough reputation to add this as a comment here, hence posting it as an answer!
Is this what you want?
static void Main(string[] args)
{
string text=#"Hey, this is an example string. A string is a collection of characters.";
StringBuilder sb=new StringBuilder();
int i=-1;
while ((i=text.IndexOf("is", i+1))>=0)
{
sb.Append(text.Substring(i+2, 3));
}
string result=sb.ToString();
}
//result " is an a "
You can use a regex like this:
Regex re = new Regex("(?:is)(.{3})");
This regex looks for is (?:is), and takes the next three characters (.{3})
Then you use the regex to find all matches: Regex.Matches(). This will return a match for each is found in the string, followed by 3 characters. Each match has two groups:
Group 0: that includes is and the next three characters
Group 1: which includes the next thress characters
Matches matches = re.Matches("Hey, this is an example string. A string is a collection of characters.");
StringBuilder sb = new StringBuilder();
foreach (Match m in matches)
{
sb.Append(m.Groups1.Value);
}
Using Regex is much faster than looping through the characters of the string. Even more if you use RegexOptions.Compiled in your regex constructor: Regex Constructor (String, RegexOptions)
I need to replace a word that starts with %.
For example Welcome to home | %brand %productName
hoping to split on words begining with % which would give me { brand, productName }.
My regex is less than average so would appreciate help with this.
Following code might help you :
string[] splits = "Welcome to home | %brand %productName".Split(' ');
List<string> lstdata = new List<string>();
for(i=0;i<splits.length;i++)
{
if(splits[i].StartsWith("%"))
lstdata.Add(splits[i].Replace('%',''));
}
Nothing wrong with string.split approach, mind you, but here's a regex approach:
string input = #"Welcome to home | %brand %productName";
string pattern = #"%\S+";
var matches = Regex.Matches(input, pattern);
string result = string.Empty;
for (int i = 0; i < matches.Count; i++)
{
result += "match " + i + ",value:" + matches[i].Value + "\n";
}
Console.WriteLine(result);
Try this:
(?<=%)\w+
This looks for any combination of word characters immediately preceded by a percent symbol.
Now, if you're doing search and replace on these matches, you'll probably want to remove the % sign as well, so you'd need to remove the lookbehind group and just have this:
%\w+
But in doing so, your replacement code would need to trim off the % sign from each match to get the word by itself.
I need a regex for PropertyName e.g. HelloWorld2HowAreYou would get:
Hello HelloWorld2 HelloWorld2How etc.
I want to use it in C#
[A-Z][a-z0-9]+ would give you all words that start with capital letter. You can write code to concat them one by one to get the complete set of words.
For example matching [A-Z][a-z0-9]+ against HelloWorld2HowAreYou with global flag set, you will get the following matches.
Hello
World2
How
Are
You
Just iterate through the matches and concat them to form the words.
Port this to C#
var s = "HelloWorld2HowAreYou";
var r = /[A-Z][a-z0-9]+/g;
var m;
var matches = [];
while((m = r.exec(s)) != null)
matches.push(m[0]);
var o = "";
for(var i = 0; i < matches.length; i++)
{
o += matches[i]
console.log(o + "\n");
}
I think something like this is what you want:
var s = "HelloWorld2HowAreYou";
Regex r = new Regex("(?=[A-Z]|$)(?<=(.+))");
foreach (Match m in r.Matches(s)) {
Console.WriteLine(m.Groups[1]);
}
The output is (as seen on ideone.com):
Hello
HelloWorld2
HelloWorld2How
HelloWorld2HowAre
HelloWorld2HowAreYou
How it works
The regex is based on two assertions:
(?=[A-Z]|$) matches positions just before an uppercase, and at the end of the string
(?<=(.+)) is a capturing lookbehind for .+ behind the current position into group 1
Essentially, the regex translates to:
"Everywhere just before an uppercase, or at the end of the string"...
"grab everything behind you if it's not an empty string"