Extract word out of string using regex

Extract word out of string using regex - c#

I want to extract certain word out of a string using regex.
I got this code now and it works perfectly when i search for *
public static string Tagify(string value, string search, string htmlTag, bool clear = false)
{
Regex regex = new Regex(#"\" + search + "([^)]*)\\" + search);
var v = regex.Match(value);
if (v.Groups[1].ToString() == "" || v.Groups[1].ToString() == value || clear == true)
{
return value.Replace(search, "");
}
return value.Replace(v.Groups[0].ToString(), "<" + htmlTag + ">" + v.Groups[1].ToString() + "</" + htmlTag + ">");
}
But now I need to search for **, but unfortunately this does not work
How can I achieve this?

I think the simplest solution is to use lazy dot matching in a capturing group.
Replace
Regex regex = new Regex(#"\" + search + "([^)]*)\\" + search);
with
Regex regex = new Regex(string.Format("{0}(.*?){0}", Regex.Escape(search)));
Or in C#6.0
Regex regex = new Regex($"{Regex.Escape(search)}(.*?){Regex.Escape(search)}");
Regex.Escape will escape any special chars for you, no need to manually append \ symbols.

Related

Regex Contains Multiple words

I am trying to search in titles matching entire search terms.
My example is something like below
string exampleTitle = "apple orange banana";
string term1 = "app bana";
string term2 = "bana app";
string pattern1 = #term1.Replace(" ", "*.*") + "*"; //output:app*.*bana*
string pattern2 = #term2.Replace(" ", "*.*") + "*"; //output:bana*.*app*
//now test
bool isMatch1 = Regex.IsMatch(exampleTitle , pattern1) // true
//now test
bool isMatch2 = Regex.IsMatch(exampleTitle , pattern2) // false
Thus pattern2 not match because banana comes after apple. However I need to true when matching all of words in search term without any order.

Regular expressions can be tricky here. Use this approach instead:
String exampleTitle = "apple orange banana";
String terms = "app bana";
Boolean found = true;
// let's clean things up for malformed input with RemoveEmptyEntries
foreach (String term in terms.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries))
found &= exampleTitle.Contains(term);
Using LINQ instead:
// let's clean things up for malformed input with RemoveEmptyEntries
String[] terms = terms_list.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries);
Boolean found = terms.All(term => exampleTitle.Contains(term));

You can use the regular expression (?=.*app)(?=.*bana) instead:
string pattern1 = "(?=.*"+term1.Replace(" ", ")(?=.*") + ")"; //output:(?=.*app)(?=.*bana)
string pattern2 = "(?=.*" + term2.Replace(" ", ")(?=.*") + ")"; //output:(?=.*app)(?=.*bana)
You can limit backtracking and forward search with this:
string pattern1 = "(?=(?>.*?"+term1.Replace(" ", "))(?=(?>.*?") + "))"; //output:(?=(?>.*?app))(?=(?>.*?bana))
string pattern2 = "(?=(?>.*?" + term2.Replace(" ", "))(?=(?>.*?") + "))"; //output:(?=(?>.*?app))(?=(?>.*?bana))

I need to true when matching all of words in search term without any order
This could be more clearly expressed as:
bool isMatch = Regex.IsMatch(exampleTitle, ".*app.*") && Regex.IsMatch(exampleTitle, ".*bana.*);
As noted in the other answer, there are non-regex ways to do substring matching that may be more appropriate.

C# String/StringBuilder MemoryException on Large set of Replaces

I am trying to figure out a better way to manipulate a large string and using both string and string builder I am unable to.
What I have below is a function that takes in a string and we search that string with regex to find any links. Any occurances of links I want to wrap them in a valid link text. My issue is , I have a database entry (string) with 101 link values present that need to be replaced and I am getting memory issues.
Is there a better way around this solution. I have included it with both string.replace and stringbuilder.replace and neither work
var resultString = new StringBuilder(testb);
Regex regx = new Regex(#"((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[.\!\/\\w]*))?)", RegexOptions.IgnoreCase);
MatchCollection mactches = regx.Matches(txt);
foreach (Match match in mactches)
{
if(match.Value.StartsWith("http://") || match.Value.StartsWith("https://"))
fixedurl = match.Value;
else
fixedurl = "http://" + match.Value;
resultString.Replace(match.Value, "<a target='_blank' class='ts-link ui-state-default' href='" + fixedurl + "'>" + match.Value + "</a>");
//testb = testb.Replace(match.Value, "<a target='_blank' class='ts-link ui-state-default' href='" + fixedurl + "'>" + match.Value + "</a>");
}

You can try the following. It may perform better in your specific case.
Regex regx = new Regex(#"((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[.\!\/\\w]*))?)", RegexOptions.IgnoreCase);
string resultString = regx.Replace(txt, (match) =>
{
string fixedurl = (match.Value.StartsWith("http://") || match.Value.StartsWith("https://"))
? match.Value
: "http://" + match.Value;
return "<a target='_blank' class='ts-link ui-state-default' href='" + fixedurl + "'>" + match.Value + "</a>";
});
EDITED:
BTW, the issue with your code seems to be the resultString.Replace call, since it replaces all the occurrences of the string it's probably causing the code to enter into an infinite loop of replacing the same strings over and over again until it hits an OutOfMemoryException.

Get Regex occurance with escaped symbols

I would appreciate help with non-working regex (does not work for special symbols % or $)
public System.Tuple<string, string> GetParts(string str, string beginMark, string endMark)
{
var pattern =
new Regex(beginMark + #"(?<val>.*?)" + endMark,
RegexOptions.Compiled |
RegexOptions.Singleline);
return (from Match match in pattern.Matches(str)
where match.Success
select new Tuple(
match.Value,
match.Groups["val"].Value))
.ToList();
}
Calling method:
string input = #"%sometext%\another text";
string replacedValue = "AAA";
var occurrences = GetPart(input, #"(%", ")");
foreach (var occurrence in occurrences)
{
Console.WriteLine(occurrence.Item1 + Environment.NewLine);
Console.WriteLine(occurrence.Item2 + Environment.NewLine);
// replace
onsole.WriteLine(input.Replace(occurrence.Item1, replacedValue) + Environment.NewLine);
}
Expected Output:
%sometext%
sometext
AAA\another text

You need to escape your symbols. Try to change
new Regex(beginMark + #"(?<val>.*?)" + endMark,
to
new Regex(Regex.Escape(beginMark) + #"(?<val>.*?)" + Regex.Escape(endMark),

Regex Replace on a JSON structure

I am currently trying to do a Regex Replace on a JSON string that looks like:
String input = "{\"`####`Answer_Options11\": \"monkey22\",\"`####`Answer_Options\": \"monkey\",\"Answer_Options2\": \"not a monkey\"}";
a
The goal is to find and replace all the value fields who's key field starts with `####`
I currently have this:
static Regex _FieldRegex = new Regex(#"`####`\w+" + ".:.\"(.*)\",");
static public string MatchKey(string input)
{
MatchCollection match = _encryptedFieldRegex.Matches(input.ToLower());
string match2 = "";
foreach (Match k in match )
{
foreach (Capture cap in k.Captures)
{
Console.WriteLine("" + cap.Value);
match2 = Regex.Replace(input.ToLower(), cap.Value.ToString(), #"CAKE");
}
}
return match2.ToString();
}
Now this isn't working. Naturally I guess since it picks up the entire `####`Answer_Options11\": \"monkey22\",\"`####`Answer_Options\": \"monkey\", as a match and replaces it. I want to just replace the match.Group[1] like you would for a single match on the string.
At the end of the day the JSON string needs to look something like this:
String input = "{\"`####`Answer_Options11\": \"CATS AND CAKE\",\"`####`Answer_Options\": \"CAKE WAS A LIE\",\"Answer_Options2\": \"not a monkey\"}";
Any idea how to do this?

you want a positive lookahead and a positive lookbehind :
(?<=####.+?:).*?(?=,)
the lookaheads and lookbehinds will verify that it matches those patterns, but not include them in the match. This site explains the concept pretty well.
Generated code from RegexHero.com :
string strRegex = #"(?<=####.+?:).*?(?=,)";
Regex myRegex = new Regex(strRegex);
string strTargetString = #" ""{\""`####`Answer_Options11\"": \""monkey22\"",\""`####`Answer_Options\"": \""monkey\"",\""Answer_Options2\"": \""not a monkey\""}""";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}
this will match "monkey22" and "monkey" but not "not a monkey"

Working from #Jonesy's answer I got to this which works for what I wanted. It includes the .Replace on the groups that I required. The negative look ahead and behinds were very interesting but I needed to replace some of those values hence groups.
static public string MatchKey(string input)
{
string strRegex = #"(__u__)(.+?:\s*)""(.*)""(,|})*";
Regex myRegex = new Regex(strRegex, RegexOptions.IgnoreCase | RegexOptions.Multiline);
IQS_Encryption.Encryption enc = new Encryption();
int count = 1;
string addedJson = "";
int matchCount = 0;
foreach (Match myMatch in myRegex.Matches(input))
{
if (myMatch.Success)
{
//Console.WriteLine("REGEX MYMATCH: " + myMatch.Value);
input = input.Replace(myMatch.Value, "__e__" + myMatch.Groups[2].Value + "\"c" + count + "\"" + myMatch.Groups[4].Value);
addedJson += "c"+count + "{" +enc.EncryptString(myMatch.Groups[3].Value, Encoding.UTF8.GetBytes("12345678912365478912365478965412"))+"},";
}
count++;
matchCount++;
}
Console.WriteLine("MAC" + matchCount);
return input + addedJson;
}`
Thanks again to #Jonesy for the huge help.

Extract some numbers and decimals from a string

I have a string:
" a.1.2.3 #4567 "
and I want to reduce that to just "1.2.3".
Currently using Substring() and Remove(), but that breaks if there ends up being more numbers after the pound sign.
What's the best way to go about doing this? I've read a bunch of questions on regex & string.split, but I can't get anything I try to work in VB.net. Would I have to do a match then replace using the match result?
Any help would be much appreciated.

This should work:
string input = " a.1.2.3 #4567 ";
int poundIndex = input.IndexOf("#");
if(poundIndex >= 0)
{
string relevantPart = input.Substring(0, poundIndex).Trim();
IEnumerable<Char> numPart = relevantPart.SkipWhile(c => !Char.IsDigit(c));
string result = new string(numPart.ToArray());
}
Demo

Try this...
String[] splited = split("#");
String output = splited[0].subString(2); // 1 is the index of the "." after "a" considering there are no blank spaces before it..

Here is regex way of doing it
string input = " a.1.2.3 #4567 ";
Regex regex = new Regex(#"(\d\.)+\d");
var match = regex.Match(input);
if(match.Success)
{
string output = match.Groups[0].Value;//"1.2.3"
//Or
string output = match.Value;//"1.2.3"
}

If the pound sign is the most relevant bit, rely on Split. Sample VB.NET code:
Dim inputString As String = " a.1.2.3 #4567 "
If (inputString.Contains("#")) Then
Dim firstBit As String = inputString.Split("#")(0).Trim()
Dim headingToRemove As String = "a."
Dim result As String = firstBit.Substring(headingToRemove.Length, firstBit.Length - headingToRemove.Length)
End If
As far as this is a multi-language question, here comes the translation to C#:
string inputString = " a.1.2.3 #4567 ";
if (inputString.Contains("#"))
{
string firstBit = inputString.Split('#')[0].Trim();
string headingToRemove = "a.";
string result = firstBit.Substring(headingToRemove.Length, firstBit.Length - headingToRemove.Length);
}

I guess another way using unrolled
\d+ (?: \. \d+ )+

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extract word out of string using regex - c#

Related

Regex Contains Multiple words

C# String/StringBuilder MemoryException on Large set of Replaces

Get Regex occurance with escaped symbols

Regex Replace on a JSON structure

Extract some numbers and decimals from a string

Categories

Resources