Get Regex occurance with escaped symbols - c#

I would appreciate help with non-working regex (does not work for special symbols % or $)
public System.Tuple<string, string> GetParts(string str, string beginMark, string endMark)
{
var pattern =
new Regex(beginMark + #"(?<val>.*?)" + endMark,
RegexOptions.Compiled |
RegexOptions.Singleline);
return (from Match match in pattern.Matches(str)
where match.Success
select new Tuple(
match.Value,
match.Groups["val"].Value))
.ToList();
}
Calling method:
string input = #"%sometext%\another text";
string replacedValue = "AAA";
var occurrences = GetPart(input, #"(%", ")");
foreach (var occurrence in occurrences)
{
Console.WriteLine(occurrence.Item1 + Environment.NewLine);
Console.WriteLine(occurrence.Item2 + Environment.NewLine);
// replace
onsole.WriteLine(input.Replace(occurrence.Item1, replacedValue) + Environment.NewLine);
}
Expected Output:
%sometext%
sometext
AAA\another text

You need to escape your symbols. Try to change
new Regex(beginMark + #"(?<val>.*?)" + endMark,
to
new Regex(Regex.Escape(beginMark) + #"(?<val>.*?)" + Regex.Escape(endMark),

Related

Extract the values between the double quotes using regex

string emailBody = "sample text for NewFinancial History:\"xyz\" text NewFinancial History:\"abc\" NewEBTDI$:\"abc\" ds \"NewFinancial History:pqr\" test";
private Dictionary<string, List<string>> ExtractFieldValuesForDynamicListObject(string emailBody)
{
Dictionary<string, List<string>> paramValueList = new Dictionary<string, List<string>>();
try
{
emailBody = ReplaceIncompatableQuotes(emailBody);
emailBody = string.Join(" ", Regex.Split(emailBody.Trim(), #"(?:\r\n|\n|\r)"));
var keys = Regex.Matches(emailBody, #"\bNew\B(.+?):", RegexOptions.Singleline).OfType<Match>().Select(m => m.Groups[0].Value.Replace(":", "")).Distinct().ToArray();
foreach (string key in keys)
{
List<string> valueList = new List<string>();
string regex = "" + Regex.Escape(key) + ":" + "\"(?<" + Regex.Escape(GetCleanKey(key)) + ">[^\"]*)\"";
var matches = Regex.Matches(emailBody, regex, RegexOptions.Singleline);
foreach (Match match in matches)
{
if (match.Success)
{
string value = match.Groups[Regex.Escape(GetCleanKey(key))].Value;
if (!valueList.Contains(value.Trim()))
{
valueList.Add(value.Trim());
}
}
}
valueList = valueList.Distinct().ToList();
string listName = key.Replace("New", "");
paramValueList.Add(listName.Trim(), valueList);
}
}
catch (Exception ex)
{
DCULSLogger.LogError(ex);
}
return paramValueList;
}
My goal here is to scan though the email body and identify the string with NewListName:"Value" nomenclature and it is working perfectly fine using the above regex and method. Now my client has changed the nomenclature from NewListName:"Value" to "NewListName:Value". I want to grab the text between the double quotes along with New: keyword. So I need to look for "New keyword and ending quotes. Can anyone help me modify the above regex to scan through the email body and get all list of value between double quotes. So in above example I want to grab \"NewFinancial History:pqr\" in my results. Any help would be appreciated.
You may use a regex that will match quote, New, some chars other than " and :, then :, and then any chars but " up to a ":
var keys = Regex.Matches(emailBody, #"""New[^"":]+:[^""]+""", RegexOptions.Singleline)
.OfType<Match>()
.Select(m => m.Value)
.Distinct()
.ToArray();
See the regex demo
Pattern details:
" - a literal double quote
New - a literal substring
[^":]+ - 1 or more characters other than " and : (the [^...] is a negated character class)
: - a literal colon
[^"]+ - 1 or more characters other than "
" - a literal double quote

How to allow space in regex?

I am trying to get the value after New : in double quote.
I can retrieve the value fine when there is no space in ListName. But if I put space between the list name (eg. NewFinancial History:\"xyz\"), it throws the error below:
parsing "NewFinancial History:"(?[^"]*)"" - Invalid group name: Group names must begin with a word character.
it throws error at below line
var matches = Regex.Matches(contents, regex, RegexOptions.Singleline);
Below is my code.
string contents = " testing NewFinancial History:\"xyz\" ";
var keys = Regex.Matches(contents, #"New(.+?):", RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace).OfType<Match>().Select(m => m.Groups[0].Value.Trim().Replace(":", "")).Distinct().ToArray();
foreach (string key in keys)
{
List<string> valueList = new List<string>();
string listNameKey = key;
string regex = "" + listNameKey + ":" + "\"(?<" + listNameKey + ">[^\"]*)\"";
var matches = Regex.Matches(contents, regex, RegexOptions.Singleline);
foreach (Match match in matches)
{
if (match.Success)
{
string value = match.Groups[key].Value;
valueList.Add(value);
}
}
}
I don't see why you also use the "key" as name of the group.
The problem you have is that the group name
could not contain spaces, but you could simply create an anonymous group.
string contents = " testing NewFinancial History:\"xyz\" ";
var keys = Regex.Matches(contents, #"New(.+?):", RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace).OfType<Match>().Select(m => m.Groups[0].Value.Trim().Replace(":", "")).Distinct().ToArray();
foreach (string key in keys)
{
List<string> valueList = new List<string>();
string listNameKey = key;
string regex = "" + listNameKey + ":" + "\"([^\"]*)\""; //create an anonymous capture group
var matches = Regex.Matches(contents, regex, RegexOptions.Singleline);
foreach (Match match in matches)
{
if (match.Success)
{
string value = match.Groups[0].Value; //get the first group
valueList.Add(value);
}
}
}
Change your foreach block to
List<string> valueList = new List<string>();
string listNameKey = key;
string regex = "" + listNameKey + ":" + "\"(?<" +
listNameKey.Replace(" ","") + ">[^\"]*)\""; // Removing spaces in the group name here
var matches = Regex.Matches(contents, regex, RegexOptions.Singleline);
foreach (Match match in matches)
{
if (match.Success)
{
string value = match.Groups[key.Replace(" ", "")].Value; // Removing spaces here
valueList.Add(value);
}
}
The point is that group names cannot have whitespace, so you need to replace them with empty strings in places where you declare the capture group name.
See IDEONE demo
Note that your New(.+?): regex has no whitespace to ignore, I recommend deleting RegexOptions.IgnorePatternWhitespace flag. You can replace it with a more efficient New([^:]+):.

Regex catch string between strings

I created a small function to catch a string between strings.
public static string[] _StringBetween(string sString, string sStart, string sEnd)
{
if (sStart == "" && sEnd == "")
{
return null;
}
string sPattern = sStart + "(.*?)" + sEnd;
MatchCollection rgx = Regex.Matches(sString, sPattern);
if (rgx.Count < 1)
{
return null;
}
string[] matches = new string[rgx.Count];
for (int i = 0; i < matches.Length; i++)
{
matches[i] = rgx[i].ToString();
//MessageBox.Show(matches[i]);
}
return matches;
}
However if i call my function like this: _StringBetween("[18][20][3][5][500][60]", "[", "]");
It will fail. A way would be if i changed this line string sPattern = "\\" + sStart + "(.*?)" + "\\" + sEnd;
However i can not because i dont know if the character is going to be a bracket or a word.
Sorry if this is a stupid question but i couldn't find something similar searching.
A way would be if i changed this line string sPattern = "\\" + sStart + "(.*?)" + "\\" + sEnd; However i can not because i don't know if the character is going to be a bracket or a word.
You can escape all meta-characters by calling Regex.Escape:
string sPattern = Regex.Escape(sStart) + "(.*?)" + Regex.Escape(sEnd);
This would cause the content of sStart and sEnd to be interpreted literally.

Replacement in a String with a regular expression

I'm trying to replace a string in C# with the class Regex but I don't know use the class properly.
I want replace the next appearance chain in the String "a"
":(one space)(one or more characters)(one space)"
by the next regular expression
":(two spaces)(one or more characters)(three spaces)"
Will anyone help me and give me the code and explains me the regular expresion used?
you can use string.Replace(string, string)
try this one.
http://msdn.microsoft.com/en-us/library/fk49wtc1.aspx
try this one
private String StrReplace(String Str)
{
String Output = string.Empty;
String re1 = "(:)( )((?:[a-z][a-z]+))( )";
Regex r = new Regex(re1, RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match m = r.Match(Str);
if (m.Success)
{
String c1 = m.Groups[1].ToString();
String ws1 = m.Groups[2].ToString() + " ";
String word1 = m.Groups[3].ToString();
String ws2 = m.Groups[4].ToString() + " ";
Output = c1.ToString() + ws1.ToString() + word1.ToString() + ws2.ToString() + "\n";
Output = Regex.Replace(Str, re1, Output);
}
return Output;
}
Using String.Replace
var str = "Test string with : .*. to replace";
var newstr = str.Replace(": .*. ", ": .*. ");
Using Regex.Replace
var newstr = Regex.Replace(str,": .*. ", ": .*. ");

Regex Replace on a JSON structure

I am currently trying to do a Regex Replace on a JSON string that looks like:
String input = "{\"`####`Answer_Options11\": \"monkey22\",\"`####`Answer_Options\": \"monkey\",\"Answer_Options2\": \"not a monkey\"}";
a
The goal is to find and replace all the value fields who's key field starts with `####`
I currently have this:
static Regex _FieldRegex = new Regex(#"`####`\w+" + ".:.\"(.*)\",");
static public string MatchKey(string input)
{
MatchCollection match = _encryptedFieldRegex.Matches(input.ToLower());
string match2 = "";
foreach (Match k in match )
{
foreach (Capture cap in k.Captures)
{
Console.WriteLine("" + cap.Value);
match2 = Regex.Replace(input.ToLower(), cap.Value.ToString(), #"CAKE");
}
}
return match2.ToString();
}
Now this isn't working. Naturally I guess since it picks up the entire `####`Answer_Options11\": \"monkey22\",\"`####`Answer_Options\": \"monkey\", as a match and replaces it. I want to just replace the match.Group[1] like you would for a single match on the string.
At the end of the day the JSON string needs to look something like this:
String input = "{\"`####`Answer_Options11\": \"CATS AND CAKE\",\"`####`Answer_Options\": \"CAKE WAS A LIE\",\"Answer_Options2\": \"not a monkey\"}";
Any idea how to do this?
you want a positive lookahead and a positive lookbehind :
(?<=####.+?:).*?(?=,)
the lookaheads and lookbehinds will verify that it matches those patterns, but not include them in the match. This site explains the concept pretty well.
Generated code from RegexHero.com :
string strRegex = #"(?<=####.+?:).*?(?=,)";
Regex myRegex = new Regex(strRegex);
string strTargetString = #" ""{\""`####`Answer_Options11\"": \""monkey22\"",\""`####`Answer_Options\"": \""monkey\"",\""Answer_Options2\"": \""not a monkey\""}""";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}
this will match "monkey22" and "monkey" but not "not a monkey"
Working from #Jonesy's answer I got to this which works for what I wanted. It includes the .Replace on the groups that I required. The negative look ahead and behinds were very interesting but I needed to replace some of those values hence groups.
static public string MatchKey(string input)
{
string strRegex = #"(__u__)(.+?:\s*)""(.*)""(,|})*";
Regex myRegex = new Regex(strRegex, RegexOptions.IgnoreCase | RegexOptions.Multiline);
IQS_Encryption.Encryption enc = new Encryption();
int count = 1;
string addedJson = "";
int matchCount = 0;
foreach (Match myMatch in myRegex.Matches(input))
{
if (myMatch.Success)
{
//Console.WriteLine("REGEX MYMATCH: " + myMatch.Value);
input = input.Replace(myMatch.Value, "__e__" + myMatch.Groups[2].Value + "\"c" + count + "\"" + myMatch.Groups[4].Value);
addedJson += "c"+count + "{" +enc.EncryptString(myMatch.Groups[3].Value, Encoding.UTF8.GetBytes("12345678912365478912365478965412"))+"},";
}
count++;
matchCount++;
}
Console.WriteLine("MAC" + matchCount);
return input + addedJson;
}`
Thanks again to #Jonesy for the huge help.

Categories

Resources