A More Efficient Way to Parse a String in C#

A More Efficient Way to Parse a String in C# - c#

I have this code that reads a file and creates Regex groups. Then I walk through the groups and use other matches on keywords to extract what I need. I need the stuff between each keyword and the next space or newline. I am wondering if there is a way using the Regex keyword match itself to discard what I don't want (the keyword).
//create the pattern for the regex
String VSANMatchString = #"vsan\s(?<number>\d+)[:\s](?<info>.+)\n(\s+name:(?<name>.+)\s+state:(?<state>.+)\s+\n\s+interoperability mode:(?<mode>.+)\s\n\s+loadbalancing:(?<loadbal>.+)\s\n\s+operational state:(?<opstate>.+)\s\n)?";
//set up the patch
MatchCollection VSANInfoList = Regex.Matches(block, VSANMatchString);
// set up the keyword matches
Regex VSANNum = new Regex(#" \d* ");
Regex VSANName = new Regex(#"name:\S*");
Regex VSANState = new Regex(#"operational state\S*");
//now we can extract what we need since we know all the VSAN info will be matched to the correct VSAN
//match each keyword (name, state, etc), then split and extract the value
foreach (Match m in VSANInfoList)
{
string num=String.Empty;
string name=String.Empty;
string state=String.Empty;
string s = m.ToString();
if (VSANNum.IsMatch(s)) { num=VSANNum.Match(s).ToString().Trim(); }
if (VSANName.IsMatch(s))
{
string totrim = VSANName.Match(s).ToString().Trim();
string[] strsplit = Regex.Split (totrim, "name:");
name=strsplit[1].Trim();
}
if (VSANState.IsMatch(s))
{
string totrim = VSANState.Match(s).ToString().Trim();
string[] strsplit=Regex.Split (totrim, "state:");
state=strsplit[1].Trim();
}

It looks like your single regex should be able to gather all you need. Try this:
string name = m.Groups["name"].Value; // Or was it m.Captures["name"].Value?

Related

How to split string by another string

I have this string (it's from EDI data):
ISA*ESA?ISA*ESA?
The * indicates it could be any character and can be of any length.
? indicates any single character.
Only the ISA and ESA are guaranteed not to change.
I need this split into two strings which could look like this: "ISA~this is date~ESA|" and
"ISA~this is more data~ESA|"
How do I do this in c#?
I can't use string.split, because it doesn't really have a delimeter.

You can use Regex.Split for accomplishing this
string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";
var regex = new Regex($#"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
var items = regex.Split(inputStr);
foreach (var item in items) {
Console.WriteLine(item);
}
Output:
ISA~this is date~ESA
ISA~this is more data~ESA|
Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.
To explain the Regex a bit:
(?<=ESA) Look-behind assertion. This portion is not captured but still matched
(?=ISA) Look-ahead assertion. This portion is not captured but still matched
Using these look-around assertions you can find the correct | character for splitting

Simply use the
int x = whateverString.indexOf("?ISA"); // replace ? with the actual character here
and then just use the substring from 0 to that indexOf, indexOf to length.
Edit:
If ? is not known,
can we just use the regex Pattern and Matcher.
Matcher matcher = Patter.compile("ISA.*ESA").match(whateverString);
if(matcher.find()) {
matcher.find();
int x = matcher.start();
}
Here x would give that start index of that match.
Edit: I mistakenly saw it as java one, for C#
string pattern = #"ISA.*ESA";
Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = myRegex.Match(whateverString); // m is the first match
while (m.Success)
{
Console.writeLine(m.value);
m = m.NextMatch(); // more matches
}

RegEx will probably be the best for this. See this link
Mask would be
ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.
This will give you 2 groups with data you need
Match match = Regex.Match(input, #"ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.",RegexOptions.IgnoreCase);
if (match.Success)
{
var data1 = match.Groups["data1"].Value;
var data2 = match.Groups["data2"].Value;
}
Use Regex.Matches If you need multiple matches found, and specify different RegexOptions if needed.

It's kinda hacky but you could do...
string x = "ISA*ESA?ISA*ESA?";
x = x.Replace("*","~"); // OR SOME OTHER DELIMITER
string[] y = x.Split('~');
Not perfect in all situations, but it could solve your problem simply.

You could split by "ISA" and "ESA" and then put the parts back together.
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
string start = "ISA",
end = "ESA";
var splitedInput = input.Split(new[] { start, end }, StringSplitOptions.None);
var firstPart = $"{start}{splitedInput[1]}{end}{splitedInput[2]}";
var secondPart = $"{start}{splitedInput[3]}{end}{splitedInput[4]}";
firstPart = "ISA~this is date~ESA|"
secondPart = "ISA~this is more data~ESA|";

Use a Regex like ISA(.+?)ESA and select the first group
string input = "ISA~mycontent+ESA";
Match match = Regex.Match(input, #"ISA(.+?)ESA",RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
}

Instead of "splitting" by a string, I would instead describe your question as "grouping" by a string. This can easily be done using a regular expression:
Regular expression: ^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$
Explanation:
^ - asserts position at start of the string
( - start capturing group
ISA - match string ISA exactly
.*?(?=ESA) - match any character 0 or more times, positive lookahead on the
string ESA (basically match any character until the string ESA is found)
ESA - match string ESA exactly
. - match any character
) - end capturing group
repeat one more time...
$ - asserts position at end of the string
Try it on Regex101
Example:
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
Regex regex = new Regex(#"^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$",
RegexOptions.Compiled);
Match match = regex.Match(input);
if (match.Success)
{
string firstValue = match.Groups[1].Value; // "ISA~this is date~ESA|"
string secondValue = match.Groups[2].Value; // "ISA~this is more data~ESA|"
}

There are two answers to the question "How to split a string by another string".
var matches = input.Split(new [] { "ISA" }, StringSplitOptions.RemoveEmptyEntries);
and
var matches = Regex.Split(input, "ISA").ToList();
However, the first removes empty entries, while the second does not.

Replace character with any possible string c#

Lets say i have string like this Test%Test and i have stored strings like this:
Test123Test
TestTTTTest
Test153jhdsTest
123Test
TEST123
So what i want is when i type in textbox Test it would filter me everything with Test in itselft and that will get me all strings which is easy, but i want to type in Test%Test and it needs to filter me everything that has Test[anything]Test in itself (so result would be first, second and third string). How can i do it?

a simple solution using a regex is:
string[] values = new string[] { "Test123Test",
"TestTTTTest",
"Test153jhdsTest",
"123Test",
"TEST123" };
string searchQuery = "Test%Test";
string regex = Regex.Escape(searchQuery).Replace("%", ".*?");
string[] filteredValues = values.Where(str => Regex.IsMatch(str, regex)).ToArray();
Or for a single match:
string value = "Test123Test";
string searchQuery = "Test%Test";
string regex = Regex.Escape(searchQuery).Replace("%", ".*?");
if ( Regex.IsMatch(value, regex) )
{
// do something with the match...
}
We replace % with a regular expression (. = any character, * = zero or more times, ? = lazy quantifier). You can learn more about regular expressions here

Get values between curly braces c#

I never used regex before. I was abel to see similar questions in forum but not exactly what im looking for
I have a string like following. need to get the values between curly braces
Ex: "{name}{name#gmail.com}"
And i Need to get the following splitted strings.
name and name#gmail.com
I tried the following and it gives me back the same string.
string s = "{name}{name#gmail.com}";
string pattern = "({})";
string[] result = Regex.Split(s, pattern);

Use Matches of Regex rather than Split to accomplish this easily:
string input = "{name}{name#gmail.com}";
var regex = new Regex("{(.*?)}");
var matches = regex.Matches(input);
foreach (Match match in matches) //you can loop through your matches like this
{
var valueWithoutBrackets = match.Groups[1].Value; // name, name#gmail.com
var valueWithBrackets = match.Value; // {name}, {name#gmail.com}
}

Is using regex a must? In this particular example I would write:
s.Split(new char[] { '{', '}' }, StringSplitOptions.RemoveEmptyEntries)

here you go
string s = "{name}{name#gmail.com}";
s = s.Substring(1, s.Length - 2);// remove first and last characters
string pattern = "}{";// split pattern "}{"
string[] result = Regex.Split(s, pattern);
or
string s = "{name}{name#gmail.com}";
s = s.TrimStart('{');
s = s.TrimEnd('}');
string pattern = "}{";
string[] result = Regex.Split(s, pattern);

regex replace matchEvaluator using string Array

I need to highlight search terms in a block of text.
My initial thought was looping though the search terms. But is there an easier way?
Here is what I'm thinking using a loop...
public string HighlightText(string inputText)
{
string[] sessionPhrases = (string[])Session["KeywordPhrase"];
string description = inputText;
foreach (string field in sessionPhrases)
{
Regex expression = new Regex(field, RegexOptions.IgnoreCase);
description = expression.Replace(description,
new MatchEvaluator(ReplaceKeywords));
}
return description;
}
public string ReplaceKeywords(Match m)
{
return "<span style='color:red;'>" + m.Value + "</span>";
}

You could replace the loop with something like:
string[] phrases = ...
var re = String.Join("|", phrases.Select(s => Regex.Escape(s)).ToArray());
text = Regex.Replace(re, text, new MatchEvaluator(SomeFunction), RegexOptions.IgnoreCase);

Extending on Qtax's answer:
phrases = ...
// Use Regex.Escape to prevent ., (, * and other special characters to break the search
string re = String.Join("|", phrases.Select(s => Regex.Escape(s)).ToArray());
// Use \b (expression) \b to ensure you're only matching whole words, not partial words
re = #"\b(?:" +re + #")\b"
// use a simple replacement pattern instead of a MatchEvaluator
string replacement = "<span style='color:red;'>$0</span>";
text = Regex.Replace(re, text, replacement, RegexOptions.IgnoreCase);
Not that if you're already replacing data inside HTML, it might not be a good idea to use Regex to replace just anything in the content, you might end up getting:
<<span style='color:red;'>script</span>>
if someone is searching for the term script.
To prevent that from happening, you could use the HTML Agility Pack in combination with Regex.
You might also want to check out this post which deals with a very similar issue.

What regular expression is good for extracting URLs from HTML?

I have tried using my own and using the top ones here on StackOverflow, but most of them let matched more than was desired.
For instance, some would extract http://foo.com/hello?world<br (note <br at end) from the input ...http://foo.com/hello?world<br>....
If there a pattern that can match just the URL more reliably?
This is the current pattern I am using:
#"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:##%/;$()~_?\+-=\\\.&^]*)"

The most secure regex is to not use a regex at all and use the System.Uri class.
System.Uri
Uri uri = new Uri("http://myUrl/%2E%2E/%2E%2E");
Console.WriteLine(uri.AbsoluteUri);
Console.WriteLine(uri.PathAndQuery);

Your regex needs an escape for the dash "-" in the last character group:
#"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:##%/;$()~_?\+\-=\\\.&^]*)"
Essentially, you were allowing characters from + through =, which includes <

Try this:
public static string[] Parse(string pattern, string groupName, string input)
{
var list = new List<string>();
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
for (var match = regex.Match(input); match.Success; match = match.NextMatch())
{
list.Add(string.IsNullOrWhiteSpace(groupName) ? match.Value : match.Groups[groupName].Value);
}
return list.ToArray();
}
public static string[] ParseUri(string input)
{
const string pattern = #"(?<Protocol>\w+):\/\/(?<Domain>[\w#][\w.:#]+)\/?[\w\.?=%&=\-#/$,]*";
return Parse(pattern, string.Empty, input);
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

A More Efficient Way to Parse a String in C# - c#

It looks like your single regex should be able to gather all you need. Try this: string name = m.Groups["name"].Value; // Or was it m.Captures["name"].Value?

Related

How to split string by another string

Replace character with any possible string c#

Get values between curly braces c#

regex replace matchEvaluator using string Array

What regular expression is good for extracting URLs from HTML?

Categories

Resources