C# split by regex

C# split by regex - c#

I have a little problem that I don't know how to call it like, so I will do my best to explain you that.
String text = "Random text over here boyz, I dunno what to do";
I want to take by split only over here boyz for example, I want to let split the word text and the word , and it will show me the whole text that in thoose 2 strings. Any ideas?
Thank you,
Sagi.

From your comments I get that from this string:
foo bar id="baz" qux
You want to obtain the value baz, because it is in the id="{text}" pattern.
For that you can use a regular expression:
string result = Regex.Match(text, "id=\"(.*?)\"").Groups[1].Value;
Note that this will match any character. Also note that this will yield false positives, like fooid="bar", and that this won't match unquoted values.
So all in all, for parsing HTML, you should not use regular expressions. Try HtmlAgilityPack and an XPath expression.

There is a Split overload that can receive multiple string seperators:
var rrr = text.Split(new string[] { ",", "text" }, StringSplitOptions.None);
If you would like to extract only the text between these two strings using regex you can do something like this:
var pattern = #"text(.*),";
var a = new Regex(pattern).Match(text);
var result = a.Groups[1];

You can use Regex class:
https://msdn.microsoft.com/pl-pl/library/ze12yx1d%28v=vs.110%29.aspx
But first of all (as it was said) you need to clarify for yourself how you will identify string that you want.

in first case you can use
string stringResult;
if (text.Contains("over here boyz"))
stringResult = string.Empty;
else
stringResult = "over here boyz";
but the second case can solve by this code
String text = "Random text over here boyz, I dunno what to do";
//Second dream without whitespace
var result = Regex.Split(text, " *text *| *, *");
foreach (var x in result)
{
Console.WriteLine(x);
}
//Second dream with whitespace
result = Regex.Split(text, "text|,");
foreach (var x in result)
{
Console.WriteLine(x);
}
You can train to write Regex with this tool http://www.regexbuddy.com/ or http://www.regexr.com/

Related

Replace Multiple References of a pattern with Regex

I have a string which is in the following form
$KL\U#, $AS\gehaeuse#, $KL\tol_plus#, $KL\tol_minus#
Basically this string is made up of the following parts
$ = Delimiter Start
(Some Text)
# = Delimiter End
(all of this n times)
I would now like to replace each of these sections with some meaningful text. Therefore I need to extract these sections, do something based on the text inside each section and then replace the section with the result. So the resulting string should look something like this:
12V, 0603, +20%, -20%
The commas and everything else that is not contained within the section stays as it is, the sections get replaced by meaningful values.
For the question: Can you help me with a Regex pattern that finds out where these sections are so I can replace them?

You need to use the Regex.Replace method and use a MatchEvaluator delegate to decide what the replacement value should be.
The pattern you need can be $ then anything except #, then #. We put the middle bit in brackets so it is stored as a separate group in the result.
\$([^#]+)#
The full thing can be something like this (up to you to do the correct appropriate replacement logic):
string value = #"$KL\U#, $AS\gehaeuse#, $KL\tol_plus#, $KL\tol_minus#";
string result = Regex.Replace(value, #"\$([^#]+)#", m =>
{
// This method takes the matching value and needs to return the correct replacement
// m.Value is e.g. "$KL\U#", m.Groups[1].Value is the bit in ()s between $ and #
switch (m.Groups[1].Value)
{
case #"KL\U":
return "12V";
case #"AS\gehaeuse":
return "0603";
case #"KL\tol_plus":
return "+20%";
case #"KL\tol_minus":
return "-20%";
default:
return m.Groups[1].Value;
}
});

As far as matching the pattern, you're wanting:
\$[^#]+#
The rest of your question isn't very clear. If you need to replace the original string with some meaningful values, just loop through your matches:
var str = #"$KL\U#, $AS\gehaeuse#, $KL\tol_plus#, $KL\tol_minus#";
foreach (Match match in Regex.Matches(str, #"\$[^#]+#"))
{
str = str.Replace(match.ToString(), "something meaningful");
}
beyond that you'll have to provide more context

are you sure you don't want to do just plain string manipulations?
var str = #"$KL\U#, $AS\gehaeuse#, $KL\tol_plus#, $KL\tol_minus#";
string ReturnManipulatedString(string str)
{
var list = str.split("$");
string newValues = string.Empty;
foreach (string st in str)
{
var temp = st.split("#");
newValues += ManipulateStuff(temp[0]);
if (0 < temp.Count();
newValues += temp[1];
}
}

Omit unnecessary parts in string array

In C#, I have a string comes from a file in this format:
Type="Data"><Path.Style><Style
or maybe
Type="Program"><Rectangle.Style><Style
,etc. Now I want to only extract the Data or Program part of the Type element. For that, I used the following code:
string output;
var pair = inputKeyValue.Split('=');
if (pair[0] == "Type")
{
output = pair[1].Trim('"');
}
But it gives me this result:
output=Data><Path.Style><Style
What I want is:
output=Data
How to do that?

This code example takes an input string, splits by double quotes, and takes only the first 2 items, then joins them together to create your final string.
string input = "Type=\"Data\"><Path.Style><Style";
var parts = input
.Split('"')
.Take(2);
string output = string.Join("", parts); //note: .net 4 or higher
This will make output have the value:
Type=Data
If you only want output to be "Data", then do
var parts = input
.Split('"')
.Skip(1)
.Take(1);
or
var output = input
.Split('"')[1];

What you can do is use a very simple regular express to parse out the bits that you want, in your case you want something that looks like this and then grab the two groups that interest you:
(Type)="(\w+)"
Which would return in groups 1 and 2 the values Type and the non-space characters contained between the double-quotes.

Instead of doing many split, why don't you just use Regex :
output = Regex.Match(pair[1].Trim('"'), "\"(\w*)\"").Value;

Maybe I missed something, but what about this:
var str = "Type=\"Program\"><Rectangle.Style><Style";
var splitted = str.Split('"');
var type = splitted[1]; // IE Data or Progam
But you will need some error handling as well.

How about a regex?
var regex = new Regex("(?<=^Type=\").*?(?=\")");
var output = regex.Match(input).Value;
Explaination of regex
(?<=^Type=\") This a prefix match. Its not included in the result but will only match
if the string starts with Type="
.*? Non greedy match. Match as many characters as you can until
(?=\") This is a suffix match. It's not included in the result but will only match if the next character is "

Given your specified format:
Type="Program"><Rectangle.Style><Style
It seems logical to me to include the quote mark (") when splitting the strings... then you just have to detect the end quote mark and subtract the contents. You can use LinQ to do this:
string code = "Type=\"Program\"><Rectangle.Style><Style";
string[] parts = code.Split(new string[] { "=\"" }, StringSplitOptions.None);
string[] wantedParts = parts.Where(p => p.Contains("\"")).
Select(p => p.Substring(0, p.IndexOf("\""))).ToArray();

Regex: C# extract text within double quotes

I want to extract only those words within double quotes. So, if the content is:
Would "you" like to have responses to your "questions" sent to you via email?
The answer must be
you
questions

Try this regex:
\"[^\"]*\"
or
\".*?\"
explain :
[^ character_group ]
Negation: Matches any single character that is not in character_group.
*?
Matches the previous element zero or more times, but as few times as possible.
and a sample code:
foreach(Match match in Regex.Matches(inputString, "\"([^\"]*)\""))
Console.WriteLine(match.ToString());
//or in LINQ
var result = from Match match in Regex.Matches(line, "\"([^\"]*)\"")
select match.ToString();

Based on #Ria 's answer:
static void Main(string[] args)
{
string str = "Would \"you\" like to have responses to your \"questions\" sent to you via email?";
var reg = new Regex("\".*?\"");
var matches = reg.Matches(str);
foreach (var item in matches)
{
Console.WriteLine(item.ToString());
}
}
The output is:
"you"
"questions"
You can use string.TrimStart() and string.TrimEnd() to remove double quotes if you don't want it.

I like the regex solutions. You could also think of something like this
string str = "Would \"you\" like to have responses to your \"questions\" sent to you via email?";
var stringArray = str.Split('"');
Then take the odd elements from the array. If you use linq, you can do it like this:
var stringArray = str.Split('"').Where((item, index) => index % 2 != 0);

This also steals the Regex from #Ria, but allows you to get them into an array where you then remove the quotes:
strText = "Would \"you\" like to have responses to your \"questions\" sent to you via email?";
MatchCollection mc = Regex.Matches(strText, "\"([^\"]*)\"");
for (int z=0; z < mc.Count; z++)
{
Response.Write(mc[z].ToString().Replace("\"", ""));
}

I combine Regex and Trim:
const string searchString = "This is a \"search text\" and \"another text\" and not \"this text";
var collection = Regex.Matches(searchString, "\\\"(.*?)\\\"");
foreach (var item in collection)
{
Console.WriteLine(item.ToString().Trim('"'));
}
Result:
search text
another text

Try this (\"\w+\")+
I suggest you to download Expresso
http://www.ultrapico.com/Expresso.htm

I needed to do this in C# for parsing CSV and none of these worked for me so I came up with this:
\s*(?:(?:(['"])(?<value>(?:\\\1|[^\1])*?)\1)|(?<value>[^'",]+?))\s*(?:,|$)
This will parse out a field with or without quotes and will exclude the quotes from the value while keeping embedded quotes and commas. <value> contains the parsed field value. Without using named groups, either group 2 or 3 contains the value.
There are better and more efficient ways to do CSV parsing and this one will not be effective at identifying bad input. But if you can be sure of your input format and performance is not an issue, this might work for you.

Slight improvement on answer by #ria,
\"[^\" ][^\"]*\"
Will recognize a starting double quote only when not followed by a space to allow trailing inch specifiers.
Side effect: It will not recognize "" as a quoted value.

Remove String After Determinate String

I need to remove certain strings after another string within a piece of text.
I have a text file with some URLs and after the URL there is the RESULT of an operation. I need to remove the RESULT of the operation and leave only the URL.
Example of text:
http://website1.com/something Result: OK(registering only mode is on)
http://website2.com/something Result: Problems registered 100% (SOMETHING ELSE) Other Strings;
http://website3.com/something Result: error: "Âíèìàíèå, îáíàðóæåíà îøèáêà - Ìåñòî æèòåëüñòâà ñîäåðæèò íåäîïóñòèìûå ê
I need to remove all strings starting from Result: so the remaining strings have to be:
http://website1.com/something
http://website2.com/something
http://website3.com/something
Without Result: ........
The results are generated randomly so I don't know exactly what there is after RESULT:

One option is to use regular expressions as per some other answers. Another is just IndexOf followed by Substring:
int resultIndex = text.IndexOf("Result:");
if (resultIndex != -1)
{
text = text.Substring(0, resultIndex);
}
Personally I tend to find that if I can get away with just a couple of very simple and easy to understand string operations, I find that easier to get right than using regex. Once you start going into real patterns (at least 3 of these, then one of those) then regexes become a lot more useful, of course.

string input = "Action2 Result: Problems registered 100% (SOMETHING ELSE) Other Strings; ";
string pattern = "^(Action[0-9]*) (.*)$";
string replacement = "$1";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
You use $1 to keep the match ActionXX.

Use Regex for this.
Example:
var r = new System.Text.RegularExpressions.Regex("Result:(.)*");
var result = r.Replace("Action Result:1231231", "");
Then you will have "Action" in the result.

You can try with this code - by using string.Replace
var pattern = "Result:";
var lineContainYourValue = "jdfhkjsdfhsdf Result:ljksdfljh"; //I want replace test
lineContainYourValue.Replace(pattern,"");

Something along the lines of this perhaps?
string line;
using ( var reader = new StreamReader ( File.Open ( #"C:\temp\test.txt", FileMode.Open ) ) )
using ( var sw = new StreamWriter(File.Open( #"C:\Temp\test.edited.txt", FileMode.CreateNew ) ))
while ( (line = reader.ReadLine()) != null )
if(!line.StartsWith("Result:")) sw.WriteLine(line);

You can use RegEx for this kind of processing.
using System.Text.RegularExpressions;
private string ParseString(string originalString)
{
string pattern = ".*(?=Result:.*)";
Match match = Regex.Match(originalString, pattern);
return match.Value;
}

A Linq approach:
IEnumerable<String> result = System.IO.File
.ReadLines(path)
.Where(l => l.StartsWith("Action") && l.Contains("Result"))
.Select(l => l.Substring(0, l.IndexOf("Result")));

Given your current example, where you want only the website, regex match the spaces.
var fileLine = "http://example.com/sub/ random text";
Regex regexPattern = new Regex("(.*?)\\s");
var websiteMatch = regexPattern.Match(fileLine).Groups[1].ToString();
Debug.Print("!" + websiteMatch + "!");
Repeating for each line in your text file. Regex explained: .* matches anything, ? makes the match ungreedy, (brackets) puts the match into a group, \\s matches whitespace.

string replace using a List<string>

I have a List of words I want to ignore like this one :
public List<String> ignoreList = new List<String>()
{
"North",
"South",
"East",
"West"
};
For a given string, say "14th Avenue North" I want to be able to remove the "North" part, so basically a function that would return "14th Avenue " when called.
I feel like there is something I should be able to do with a mix of LINQ, regex and replace, but I just can't figure it out.
The bigger picture is, I'm trying to write an address matching algorithm. I want to filter out words like "Street", "North", "Boulevard", etc. before I use the Levenshtein algorithm to evaluate the similarity.

How about this:
string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)));
or for .Net 3:
string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)).ToArray());
Note that this method splits the string up into individual words so it only removes whole words. That way it will work properly with addresses like Northampton Way #123 that string.Replace can't handle.

Regex r = new Regex(string.Join("|", ignoreList.Select(s => Regex.Escape(s)).ToArray()));
string s = "14th Avenue North";
s = r.Replace(s, string.Empty);

Something like this should work:
string FilterAllValuesFromIgnoreList(string someStringToFilter)
{
return ignoreList.Aggregate(someStringToFilter, (str, filter)=>str.Replace(filter, ""));
}

What's wrong with a simple for loop?
string street = "14th Avenue North";
foreach (string word in ignoreList)
{
street = street.Replace(word, string.Empty);
}

If you know that the list of word contains only characters that do not need escaping inside a regular expression then you can do this:
string s = "14th Avenue North";
Regex regex = new Regex(string.Format(#"\b({0})\b",
string.Join("|", ignoreList.ToArray())));
s = regex.Replace(s, "");
Result:
14th Avenue
If there are special characters you will need to fix two things:
Use Regex.Escape on each element of ignore list.
The word-boundary \b will not match a whitespace followed by a symbol or vice versa. You may need to check for whitespace (or other separating characters such as punctuation) using lookaround assertions instead.
Here's how to fix these two problems:
Regex regex = new Regex(string.Format(#"(?<= |^)({0})(?= |$)",
string.Join("|", ignoreList.Select(x => Regex.Escape(x)).ToArray())));

If it's a short string as in your example, you can just loop though the strings and replace one at a time. If you want to get fancy you can use the LINQ Aggregate method to do it:
address = ignoreList.Aggregate(address, (a, s) => a.Replace(s, String.Empty));
If it's a large string, that would be slow. Instead you can replace all strings in a single run through the string, which is much faster. I made a method for that in this answer.

LINQ makes this easy and readable. This requires normalized data though, particularly in that it is case-sensitive.
List<string> ignoreList = new List<string>()
{
"North",
"South",
"East",
"West"
};
string s = "123 West 5th St"
.Split(' ') // Separate the words to an array
.ToList() // Convert array to TList<>
.Except(ignoreList) // Remove ignored keywords
.Aggregate((s1, s2) => s1 + " " + s2); // Reconstruct the string

Why not juts Keep It Simple ?
public static string Trim(string text)
{
var rv = text.trim();
foreach (var ignore in ignoreList) {
if(tv.EndsWith(ignore) {
rv = rv.Replace(ignore, string.Empty);
}
}
return rv;
}

You can do this using and expression if you like, but it's easier to turn it around than using a Aggregate. I would do something like this:
string s = "14th Avenue North"
ignoreList.ForEach(i => s = s.Replace(i, ""));
//result is "14th Avenue "

public static string Trim(string text)
{
var rv = text;
foreach (var ignore in ignoreList)
rv = rv.Replace(ignore, "");
return rv;
}
Updated For Gabe
public static string Trim(string text)
{
var rv = "";
var words = text.Split(" ");
foreach (var word in words)
{
var present = false;
foreach (var ignore in ignoreList)
if (word == ignore)
present = true;
if (!present)
rv += word;
}
return rv;
}

If you have a list, I think you're going to have to touch all the items. You could create a massive RegEx with all your ignore keywords and replace to String.Empty.
Here's a start:
(^|\s+)(North|South|East|West){1,2}(ern)?(\s+|$)
If you have a single RegEx for ignore words, you can do a single replace for each phrase you want to pass to the algorithm.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# split by regex - c#

You can use Regex class: https://msdn.microsoft.com/pl-pl/library/ze12yx1d%28v=vs.110%29.aspx But first of all (as it was said) you need to clarify for yourself how you will identify string that you want.

Related

Replace Multiple References of a pattern with Regex

Omit unnecessary parts in string array

Regex: C# extract text within double quotes

Remove String After Determinate String

string replace using a List<string>

Categories

Resources