Omit unnecessary parts in string array - c#

In C#, I have a string comes from a file in this format:
Type="Data"><Path.Style><Style
or maybe
Type="Program"><Rectangle.Style><Style
,etc. Now I want to only extract the Data or Program part of the Type element. For that, I used the following code:
string output;
var pair = inputKeyValue.Split('=');
if (pair[0] == "Type")
{
output = pair[1].Trim('"');
}
But it gives me this result:
output=Data><Path.Style><Style
What I want is:
output=Data
How to do that?

This code example takes an input string, splits by double quotes, and takes only the first 2 items, then joins them together to create your final string.
string input = "Type=\"Data\"><Path.Style><Style";
var parts = input
.Split('"')
.Take(2);
string output = string.Join("", parts); //note: .net 4 or higher
This will make output have the value:
Type=Data
If you only want output to be "Data", then do
var parts = input
.Split('"')
.Skip(1)
.Take(1);
or
var output = input
.Split('"')[1];

What you can do is use a very simple regular express to parse out the bits that you want, in your case you want something that looks like this and then grab the two groups that interest you:
(Type)="(\w+)"
Which would return in groups 1 and 2 the values Type and the non-space characters contained between the double-quotes.

Instead of doing many split, why don't you just use Regex :
output = Regex.Match(pair[1].Trim('"'), "\"(\w*)\"").Value;

Maybe I missed something, but what about this:
var str = "Type=\"Program\"><Rectangle.Style><Style";
var splitted = str.Split('"');
var type = splitted[1]; // IE Data or Progam
But you will need some error handling as well.

How about a regex?
var regex = new Regex("(?<=^Type=\").*?(?=\")");
var output = regex.Match(input).Value;
Explaination of regex
(?<=^Type=\") This a prefix match. Its not included in the result but will only match
if the string starts with Type="
.*? Non greedy match. Match as many characters as you can until
(?=\") This is a suffix match. It's not included in the result but will only match if the next character is "

Given your specified format:
Type="Program"><Rectangle.Style><Style
It seems logical to me to include the quote mark (") when splitting the strings... then you just have to detect the end quote mark and subtract the contents. You can use LinQ to do this:
string code = "Type=\"Program\"><Rectangle.Style><Style";
string[] parts = code.Split(new string[] { "=\"" }, StringSplitOptions.None);
string[] wantedParts = parts.Where(p => p.Contains("\"")).
Select(p => p.Substring(0, p.IndexOf("\""))).ToArray();

Related

How to split a string every time the character changes?

I'd like to turn a string such as abbbbcc into an array like this: [a,bbbb,cc] in C#. I have tried the regex from this Java question like so:
var test = "aabbbbcc";
var split = new Regex("(?<=(.))(?!\\1)").Split(test);
but this results in the sequence [a,a,bbbb,b,cc,c] for me. How can I achieve the same result in C#?
Here is a LINQ solution that uses Aggregate:
var input = "aabbaaabbcc";
var result = input
.Aggregate(" ", (seed, next) => seed + (seed.Last() == next ? "" : " ") + next)
.Trim()
.Split(' ');
It aggregates each character based on the last one read, then if it encounters a new character, it appends a space to the accumulating string. Then, I just split it all at the end using the normal String.Split.
Result:
["aa", "bb", "aaa", "bb", "cc"]
I don't know how to get it done with split. But this may be a good alternative:
//using System.Linq;
var test = "aabbbbcc";
var matches = Regex.Matches(test, "(.)\\1*");
var split = matches.Cast<Match>().Select(match => match.Value).ToList();
There are several things going on here that are producing the output you're seeing:
The regex combines a positive lookbehind and a negative lookahead to find the last character that matches the one preceding it but does not match the one following it.
It creates capture groups for every match, which are then fed into the Split method as delimiters. The capture groups are required by the negative lookahead, specifically the \1 identifier, which basically means "the value of the first capture group in the statement" so it can not be omitted.
Regex.Split, given a capture group or multiple capture groups to match on when identifying the splitting delimiters, will include the delimiters used for every individual Split operation.
Number 3 is why your string array is looking weird, Split will split on the last a in the string, which becomes split[0]. This is followed by the delimiter at split[1], etc...
There is no way to override this behaviour on calling Split.
Either compensation as per Gusman's answer or projecting the results of a Matches call as per Ruard's answer will get you what you want.
To be honest I don't exactly understand how that regex works, but you can "repair" the output very easily:
Regex reg = new Regex("(?<=(.))(?!\\1)", RegexOptions.Singleline);
var res = reg.Split("aaabbcddeee").Where((value, index) => index % 2 == 0 && value != "").ToArray();
Could do this easily with Linq, but I don't think it's runtime will be as good as regex.
A whole lot easier to read though.
var myString = "aaabbccccdeee";
var splits = myString.ToCharArray()
.GroupBy(chr => chr)
.Select(grp => new string(grp.Key, grp.Count()));
returns the values `['aaa', 'bb', 'cccc', 'd', 'eee']
However this won't work if you have a string like "aabbaa", you'll just get ["aaaa","bb"] as a result instead of ["aa","bb","aa"]

Exclude first and last quotation of string in regex result

I'm running a little c# program where I need to extract the escape-quoted words from a string.
Sample code from linqpad:
string s = "action = 0;\r\ndir = \"C:\\\\folder\\\\\";\r\nresult";
var pattern = "\".*?\"";
var result = Regex.Split(s, pattern);
result.Dump();
Input (actual input contains many more escaped even-number-of quotes):
"action = 0;\r\ndir = \"C:\\\\folder\\\\\";\r\nresult"
expected result
"C:\\folder\\"
actual result (2 items)
"action = 0;
dir = "
_____
";
result"
I get exactly the opposite of what I require. How can I make the regex ignore the starting (and ending) quote of the actual string? Why does it include them in the search? I've used the regex from similar SO questions but still don't get the intended result. I only want to filter by escape quotes.
Instead of using Regex.Split, try Regex.Match.
You don't need RegEx. Simply use String.Split(';') and the second array element will have the path you need. You can then Trim() it to get rid of the quotes and Remove() to get rid of the ndir part. Something like:
result = s.Split(';')[1].Trim("\r ".ToCharArray()).Remove(0, 7).Trim('"');

C# split by regex

I have a little problem that I don't know how to call it like, so I will do my best to explain you that.
String text = "Random text over here boyz, I dunno what to do";
I want to take by split only over here boyz for example, I want to let split the word text and the word , and it will show me the whole text that in thoose 2 strings. Any ideas?
Thank you,
Sagi.
From your comments I get that from this string:
foo bar id="baz" qux
You want to obtain the value baz, because it is in the id="{text}" pattern.
For that you can use a regular expression:
string result = Regex.Match(text, "id=\"(.*?)\"").Groups[1].Value;
Note that this will match any character. Also note that this will yield false positives, like fooid="bar", and that this won't match unquoted values.
So all in all, for parsing HTML, you should not use regular expressions. Try HtmlAgilityPack and an XPath expression.
There is a Split overload that can receive multiple string seperators:
var rrr = text.Split(new string[] { ",", "text" }, StringSplitOptions.None);
If you would like to extract only the text between these two strings using regex you can do something like this:
var pattern = #"text(.*),";
var a = new Regex(pattern).Match(text);
var result = a.Groups[1];
You can use Regex class:
https://msdn.microsoft.com/pl-pl/library/ze12yx1d%28v=vs.110%29.aspx
But first of all (as it was said) you need to clarify for yourself how you will identify string that you want.
in first case you can use
string stringResult;
if (text.Contains("over here boyz"))
stringResult = string.Empty;
else
stringResult = "over here boyz";
but the second case can solve by this code
String text = "Random text over here boyz, I dunno what to do";
//Second dream without whitespace
var result = Regex.Split(text, " *text *| *, *");
foreach (var x in result)
{
Console.WriteLine(x);
}
//Second dream with whitespace
result = Regex.Split(text, "text|,");
foreach (var x in result)
{
Console.WriteLine(x);
}
You can train to write Regex with this tool http://www.regexbuddy.com/ or http://www.regexr.com/

How do I get quoted fields from a delimited string as a list of unquoted values using LINQ?

Original text line is:
"125"|"Bio Methyl"|"99991"|"OPT12"|"CB"|"1"|"12"|"5"|"23"
Expected string list is free of double quotes and split by |:
125
Bio Methyl
99991
The text may contain empty quoted strings as in (former "OPT12" value now empty ""):
"125"|"Bio Methyl"|"99991"|""|"CB"|"1"|"12"|"5"|"23"
So I checked these two questions & answers :QA1 and QA2 to derive my solution.
var eList = uEList.ElementAt(i).Split(BarDelimiter);
var xList = eList.ElementAt(0).Where(char.IsDigit).ToList();
Of course it doesn't work the way I need it to be since xList is a list with elements like this: xList(0) = 1, xList(1) = 2, xList(2) = 5
I do not want to write another line to join them because this doesn't look like a suitable solution. There has to be something better with LINQ right?
How about this:
// Based on OPs comment: preserve empty non-quoted entries.
var splitOptions = StringSplitOptions.None;
//change to the below if empty entries should be removed
//var splitOptions = StringSplitOptions.None;
var line = "\"125\"|\"Bio Methyl\"|\"99991\"|\"OPT12\"|\"CB\"|\"1\"|\"12\"|\"5\"|\"23\"";
var result = line
.Split(new[] { "|" }, splitOptions)
.Select(p => p.Trim('\"'))
.ToList();
Console.WriteLine(string.Join(", ", result));
The Split(...) statement splits the input into an array with parts like
{ \"99991\", \"OPT12\", ... };
The p.Trim('\"') statement removes the leading and trailing quote from each of the parts.
As an alternative to the trimming, if there's no " in your values, you could simply sanitize the input before splitting it. You can do so by replacing the " symbol by nothing (either "" or string.Empty).
Your Split code would then give the correct result afterwards:
string uEList = "\"125\"|\"Bio Methyl\"|\"99991\"|\"OPT12\"|\"CB\"|\"1\"|\"12\"|\"5\"|\"23\"";
var eList = uEList.Replace("\"", string.Empty).Split(BarDelimiter);

Splitting a String using regex in c#

I have a program to compare text files. Takes in 2 files spits out 1. The input files have lines of data similar to this
tv_rocscores_DeDeP005M3TSub.csv FMR: 0.0009 FNMR: 0.023809524 SCORE: -4 Conformity: True
tv_..............P006............................................................
tv_..............P007............................................................
etc etc.
For my initial purposes, I was splitting the lines based on spaces, to get the respective values. However, for the first field, tv_rocscores_DeDeP005M3TSbu.csv i only need P005 and not the rest. I cannot opt for position number as well, because the position of P005 in the phrase is not the same for every file.
Any advise on how i split this so that i can identify my first field with only P005??
Your question is a bit unclear. If you're looking for pattern, say "P + three digits", e.g. "P005" you can use regular expressions:
String str = #"tv_rocscores_DeDeP005M3TSub.csv FMR: 0.0009 FNMR: 0.023809524 SCORE: -4 Conformity: True";
String[] parts = str.Split(' ');
parts[0] = Regex.Match(parts[0], #"P\d\d\d").Value; // <- "P005"
To extract the desired part I would try something like this:
var parts = str.Split(' ');
var number = Regex.Match(parts[0], ".*?(?<num>P\d+).*?").Groups["num"].Value;
Or if you know its only three digits you could change the regular expression to .*?(?<num>P\d{3}).*?
Hope that solves your problem :)
How about just checking if the first field contains P005?
bool hasP005 = field1.Contains("P005");
Your question isn't clear. Can't you just replace the first field with your string?
string[] parts = str.Split(' ');
parts[0] = "P005";
Are you looking to try field the field that contains that string? if so then you can use some linq
var field = s.Split(' ').Where(x => x.Contains("P005")).ToList()[0];

Categories

Resources