How to split a comma delimited string with embedded quoted strings? - c#

I have a string and I want to split this string into an array as follows:
string stemp = "a,b,c,\"d,e f\",g,h";
array[0] = a
array[1] = b
array[2] = c
array[3] = d,e f
array[4] = g
array[5] = h
I have tried following syntax
string array[] = null;
array = stemp.split(',');

This looks like CSV - which is not so simple to parse (when taking escapes into consideration).
I suggest using a CSV parser, such as the TextFieldParser class that lives in the Microsoft.VisualBasic.FileIO namespace.
There are many alternatives, such as FileHelpers.

Using a CSV parser is probably the right solution but you can also use a regular expression:
var stemp = #"a,b,c,""d,e f"",g,h";
var regex = new Regex(#"^(?:""(?<item>[^""]*)""|(?<item>[^,]*))(?:,(?:""(?<item>[^""]*)""|(?<item>[^,]*)))*$");
var array = regex
.Match(stemp)
.Groups["item"]
.Captures
.Cast<Capture>()
.Select(c => c.Value)
.ToArray();
Unfortunately regular expressions tend to be incomprehensible so here are a short description of the individual parts:
""(?<item>[^""]*)""
This matches "d,e f".
(?<item>[^,]*)
This matches a and b etc. Both expressions capture the relevant part to the named group item.
These expressions (lets call them A and B) are combined using an alternation construct and grouped using a non-capturing group:
(?:A|B)
Lets call this new expression C. The entire expression is then (again using a non-capturing group):
^C(?:,C)*$

Related

Get a number and string from string

I have a kinda simple problem, but I want to solve it in the best way possible. Basically, I have a string in this kind of format: <some letters><some numbers>, i.e. q1 or qwe12. What I want to do is get two strings from that (then I can convert the number part to an integer, or not, whatever). The first one being the "string part" of the given string, so i.e. qwe and the second one would be the "number part", so 12. And there won't be a situation where the numbers and letters are being mixed up, like qw1e2.
Of course, I know, that I can use a StringBuilder and then go with a for loop and check every character if it is a digit or a letter. Easy. But I think it is not a really clear solution, so I am asking you is there a way, a built-in method or something like this, to do this in 1-3 lines? Or just without using a loop?
You can use a regular expression with named groups to identify the different parts of the string you are interested in.
For example:
string input = "qew123";
var match = Regex.Match(input, "(?<letters>[a-zA-Z]+)(?<numbers>[0-9]+)");
if (match.Success)
{
Console.WriteLine(match.Groups["letters"]);
Console.WriteLine(match.Groups["numbers"]);
}
You can try Linq as an alternative to regular expressions:
string source = "qwe12";
string letters = string.Concat(source.TakeWhile(c => c < '0' || c > '9'));
string digits = string.Concat(source.SkipWhile(c => c < '0' || c > '9'));
You can use the Where() extension method from System.Linq library (https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.where), to filter only chars that are digit (number), and convert the resulting IEnumerable that contains all the digits to an array of chars, that can be used to create a new string:
string source = "qwe12";
string stringPart = new string(source.Where(c => !Char.IsDigit(c)).ToArray());
string numberPart = new string(source.Where(Char.IsDigit).ToArray());
MessageBox.Show($"String part: '{stringPart}', Number part: '{numberPart}'");
Source:
https://stackoverflow.com/a/15669520/8133067
if possible add a space between the letters and numbers (q 3, zet 64 etc.) and use string.split
otherwise, use the for loop, it isn't that hard
You can test as part of an aggregation:
var z = "qwe12345";
var b = z.Aggregate(new []{"", ""}, (acc, s) => {
if (Char.IsDigit(s)) {
acc[1] += s;
} else {
acc[0] += s;
}
return acc;
});
Assert.Equal(new [] {"qwe", "12345"}, b);

c# regex for number/number/string pattern

I am trying to find {a number} / { a number } / {a string} patterns. I can get number / number to work, but when I add / string it does not.
Example of what I'm trying to find:
15/00969/FUL
My regex:
Regex reg = new Regex(#"\d/\d/\w");
You should use + quantifier that means 1 or more times and it applies to the pattern preceding the quantifier, and I would add word boundaries \b to only match whole words:
\b\d+/\d+/\w+\b
C# code (using verbatim string literal so that we just could copy/paste regular expressions from testing tools or services without having to escape backslashes):
var rx = new Regex(#"\b\d+/\d+/\w+\b");
If you want to precise the number of characters corresponding to some pattern, you can use {}s:
\b\d{2}/\d{5}/\w{3}\b
And, finally, if you have only letters in the string, you can use \p{L} (or \p{Lu} to only capture uppercase letters) shorthand class in C#:
\b\d{2}/\d{5}/\p{L}{3}\b
Sample code (also featuring capturing groups introduced with unescaped ( and )):
var rx = new Regex(#"\b(\d{2})/(\d{5})/(\p{L}{3})\b");
var res = rx.Matches("15/00969/FUL").OfType<Match>()
.Select(p => new
{
first_number = p.Groups[1].Value,
second_number = p.Groups[2].Value,
the_string = p.Groups[3].Value
}).ToList();
Output:
Regex reg = new Regex(#"\d+/\d+/\w+");
Complete example:
Regex r = new Regex(#"(\d+)/(\d+)/(\w+)");
string input = "15/00969/FUL";
var m = r.Match(input);
if (m.Success)
{
string a = m.Groups[1].Value; // 15
string b = m.Groups[2].Value; // 00969
string c = m.Groups[3].Value; // FUL
}
You are missing the quantifiers in your Regex
If you want to match 1 or more items you should use the +.
If you already know the number of items you need to match, you can specify it using {x} or {x,y} for a range (x and y being two numbers)
So your regex would become:
Regex reg = new Regex(#"\d/+\d+/\w+");
For example if all the elements you want to match have this format ({2 digit}/{5 digit}/{3 letters}), you could write:
Regex reg = new Regex(#"\d/{2}\d{5}/\w{3}");
And that would match 15/00969/FUL
More info on the Regular Expressions can be found here
bool match = new Regex(#"[\d]+[/][\d]+[/][\w]+").IsMatch("15/00969/FUL"); //true
Regular Expression:
[\d]+ //one or more digits
[\w]+ //one or more alphanumeric characters
[/] // '/'-character

Omit unnecessary parts in string array

In C#, I have a string comes from a file in this format:
Type="Data"><Path.Style><Style
or maybe
Type="Program"><Rectangle.Style><Style
,etc. Now I want to only extract the Data or Program part of the Type element. For that, I used the following code:
string output;
var pair = inputKeyValue.Split('=');
if (pair[0] == "Type")
{
output = pair[1].Trim('"');
}
But it gives me this result:
output=Data><Path.Style><Style
What I want is:
output=Data
How to do that?
This code example takes an input string, splits by double quotes, and takes only the first 2 items, then joins them together to create your final string.
string input = "Type=\"Data\"><Path.Style><Style";
var parts = input
.Split('"')
.Take(2);
string output = string.Join("", parts); //note: .net 4 or higher
This will make output have the value:
Type=Data
If you only want output to be "Data", then do
var parts = input
.Split('"')
.Skip(1)
.Take(1);
or
var output = input
.Split('"')[1];
What you can do is use a very simple regular express to parse out the bits that you want, in your case you want something that looks like this and then grab the two groups that interest you:
(Type)="(\w+)"
Which would return in groups 1 and 2 the values Type and the non-space characters contained between the double-quotes.
Instead of doing many split, why don't you just use Regex :
output = Regex.Match(pair[1].Trim('"'), "\"(\w*)\"").Value;
Maybe I missed something, but what about this:
var str = "Type=\"Program\"><Rectangle.Style><Style";
var splitted = str.Split('"');
var type = splitted[1]; // IE Data or Progam
But you will need some error handling as well.
How about a regex?
var regex = new Regex("(?<=^Type=\").*?(?=\")");
var output = regex.Match(input).Value;
Explaination of regex
(?<=^Type=\") This a prefix match. Its not included in the result but will only match
if the string starts with Type="
.*? Non greedy match. Match as many characters as you can until
(?=\") This is a suffix match. It's not included in the result but will only match if the next character is "
Given your specified format:
Type="Program"><Rectangle.Style><Style
It seems logical to me to include the quote mark (") when splitting the strings... then you just have to detect the end quote mark and subtract the contents. You can use LinQ to do this:
string code = "Type=\"Program\"><Rectangle.Style><Style";
string[] parts = code.Split(new string[] { "=\"" }, StringSplitOptions.None);
string[] wantedParts = parts.Where(p => p.Contains("\"")).
Select(p => p.Substring(0, p.IndexOf("\""))).ToArray();

Multi Substring from long string

I have a long string I need to take out only substrings that are between { and }, and turn it into a Json object
This string
sys=t85,fggh{"Name":"5038.zip","Folder":"Root",,"Download":"services/DownloadFile.ashx?"} dsdfg x=565,dfg
{"Name":"5038.zip","Folder":"Root",,"Download":"services/DownloadFile.ashx?"}dfsdfg567
{"Name":"5038.zip","Folder":"Root",,"Download":"services/DownloadFile.ashx?"}sdfs
I have trash inside so I need to extract the substring of the data between { and }
My code is here, but I'm stuck, I can't remove the data that I already taken.
List<JsonTypeFile> AllFiles = new List<JsonTypeFile>();
int lenght = -1;
while (temp.Length>3)
{
lenght = temp.IndexOf("}") - temp.IndexOf("{");
temp=temp.Substring(temp.IndexOf("{"), lenght+1);
temp.Remove(temp.IndexOf("{"), lenght + 1);
var result = JsonConvert.DeserializeObject<SnSafe.JsonTypeFile>(temp);
AllFiles.Add(result);
}
Or using regex you can get the strings like this:
var regex = new Regex("{([^}]*)}");
var matches = regex.Matches(str);
var list = (from object m in matches select m.ToString().Replace("{",string.Empty).Replace("}",string.Empty)).ToList();
var jsonList = JsonConvert.SerializeObject(list);
The str variable containing your string as you provided in your question.
You can use a regex for this but what I would do is use .split ('{') to split into sections, skip the first section, and then using .split('}) to find the first portion of each section.
You can do this using LINQ
var data = temp
.Split('{')
.Skip(1)
.Select(v => v.Split('}').FirstOrDefault());
If I understand correctly, you just want to extract anything in-between the braces and ignore anything else.
The following regular expression should allow you to extract that info:
{[^}]*} (a brace, followed by anything that isn't a brace, followed by a brace)
You can extract all instances and then deserialize them using something along the lines of:
using System.Text.RegularExpressions;
...
List<JsonTypeFile> AllFiles = new List<JsonTypeFile>();
foreach(Match match in Regex.Matches(temp, "{[^}]*}"))
{
var result = JsonConvert.DeserializeObject<SnSafe.JsonTypeFile>(match.Value);
AllFiles.Add(result);
}

Extracting Numbers from String RegEx

I am really struggling with Regular Expressions and can't seem to extract the number from this string
"id":143331539043251,
I've tried with this ... but I'm getting compilation errors
var regex = new Regex(#""id:"\d+,");
Note that the full string contains other numbers I don't want. I want numbers between id: and the ending ,
Try this code:
var match = Regex.Match(input, #"\""id\"":(?<num>\d+)");
var yourNumber = match.Groups["num"].Value;
Then use extracted number yourNumber as a string or parse it to number type.
If all you need is the digits, just match on that:
[0-9]+
Note that I am not using \d as that would match on any digit (such as Arabic numerals) in the .NET regex engine.
Update, following comments on the question and on this answer - the following regex will match the pattern and place the matched numbers in a capturing group:
#"""id"":([0-9]+),"
Used as:
Regex.Match(#"""id"":143331539043251,", #"""id"":([0-9]+),").Groups[1].Value
Which returns 143331539043251.
If you are open to using LINQ try the following (c#):
string stringVariable = "123cccccbb---556876---==";
var f = (from a in stringVariable.ToCharArray() where Char.IsDigit(a) == true select a);
var number = String.Join("", f);

Categories

Resources