i feel dumb for asking a most likely silly question.
I am helping someone getting the results he wishes for his custom compiler that reads all lines of an xml file in one string so it will look like below, and since he wants it to "Support" to call variables inside the array worst case scenario would look like below:
"Var1 = [5,4,3,2]; Var2 = [2,8,6,Var1;4];"
What i need is to find the first ";" after "[" and "]" and split it, so i stand with this:
"Var1 = [5,4,3,2];
It will also have to support multiple "[", "]" for example:
"Var2 = [5,Var1,[4],2];"
EDIT: There may also be Data in between the last "]" and ";"
For example:
"Var2 = [5,[4],2]Var1;
What can i do here? Im kind of stuck.
You can try regular expressions, e.g.
string source = "Var1 = [5,4,3,2]; Var2 = [2,8,6,Var1;4];";
// 1. final (or the only) chunk doesn't necessary contain '];':
// "abc" -> "abc"
// 2. chunk has at least one symbol except '];'
string pattern = ".+?(][a-zA-Z0-9]*;|$)";
var items = Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value)
.ToArray();
Console.Write(string.Join(Environment.NewLine, items));
Outcome:
Var1 = [5,4,3,2]abc123;
Var2 = [2,8,6,Var1;4];
^([^;]+);
This regex should work for all.
You can use it like here:
string[] lines =
{
"Var1 = [5,4,3,2]; Var2 = [2,8,6,Var1;4];",
"Var2 = [5,[4],2]Var1; Var2 = [2,8,6,Var1;4];"
};
Regex pattern = new Regex(#"^([^;]+);");
foreach (string s in lines){
Match match = pattern.Match(s);
if (match.Success)
{
Console.WriteLine(match.Value);
}
}
The explanation is:
^ means starts with and is [^;] anything but a semicolon
+ means repeated one or more times and is ; followed by a semicolon
This will find Var1 = [5,4,3,2]; as well as Var1 = [5,4,3,2];
You can see the output HERE
public static string Extract(string str, char splitOn)
{
var split = false;
var count = 0;
var bracketCount = 0;
foreach (char c in str)
{
count++;
if (split && c == splitOn)
return str.SubString(0, count);
if (c == '[')
{
bracketCount++;
split = false;
}
else if (c == ']')
{
bracketCount--;
if (bracketCount == 0)
{
split = true;
}
else if (bracketCount < 0)
throw new FormatException(); //?
}
}
return str;
}
Related
This question already has answers here:
How can I Split(',') a string while ignore commas in between quotes?
(3 answers)
C# Regex Split - commas outside quotes
(7 answers)
Closed 5 years ago.
I need to split a csv file by comma apart from where the columns is between quote marks. However, what I have here does not seem to be achieving what I need and comma's in columns are being split into separate array items.
public List<string> GetData(string dataFile, int row)
{
try
{
var lines = File.ReadAllLines(dataFile).Select(a => a.Split(';'));
var csv = from line in lines select (from piece in line select piece.Split(',')).ToList();
var foo = csv.ToList();
var result = foo[row][0].ToList();
return result;
}
catch
{
return null;
}
}
private const string QUOTE = "\"";
private const string ESCAPED_QUOTE = "\"\"";
private static char[] CHARACTERS_THAT_MUST_BE_QUOTED = { ',', '"', '\n' };
public static string Escape(string s)
{
if (s.Contains(QUOTE))
s = s.Replace(QUOTE, ESCAPED_QUOTE);
if (s.IndexOfAny(CHARACTERS_THAT_MUST_BE_QUOTED) > -1)
s = QUOTE + s + QUOTE;
return s;
}
I am not sure where I can use my escape function in this case.
Example:
Degree,Graduate,08-Dec-17,Level 1,"Advanced, Maths"
The string Advanced, Maths are being split into two different array items which I don't want
You could use regex, linq or just loop through each character and use Booleans to figure out what the current behaviour should be. This question actually got me thinking, as I'd previously just looped through and acted on each character. Here is Linq way of breaking an entire csv document up, assuming the end of line can be found with ';':
private static void Main(string[] args)
{
string example = "\"Hello World, My name is Gumpy!\",20,male;My sister's name is Amy,29,female";
var result1 = example.Split(';')
.Select(s => s.Split('"')) // This will leave anything in abbreviation marks at odd numbers
.Select(sl => sl.Select((ss, index) => index % 2 == 0 ? ss.Split(',') : new string[] { ss })) // if it's an even number split by a comma
.Select(sl => sl.SelectMany(sc => sc));
Console.WriteLine("Press any key to continue.");
Console.ReadKey();
}
Not sure how this performes - but you can solve that with Linq.Aggregate like this:
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static IEnumerable<string> SplitIt(
char[] splitters,
string text,
StringSplitOptions opt = StringSplitOptions.None)
{
bool inside = false;
var result = text.Aggregate(new List<string>(), (acc, c) =>
{
// this will check each char of your given text
// and accumulate it in the (empty starting) string list
// your splitting chars will lead to a new item put into
// the list if they are not inside. inside starst as false
// and is flipped anytime it hits a "
// at end we either return all that was parsed or only those
// that are neither null nor "" depending on given opt's
if (!acc.Any()) // nothing in yet
{
if (c != '"' && (!splitters.Contains(c) || inside))
acc.Add("" + c);
else if (c == '"')
inside = !inside;
else if (!inside && splitters.Contains(c)) // ",bla"
acc.Add(null);
return acc;
}
if (c != '"' && (!splitters.Contains(c) || inside))
acc[acc.Count - 1] = (acc[acc.Count - 1] ?? "") + c;
else if (c == '"')
inside = !inside;
else if (!inside && splitters.Contains(c)) // ",bla"
acc.Add(null);
return acc;
}
);
if (opt == StringSplitOptions.RemoveEmptyEntries)
return result.Where(r => !string.IsNullOrEmpty(r));
return result;
}
public static void Main()
{
var s = ",,Degree,Graduate,08-Dec-17,Level 1,\"Advanced, Maths\",,";
var spl = SplitIt(new[]{','}, s);
var spl2 = SplitIt(new[]{','}, s, StringSplitOptions.RemoveEmptyEntries);
Console.WriteLine(string.Join("|", spl));
Console.WriteLine(string.Join("|", spl2));
}
}
Output:
|Degree|Graduate|08-Dec-17|Level 1|Advanced, Maths||
Degree|Graduate|08-Dec-17|Level 1|Advanced, Maths
The function gets comma separated fields within a string, excluding commas embedded in a quoted field
The assumptions
It should return empty fields ,,
There are no quotes within a quote field (as per the example)
The method
I uses a for loop with i as a place holder of the current field
It scans for the next comma or quote and if it finds a quote it scans for the next comma to create the field
It needed to be efficient otherwise we would use regex or Linq
The OP didn't want to use a CSV library
Note : There is no error checking, and scanning each character would be faster this was just easy to understand
Code
public List<string> GetFields(string line)
{
var list = new List<string>();
for (var i = 0; i < line.Length; i++)
{
var firstQuote = line.IndexOf('"', i);
var firstComma = line.IndexOf(',', i);
if (firstComma >= 0)
{
// first comma is before the first quote, then its just a standard field
if (firstComma < firstQuote || firstQuote == -1)
{
list.Add(line.Substring(i, firstComma - i));
i = firstComma;
continue;
}
// We have found quote so look for the next comma afterwards
var nextQuote = line.IndexOf('"', firstQuote + 1);
var nextComma = line.IndexOf(',', nextQuote + 1);
// if we found a comma, then we have found the end of this field
if (nextComma >= 0)
{
list.Add(line.Substring(i, nextComma - i));
i = nextComma;
continue;
}
}
list.Add(line.Substring(i)); // if were are here there are no more fields
break;
}
return list;
}
Tests 1
Degree,Graduate,08-Dec-17,Level 1,"Advanced, Maths",another
Degree
Graduate
08-Dec-17
Level 1
"Advanced, Maths"
another
Tests 2
,Degree,Graduate,08-Dec-17,\"asdasd\",Level 1,\"Advanced, Maths\",another
<Empty Line>
Degree
Graduate
08-Dec-17
"asdasd"
Level 1
"Advanced, Maths"
another
Consider the following english phrase
FRIEND AND COLLEAGUE AND (FRIEND OR COLLEAGUE AND (COLLEAGUE AND FRIEND AND FRIEND))
I want to be able to programmatically change arbitrary phrases, such as above, to something like:
SELECT * FROM RelationTable R1 JOIN RelationTable R2 ON R2.RelationName etc etc WHERE
R2.RelationName = FRIEND AND R2.RelationName = Colleague AND (R3.RelationName = FRIENd,
etc. etc.
My question is. How do I take the initial string, strip it of the following words and symbols : AND, OR, (, ),
Then change each word, and create a new string.
I can do most of it, but my main problem is that if I do a string.split and only get the words I care for, I can't really replace them in the original string because I lack their original index. Let me explain in a smaller example:
string input = "A AND (B AND C)"
Split the string for space, parenthesies, etc, gives: A,B,C
input.Replace("A", "MyRandomPhrase")
But there is an A in AND.
So I moved into trying to create a regular expression that matches exact words, post split, and replaces. It started to look like this:
"(\(|\s|\))*" + itemOfInterest + "(\(|\s|\))+"
Am I on the right track or am I overcomplicating things..Thanks !
You can try using Regex.Replace, with \b word boundary regex
string input = "A AND B AND (A OR B AND (B AND A AND A))";
string pattern = "\\bA\\b";
string replacement = "MyRandomPhrase";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
class Program
{
static void Main(string[] args)
{
string text = "A AND (B AND C)";
List<object> result = ParseBlock(text);
Console.ReadLine();
}
private static List<object> ParseBlock(string text)
{
List<object> result = new List<object>();
int bracketsCount = 0;
int lastIndex = 0;
for (int i = 0; i < text.Length; i++)
{
char c = text[i];
if (c == '(')
bracketsCount++;
else if (c == ')')
bracketsCount--;
if (bracketsCount == 0)
if (c == ' ' || i == text.Length - 1)
{
string substring = text.Substring(lastIndex, i + 1 - lastIndex).Trim();
object itm = substring;
if (substring[0] == '(')
itm = ParseBlock(substring.Substring(1, substring.Length - 2));
result.Add(itm);
lastIndex = i;
}
}
return result;
}
}
How can I split comma separated strings with quoted strings that can also contain commas?
Example input:
John, Doe, "Sid, Nency", Smith
Expected output:
John
Doe
Sid, Nency
Smith
Split by commas was ok, but I've got requirement that strings like "Sid, Nency" are allowed. I tried to use regexes to split such values. Regex ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)" is from Java question and it is not working good for my .NET code. It doubles some strings, finds extra results etc.
So what is the best way to split such strings?
It's because of the capture group. Just turn it into a non-capture group:
",(?=(?:[^""]*""[^""]*"")*[^""]*$)"
^^
The capture group is including the captured part in your results.
ideone demo
var regexObj = new Regex(#",(?=(?:[^""]*""[^""]*"")*[^""]*$)");
regexObj.Split(input).Select(s => s.Trim('\"', ' ')).ForEach(Console.WriteLine);
And just trim the results.
Just go through your string. As you go through your string keep track
if you're in a "block" or not. If you're - don't treat the comma as
a comma (as a separator). Otherwise do treat it as such. It's a simple
algorithm, I would write it myself. When you encounter first " you enter
a block. When you encounter next ", you end that block you were, and so on.
So you can do it with one pass through your string.
import java.util.ArrayList;
public class Test003 {
public static void main(String[] args) {
String s = " John, , , , \" Barry, John \" , , , , , Doe, \"Sid , Nency\", Smith ";
StringBuilder term = new StringBuilder();
boolean inQuote = false;
boolean inTerm = false;
ArrayList<String> terms = new ArrayList<String>();
for (int i=0; i<s.length(); i++){
char ch = s.charAt(i);
if (ch == ' '){
if (inQuote){
if (!inTerm) {
inTerm = true;
}
term.append(ch);
}
else {
if (inTerm){
terms.add(term.toString());
term.setLength(0);
inTerm = false;
}
}
}else if (ch== '"'){
term.append(ch); // comment this out if you don't need it
if (!inTerm){
inTerm = true;
}
inQuote = !inQuote;
}else if (ch == ','){
if (inQuote){
if (!inTerm){
inTerm = true;
}
term.append(ch);
}else{
if (inTerm){
terms.add(term.toString());
term.setLength(0);
inTerm = false;
}
}
}else{
if (!inTerm){
inTerm = true;
}
term.append(ch);
}
}
if (inTerm){
terms.add(term.toString());
}
for (String t : terms){
System.out.println("|" + t + "|");
}
}
}
I use the following code within my Csv Parser class to achieve this:
private string[] ParseLine(string line)
{
List<string> results = new List<string>();
bool inQuotes = false;
int index = 0;
StringBuilder currentValue = new StringBuilder(line.Length);
while (index < line.Length)
{
char c = line[index];
switch (c)
{
case '\"':
{
inQuotes = !inQuotes;
break;
}
default:
{
if (c == ',' && !inQuotes)
{
results.Add(currentValue.ToString());
currentValue.Clear();
}
else
currentValue.Append(c);
break;
}
}
++index;
}
results.Add(currentValue.ToString());
return results.ToArray();
} // eo ParseLine
If you find the regular expression too complex you can do it like this:
string initialString = "John, Doe, \"Sid, Nency\", Smith";
IEnumerable<string> splitted = initialString.Split('"');
splitted = splitted.SelectMany((str, index) => index % 2 == 0 ? str.Split(',') : new[] { str });
splitted = splitted.Where(str => !string.IsNullOrWhiteSpace(str)).Select(str => str.Trim());
I have a string like
a,[1,2,3,{4,5},6],b,{c,d,[e,f],g},h
After split by , I expect getting 5 items, the , in the braces or brackets are ignored.
a
[1,2,3,{4,5},6]
b
{c,d,[e,f],g}
h
There are no whitespaces in the string. Is there a regular expression can make it happen?
You could use this:
var input = "a,[1,2,3,{4,5}],b,{c,d,[e,f]},g";
var result =
(from Match m in Regex.Matches(input, #"\[[^]]*]|\{[^}]*}|[^,]+")
select m.Value)
.ToArray();
This will find any matches like:
[ followed by any characters other than ], then terminated by ]
{ followed by any characters other than }, then terminated by }
One or more characters other than ,
This will work, for you sample input, but it cannot handle nested groups like [1,[2,3],4] or {1,{2,3},4}. For that, I'd recommend something a bit more powerful regular expressions. Since you've mentioned in your comments that you're trying to parse Json, I'd recommend you check out the excellent Json.NET library.
Regular expressions * cannot be used to parse nested structures **.
( ∗ True regular expressions without non-regular extensions )
( ∗∗ Nested structures of arbitrary depth and interleaving )
But parsing by hand is not that difficult. First you need to find the , that are not in brackets or braces.
string input = "a,[1,2,3,{4,5},6],b,{c,d,[e,f],g},h";
var delimiterPositions = new List<int>();
int bracesDepth = 0;
int bracketsDepth = 0;
for (int i = 0; i < input.Length; i++)
{
switch (input[i])
{
case '{':
bracesDepth++;
break;
case '}':
bracesDepth--;
break;
case '[':
bracketsDepth++;
break;
case ']':
bracketsDepth--;
break;
default:
if (bracesDepth == 0 && bracketsDepth == 0 && input[i] == ',')
{
delimiterPositions.Add(i);
}
break;
}
}
And then split the string at these positions.
public List<string> SplitAtPositions(string input, List<int> delimiterPositions)
{
var output = new List<string>();
for (int i = 0; i < delimiterPositions.Count; i++)
{
int index = i == 0 ? 0 : delimiterPositions[i - 1] + 1;
int length = delimiterPositions[i] - index;
string s = input.Substring(index, length);
output.Add(s);
}
string lastString = input.Substring(delimiterPositions.Last() + 1);
output.Add(lastString);
return output;
}
Even if it looks ugly and there is no regex involved (not sure if it's a requirement or a nice-to-have in the original question), this alternative should work:
class Program
{
static void Main(string[] args)
{
var input = "a,[1,2,3,{4,5}],b,{c,d,[e,f]},g";
var output = "<root><n>" +
input.Replace(",", "</n><n>")
.Replace("[", "<n1><n>")
.Replace("]", "</n></n1>")
.Replace("{", "<n2><n>")
.Replace("}", "</n></n2>") +
"</n></root>";
var elements = XDocument
.Parse(output, LoadOptions.None)
.Root.Elements()
.Select(e =>
{
if (!e.HasElements)
return e.Value;
else
{
return e.ToString()
.Replace(" ", "")
.Replace("\r\n", "")
.Replace("</n><n>", ",")
.Replace("<n1>", "[")
.Replace("</n1>", "]")
.Replace("<n2>", "{")
.Replace("</n2>", "}")
.Replace("<n>", "")
.Replace("</n>", "")
.Replace("\r\n", "")
;
}
}).ToList();
}
}
I'm tring to remove the char '.' from a string except the last occurrence; for example the string
12.34.56.78
should became
123456.78
I'm using this loop:
while (value != null && value.Count(c => c == '.') > 1)
{
value = value.Substring(0, value.IndexOf('.')) + value.Substring(value.IndexOf('.') + 1);
}
I wonder if there is a cleaner way (maybe using linq?) to do this whitout an explicit loop?
(I know there is a very similar question but is about perl and things are quite different)
int lastIndex = value.LastIndexOf('.');
if (lastIndex > 0)
{
value = value.Substring(0, lastIndex).Replace(".", "")
+ value.Substring(lastIndex);
}
Perhaps a mixture of string methods and Linq:
string str = "12.34.56.78";
Char replaceChar = '.';
int lastIndex = str.LastIndexOf(replaceChar);
if (lastIndex != -1)
{
IEnumerable<Char> chars = str
.Where((c, i) => c != replaceChar || i == lastIndex);
str = new string(chars.ToArray());
}
Demo
I would do that way:
search for the last '.' ;
substring [0 .. indexOfLastDot] ;
remove in place any '.' of the substring
concatenate the substring with the rest of the original string, [indexOfLastDot .. remaining]
OR
search for the last '.'
for each enumerated char of the string
if it’s a '.' and i ≠ indexOfLastDot, remove it
var splitResult = v.Split(new char[] { '.' }).ToList();
var lastSplit = splitResult.Last();
splitResult.RemoveAt(splitResult.Count - 1);
var output = string.Join("", splitResult) + "." + lastSplit;
I would do it that way. The neatest way isn't always the shortest way.
Something like this should do the trick. Whether it is "good" or not is another matter. Note also that there is no error checking. Might want to check for null or empty string and that the string has at least one "." in it.
string numbers = "12.34.56.78";
var parts = String.Split(new char [] {'.'});
string newNumbers = String.Join("",parts.Take(parts.Length-1)
.Concat(".")
.Concat(parts.Last());
I don't claim that this would have great performance characteristics for long strings, but it does use Linq ;-)
you do not have to use loop:
//string val = "12345678";
string val = "12.34.56.78";
string ret = val;
int index = val.LastIndexOf(".");
if (index >= 0)
{
ret = val.Substring(0, index).Replace(".", "") + val.Substring(index);
}
Debug.WriteLine(ret);