I am using Sprache to parse a legacy file.
The file has the following structure very similar to a key and value dictionary:
Entity
{
propertyA simple
propertyB 10-1
propertyC "first"
propertyD "I am a line that spawns
to another line"
propertyE "second"
propertyF 1,2,3,4,5,6,\
7,8,9,10,11,\
12,13,14
propertyG "one","two","three",\
"four","five","six","seven",\
"eight","nine"
}
I am able to process the file correctly but not when it has the "\" line continuation.
The only dirty hack I did is to replace the string sent as input to the parser and replace the characters so there is no line continuation:
public static Document ParseLegsacyFile(string input)
{
// HACK
return Document.Parse(input.Replace("\\\r\n", string.Empty));
}
I don't want to carry out this technical debt...
Is there anyway to instruct the parser to ignore the pattern "\" and "\r\n" and replace to a string empty?
I already tried the Except (with Or), Return and Then without much success.
Here is part of the parsers i am using. The following ones are just for the "value" part:
public static readonly Parser<GenericObject> Value =
from value in Parse.AnyChar.Until(Parse.LineEnd).Text()
select new GenericObject(value);
private static readonly Parser<GenericString> SingleString =
from result in (from open in Parse.Char(Quote)
from content in Parse.CharExcept(Quote).Many().Text()
from close in Parse.Char(Quote)
select content).Token()
select new GenericString(result);
public static readonly Parser<GenericString> StringValue =
from value in SingleString .DelimitedBy(Parse.Char(Char.Parse(Comma)))
select new StringLiteral(string.Join(Comma, value));
Old Question, but answer may help someone:
You can remove the continuation char "\" and combine their lines using Sprache as below:
var text = #"...text here...";
var result= RemoveSlash(text).ToList();
foreach (var l in result)
Console.WriteLine(l);
IEnumerable<string> RemoveSlash(string text)
{
// return;
Parser<string> Eol = Parse.String("\\" + Environment.NewLine).Text();
var oneLine = Parse.AnyChar.Until(Parse.LineEnd).Text();
var multiLine =
from l in Parse.AnyChar.Until(Eol).Text().Many()
from c in oneLine.Once()
let m = string.Join("", l.Concat(c))
select m;
var lines = multiLine.Or(oneLine);
var result = lines.Many().Parse(text);
return result;
}
Try it
Related
For the purposes of my program, I need to read in data from a CSV file, process it, and then create a new batch of CSV files based on the data I read in. I am using the CsvHelper library if that makes a difference.
The program works perfectly, except for one part. I need my output CSV files to have each field wrapped in double quotes. Normally I would just use the escape mechanism for c#, but I ran into a bizarre issue that I can't seem to work around. Whenever I try to escape a " mark, I end up getting more than just the one.
Desired output example:
date_start,date_end,current_year
"2017-08-01","2018-06-20","2017"
"2017-08-01","2018-06-20","2017"
Observed output:
date_start,date_end,current_year
"""2017-08-01""","""2018-06-20""","""2017"""
"""2017-08-01""","""2018-06-20""","""2017"""
It puts in a pair of double quotes when I am just looking for one. I've looked around online and tried several different approaches to try and get just the one double quote, but I can't seem to make it work. Here is the relevant code and the different methods I have tried:
HashSet<CustomDataWrapper> hs = new HashSet<CustomDataWrapper>();
foreach (CustomDataType record in records) {
string quote = "\"";
String method1 = (String)(record.date_start);
String method2 = quote + record.date_start + quote;
String method3 = "\"" + record.date_start + "\"";
String method4 = "\U0022" + record.date_start + "\U0022"
hs.Add(data);
}
csvWriter.WriteRecords(hs);
I am honestly not sure what's going on here, as I've tried adding the quote to just one side of the string and got this output:
date_start,date_end,current_year
"2017-08-01""","2018-06-20""","2017"""
"2017-08-01""","2018-06-20""","2017"""
Any help would be greatly appreciated. Thank you!
Stop adding quotes to the fields. Try creating configuration object and set the 'QuoteAllFields' to true and pass this object to CSVWriter as below
var config = new CsvHelper.Configuration.CsvConfiguration();
config.QuoteAllFields = true;
using (var csvReader = new CsvHelper.CsvReader(streamReader, config))
{
}
Hope this works.
Here's what's happening. Let's start with this code:
var records = new []
{
new { H1 = "Foo", H2 = "Bar" },
new { H1 = "Qaz", H2 = "Waz" }
};
using (var tw = new System.IO.StringWriter())
{
using (var cw = new CsvHelper.CsvWriter(tw))
{
cw.WriteRecords(records);
var output = tw.ToString();
Console.WriteLine(output);
}
}
If I run that I get this output:
H1,H2
Foo,Bar
Qaz,Waz
If I insert one double quote into one field of the source data, like so:
var records = new []
{
new { H1 = "Foo", H2 = "Bar" },
new { H1 = "Qaz\"", H2 = "Waz" }
};
...then the output becomes:
H1,H2
Foo,Bar
"Qaz""",Waz
The library knows to be able to output a double quote as part of the data then it must wrap the entire field in double quotes and it must escape the double quote within the field, turning a single " into "".
I've the following string that I get from a method. I would like to parse it and make pairs. The order of input string will not change.
INPUT:
ku=value1,ku=value2,ku=value3,ku=value4,ku=value5,lu=value6,lu=value7,lu=value8,lu=value9
OUTPUT
Name value1
Title value2
School value3
.
.
.
Age value9
I think I can read through the string and assign value to the left hand side as I go and so on. However, I am very new to C#.
Use string.Split and split imput string to list key-value pair then split each pair to key and value. Tahts all.
You can do something like this:
void Main()
{
string str = "ku=value1,ku=value2,ku=value3,ku=value4,ku=value5,lu=value6,lu=value7,lu=value8,lu=value9";
var tempStr = str.Split(',');
var result = new List<KeyValue>();
foreach(var val in tempStr)
{
var temp = val.Split('=');
var kv = new KeyValue();
kv.Name = temp[0];
kv.Value = temp[1];
result.Add(kv);
}
}
public class KeyValue
{
public string Name {get;set;}
public string Value{get;set;}
}
If you don't need the first part you can do this using String split as follow
Split String on , using String.Split method creating sequence of string {"ku=value1","ku=value2",...}
Use Linq's Select method to apply an additional transformation
Use Split again on each item on the '=' character
Select the item to the right of the '=', at index 1 of the newly split item
Loop through everything and print your results
Here's the code
var target = "ku=value1,ku=value2,ku=value3,ku=value4,ku=value5,lu=value6,lu=value7,lu=value8,lu=value9";
var split = target.Split(',').Select(a=>a.Split('=')[1]).ToArray();
var names = new[]{"Name","Title","School",...,"Age"};
for(int i=0;i<split.Length;i++)
{
Console.WriteLine(names[i]+"\t"+split[i]);
}
If you want to find out more about how to use these methods you can look at the MSDN documentation for them :
String.Split(char[]) Method
Enumerable.Select Method
I suggest to try this way. Split() plus regular expression
string inputString = "ku=value1,ku=value2,ku=value3,ku=value4,ku=value5,lu=value6,lu=value7,lu=value8,lu=value9";
string pattern = "(.*)=(.*)";
foreach(var pair in inputString.Split(','))
{
var match = Regex.Match(pair,pattern);
Console.WriteLine(string.Format("{0} {1}",match.Groups[1].Value, match.Groups[2].Value));
}
I'm searching for a solution to this case:
I have a Method inside a DLL that receive a string that contains some words as "placeholders/parameters" that will be replaced by a result of another specific method (inside dll too)
Too simplificate: It's a query string received as an argument to be on a method inside a DLL, where X word that matchs a specifc case, will be replaced.
My method receive a string that could be like this:
(on .exe app)
string str = "INSERT INTO mydb.mytable (id_field, description, complex_number) VALUES ('#GEN_COMPLEX_ID#','A complex solution', '#GEN_COMPLEX_ID#');"
MyDLLClass.MyMethod(str);
So, the problem is: if i replace the #GEN_COMPLEX_ID# on this string, wanting that a different should be on each match, it not will happen because the replaced executes the function in a single shot (not step by step). So, i wanna help to implement this: a step by step replace of any text (like Find some word, replace, than next ... replace ... next... etc.
Could you help me?
Thanks!
This works pretty well for me:
string yourOriginalString = "ab cd ab cd ab cd";
string pattern = "ab";
string yourNewDescription = "123";
int startingPositionOffset = 0;
int yourOriginalStringLength = yourOriginalString.Length;
MatchCollection match = Regex.Matches(yourOriginalString, pattern, RegexOptions.IgnoreCase | RegexOptions.Multiline);
foreach (Match m in match)
{
yourOriginalString = yourOriginalString.Substring(0, m.Index+startingPositionOffset) + yourNewDescription + yourOriginalString.Substring(m.Index + startingPositionOffset+ m.Length);
startingPositionOffset = yourOriginalString.Length - yourOriginalStringLength;
}
If what you're asking is how to replace each placeholder with a different value, you can do it using the Regex.Replace overload which accepts a MatchEvaluator delegate, and executes it for each match:
// conceptually, something like this (note that it's not checking if there are
// enough values in the replacementValues array)
static string ReplaceMultiple(
string input, string placeholder, IEnumerable<string> replacementValues)
{
var enumerator = replacementValues.GetEnumerator();
return Regex.Replace(input, placeholder,
m => { enumerator.MoveNext(); return enumerator.Current; });
}
This is, of course, presuming that all placeholders look the same.
Pseudo-code
var split = source.Split(placeholder); // create array of items without placeholders
var result = split[0]; // copy first item
for(int i = 1; i < result.Length; i++)
{
bool replace = ... // ask user
result += replace ? replacement : placeholder; // to put replacement or not to put
result += split[i]; // copy next item
}
you should use the split method like this
string [] placeholder = {"#Placeholder#"} ;
string[] request = cd.Split(placeholder, StringSplitOptions.RemoveEmptyEntries);
StringBuilder requetBuilding = new StringBuilder();
requetBuilding.Append(request[0]);
int index = 1;
requetBuilding.Append("Your place holder replacement");
requetBuilding.Append(request[index]);
index++; //next replacement
// requetBuilding.Append("Your next place holder replacement");
// requetBuilding.Append(request[index]);
Need some ideas how to solve this problem.
I have a template file what describes the line in the text file. For example:
Template
[%f1%]|[%f2%]|[%f3%]"[%f4%]"[%f5%]"[%f6%]
Text file
1234|1234567|123"12345"12"123456
Now i need to read in the fields from the text file. In the template file fields are described with [%some name%]. Allso in the template file there is set what the field separators are, in this example here there are | and ". The lenght of the fields can change through different files but the separators will stay the same. What would be the best way to read in the template and by template read in the text file?
EDIT: Text file has multiple rows, like this:
1234|1234567|123"12345"12"123456"\r\n
1234|field|123"12345"12"asdasd"\r\n
123sd|1234567|123"asdsadf"12"123456"\r\n
45gg|somedata|123"12345"12"somefield"\r\n
EDIT2: Ok, lets make it even harder. Some fields can contain binary data and i know the starting and end position of the binary data field. I should be able to mark those fields in the template and then the parser will know that this field is binary. How to solve this problem?
I would create a regex based on the template and then parse the text file using that:
class Parser
{
private static readonly Regex TemplateRegex =
new Regex(#"\[%(?<field>[^]]+)%\](?<delim>[^[]+)?");
readonly List<string> m_fields = new List<string>();
private readonly Regex m_textRegex;
public Parser(string template)
{
var textRegexString = '^' + TemplateRegex.Replace(template, Evaluator) + '$';
m_textRegex = new Regex(textRegexString);
}
string Evaluator(Match match)
{
// add field name to collection and create regex for the field
var fieldName = match.Groups["field"].Value;
m_fields.Add(fieldName);
string result = "(.*?)";
// add delimiter to the regex, if it exists
// TODO: check, that only last field doesn't have delimiter
var delimGroup = match.Groups["delim"];
if (delimGroup.Success)
{
string delim = delimGroup.Value;
result += Regex.Escape(delim);
}
return result;
}
public IDictionary<string, string> Parse(string text)
{
var match = m_textRegex.Match(text);
var groups = match.Groups;
var result = new Dictionary<string, string>(m_fields.Count);
for (int i = 0; i < m_fields.Count; i++)
result.Add(m_fields[i], groups[i + 1].Value);
return result;
}
}
You can parse the template using regular expressions. An expression like this will match each field definition and separator:
Match m = Regex.Match(template, #"^(\[%(?<name>.+?)%\](?<separator>.)?)+$")
The match will contain two named groups for (name and separator), each of which will contain a number of captures for each time they matched in the input string. In your example, the separator group would have one less capture than the name group.
You can then iterate over the captures, and use the results to extract the fields from the input string and store the values, like this:
if( m.Success )
{
Group name = m.Groups["name"];
Group separator = m.Groups["separator"];
int index = 0;
Dictionary<string, string> fields = new Dictionary<string, string>();
for( int x = 0; x < name.Captures.Count; ++x )
{
int separatorIndex = input.Length;
if( x < separator.Captures.Count )
separatorIndex = input.IndexOf(separator.Captures[x].Value, index);
fields.Add(name.Captures[x].Value, input.Substring(index, separatorIndex - index));
index = separatorIndex + 1;
}
// Do something with results.
}
Obviously in a real program you'd have to account for invalid input and such, which I didn't do here.
I would do this with a few lines of code. Loop through your template row, grabbing all text between "[" as the variable name and everything else as a terminator. Read all the text to the terminal, assign it to the variable name, repeat.
1- Use API for that sscanf(line, format, __arglist) check here
2- Use string split Like:
public IEnumerable<int> GetDataFromLines(string[] lines)
{
//handle the output data
List<int> data = new List<int>();
foreach (string line in lines)
{
string[] seperators = new string[] { "|", "\"" };
string[] results = line.Split(seperators, StringSplitOptions.RemoveEmptyEntries);
foreach (string result in results)
{
data.Add(int.Parse(result));
}
}
return data;
}
Test it with line:
line = "1234|1234567|123\"12345\"12\"123456";
string[] lines = new string[] { line };
GetDataFromLines(lines);
//output list items are:
1234
1234567
123
12345
12
123456
I think I've come across this requirement for a dozen times. But I could never find a satisfying solution. For instance, there are a collection of string which I want to serialize (to disk or through network) through a channel where only plain string is allowed.
I almost always end up using "split" and "join" with ridiculous separator like
":::==--==:::".
like this:
public static string encode(System.Collections.Generic.List<string> data)
{
return string.Join(" :::==--==::: ", data.ToArray());
}
public static string[] decode(string encoded)
{
return encoded.Split(new string[] { " :::==--==::: " }, StringSplitOptions.None);
}
But this simple solution apparently has some flaws. The string cannot contains the separator string. And consequently, the encoded string can no longer re-encoded again.
AFAIK, the comprehensive solution should involve escaping the separator on encoding and unescaping on decoding. While the problem sound simple, I believe the complete solution can take significant amount of code. I wonder if there is any trick allowed me to build encoder & decoder in very few lines of code ?
Add a reference and using to System.Web, and then:
public static string Encode(IEnumerable<string> strings)
{
return string.Join("&", strings.Select(s => HttpUtility.UrlEncode(s)).ToArray());
}
public static IEnumerable<string> Decode(string list)
{
return list.Split('&').Select(s => HttpUtility.UrlDecode(s));
}
Most languages have a pair of utility functions that do Url "percent" encoding, and this is ideal for reuse in this kind of situation.
You could use the .ToArray property on the List<> and then serialize the Array - that could then be dumped to disk or network, and reconstituted with a deserialization on the other end.
Not too much code, and you get to use the serialization techniques already tested and coded in the .net framework.
You might like to look at the way CSV files are formatted.
escape all instances of a deliminater, e.g. " in the string
wrap each item in the list in "item"
join using a simple seperator like ,
I don't believe there is a silver bullet solution to this problem.
Here's an old-school technique that might be suitable -
Serialise by storing the width of each string[] as a fixed-width prefix in each line.
So
string[0]="abc"
string[1]="defg"
string[2]=" :::==--==::: "
becomes
0003abc0004defg0014 :::==--==:::
...where the size of the prefix is large enough to cater for the string maximum length
You could use an XmlDocument to handle the serialization. That will handle the encoding for you.
public static string encode(System.Collections.Generic.List<string> data)
{
var xml = new XmlDocument();
xml.AppendChild(xml.CreateElement("data"));
foreach (var item in data)
{
var xmlItem = (XmlElement)xml.DocumentElement.AppendChild(xml.CreateElement("item"));
xmlItem.InnerText = item;
}
return xml.OuterXml;
}
public static string[] decode(string encoded)
{
var items = new System.Collections.Generic.List<string>();
var xml = new XmlDocument();
xml.LoadXml(encoded);
foreach (XmlElement xmlItem in xml.SelectNodes("/data/item"))
items.Add(xmlItem.InnerText);
return items.ToArray();
}
I would just prefix every string with its length and an terminator indicating the end of the length.
abc
defg
hijk
xyz
546
4.X
becomes
3: abc 4: defg 4: hijk 3: xyz 3: 546 3: 4.X
No restriction or limitations at all and quite simple.
Json.NET is a very easy way to serialize about any object you can imagine. JSON keeps things compact and can be faster than XML.
List<string> foo = new List<string>() { "1", "2" };
string output = JsonConvert.SerializeObject(foo);
List<string> fooToo = (List<string>)JsonConvert.DeserializeObject(output, typeof(List<string>));
It can be done much simpler if you are willing to use a separator of 2 characters long:
In java code:
StringBuilder builder = new StringBuilder();
for(String s : list) {
if(builder.length() != 0) {
builder.append("||");
}
builder.append(s.replace("|", "|p"));
}
And back:
for(String item : encodedList.split("||")) {
list.add(item.replace("|p", "|"));
}
You shouldn't need to do this manually. As the other answers have pointed out, there are plenty of ways, built-in or otherwise, to serialize/deserialize.
However, if you did decide to do the work yourself, it doesn't require that much code:
public static string CreateDelimitedString(IEnumerable<string> items)
{
StringBuilder sb = new StringBuilder();
foreach (string item in items)
{
sb.Append(item.Replace("\\", "\\\\").Replace(",", "\\,"));
sb.Append(",");
}
return (sb.Length > 0) ? sb.ToString(0, sb.Length - 1) : string.Empty;
}
This will delimit the items with a comma (,). Any existing commas will be escaped with a backslash (\) and any existing backslashes will also be escaped.
public static IEnumerable<string> GetItemsFromDelimitedString(string s)
{
bool escaped = false;
StringBuilder sb = new StringBuilder();
foreach (char c in s)
{
if ((c == '\\') && !escaped)
{
escaped = true;
}
else if ((c == ',') && !escaped)
{
yield return sb.ToString();
sb.Length = 0;
}
else
{
sb.Append(c);
escaped = false;
}
}
yield return sb.ToString();
}
Why not use Xstream to serialise it, rather than reinventing your own serialisation format?
Its pretty simple:
new XStream().toXML(yourobject)
Include the System.Linq library in your file and change your functions to this:
public static string encode(System.Collections.Generic.List<string> data, out string delimiter)
{
delimiter = ":";
while(data.Contains(delimiter)) delimiter += ":";
return string.Join(delimiter, data.ToArray());
}
public static string[] decode(string encoded, string delimiter)
{
return encoded.Split(new string[] { delimiter }, StringSplitOptions.None);
}
There are loads of textual markup languages out there, any would function
Many would function trivially given the simplicity of your input it all depends on how:
human readable you want the encoding
resilient to api changes it should be
how easy to parse it is
how easy it is to write or get a parser for it.
If the last one is the most important then just use the existing xml libraries MS supply for you:
class TrivialStringEncoder
{
private readonly XmlSerializer ser = new XmlSerializer(typeof(string[]));
public string Encode(IEnumerable<string> input)
{
using (var s = new StringWriter())
{
ser.Serialize(s, input.ToArray());
return s.ToString();
}
}
public IEnumerable<string> Decode(string input)
{
using (var s = new StringReader(input))
{
return (string[])ser.Deserialize(s);
}
}
public static void Main(string[] args)
{
var encoded = Encode(args);
Console.WriteLine(encoded);
var decoded = Decode(encoded);
foreach(var x in decoded)
Console.WriteLine(x);
}
}
running on the inputs "A", "<", ">" you get (edited for formatting):
<?xml version="1.0" encoding="utf-16"?>
<ArrayOfString
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<string>A</string>
<string><</string>
<string>></string>
</ArrayOfString>
A
<
>
Verbose, slow but extremely simple and requires no additional libraries