FileHelpers nested quotes and commas - parsing error

FileHelpers nested quotes and commas - parsing error - c#

I'm trying to parse a CSV file from hell, using the fantastic FileHelpers library.
It's failing to handle a row of the form:
"TOYS R"" US"," INC.""",fld2,fld3,"<numberThousands>","<numberThousands>","<numberThousands>",fld7,
FileHelper is very good at handling number fields in 'thousands' format (using a custom formatter), even when wrapped in quotes, trailing commas etc, however it's causing issues with the first field.
"TOYS R"" US"," INC.""",fld2,...
This field includes both nested quotes and nested commas. FileHelper doesn't know how to handle this and is splitting it into two separate fields, which subsequently causes an exception to be thrown.
Are there any recommended ways to handle this?

First, you need to make all of your fields optionally quoted.
[DelimitedRecord(",")]
public class contactTemplate
{
[FieldQuoted('"', QuoteMode.OptionalForBoth)]
public string CompanyName;
[FieldQuoted('"', QuoteMode.OptionalForBoth)]
public string fld2;
// etc...
}
Then you need replace the escaped delimiters with something else (e.g., a single quote) in a BeforeReadRecord event.
var engine = new FileHelperEngine<MyFileHelpersSpec>();
engine.BeforeReadRecord += (sender, args) =>
args.RecordLine = args.RecordLine.Replace(#"""", "'");

Related

In C#, can a Salesforce Report "CSV" be deserialized to a C# Object?

We will receive Salesforce Reports in "CSV" format via a REST/POST endpoint, where it will be captured as an instance of IFormFile, which we can turn to a Stream with file.OpenReadStream().
While allegedly comma separated, they are actually semi-colon separated and to further reduce the risk of ambiguity, all values are surrounded by quotes. Not least, the headers are intended to be human readable, so unlike variable names they contain spaces.
A typical file might look like this:
"Opportunity Name";"Order Number";"Date of request for deed registry check";"Date of deed registry evaluation"
"Implementation - Jayde Cote";"20190605_Cote_11";"";"09.11.2020"
"Implementation - Ebony Collier";"20190612_Collier_48";"09.10.2020";"09.11.2020"
"Implementation - Izzy Bains";"20190528_Bains_42";"09.11.2020";""
I'd like to deserialize this into a List where Opportunity is a class like:
public class Opportunity {
public string OpportunityName {get; set;}
public string OrderNumber {get; set;}
public DateTime RequestDate {get; set;}
public DateTime EvaluationDate {get; set;}
}
Yes, I know I could build a parser for this, but deserializing would be more elegant.
Is this possible? If so, how?

You can use a one of the many libraries out there, for example TinyCsvParser.
The setup is pretty straightforward, a few options to declare your requirements (" as Quote character, \ as Escape character and ; as Delimiter). I realized you are using a pretty similar format to RFC4180 so...:
var options = new Options('"', '\\', ';');
var tokenizer = new RFC4180Tokenizer(options);
CsvParserOptions csvParserOptions = new CsvParserOptions(true, tokenizer);
CsvReaderOptions csvReaderOptions = new CsvReaderOptions(new[] { Environment.NewLine });
Then you need a map declaration, in your case something like this, although you can also use a TypeConverter like this one new DateTimeConverter("dd.MM.yyyy"), in your case is unnecessary:
private sealed class CsvOpportunityMap : CsvMapping<Opportunity>
{
public CsvOpportunityMap() : base()
{
MapProperty(0, m => m.OpportunityName );
MapProperty(1, m => m.OrderNumber );
MapProperty(2, m => m.RequestDate);
MapProperty(3, m => m.EvaluationDate);
}
}
I've built an example with your scenario, I don't like the way TinyCsvParser builds the properties map but I think it's nice in terms of performance and footprint size (in memory also)
https://dotnetfiddle.net/3bXrap

Smartly replace strings

I am working with JSON API. As c# doesn't accept characters like - (minus) or . (point), I had to replace each character by _ (underscore). The replacement happens when the JSON response is received as a string so that every attribute name containing a - or a . will have it replaced by a _ , then every attribute name will be the same as the attributes names in the class it will be deserialized into.
To make it clearer, here are some examples:
I recieve the following JSON : { "id": 1, "result": [ { "data": [ { "adm-pass": ""}]}
In the class I want to deserialize into I have this attribute : public String adm_pass {get; set;}
So I replace the minus with an underscore so that the NewtonSoft parser can deserialize it accordingly.
My problem is that I sometimes I get some negative integers in my JSON. So if I do the string replacement in: {"beta" : -1}, I get a parsing exception since the -1 (integer here) becomes _1 and cannot be deserialized properly and raises an exception.
Is there a way to replace the string smartly so I can avoid this error?
For example if - is followed by an int it's not replaced.
If this way does not exist, is there a solution for this kind of problems?

Newtonsoft allows you to specify the exact name of the JSON property, which it will use to serialize/deserialize.
So you should be able to do this
[JsonProperty("adm-pass")]
public String adm_pass { get; set; }
This way you are not restricted to name your properties exactly as the JSON property names. And in your case, you won't need to do a string replace.
Hope this helps.

You'll have to check that you are replacing the key and not the value, maybe by using a regex like http://regexr.com/3d471

Regex could work as wlalele suggests.
But I would create a new object like this:
Create a new object:
var sharpObj = {};
loop through the objects as properties as described here:
Iterate through object properties
for (var property in object) {
if (object.hasOwnProperty(property)) {
// do stuff
}
}
In the // do stuff section, create a property on sharpObj with the desired string replacements and set the property to the same value.
var cleanProperty = cleanPropertyName(property);
sharpObj[cleanProperty] = orginalObject[property];
Note: I assume you can figure out the cleanPropertyName() method or similar.
Stringify the object
var string = JSON.stringify(sharpObj);

You can substring to check whether the next character is an integer, this can adapt into your code easily as you already find a character, as such you could do
int a;
if(int.TryParse(adm_pass.Substring(adm_pass.IndexOf("-") + 1,1),out a))
{
//Code if next character is an int
}
else
{
adm_pass = adm_pass.Replace("-","_");
}
This kind of code can be looped until there are no remaining hyphens/minuses

How to add thousands separator to a numeric string in c#?

I want this "123456789" to this "123,456,789".
Plenty of SO answers on how to format non-string types numerically using .Format() and .ToString(). Can't find any answers on how to do coming from a numeric string.
I can do this way, but it's not ideal:
Convert.ToInt32(minPrice).ToString("N0");

Simply encapsulate your function, which you find isn't ideal, into an extension method.
public static string ToFormattedThousands(this string number)
{
return Convert.ToInt32(number).ToString("N0");
}
Simply put this function into a static class and then you will be able to call it on any string.
For example :
string myString = "123456789".ToFormattedThousands();

How to read a string containing XML elements without using the XML properties

I'm doing an XML reading process in my project. Where I have to read the contents of an XML file. I have achieved it.
Just out of curiosity, I also tried using the same by keeping the XML content inside a string and then read only the values inside the elemet tag. Even this I have achieved. The below is my code.
string xml = <Login-Form>
<User-Authentication>
<username>Vikneshwar</username>
<password>xxx</password>
</User-Authentication>
<User-Info>
<firstname>Vikneshwar</firstname>
<lastname>S</lastname>
<email>xxx#xxx.com</email>
</User-Info>
</Login-Form>";
XDocument document = XDocument.Parse(xml);
var block = from file in document.Descendants("client-authentication")
select new
{
Username = file.Element("username").Value,
Password = file.Element("password").Value,
};
foreach (var file in block)
{
Console.WriteLine(file.Username);
Console.WriteLine(file.Password);
}
Similarly, I obtained my other set of elements (firstname, lastname, and email). Now my curiosity draws me again. Now I'm thinking of doing the same using the string functions?
The same string used in the above code is to be taken. I'm trying not to use any XMl related classes, that is, XDocument, XmlReader, etc. The same output should be achieved using only string functions. I'm not able to do that. Is it possible?

Don't do it. XML is more complex than can appear the case, with complex rules surrounding nesting, character-escaping, named-entities, namespaces, ordering (attributes vs elements), comments, unparsed character data, and whitespace. For example, just add
<!--
<username>evil</username>
-->
Or
<parent xmlns=this:is-not/the/data/you/expected">
<username>evil</username>
</parent>
Or maybe the same in a CDATA section - and see how well basic string-based approaches work. Hint: you'll get a different answer to what you get via a DOM.
Using a dedicated tool designed for reading XML is the correct approach. At the minimum, use XmlReader - but frankly, a DOM (such as your existing code) is much more convenient. Alternatively, use a serializer such as XmlSerializer to populate an object model, and query that.
Trying to properly parse xml and xml-like data does not end well.... RegEx match open tags except XHTML self-contained tags

You could use methods like IndexOf, Equals, Substring etc. provided in String class to fulfill your needs, for more info Go here,
Using Regex is a considerable option too.
But it's advisable to use XmlDocument class for this purpose.

It can be done without regular expressions, like this:
string[] elementNames = new string[]{ "<username>", "<password>"};
foreach (string elementName in elementNames)
{
int startingIndex = xml.IndexOf(elementName);
string value = xml.Substring(startingIndex + elementName.Length,
xml.IndexOf(elementName.Insert(1, "/"))
- (startingIndex + elementName.Length));
Console.WriteLine(value);
}
With a regular expression:
string[] elementNames2 = new string[]{ "<username>", "<password>"};
foreach (string elementName in elementNames2)
{
string value = Regex.Match(xml, String.Concat(elementName, "(.*)",
elementName.Insert(1, "/"))).Groups[1].Value;
Console.WriteLine(value);
}
Of course, the only recommended thing is to use the XML parsing classes.

Build an extension method that will get the text between tags like this:
public static class StringExtension
{
public static string Between(this string content, string start, string end)
{
int startIndex = content.IndexOf(start) + start.Length;
int endIndex = content.IndexOf(end);
string result = content.Substring(startIndex, endIndex - startIndex);
return result;
}
}

C# Library for ASCII File import/mapping

this is maybe a dump question but I give it a try.
One of a common task is to import data from ascii files.
It's almost always the same beside the structure of the file.
Comma separated, line seperated, take 5 rows, take 12... whatever...
So it's always a different protocol/mapping but the same handling...
Is there a library for c# which helps to support this day-to-day scenario?

This is awesome: FileHelpers Library
Example:
File:
1732,Juan Perez,435.00,11-05-2002
554,Pedro Gomez,12342.30,06-02-2004
112,Ramiro Politti,0.00,01-02-2000
924,Pablo Ramirez,3321.30,24-11-2002
Create a class that maps your data.
[DelimitedRecord(",")]
public class Customer
{
public int CustId;
public string Name;
public decimal Balance;
[FieldConverter(ConverterKind.Date, "dd-MM-yyyy")]
public DateTime AddedDate;
}
And then parse using:
FileHelperEngine engine = new FileHelperEngine(typeof(Customer));
// To Read Use:
Customer[] res = engine.ReadFile("FileIn.txt") as Customer[];
// To Write Use:
engine.WriteFile("FileOut.txt", res);
Enumerate:
foreach (Customer cust in res)
{
Console.WriteLine("Customer Info:");
Console.WriteLine(cust.Name + " - " +
cust.AddedDate.ToString("dd/MM/yy"));
}

You may want to take a look at the FileHelpers library.

So the only thing those tasks have in common is reading a text file?
If FileHelpers is overkill for you (simple text data, etc.), standard .NET classes should be all you need (String.Split Method, Regex Class, StreamReader Class).
They provide reading delimited by characters (String.Split) or lines (StreamReader).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

FileHelpers nested quotes and commas - parsing error - c#

Related

In C#, can a Salesforce Report "CSV" be deserialized to a C# Object?

Smartly replace strings

How to add thousands separator to a numeric string in c#?

How to read a string containing XML elements without using the XML properties

C# Library for ASCII File import/mapping

Categories

Resources