StringReader or Memory Stream which is resource friendly? - c#

I have a module which will be responsible for parsing CSV data received from different user via a website interface, and I have to parse that CSV. I was considering to use, TextFieldParser for it.
But before I could implement I was considering what shall be a better approach...
Generating MemoryStream from data received,
or initialising a StringReader from the same input string.
Which one is better & why?

Option 1 won't give you a string at all, so if you want to work with a byte array and buffers, go that way but it seems unlikely. If you're doing string processing would strongly recommend Option 2, because with that you can read a line at a time.
As far as I can see the only reason to use a MemoryStream would be if you need to do something more complex that StringReader doesn't handle as you want (otherwise you're reinventing the wheel): encodings, strange line formats, etc.
Having worked with very large files (specifically CSV files) with StringReaders, I've never had a problem. I'd wager that when MS designed StringReader to do exactly what you're trying to do, they made it as resource-friendly as possible.

Related

Binary File Support

I need to log raw data off of sensors. I need features such as every 15 minutes, create a new log file or after the file reaches a certain size, create new file.
I'd like to leverage an existing framework such as log4net but it appears there isn't much out there on how to, or if it will support, adding a custom logger to log binary data. Has anyone done this or have come across an implementation of something similar that matches my needs as described throughout this post?
I should add that we are looking at ~300GB a day of data here. We are saving this data for the ability of post analysis and algorithm tweaking.
You could leverage log4net or any other text-logging tool by taking your byte[] data and converting it to plain text using Convert.ToBase64String. You can convert it back later using Convert.FromBase64String.
.NET has a BinaryReader and BinaryWriter class implemented. It does exactly what you expect it to do...it deals with raw bytes to/from a file (or any Stream for that matter). So all you have to do, is create a simple file format for yourself then read data out of it.
You can, of course, convert the binary data to other formats (like string) then use any serialization scheme you like (JSON, XML, etc. you name it). But since you're dealing with binary data, converting them to other formats sounds may not be the most elegant solution.

How do I compress and encrypt a string to another string?

The program that I am working on saves the snapshot of the current state to a xml file. I would like to store this in database (as blob) instead of xml.
Firstly, I think xml files are quite space-consuming and redundant, so we would like to compress the string in some way before storing in in the database. In addition, we would also like to introduce a simple cryptography so that people won't be able to figure out what it means without at least a simple key/password.
Note that I want to store it in the database as blob, so zipping it and then encrypting the zip file won't do, I guess.
How can I go about doing this?
Compress the XML data with DeflateStream and write it's output to a MemoryStream. Then call .ToArray() method to obtain your blob data. You can also do encryption with .NET in a similar way as well (after compression of course). If you believe deflate is not enough to save space, then try this library: XWRT.
Firstly, have a look at your serialization mechanism. The whole point of XML is that it's human readable. If that's no longer an important goal for you then it might be time to look at other serialization technologies which would be more suited to database storage (compressing XML into binary completely defeats the point of it :)
As an alternative format, BSON could be a good choice.

using C#'s XmlReader on slightly malformed XML

I'm trying to use C#'s XmlReader on a large series of XML files, they are all properly formatted except for a few select ones (unfortunately I'm not in a position to have them changed, because it would break a lot of other code).
The errors only come from one specific part of the these affronting XML files and it's ok to just skip them but I don't want to stop reading the rest of the XML file.
The bad parts look like this:
<InterestingStuff>
...
<ErrorsHere OptionA|Something = "false" OptionB|SomethingElse = "false"/>
<OtherInterestingStuff>
...
</OtherInterestingStuff>
</InterestingStuff>
So really if I could just ignore invalid tags, or ignore the pipe symbol then I would be ok.
Trying to use XmlReader.Skip() when I see the name "ErrorsHere" doesn't work, apparently it already reads a bit ahead and throws the exception.
TLDR: How do I skip so I can read in the XML file above, using the XmlReader?
Edit:
Some people suggested just replacing the '|'-symbol, but the idea of XmlReader is to not load the entire file but only traverse parts you want, since I'm reading directly from files I can not afford the read in entire files, replace all instances of '|' and then read parts again :).
I've experimented a bit with this in the past.
In general the input simply has to be well-formed. An XmlReader will go into an unrecoverable error-state when the basic XML rules are broken. It is easy to avoid schema-validation but that's not relevant here.
Your only option is to clean the input, that can be done in a streaming manner (custom Stream or TextReader) but that will require a light form of parsing. If you don't have pipe-symbols in valid positions it's easy.
XmlReader is strict. Any non-conformance, it will error.
So no, you can't do that unless you write your own xml implementation. Fixup on the malformed data is probably easier.
Once I had a similar situation (with HTML files, not XML files). But I ended up using regular expression for each HTML file before entering it into my operation pipeline, to delete malformed parts. It came handy and was easier than struggling with the API. :)

What's so bad about building XML with string concatenation?

In the thread What’s your favorite “programmer ignorance” pet peeve?, the following answer appears, with a large amount of upvotes:
Programmers who build XML using string concatenation.
My question is, why is building XML via string concatenation (such as a StringBuilder in C#) bad?
I've done this several times in the past, as it's sometimes the quickest way for me to get from point A to point B when to comes to the data structures/objects I'm working with. So far, I have come up with a few reasons why this isn't the greatest approach, but is there something I'm overlooking? Why should this be avoided?
Probably the biggest reason I can think of is you need to escape your strings manually, and most new programmers (and even some experienced programmers) will forget this. It will work great for them when they test it, but then "randomly" their apps will fail when someone throws an & symbol in their input somewhere. Ok, I'll buy this, but it's really easy to prevent the problem (SecurityElement.Escape to name one).
When I do this, I usually omit the XML declaration (i.e. <?xml version="1.0"?>). Is this harmful?
Performance penalties? If you stick with proper string concatenation (i.e. StringBuilder), is this anything to be concerned about? Presumably, a class like XmlWriter will also need to do a bit of string manipulation...
There are more elegant ways of generating XML, such as using XmlSerializer to automatically serialize/deserialize your classes. Ok sure, I agree. C# has a ton of useful classes for this, but sometimes I don't want to make a class for something really quick, like writing out a log file or something. Is this just me being lazy? If I am doing something "real" this is my preferred approach for dealing w/ XML.
You can end up with invalid XML, but you will not find out until you parse it again - and then it is too late. I learned this the hard way.
I think readability, flexibility and scalability are important factors. Consider the following piece of Linq-to-Xml:
XDocument doc = new XDocument(new XDeclaration("1.0","UTF-8","yes"),
new XElement("products", from p in collection
select new XElement("product",
new XAttribute("guid", p.ProductId),
new XAttribute("title", p.Title),
new XAttribute("version", p.Version))));
Can you find a way to do it easier than this? I can output it to a browser, save it to a document, add attributes/elements in seconds and so on ... just by adding couple lines of code. I can do practically everything with it without much of effort.
Actually, I find the biggest problem with string concatenation is not getting it right the first time, but rather keeping it right during code maintenance. All too often, a perfectly-written piece of XML using string concat is updated to meet a new requirement, and string concat code is just too brittle.
As long as the alternatives were XML serialization and XmlDocument, I could see the simplicity argument in favor of string concat. However, ever since XDocument et. al., there is just no reason to use string concat to build XML anymore. See Sander's answer for the best way to write XML.
Another benefit of XDocument is that XML is actually a rather complex standard, and most programmers simply do not understand it. I'm currently dealing with a person who sends me "XML", complete with unquoted attribute values, missing end tags, improper case sensitivity, and incorrect escaping. But because IE accepts it (as HTML), it must be right! Sigh... Anyway, the point is that string concatenation lets you write anything, but XDocument will force standards-complying XML.
I wrote a blog entry back in 2006 moaning about XML generated by string concatenation; the simple point is that if an XML document fails to validate (encoding issues, namespace issues and so on) it is not XML and cannot be treated as such.
I have seen multiple problems with XML documents that can be directly attributed to generating XML documents by hand using string concatenation, and nearly always around the correct use of encoding.
Ask yourself this; what character set am I currently encoding my document with ('ascii7', 'ibm850', 'iso-8859-1' etc)? What will happen if I write a UTF-16 string value into an XML document that has been manually declared as 'ibm850'?
Given the richness of the XML support in .NET with XmlDocument and now especially with XDocument, there would have to be a seriously compelling argument for not using these libraries over basic string concatenation IMHO.
I think that the problem is that you aren't watching the xml file as a logical data storage thing, but as a simple textfile where you write strings.
It's obvious that those libraries do string manipulation for you, but reading/writing xml should be something similar to saving datas into a database or something logically similar
If you need trivial XML then it's fine. Its just the maintainability of string concatenation breaks down when the xml becomes larger or more complex. You pay either at development or at maintenance time. The choice is yours always - but history suggests the maintenance is always more costly and thus anything that makes it easier is worthwhile generally.
You need to escape your strings manually. That's right. But is that all? Sure, you can put the XML spec on your desk and double-check every time that you've considered every possible corner-case when you're building an XML string. Or you can use a library that encapsulates this knowledge...
Another point against using string concatenation is that the hierarchical structure of the data is not clear when reading the code. In #Sander's example of Linq-to-XML for example, it's clear to what parent element the "product" element belongs, to what element the "title" attribute applies, etc.
As you said, it's just awkward to build XML correct using string concatenation, especially now you have XML linq that allows for simple construction of an XML graph and will get namespaces, etc correct.
Obviously context and how it is being used matters, such as in the logging example string.Format can be perfectly acceptable.
But too often people ignore these alternatives when working with complex XML graphs and just use a StringBuilder.
The main reason is DRY: Don't Repeat Yourself.
If you use string concat to do XML, you will constantly be repeating the functions that keep your string as a valid XML document. All the validation would be repeated, or not present. Better to rely on a class that is written with XML validation included.
I've always found creating an XML to be more of a chore than reading in one. I've never gotten the hang of serialization - it never seems to work for my classes - and instead of spending a week trying to get it to work, I can create an XML file using strings in a mere fraction of the time and write it out.
And then I load it in using an XMLReader tree. And if the XML file doesn't read as valid, I go back and find the problem within my saving routines and corret it. But until I get a working save/load system, I refuse to perform mission-critical work until I know my tools are solid.
I guess it comes down to programmer preference. Sure, there are different ways of doing things, for sure, but for developing/testing/researching/debugging, this would be fine. However I would also clean up my code and comment it before handing it off to another programmer.
Because regardless of the fact you're using StringBuilder or XMLNodes to save/read your file, if it is all gibberish mess, nobody is going to understand how it works.
Maybe it won't ever happen, but what if your environment switches to XML 2.0 someday? Your string-concatenated XML may or may not be valid in the new environment, but XDocument will almost certainly do the right thing.
Okay, that's a reach, but especially if your not-quite-standards-compliant XML doesn't specify an XML version declaration... just saying.

What is a good format for command line output when it is being used for further processing?

I have written a console application in Delphi that queries information from several locations. This application will be launched by another process, and the output to STDOUT will be captured by the launching process.
The information I am retrieving is to be interpreted by the calling application for reporting purposes. What is the best way to output this data to STDOUT so that it can be easily parsed? JSON? XML? CSV? The data, specifically, is remote workstation information, so it will pull things back like running processes, and details about each process.
Does anyone have any experience with this or suggestions?
If you want something that can be easily parsed, especially if it has to be done quickly, go with the simplest format that can effectively communicate the information you need. CSV if you can, otherwise try JSON. Definitely not XML unless you really, really need all the extra complexity for some reason.
I'd go for a Tab-delimited file, if your data (as it seems) doesn't contain that character because it allows the fastest and simplest processing.
All the other formats are slower and more complicated (even if they give you more power).
The closest match is CSV but CSV needs to quote the item if the item contains some special characters defined by the CSV (space, comma, quotes etc.).
Because of the above thing, the Tab delimited format is the most compact one, hence it has the greatest speed over-the-wire. (Since you're talking about remote workstations I assume that you're on some kind of network).
Also, another thing worth mentioning is that the Tab delimited format is very readable thus making the debugging much easier, if needed.
As an aside, if the Tab character is present in your data stream you can choose another character which you are sure that cannot be. (For example #1 etc.). Of course, this if your usage scenario permits it.
HTH
It would depend entirely on what the launching process has available. If it's a small Delphi app, CSV is easy to parse with just TStringList. XML may be more heavy weight than JSON, but Delphi ships with an XML parser, and AFAIK, not a JSON parser.
The XML output format has the advantage that you can pipe it to a XSL formatter, so that the XML data can be converted to a user friendly HTML document. (You can almost have the cake and eat it too) ...

Categories

Resources