I cannot find out how to create an EXI decoder using C#/.NET which accepts a MemoryStream containing EXI valid code and simply outputs another MemoryStream containing XML code. I will parse XML code later with custom methods; I'm using EXI only to achieve best compression performances and low memory footprint. Until now, I have found some Java implementations as examples, but none for C#/.NET counterpart; hints of any kind are really appreciated.
Related
I need to log raw data off of sensors. I need features such as every 15 minutes, create a new log file or after the file reaches a certain size, create new file.
I'd like to leverage an existing framework such as log4net but it appears there isn't much out there on how to, or if it will support, adding a custom logger to log binary data. Has anyone done this or have come across an implementation of something similar that matches my needs as described throughout this post?
I should add that we are looking at ~300GB a day of data here. We are saving this data for the ability of post analysis and algorithm tweaking.
You could leverage log4net or any other text-logging tool by taking your byte[] data and converting it to plain text using Convert.ToBase64String. You can convert it back later using Convert.FromBase64String.
.NET has a BinaryReader and BinaryWriter class implemented. It does exactly what you expect it to do...it deals with raw bytes to/from a file (or any Stream for that matter). So all you have to do, is create a simple file format for yourself then read data out of it.
You can, of course, convert the binary data to other formats (like string) then use any serialization scheme you like (JSON, XML, etc. you name it). But since you're dealing with binary data, converting them to other formats sounds may not be the most elegant solution.
i´m using GZipStream to gzip string.
Can someone tell me if it is possible to control the level of compression? This is because I realize it is possible to create gzip streams more compressed than .net seems to create.
This will be possible in .NET 4.5 as a new constructor has been added which allows you to specify a compression level. Another possibility is to use a third party library that will allow you to achieve that.
You will get better compression using #ZipLib
I have a module which will be responsible for parsing CSV data received from different user via a website interface, and I have to parse that CSV. I was considering to use, TextFieldParser for it.
But before I could implement I was considering what shall be a better approach...
Generating MemoryStream from data received,
or initialising a StringReader from the same input string.
Which one is better & why?
Option 1 won't give you a string at all, so if you want to work with a byte array and buffers, go that way but it seems unlikely. If you're doing string processing would strongly recommend Option 2, because with that you can read a line at a time.
As far as I can see the only reason to use a MemoryStream would be if you need to do something more complex that StringReader doesn't handle as you want (otherwise you're reinventing the wheel): encodings, strange line formats, etc.
Having worked with very large files (specifically CSV files) with StringReaders, I've never had a problem. I'd wager that when MS designed StringReader to do exactly what you're trying to do, they made it as resource-friendly as possible.
I need to compress a very large xml file to the smallest possible size.
I work in C#, and I prefer it to be some open source or application that I can access thru my code, but I can handle an algorithm as well.
Thank you!
It may not be the "smallest size possible", but you could use use System.IO.Compression to compress it. Zipping tends to provide very good compression for text.
using (var fileStream = File.OpenWrite(...))
using (var zipStream = new GZipStream(fileStream, CompressionMode.Compress))
{
zipStream.Write(...);
}
As stated above, Efficient XML Interchange (EXI) achieves the best available XML compression pretty consistently. Even without schemas, it is not uncommon for EXI to be 2-5 times smaller than zip. With schemas, you'll do even better.
If you're not opposed to a commercial implementation, you can use the .NET version of Efficient XML and call it directly from your C# code using standard .NET APIs. You can download a free trial copy from http://www.agiledelta.com/efx_download.html.
have a look at XML Compression Tools you can also compress it using SharpZipLib
If you have a schema available for the XML file, you could try EXIficient. It is an implementation of the Efficient XML Interchange (EXI) format that is pretty much the best available general-purpose XML compression method. If you don't have a schema, EXI is still better than regular zip (the deflate algorithm, that is), but not very much, especially for large files.
EXIficient is only Java but you can probably make it into an application that you can call. I'm not aware of any open-source implementations of EXI in C#.
File size is not the only advantage of EXI (or any binary scheme). The processing time and memory overhead are also greatly reduced when reading/writing it. Imagine a program that copies floating point numbers to disk by simply copying the bytes. Now imagine another program converts the floating point numbers to formatted text, and pastes them into a text stream, and then feeds that stream through an expensive compression algorithm. Because of this ridiculous overhead, XML is basically unusable for very large files that could have been effortlessly processed with a binary representation.
Binary XML promises to address this longstanding weakness of XML. It would be very easy to make a utility that converts between binary/text representations (without knowing the XML schema), which means you can still edit the files easily when you want to.
XML is highly compressible. You can use DotNetZip to produce compressed zip files from you XML.
if you require maximum compression level i would recommend LZMA. There is a SDK (including C#) that is part of the open source 7-Zip project, available here.
If you are looking for the smallest possible size then try Fast Infoset as binary XML encoding and then compress using BZIP2 or LZMA. You will probably get better results than compressing text XML or using EXI. FastInfoset.NET includes implementations of the Fast Infoset standard and several compression formats to choose from but it's commercial.
I have a .NET application which serializes an object in binary format.
this object is a struct consisting of a few fields.
I must deserialize and use this object in a C++ application.
I have no idea if there are any serialization libraries for C++, a google search hasn't turned up much.
What is the quickest way to accomplish this?
Thanks in advance.
Roey.
Update :
I have serialized using Protobuf-net , in my .NET application, with relative ease.
I also get the .proto file that protobuf-net generated, using GetProto() command.
In the .proto file, my GUID fields get a type of "bcl.guid", but C++ protoc.exe compiler does not know how to interpret them!
What do I do with this?
If you are using BinaryFormatter, then it will be virtually impossible. Don't go there...
Protocol buffers is designed to be portable, cross platform and version-tolerant (so it won't explode when you add new fields etc). Google provide the C++ version, and there are several C# versions freely available (including my own) - see here for the full list.
Small, fast, easy.
Note that the v1 of protobuf-net won't handle structs directly (you'll need a DTO class), but v2 (very soon) does have tested struct support.
Can you edit the .NET app? If so why not use XML Serialization to output the data in a easy to import format?
Both boost and Google have libraries for serialization. However, if your struct is pretty trivial, you might consider managing the serialization yourself by writing bytes out from C# and then reading the data in C++ with fread.
Agree with others. You are making your app very vulnerable by doing this. Consider the situation if one of the classes you're serializing is changed in any way or built on a later version of the C# compiler: Your serialized classes could potentially change causing them to be unreadable.
An XML based solution might work well. Have you considered SOAP? A little out of fashion now but worth a look. The main issue is to decouple the implementation from the data. You can do this in binary if speed / efficiency is an issue, although in my experience, it rarely is.
Serializing in a binary format and expecting an application in another language to read the binary is a very brittle solution (ie it will tend to break on the smallest change to anything).
It would be more stable to serialize the data in a common standard format.
Do you have the option of changing the format? If so, consider choosing a non-binary format for greater interoperability. There are plenty of libraries for reading and writing XML. Json is popular as well.
Binary formats are efficient, but vulnerable to implementation details (does your C++ compiler pack data structures? how are ints and floats represented? what byte ordering is used?), and difficult to adjust if mangled. Text based formats are verbose, but tend to be much more robust. If you are uncertain about binary representations, text representations tend to be easier to understand (apart from challenges such as code pages and wide/narrow characters...).
For C++ XML libraries, the most capable (and perhaps also most complex) would still seem to be the Xerces library. But you should decide for yourself which library best fits your needs and skills.
Use XML Serialization its the best way to go, in fact is the cleanest way to go.
XmlSerializer s = new XmlSerializer( typeof( YourClassType ) );
TextWriter w = new StreamWriter( #"c:\list.xml" );
s.Serialize( w, yourClassListCollection );
w.Close();