I need to log raw data off of sensors. I need features such as every 15 minutes, create a new log file or after the file reaches a certain size, create new file.
I'd like to leverage an existing framework such as log4net but it appears there isn't much out there on how to, or if it will support, adding a custom logger to log binary data. Has anyone done this or have come across an implementation of something similar that matches my needs as described throughout this post?
I should add that we are looking at ~300GB a day of data here. We are saving this data for the ability of post analysis and algorithm tweaking.
You could leverage log4net or any other text-logging tool by taking your byte[] data and converting it to plain text using Convert.ToBase64String. You can convert it back later using Convert.FromBase64String.
.NET has a BinaryReader and BinaryWriter class implemented. It does exactly what you expect it to do...it deals with raw bytes to/from a file (or any Stream for that matter). So all you have to do, is create a simple file format for yourself then read data out of it.
You can, of course, convert the binary data to other formats (like string) then use any serialization scheme you like (JSON, XML, etc. you name it). But since you're dealing with binary data, converting them to other formats sounds may not be the most elegant solution.
Related
I have a bunch of python objects with fields containing arrays of various dimensions and data types (ints or floats) and I want to write this to a file which I can read into C# objects elsewhere. I'm probably just going to write an XML file, but I thought there might be a quicker/easier way saving and reading it. Also, the resulting XML file will be rather large, which I would like to avoid if it is not too much hassle.
Is there a tried and tested file format that is compatible (and simple to use) with both languages?
What you are trying to do is called serialization. JSON is an excellent option for doing this with support in both languages.
Because you are working with floats etc.. I would consider looking at a format like BSON - "BSON is a binary format in which zero or more key/value pairs are stored as a single entity." It allows you to specify types, sizes etc...
http://bsonspec.org/#/specification
There are libraries for python, C# etc....
There are a heap of other compact easier to use than xml formats like bson out there. I only suggested this particular one as it was the first I remembered.
The program that I am working on saves the snapshot of the current state to a xml file. I would like to store this in database (as blob) instead of xml.
Firstly, I think xml files are quite space-consuming and redundant, so we would like to compress the string in some way before storing in in the database. In addition, we would also like to introduce a simple cryptography so that people won't be able to figure out what it means without at least a simple key/password.
Note that I want to store it in the database as blob, so zipping it and then encrypting the zip file won't do, I guess.
How can I go about doing this?
Compress the XML data with DeflateStream and write it's output to a MemoryStream. Then call .ToArray() method to obtain your blob data. You can also do encryption with .NET in a similar way as well (after compression of course). If you believe deflate is not enough to save space, then try this library: XWRT.
Firstly, have a look at your serialization mechanism. The whole point of XML is that it's human readable. If that's no longer an important goal for you then it might be time to look at other serialization technologies which would be more suited to database storage (compressing XML into binary completely defeats the point of it :)
As an alternative format, BSON could be a good choice.
I have a module which will be responsible for parsing CSV data received from different user via a website interface, and I have to parse that CSV. I was considering to use, TextFieldParser for it.
But before I could implement I was considering what shall be a better approach...
Generating MemoryStream from data received,
or initialising a StringReader from the same input string.
Which one is better & why?
Option 1 won't give you a string at all, so if you want to work with a byte array and buffers, go that way but it seems unlikely. If you're doing string processing would strongly recommend Option 2, because with that you can read a line at a time.
As far as I can see the only reason to use a MemoryStream would be if you need to do something more complex that StringReader doesn't handle as you want (otherwise you're reinventing the wheel): encodings, strange line formats, etc.
Having worked with very large files (specifically CSV files) with StringReaders, I've never had a problem. I'd wager that when MS designed StringReader to do exactly what you're trying to do, they made it as resource-friendly as possible.
I have two separate apps - one a client (in C#), one a server (in C++). They need to exchange data in the form of "structs" and ~ about 1 MB of data a minute is sent from server to client.
Whats better to use - XML or my own Binary format?
With XML:
Translating XML to a struct using a parser would be slow I believe? ("good",but: load parser, load XML, parse)
The other option is parsing XML with regex (bad!)
With Binary:
compact data sizes
no need for meta information like tags;
but structs cannot be changed easily to accomodate new structs/new members in structs in future;
no conversion from text (XML) to binary (struct) necessary so is faster to receive and "assemble" into a struct)
Any pointers? Should I not be considering binary at all?? A bit confused about what approach to take.
1MB of data per minute is pretty tiny if you've got a reasonable network connection.
There are other choices between binary and XML - other human-readable text serialization formats, such as JSON.
When it comes to binary, you don't have to have versioning problems - technologies like Protocol Buffers (I'm biased: I work for Google and I've ported PB to C#) are explicitly designed with backward and forward compatibility in mind. There are other binary formats to consider as well, such as Thrift.
If you're worried about performance though, you should really measure it. I'm pretty sure my phone could parse 1MB of XML sufficiently quickly for it not to be a problem in this case... basically work out what you're most concerned about, in terms of:
Simplicity of code
Interoperability
Performance in terms of CPU
Network traffic
Backward/forward compatibility
Human readability of on-the-wire format
It's all a balancing act - but you're the one who has to decide how much weight to give each of those factors.
If you have .NET applications in both ends, use Windows Communication Foundation. This will allow you to defer the decision until deployment time, as it supports both binary and XML serialization.
As you stated, XML is a (little) slower but much more flexible and reliable. I would go with XML until there is a proven problem with performance.
You should also take a look a ProtoBuff as an alternative.
And, after your update, any cross-language, cross-platform and cross-version requirement strongly points away from binary formatting.
A good point for XML would be interoperability. Do you have other clients that also access your server?
Before you use your own binary format or do regex on XML...Have you considered the serialization namespace in .NET? There are Binary Formatters, SOAP formatters and there is also XmlSerialization.
Another advantage of a XML is that you can extend the data you are sending by adding an element, you wont have to alter the receiver's code to cope with the extra data until you are ready to.
Also even minimal(fast) compression of XML can dramatic reduce the wire load.
text/xml
Human readable
Easier to debug
Bandwidth can be saved by compressing
Tags document the data they contain
binary
Compact
Easy to parse (if fixed size fields are used, just overlay a struct)
Difficult to debug (hex editors are a pain)
Needs a separate document to understand what the data is.
Both forms are extensible and can be upgraded to newer versions provided you insert a type and version field at the beginning of the datagram.
you did not say if they are on the same machine or not. I assume not.
IN that case then there is another downside to binary. You cannot simply dump the structs on the wire, you could have endianness and sizeof issues.
XML is very wordy, YAML or JSON are much smaller
Don't forget that what most people think of as XML is XML serialized as text. It can be serialized to binary instead. This is what the netTcpBinding and other such bindings do in WCF. The XML infoset is output as binary, not as text. It's still XML, just in binary.
You could also use Google Protocol Buffers, which is a compact binary representation for structured data.
I am developing a little app that retrieves an XML file, located on a remote server (http://example.com/myfile.xml)
This file is relatively big, and it contains a big list of geolocations with other information that I need to use for my app.
So I read this file remotely once and insert it into a little SqlCE file (database.sdf)
So If I need to be accessing geolocation #1, I ll just make a SELECT statement into this DATABASE instead of loading the whole XML file every time.
But I would like to know if its possible to do this without using .sdf files?
What is the most efficient way (fastest)?
Saving the big XML file once locally and load if every time I start my app to load it in a data set? this is would make the app a bit long to load every time
Saving the big XML file once locally and reading the nodes one by one to look for geolocation #1?
Or is it possible to retrieve geolocation #1 from the remote XML directly(http://example.com/myfile.xml) without reading the whole file?
Load the big XML file, convert it into an appropriate different data structure, save it to a file in an efficient format. (XML really isn't terribly efficient.)
I believe Marc Gravell's Protocol Buffers implementation works on the Compact Framework...
(None of the protobuf implementations are deemed production-ready yet, but a couple are close. We need testers!)
Re protobuf-net, there isn't a separate download for the CF version at the moment, but there is a csproj in the source for both CF 2.0 and CF 3.5.
To clarify on your question; actually protobuf-net doesn't even use a .proto file (at the moment); a .proto file just describes what the data is - protobuf-net simply looks at your classes and infers the schema from that (similar to how XmlSerializer / DataContractSerializer etc work). So there is not .proto - just the classes that look like your data.
However, before you embark on creating classes that look like your data, I wonder if you couldn't simply use GZIP or [PK]ZIP to compress the data, and transfer it "as is". XML generally compresses very well. Of course, finding a GZIP (etc) implementation for CF then becomes the issue.
Of course, if you want to use protobuf-net here, I'll happily advise etc if you get issues...
The other option is for your CF app to call into a web-service that has the data locally...
Why would you pull the entire file down to the CE device for this? It's a bandwidth waste and certainly doing the lookup on an embedded processor is going to be way slower than on the server regardless of storage format. You should have a service (Web, WCF or whatever) that allows you to ask it for the single geolocation you want.