How to read remote XML file the most efficiently way? - c#

I am developing a little app that retrieves an XML file, located on a remote server (http://example.com/myfile.xml)
This file is relatively big, and it contains a big list of geolocations with other information that I need to use for my app.
So I read this file remotely once and insert it into a little SqlCE file (database.sdf)
So If I need to be accessing geolocation #1, I ll just make a SELECT statement into this DATABASE instead of loading the whole XML file every time.
But I would like to know if its possible to do this without using .sdf files?
What is the most efficient way (fastest)?
Saving the big XML file once locally and load if every time I start my app to load it in a data set? this is would make the app a bit long to load every time
Saving the big XML file once locally and reading the nodes one by one to look for geolocation #1?
Or is it possible to retrieve geolocation #1 from the remote XML directly(http://example.com/myfile.xml) without reading the whole file?

Load the big XML file, convert it into an appropriate different data structure, save it to a file in an efficient format. (XML really isn't terribly efficient.)
I believe Marc Gravell's Protocol Buffers implementation works on the Compact Framework...
(None of the protobuf implementations are deemed production-ready yet, but a couple are close. We need testers!)

Re protobuf-net, there isn't a separate download for the CF version at the moment, but there is a csproj in the source for both CF 2.0 and CF 3.5.
To clarify on your question; actually protobuf-net doesn't even use a .proto file (at the moment); a .proto file just describes what the data is - protobuf-net simply looks at your classes and infers the schema from that (similar to how XmlSerializer / DataContractSerializer etc work). So there is not .proto - just the classes that look like your data.
However, before you embark on creating classes that look like your data, I wonder if you couldn't simply use GZIP or [PK]ZIP to compress the data, and transfer it "as is". XML generally compresses very well. Of course, finding a GZIP (etc) implementation for CF then becomes the issue.
Of course, if you want to use protobuf-net here, I'll happily advise etc if you get issues...
The other option is for your CF app to call into a web-service that has the data locally...

Why would you pull the entire file down to the CE device for this? It's a bandwidth waste and certainly doing the lookup on an embedded processor is going to be way slower than on the server regardless of storage format. You should have a service (Web, WCF or whatever) that allows you to ask it for the single geolocation you want.

Related

Store bigger object data using without using a database

I'm writing a project most for fun, just to play around with .NET. I'm building an xaml project (or what you call the windows 8 apps) in C#.
In this case I will have a bigger object with some lists of other abojects and stuff. Is there any smart way of saving these object to disk for loading them later? That is something like GetMyOldSavesObectWithName("MyObject");
What i've read that the local storage is primary used to save smaller thing, such as settings. How much data is acceptale to save? Can it handle a bigger object, and what are the pros/cons? Should i save the objects to files, and how do i in that case do that? Is there any smart way of telling .net to "Save this object to MyObjectName.xml" or something?
In C#, you can use serialization to save your objects as files, commonly in XML format. Other libraries, such as JSON.net, can serialize into JSON.
You could also roll out your own saving/loading format, which will probably run faster and store data in a more compact way, but will take much more time on your part. This can be done with BinaryReaders and Writers.
Take a look at this StackOverflow answer if you wish to go the serialization route.
In most cases data will be so compact it will not use much space at all. Based on your comment, that "large" amount of data would really only take a few KBs.

Safety concerns regarding Serialization C#

I have Serialized all the Dictionaries in my application to a file. So when I open this file I can see lots of information regarding my class names and etc which have been saved with the file.
So is this safe? Will everybody be able to just open a saved file created by my application and see what classes I've used? Here is the method I've used to Serialize my Objects.:
Serialization of two Dictionaries at once
What alternatives I have got to save my objects in my application to a file.
Yes, they will be able to see the structure of the serialized object (maybe if you serialize it to a binary file, it's a bit more difficult, it does not help much tho).
However, anyone can see your source code anyways, just think about .NET Reflector or ildasm. I personally wouldn't worry about it, I don't see any problem with this.
You can encrypt the file to hide it contents. So to read encrypted file you need to read it to the memory, decrypt and then pass to the deserialization Formatter.
In my opinion, you shouldn't be afraid of it.. but it depends on your need.
If you decide that it is important for you, I would recommend to store the data in some other place (remote storage).
You will have 3 alternatives for hiding the content:
Encrypt the object, and serialize it (best for local storage and postal) - http://msdn.microsoft.com/en-us/library/as0w18af(v=vs.110).aspx
serialize the object, and encrypt the file (worse, you will have to handle deleting the file)- http://support.microsoft.com/kb/307010
serialaztion to binary - the worst, doesn't really work - you can open the file in txt in figure out what's going on
So, if that is an importnant thing in your program, I think that the first method will be the best.

Sending large volume of data between two C# programs

I am currently working on a C# (I could use python also) plug-in for two separate programs that need to communicate. In the first first program, I deconstruct 3D geometry into edges, points, normals, etc. Then I send all of this data to my plug-in in my second program to be rebuilt. Ideally this would happen as fast as possible to keep things i "real time".
Currently, I am converting my data with JSON, and writing the JSON to the disk. Then my second program watches for file changes, and then reads the file and uses the JSON data.
By far the biggest bottle neck of my entire plugin is the read/write process. There has to be a faster way than writing to a file.
There are several ways to use interprocess communication.
The most well known are used between different machines: WCF(.NET 3.5) and Remoting(.NET 2)
For on-machine communication you can choose to use Named pipes or Memory mapped files.
Memory mapped files are similar to your solution in that they use the page file as a backup.
I think the Named pipes solution is the most convenient:
You set up a "server" stream and wait for some "client" to connect.
Then you transfer the data just as you would through any other stream.
Here's NamedPipeServerStream.
And this is NamedPipeClientStream.
The code example there pretty much covers it.
I think WCF with named pipes would do the job, you just have to create transfer objects, which will be serialized and it will be all done by WCF automagically, or you can just prepare your existing objects to be transfered by named pipe with not really big overhead. Using json would be nice but it creates additional layer, and with WCF you transfer objects which can be used right away without translation by json. (Really they are translated to xml but you are not doing it by yourself so it is better then you could do with parsing json I think).

How do I compress and encrypt a string to another string?

The program that I am working on saves the snapshot of the current state to a xml file. I would like to store this in database (as blob) instead of xml.
Firstly, I think xml files are quite space-consuming and redundant, so we would like to compress the string in some way before storing in in the database. In addition, we would also like to introduce a simple cryptography so that people won't be able to figure out what it means without at least a simple key/password.
Note that I want to store it in the database as blob, so zipping it and then encrypting the zip file won't do, I guess.
How can I go about doing this?
Compress the XML data with DeflateStream and write it's output to a MemoryStream. Then call .ToArray() method to obtain your blob data. You can also do encryption with .NET in a similar way as well (after compression of course). If you believe deflate is not enough to save space, then try this library: XWRT.
Firstly, have a look at your serialization mechanism. The whole point of XML is that it's human readable. If that's no longer an important goal for you then it might be time to look at other serialization technologies which would be more suited to database storage (compressing XML into binary completely defeats the point of it :)
As an alternative format, BSON could be a good choice.

Should I use XML or Binary to send data from server to client?

I have two separate apps - one a client (in C#), one a server (in C++). They need to exchange data in the form of "structs" and ~ about 1 MB of data a minute is sent from server to client.
Whats better to use - XML or my own Binary format?
With XML:
Translating XML to a struct using a parser would be slow I believe? ("good",but: load parser, load XML, parse)
The other option is parsing XML with regex (bad!)
With Binary:
compact data sizes
no need for meta information like tags;
but structs cannot be changed easily to accomodate new structs/new members in structs in future;
no conversion from text (XML) to binary (struct) necessary so is faster to receive and "assemble" into a struct)
Any pointers? Should I not be considering binary at all?? A bit confused about what approach to take.
1MB of data per minute is pretty tiny if you've got a reasonable network connection.
There are other choices between binary and XML - other human-readable text serialization formats, such as JSON.
When it comes to binary, you don't have to have versioning problems - technologies like Protocol Buffers (I'm biased: I work for Google and I've ported PB to C#) are explicitly designed with backward and forward compatibility in mind. There are other binary formats to consider as well, such as Thrift.
If you're worried about performance though, you should really measure it. I'm pretty sure my phone could parse 1MB of XML sufficiently quickly for it not to be a problem in this case... basically work out what you're most concerned about, in terms of:
Simplicity of code
Interoperability
Performance in terms of CPU
Network traffic
Backward/forward compatibility
Human readability of on-the-wire format
It's all a balancing act - but you're the one who has to decide how much weight to give each of those factors.
If you have .NET applications in both ends, use Windows Communication Foundation. This will allow you to defer the decision until deployment time, as it supports both binary and XML serialization.
As you stated, XML is a (little) slower but much more flexible and reliable. I would go with XML until there is a proven problem with performance.
You should also take a look a ProtoBuff as an alternative.
And, after your update, any cross-language, cross-platform and cross-version requirement strongly points away from binary formatting.
A good point for XML would be interoperability. Do you have other clients that also access your server?
Before you use your own binary format or do regex on XML...Have you considered the serialization namespace in .NET? There are Binary Formatters, SOAP formatters and there is also XmlSerialization.
Another advantage of a XML is that you can extend the data you are sending by adding an element, you wont have to alter the receiver's code to cope with the extra data until you are ready to.
Also even minimal(fast) compression of XML can dramatic reduce the wire load.
text/xml
Human readable
Easier to debug
Bandwidth can be saved by compressing
Tags document the data they contain
binary
Compact
Easy to parse (if fixed size fields are used, just overlay a struct)
Difficult to debug (hex editors are a pain)
Needs a separate document to understand what the data is.
Both forms are extensible and can be upgraded to newer versions provided you insert a type and version field at the beginning of the datagram.
you did not say if they are on the same machine or not. I assume not.
IN that case then there is another downside to binary. You cannot simply dump the structs on the wire, you could have endianness and sizeof issues.
XML is very wordy, YAML or JSON are much smaller
Don't forget that what most people think of as XML is XML serialized as text. It can be serialized to binary instead. This is what the netTcpBinding and other such bindings do in WCF. The XML infoset is output as binary, not as text. It's still XML, just in binary.
You could also use Google Protocol Buffers, which is a compact binary representation for structured data.

Categories

Resources