I'm on a project that processes and reports on large sets of aggregatable row based data. There is a primary aggregation service and then many clients who can subscribe to different views of the data from that server. The objects are passed back and forth between the Java server and the C# clients encoded in JSON. We're noticing that the parsing of the objects is taking a lot of time and somewhat memory intensive. Have others used JSON for this purpose or seen similar behavior?
We used to use straight XML across the wire and had to use custom serialization (ie. manual) for alot of the objects. While not JSON we did have performance hits due to this constraint. Once we migrated all our tech to a similar architecture we were able to switch to binary serialization which worked much better.
However on the objects where we had issues with performance due to size we made some modifications. Since we had access to the code on both ends (and both were c#) we were able to binary serialize the payload and then base64 encode it since it had to be text across the wire. It did help a good bit in terms of object size and the serialization ran a bit faster.
Since you are going from Java to C# you won't really have that luxury. So the only thing I can think of in your case would be to try and optimize your parsing of the JSON response. You may be able to use some code profiling tools to help you identify portions that are causing you performance issues and then try to optimize those. Also, on the deserialize to JSON make sure you use a string builder to build your final string. If you are doing standard concat operations it will kill performance as well.
Also, you might want to check around I have seen on the web several JSON serializers written for c# some may be faster than what you are doing, who knows.
Not sure if that helps you all that much but there is some info from things we have seen with string based message passing.
UPDATE: Just saw this on dotnetkicks: JSON.Net it's an update from james for the json.net serializers. May help out.
I know for java there are any number of opensource JSON serializers and deserializers. We use FlexJSON.
JSON can be expensive to decode. If performance is an issue try using something like Hessian.
Related
Server side - C# or java
Client side Objective C
I need a way to serialize an object in C#\java and de-serialize it in Objective C.
I'm new to Objective C and I was wondering where I can get information about this issue.
Thanks.
Apart from the obvious JSON/XML solutions, protobuf may also be interesting. There are Java//c++/python backends for it and 3rd parties have created backends for C# and objective-c (never used that one though) as well.
The main advantages are it being much, much faster to parse[1], much smaller[2] since it's a binary format and the fact that versioning was an important factor from the beginning.
[1] google claims 20-100times compared to XML
[2] 3-10times according to the same source
Another technology similar to protobufs is Apache Thrift.
Apache Thrift is a software framework for scalable cross-language services development. Apache Thrift allows you to define data types and service interfaces in a simple definition file. Taking that file as input, the compiler generates code to be used to easily build RPC clients and servers that communicate seamlessly across programming languages.
JSON for relatively straight forward object graphs
XML/REST for more complex object graphs (distinction between Arrays / Collections / nested arrays etc)
Sudzc. I am using it. It is pretty easy to invoke a Webservice from i-os app.
You dont have to write code to serialize object.
JSON is probably the best choice, because:
It is simple to use
It is human-readable
It is data-based rather than being tied to any more complex object model
You will be able to find decent libraries for import/export in most languages.
Serialisation of more complex objects is IMHO not a good idea from the perspective of portability since often one language/platform has no effective way of expressing a concept from another language / platform. e.g. as soon as you start declaring "types" or "classes" of serialised objects you run into the thorny issue of differing object models between languages.
On iOS there are couple of JSON frameworks and libraries with an Objective-C API:
JSONKit
SBJson
TouchJson
are probably the most prominent.
JSONKit is fast and simple, but can only parse a contiguous portion of JSON text. This means, you need to save downloaded data into a temporary file, or you need to save all downloaded JSON text into a NSMutableData object (kept in memory). Only after the JSON text has been downloaded completely you can start parsing.
SBJson is more flexible to use. It provides an additional "SAX style" interface, can parse partial input and can parse more than one JSON document per "input" (for example several JSON documents per network connection). This is very handy when you want to connect to a "streaming API" (e.g. Twitter Streaming API), where many JSON documents can arrive per connection. The drawback is, it is a much slower than JSONKit.
TouchJson is even somewhat slower than SBJson.
My personal preference is some other, though. It is faster than JSONKit (20% faster on arm), has an additional SAX style API, can handle "streaming APIs", can simultaneously download and parse, can handle very large JSON strings without severely impacting memory foot-print, while it is especially easy to use with NSURLConnection. (Well, I'm probably biased since I'm the author).
You can take a look at JPJson (Apache License v2):
JPJson - it's still in beta, though.
I've been doing some reading up on XML serialization, and from what I understand, It is a way to take an object and persist the state in a file. As for the implementation, it looks straight forward enough, and there seems to be a load of resources for applying it. When should XML serialization be used? What are the benefits? What are situations that are best helped by using this?
The .NET XmlSerializer class isn't the only way to persist an object to XML. The newer DataContractSerializer is faster, and also allows an object to be persisted to a binary form of XML, which is more compact.
The XmlSerializer is only getting limited bug fixes these days, in part because so much code depends on the precise details of how it works, in part because it is associated with ASMX web services, which Microsoft considers to be a "legacy technology".
This is not the case with the DataContractSerializer, which continues to be a vibrant and important part of WCF.
You've answered a little bit of the question in your post. It's great for persisting the state of an object. I've used it in applications for saving user settings. It's also a great way to send data to other systems, since it is standardized. An important thing to remember is that it is easily human readable. This can either be a good or bad thing depending on your situation. You might want to consider encrypting it, or using encrypted binary serialization if you don't want someone else to be able to understand it.
EDIT:
Another gotchya worth mentioning is that the .NET implemented XMLSerializer only serializes public members in a object. If you need to persist private or protected members, you will either need to use a customized serializer or use another form of serialization.
Its good for communication between disparate systems. E.G. take a Java app and a C# app and allow them to communicate via a webservice with serializeable XML objects. Both apps understand XML and are shielded from the details of the other language. And yes while you could fire strings back and forth, XML gives us strong typing and schema validation.
this is just from personal experiences - XML serialization is good for web services.
Also, if you want to modify (or allow the modification of) the object/file that you're storing to without using the application that you're writing (i.e third party app), XML can be a good choice.
I send an array of Objects of type class I wrote, using HttpWebRequest, so I cant send it as an object , because im mixing HttpWebRequest + Soap (that Im writing), and in Soap you cant send a non predefined Objects as String, int , ... .
so I used XML serialization to convert my object to an XML string and send it through my HttpWebRequest .
I have built two programs in C# and I am sending simple strings through the sockets. This is fine for the moment but in the near future I will need to send more complicated items, such as objects down the sockets and eventually files.
What steps would I take to do this? What purpose do the buffers serve for the sockets/streams? Apologies if I am a little vague.
If you are sending objects, you have to really be careful with what you do and how you are planning on using those objects on the other end. All properties need to be serialized. If you are going to have large amounts of data in theses objects, you may want to use binary serialization instead.
Also, look at the guidelines posted here: MSDN Serialization Guidelines
If you are going to be sending objects, you may want to look at either .Net Remoting options or WCF Services if applicable. Rolling your own socket handlers and then using it for complex operations is asking for a lot of time and pain, especially if you haven't done it before.
There are many options, but basically you want to serialise the data into a format that will go through the socket.
Worth looking here into xml serialisation.
One way you can handle this is to serialize your object into XML, send over the socket, then deserialize it. I've done it this way before. However, I (being fairly new to .NET) just learned about the JavaScriptSerializer, which I believe makes this process a lot easier for you.
You need to serialize the objects.. Mark it with [Serializable] attribute and use some serializers.. Example can be found here.
First thing in any comms situation is to consider that anything you send must be able to get serialised and de serialised so that it can get over a comms channel. Next you must consider that comms have latency (its not instantaneous), and then the fact that it can fail.
After this you consider the protocols and technology to enable the above to be factored in.
I have two separate apps - one a client (in C#), one a server (in C++). They need to exchange data in the form of "structs" and ~ about 1 MB of data a minute is sent from server to client.
Whats better to use - XML or my own Binary format?
With XML:
Translating XML to a struct using a parser would be slow I believe? ("good",but: load parser, load XML, parse)
The other option is parsing XML with regex (bad!)
With Binary:
compact data sizes
no need for meta information like tags;
but structs cannot be changed easily to accomodate new structs/new members in structs in future;
no conversion from text (XML) to binary (struct) necessary so is faster to receive and "assemble" into a struct)
Any pointers? Should I not be considering binary at all?? A bit confused about what approach to take.
1MB of data per minute is pretty tiny if you've got a reasonable network connection.
There are other choices between binary and XML - other human-readable text serialization formats, such as JSON.
When it comes to binary, you don't have to have versioning problems - technologies like Protocol Buffers (I'm biased: I work for Google and I've ported PB to C#) are explicitly designed with backward and forward compatibility in mind. There are other binary formats to consider as well, such as Thrift.
If you're worried about performance though, you should really measure it. I'm pretty sure my phone could parse 1MB of XML sufficiently quickly for it not to be a problem in this case... basically work out what you're most concerned about, in terms of:
Simplicity of code
Interoperability
Performance in terms of CPU
Network traffic
Backward/forward compatibility
Human readability of on-the-wire format
It's all a balancing act - but you're the one who has to decide how much weight to give each of those factors.
If you have .NET applications in both ends, use Windows Communication Foundation. This will allow you to defer the decision until deployment time, as it supports both binary and XML serialization.
As you stated, XML is a (little) slower but much more flexible and reliable. I would go with XML until there is a proven problem with performance.
You should also take a look a ProtoBuff as an alternative.
And, after your update, any cross-language, cross-platform and cross-version requirement strongly points away from binary formatting.
A good point for XML would be interoperability. Do you have other clients that also access your server?
Before you use your own binary format or do regex on XML...Have you considered the serialization namespace in .NET? There are Binary Formatters, SOAP formatters and there is also XmlSerialization.
Another advantage of a XML is that you can extend the data you are sending by adding an element, you wont have to alter the receiver's code to cope with the extra data until you are ready to.
Also even minimal(fast) compression of XML can dramatic reduce the wire load.
text/xml
Human readable
Easier to debug
Bandwidth can be saved by compressing
Tags document the data they contain
binary
Compact
Easy to parse (if fixed size fields are used, just overlay a struct)
Difficult to debug (hex editors are a pain)
Needs a separate document to understand what the data is.
Both forms are extensible and can be upgraded to newer versions provided you insert a type and version field at the beginning of the datagram.
you did not say if they are on the same machine or not. I assume not.
IN that case then there is another downside to binary. You cannot simply dump the structs on the wire, you could have endianness and sizeof issues.
XML is very wordy, YAML or JSON are much smaller
Don't forget that what most people think of as XML is XML serialized as text. It can be serialized to binary instead. This is what the netTcpBinding and other such bindings do in WCF. The XML infoset is output as binary, not as text. It's still XML, just in binary.
You could also use Google Protocol Buffers, which is a compact binary representation for structured data.
Which serialization should I use?
I need to store a large Dictionary with 100000+ elements, and I just need to save and load this data directly without caring whether it's binary or whether it's formatted or not.
Right now I am using the BinarySerializer but not sure if it's the most effective?
Please suggest better alternatives in the .NET standard libraries or an external library, preferably free.
EDIT: This is to serialize to disk and from it. The app is single threaded too.
Well, it will depend on what's in the dictionary - but if Protocol Buffers is flexible enough for you (you have to define your own types to serialize - it doesn't do all .NET types or anything like that), it's pretty darned fast.
For example, in protocol buffers I'd represent the dictionary as a message with a repeated key/value pair field. For ultimate speed you could use the CodedOutputStream and CodedInputStream to serialize/deserialize the dictionary directly rather than reading it all into memory separately first. Again, it'll depend on what the key/value types are though.
This is entirely a guess since I haven't profiled this (ie. which is what you should do to truly get your answer).
But my guess is that the binary serializer would give you the best performance. Both in size and speed.
This is a bit of an open-ended question. Are you storing this in memory or writing it to disk? Does this execute in a multi-threaded (and perhaps multi-concurrent-access) environment? Context is important.
BinarySerializer is generally going to be pretty fast, and there are external libs that provide better compression such as ProtoBuffers. I've personally had good success with DataContractSerializer.
The great thing about all these options is that you can try all of them (relatively pain free) to learn for yourself what works in your environment and operation.