I have a .JSON file which is approx. 1.5MB in size containing around 1500 JSON objects that I want to convert into domain objects at the start-up of my app.
Currently my process on the Phone (not on my development PC) takes around 23 seconds which is far too slow for me and is forcing me to write the list of objects into ApplicationSettings so that I dont have to do it each time the app loads (just first off), but even that takes 15-odd seconds to write to, and 16 seconds to read from, all of which is not really good enough.
I have not had a lot of serialization experience and I dont really know the fastest way to get it done.
Currently, I am using the System.Runtime.Serialization namespace with DataContract and DataMember approach.
Any ideas on performance with this type of data loading?
I found the Json.NET library to be more performant and to have better options that the standard json serializer.
One performance issue I encountered in my app was that my domain objects implemented INotifyPropertyChanged with code to support dispatching the event back to the UI thread. Since the deserialization code populated those properties I was doing a lot of thread marshalling that didn't need to be there. Cutting out the notifications during deserialization substantially increased performance.
Update: I was using Caliburn Micro which has a property on PropertyChangedBase that can turn off property changed notifications. I then added the following:
[OnDeserializing]
public void OnDeserializing(StreamingContext context)
{
IsNotifying = false;
}
[OnDeserialized]
public void OnDeserialized(StreamingContext context)
{
IsNotifying = true;
}
Try profiling your app with the free EQATEC Profiler for WP7. The real issue could be something completely unexpected and easy to fix, like the INotifyPropertyChanged-example Nigel mentions.
You can quickly shoot yourself in the foot using the application settings. The issue is that these are always serialized/deserialized "in bulk" and loaded in memory, so unless your objects are extremely small this can cause memory and performance issues down the road.
I am still wondering about the need for 1500 objects. Do you really need 1500 of the entire object, and if so, why - ultimately the phone is showing something to the user and no user can process 1500 bits of information at once. They can only process information that is presented, no? So is there possible parts of the object that you can show, and wait to load the other parts until later? For example, if I have 2000 contacts I will never load 2000 contacts. I might load 2000 names, let the user filter/sort names, and then when they select a name load the contact.
I would suggest serializing this to isolated storage as a file. The built-in JSON serializer has the smallest footprint on disk and performs quite well.
Here is a post about serialization. Use binary or Json.Net.
Storing/restoring into ApplicationSettings is going to involve serialization as well (pretty sure it's Xml) so I don't think you are ever going to get any faster than the 16 seconds you are seeing.
Moving that amount of data around is just not going to be fast no matter how good the deerializer. My recommendation would be to look at why you are storing that many objects. If you can't reduce the set of objects you need to store look at breaking them up into logical groups so that you can load on demand rather than up front.
Have you tried using multiple smaller files and [de]serializing in parallel to see if that will be faster?
Related
I'm trying to cache a large object (around 25MB) that needs to be available for the user for 15 minutes.
In the beginning, I was using MemoryCache (single server) but now that we are going the HA route, we need it to be available to all the servers.
We tried to replace it with Redis, but it takes around 2 minutes (on localhost), between serializing and unserializing the object and the roundtrip (newtonsoft.json serialization).
So, the question is: How do you share large objects that have a short lifespan between servers in a HA?
Thanks for reading :)
I've had good luck switching from JSON to Protobuf ser/de, using the Protobuf-net package. But, it sounds like even if that cut it down to the oft-repeated 6x faster execution time, a 20 second deserialization time probably still won't cut it in this case - since the whole goal is to cache it for a particular user for a "short" period of time.
This sounds like a classic case of eager vs. lazy loading. Since you're already using Redis, have you considered separately caching each property of the object as a separate key? The more numerous the properties, and therefore the smaller each individual one is, the more beneficial this strategy will be. Of course, I'm assuming a fairly orthogonal set of properties on the object - if many of them have dependencies on each other, then this will likely perform worse. But, if the access patterns tend to not require the entire hydrated object, you may improve responsiveness a lot by fetching the demanded individual property instead of the entire object.
I'm assuming a lot about your object - but the simplest step would be implement each property's get accessor to perform the Redis Get call. This has a lot of other downsides regarding dependency management and multi-threaded access, but might be a simple way to achieve a proof of concept.
Keep in mind that this dramatically complicates the cache invalidation requirements. Even if you can store each property individually in Redis, if you then store that value in variable on each machine after fetching, you quickly run into an unmanaged cache situation where you cannot guarantee synchronized data depending on which machine serves the next request.
I have started memory profiling our app because we have recently received several reports about performance and out of memory exceptions. The app is developed in C# .Net Winforms (.Net Framework 2.0)
When the application started, the ANT profiler shows 17.7 MB objects live in Gen 2.
When the app starts, it reads the 77000+ zipcodes from a xml serialized file on the disk and saves in a Hashtable. Please see the sample code below
public Class ZipCodeItem
{
private string zipCode;
private string city;
private string state;
private string county;
private int tdhCode;
private string fipsCounty;
private string fipsCity;
Public ZipCodeItem()
{
// Constructor.. nothing interesting here
}
// Bunch of public getter/setter properties
}
Here is the static class that reads the serialized zip data from a file on disk and loads the zipcodes.
internal sealed class ZipTypes
{
private static readonly Hashtable zipCodes = new Hashtable();
public static ArrayList LookupZipCodes(string zipCode)
{
if (zipCodes.Count == 0)
LoadZipCodes();
ArrayList arZips = new ArrayList();
// Search for given zip code and return the matched ZipCodeitem collection
if (zipCodes.ContainsKey(zipCode))
{
// Populate the array with the matched items
}
// Omitted the details to keep it simple
return arZips;
}
private static bool LoadZipCodes()
{
using (FileStream stream = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
// unzip it.. Omitted the details to keep it simple
// Read the zipcodes from the flat xml file on disk and load the local zipCodes HashTable
}
}
}
This Class and the corr. ZipCodes are accessed all over the app.
About 14 meg out of 17.7 meg of Gen 2 objects are either zipCodeItems or Its child String classes.
I would like to change my code to some how NOT to keep these 77000+ zipcode item objects in memory (in a hashTable), but give the mapped zipCode items when the app needs it.
Any suggestions How to resolve this issue? Thanks in advance.
I'm going to avoid answering the question directly in hopes of providing a more useful answer because I don't believe that the ~14MB associated with that hash is actually causing a problem.
You say that you are using the ANTS memory profiler. That's a great tool and I have used it before, so perhaps I can help you track down the real problem here.
We can safely assume that your hash is not causing the OutOfMemoryException as it is nowhere near big enough to do so. Keep it as it is now, except for two small changes:
Use a strongly typed Dictionary<K,V> instead of Hashtable. Hashtable was essentially deprecated once .NET 2.0 introduce generics. You can also go ahead and replace that ArrayList with a List<T>.
Instead of performing a Contains and then looking up the value in the hash, use TryGetValue instead. This cuts the number of hash table lookups in half. Now that may not be a performance bottleneck in your app, but I don't think it amounts to premature optimization either.
Now, onto the crux of the issue...
You have your profiler results. Go back and look at where your memory is being allocated. In order, check these things:
Is .NET holding most of the memory or is it native code (possibly creating a lot of objects that implement IDisposable and not calling Dispose() on them in a timely manner.) If it's the latter you probably know where to look.
How does the Large Object Heap (LOH) look? Is most of the memory allocated there? Many large allocations can fragment the LOH and it may not be compacted for quite some time. ANTS will tell you this at the top right of the results overview page.
Event Handlers. When an object subscribes to an event a reference to the subscriber (method) is stored by the subscribee (by the MultiCastDelegate, i.e., the event object). This may cause object lifetimes to never end and, over some amount of time, this may be adding up memory wise. You need to make sure that, if there are objects being created and then going out of scope, it also unsubscribes from any events it had previously subscribed to. Static events can be a killer here.
Use ANTS to track object lifetimes. Similar to the above, make sure that there are no objects being kept alive inadvertently due to stale references. This can occur more easily than you may think. Again, look in areas where a relatively large number of objects are created and go out of scope, but also instances where other objects maintain references to them. ANTS can show you this in the object graph.
That should at least give you a good picture of what memory is being allocated where. You will likely need to run your program for some time under the profiler and simply watch memory usage. If it steadily goes up then you can perform the steps I listed above in an attempt to isolate which objects are piling up.
You are going to need some sort of storage along with an easy access mechanism.
I would suggest using some form of SQL based database.
We've had good success using SQL Compact Edition as it can be deployed without pre-requisites (aka a "private deployment"). Subsequently the integration story for querying is really tight - for example, using Linq to SQL is dead easy.
You could also look at SQL Lite or other providers. I would steer you away from a SQL Express dependency though just because it will be too heavy for these needs.
Also, seeing as though you would now be caching the zip codes in a database, you might also consider looking at some sort of "sync process" (say daily) for downloading and parsing your input XML file that you mentioned above (assuming the XML file is retrieved from some web service) or else you could just deploy the database already populated with the data.
If your users are complaining about out of memory exceptions and you're addressing something that's taking 15MB of memory, you're looking in the wrong place.
The rest of my answer assumes that the 15MB really does matter.
Having said that, I would like to offer an alternative to the SQL solutions already proposed (which are good solutions, depending on your situation, if you determine that you really don't want to load the 15MB into memory).
We load 3GB of IP data into the process space of our web servers. However, most of the IP addresses are not accessed most of the time (especially since we have a strong geographic bias in the user base, but still need to have the data available for visitors from less frequently seen parts of the world).
To keep the memory footprint small and the access very fast, we made use of memory mapped files. While there's support for them built-in to .NET 4, you can still use them via interop calls in any previous .NET version.
Memory mapped files allow you to very rapidly map data from a file into memory in your process space. SQL adds overhead for things you probably don't need in this situation (transactional support, key constraints, table relationships, etc... all great features in general but unnecessary to solve this problem, and cost something in terms of performance and memory footprint).
When the application started, the ANT profiler shows 17.7 MB objects
live in Gen 2
We had a similar issue and we find it is ok to keep these details in memoray rather than loading it several times. BUT it should not load in the start up. Change the logic to load that at the first usage on demand. Because there can be use cases where it is not using at all. No point of keeping that chunk in Gen2 for such cases.
I am sure there may be sever memory leak issues than this one if you
have got memory issues in your application. ANT profiler is a good for
such cases :-)
I'm working on an iPhone app in which I manage some kind of "agendas". The app needs to be authenticated on a server to work ( = no offline mode ) so I don't need any kind of persistence on the iPhone.
Is it bad practice if, once I retrieved datas for the agendas from the servers ( around 10-50 lists of ~30 days ) I store them into custom c# objects containing lists of other objects (basicly months containing days) instead of setting up a database or using XML ? I need to be able to edit and search through them fairly easily ( I'm using LINQ with success in my P.O.C. at the moment )
Thanks in advance,
regards,
C.Hamel
No, it is no bad practice. You don't need to persist this data, so no need for a local DB or XML. You need to load the data into memory anyway to search it...
As long as there is no requirement to persist data it is fine to store it in memory.
By the way, it is pretty straight forward and quick to serialize data in iPhone ,if you prefer (in case of low memory).
Serialization vs. Archiving? my answer with details about archiving and serializing. Hope that helps, if you want to persist(serialize) data.
I am not too sure how iPhone allocates memory to its processes, this may also varies from model to model. You might face some painful troubles, when the device wouldn't allow you to allocate sufficient amount of memory for all your data.
Even though keeping data in memory is super fast, I will suggest persisting it partly. You can get away with a limited cache of the most recently accessed data and persist everything else. It is more reliable and safe, maybe a little bit slower so you need to find a balance.
I work on a big project in company. We collect data which we get via API methods of the CMS.
ex.
DataSet users = CMS.UserHelper.GetLoggedUser(); // returns dataset with users
Now on some pages we need many different data, not just users, also Nodes of the tree of the CMS or specific data of subtreee.
So we thought of write an own "helper class" in which we later can get different data easy.
ex:
MyHelperClass.GetUsers();
MyHelperClass.Objects.GetSingleObject( ID );
Now the problem is our "Helper Class" is really big and now we like to collect different data from the "Helper Class" and write them into a typed dataset . Later we can give a repeater that typed dataset which contains data from different tables. (which even comes from the methods I wrote before via API)
Problem is: It is so slow now, even at loading the page! Does it load or init the whole class??
By the way CMS is Kentico if anyone works with it.
I'm tired. Tried whole night..but it's soooo slow. Please give a look to that architecture.
May be you find some crimes which are not allowed :S
I hope we get it work faster. Thank you.
alt text http://img705.imageshack.us/img705/3087/classj.jpg
Bottlenecks usually come in a few forms:
Slow or flakey network.
Heavy reading/writing to disk, as disk IO is 1000s of times slower than reading or writing to memory.
CPU throttle caused by long-running or inefficiently implemented algorithm.
Lots of things could affect this, including your database queries and indexes, the number of people accessing your site, lack of memory on your web server, lots of reflection in your code, just plain slow hardware etc. No one here can tell you why your site is slow, you need to profile it.
For what its worth, you asked a question about your API architecture -- from a code point of view, it looks fine. There's nothing wrong with copying fields from one class to another, and the performance penalty incurred by wrapper class casting from object to Guid or bool is likely to be so tiny that its negligible.
Since you asked about performance, its not very clear why you're connecting class architecture to performance. There are really really tiny micro-optimizations you could apply to your classes which may or may not affect performance -- but the four or five nanoseconds you'll gain with those micro-optimizations have already been lost simply by reading this answer. Network latency and DB queries will absolutely dwarf the performance subtleties of your API.
In a comment, you stated "so there is no problem with static classes or a basic mistake of me". Performance-wise, no. From a web-app point of view, probably. In particular, static fields are global and initialized once per AppDomain, not per session -- the variables mCurrentCultureCode and mcurrentSiteName sound session-specific, not global to the AppDomain. I'd double-check those to see your site renders correctly when users with different culture settings access the site at the same time.
Are you already using Caching and Session state?
The basic idea being to defer as much of the data loading to these storage mediums as possible and not do it on individual page loads. Caching especially can be useful if you only need to get the data once and want to share it between users and over time.
If you are already doing these things, ore cant directly implement them try deferring as much of this data gathering as possible, opting to short-circuit it and not do the loading up front. If the data is only occasionally used this can also save you a lot of time in page loads.
I suggest you try to profile your application and see where the bottlenecks are:
Slow load from the DB?
Slow network traffic?
Slow rendering?
Too much traffic for the client?
The profiling world should be part of almost every senior programmer. It's part of the general toolbox. Learn it, and you'll have the answers yourself.
Cheers!
First thing first... Enable Trace for your application and try to optimize Response size, caching and work with some Application and DB Profilers... By just looking at the code I am afraid no one can be able to help you better.
Is there any good approach to serialzing and de-serializing large files (>10M) in c#.
Thanks in advance.
There isn't any difference between de/serializing small or large files. You would just have to make sure that you don't deserialize very large files to memory, that's going to buy you OOM.
And large files are going to take more time of course. If that makes your user interface unresponsive then you'll want to do this processing in a background thread. BackgroundWorker is a typical solution for that.
Random shots in the dark here btw, your question is far too vague.
If you really have large files (let's say larger than 100 MB), the best thing is to load only the things you need at the moment.
Let's say you have a list of 10.000 customers - each with an image. It makes no sense to keep this list in the memory.
For example, you could load all lastnames and the position of the person in the file. So the user could search for a person and you could load exactly that person.
Another possibility would be loading the first ten and display them to the user. As soon as he clicks at a "Next" button you could load the next ten - just plan how to organize the information.
Instead of very large files, databases can bring some advantages. They can abstract the large amout of work required to navigate within the file.
"Single-Line-Serialization" using BinaryFormatter etc., however, reaches its limits at files of that size in my opinion. You have to think of other concepts.
You can check out my answer here to this question (there are all kinds of other relevant answers there too).
My method uses BinaryReader and BinaryWriter for performance.
I have used this method to deserialize 50MB files in a recent project, and it does it quite quickly (under 5 seconds) compared to the built-in serialization or Xml serialization (10 minutes for my data set).
Are you sure serialization/deserialization is the right approach for that much data? Would perhaps a client-side database, like SQLite be a better solution, where you can query it for the exact data you need, instead of just loading everything into memory?