Saving List<T> to save adding items on every start - c#

I use a List<string> to which I add a few hundred thousand of random strings to be then used within my program.
I need this list quite at the beginning of the program and the random strings which it uses are saved in a textfile. At the moment the first thing the program does after loading, is adding all the items to the list. However, since the list is exactly the same every single time I wonder if there is a way to save it somehow internally so the list can just be directly used and does not need to be expanded on every single startup.

The list needs to be persisted somewhere otherwise when the application shuts down you will loose all values. When the application shuts down, the memory that was used to store this list is returned back to the Operating System. So, no, there's no other way to have the list in memory when the application starts without reading it from somewhere - whether this would be a file, database or some other storage you need to load it from there or regenerate it from scratch.
If you do not care about the file format in which the list is stored you could use a BinaryFormatter for faster serialization and deserialization compared to XML, JSON and other formats.

Related

Application performance degradation due to several large in memory Dictionary objects in C#

I am working on a winforms application where I have to load data from Web API calls. Few million rows of data will be returned and had to be stored in a Dictionary. The logic goes like this. User will click on an item and data would be loaded. If the user clicks on another item, another new dictionary would be created.During the course of time several such heavy weight Dictionary objects would be created. The user might not use the old Dictionary objects after some time. Is this a case for using WeakReference?. Note that recreating any Dictionary object would take 10 to 20 secs. If I opt to keep all the objects in memory, the application performance degrades slowly after some time.
The answer here is to use a more advanced technique.
Use a memory-mapped file to store the dictionaries on disk; then you don't have to worry about holding them all in memory at once as they will be swapped in and out by the OS per demand.
You will want to write a Dictionary designed specifically to operate in the memory mapped file region, and a heap to store things pointed to by the key value pairs in the dictionary. Since you aren't deleting anything, this is actually pretty straightforward.
Otherwise you should take Fildor 4's suggestion and Just Use A Database, as it will basically do everything I just mentioned for you and wrap it up in a nice syntax.

Efficiently random enumeration of files from huge directory

I want to be able to enumerate files with a specific search pattern (e.g., *.txt) recursively from a directory. But with couple of constraints:
The mechanism should be very efficient. The goal is to enumerate file one by one (using IEnumerable), so that if there is a huge list of files, then it shouldn't take forever to get one file for processing.
The enumeration should return files randomly, so that if two instances of my program are trying to enumerate the directory, both should not be seeing the files in the same sequence.
Given the requirements, DirectoryInfo.EnumerateFiles looks promising, except that it does not fulfill the second requirement. If I remove the performance consideration, the solution is straightforward (just get the entire collection and randomize the sequence before accessing).
Can someone suggest possible choices for C# implementation in .net 3.5/4.0 ?
What you are asking for is impossible.
A truly "random" enumeration (in the sense that the order likely changes each time) requires a "pick without replacement" strategy. Such a strategy necessarily requires two pools: one of "chosen" files, and one of "unchosen." The "unchosen" list has to be populated before anything from it can be "chosen" randomly. This breaks your #1 requirement.
Two thoughts on how to solve your problem:
What is the problem with two instances seeing the files in the same order? If it's a file locking issue, choose a read-only lock.
You might be able to get away with a "holding pile" approach. Here, you would create your own enumerator class that starts by reading a small number of FileInfo records into a "Hold" collection. Then, each time your calling code requests a file, it either feeds one directly from the EnumerateFiles, or it reads one from there but swaps it out with one in your "Hold" pile and returns that one instead. The decision would be random until the EnumerateFiles returns nothing, at which point you would empty out your Hold pile. That won't provide a truly random selection order, but maybe it will add enough fuzziness to the order to meet your needs. The max size of the "Hold" collection can be adjusted to taste to balance your need for "randomness" with the need to quickly get the first file.

How to resolve Out of memory exception in WP7 using dictionary?

I build an application in WP7 where i require to load around 20000 hard coded data({'a',"XYZ"},{'b',"mno"},....)on which i have to perform the search. So i trying to do this by creating a dictionary making 'a' as key and value as "XYZ". As soon as my dictionary gets filled it gives Out of memory exception.
How can i solve this problem considering that i m building WP7 application?
Or Is there some way other than using dictionary?
Whenever you are loading so much data onto a phone, you're doing it wrong. Firstly, the bandwidth issue is going to kill your app. Second, the memory issue has already killed your app. Thirdly, the CPU issue is going to kill your app. The conclusion is, your user ends up killing your app.
Recommended solution: find a way to categorize the data so that not all of it must download to the phone. Do your processing on the server where it belongs (not on a phone).
If you insist on processing so much data on the phone, first try to manage the download size. Remember you're talking about a mobile phone here, and not everywhere has max 3G speeds. Try to compress the data structure as much as possible (e.g. using a tree to store common prefixes). Also try to zip up the data before downloading.
Then count your per-object memory usage aggressively. Putting in 20,000 strings can easily consume a lot of memory. You'd want to reduce the size of per-object memory usage as possible. In your example, you are just putting strings in there, so I can't guess how you'd be using up the tens of MB allowable on a WP7 app. However, if you are putting not just strings, but large objects, count the bytes.
Also, manage fragementation aggressively. The last thing you'll want to do is to new Dictionary() then dict.Add(x,y); in a for-loop. When the dictionary's internal table space runs full, it gets allocated to a new place, and the entire dictionary copied to the new place, wasting the original space. You end up having lots of fragmented memory space. Do a new Dictionary(20000) or something to reserve the space first in one go.
Instead of storing it in memory as a Dictionary you can store it in a Database(wp7sqlite) and fetch only the data required.In this way you can store whatever amount of data.
Edit
No nothing is required in extra from user end.you can create the database using sqlite manager,attach this to the project.Copy DB to Isolated storage on first usage.and you can access the DB whenever you want .Check this Link DB helper.This link uses sqlitewindowsphone instead of WP7Sqlite.I prefer wp7sqlite Since i got an error using sqlitewindowsphone.

Using Large Lists

In an Outlook AddIn I'm working on, I use a list to grab all the messages in the current folder, then process them, then save them. First, I create a list of all messages, then I create another list from the list of messages, then finally I create a third list of messages that need to be moved. Essentially, they are all copies of each other, and I made it this way to organize it. Would it increase performance if I used only one list? I thought lists were just references to the actual item.
Without seeing your code it is impossible to tell if you are creating copies of the list itself or copies of the reference to the list - the latter is preferable.
Another thing to consider is whether or not you could stream the messages from Outlook using an iterator block. By using a List<T> you are currently buffering the entire sequence of messages which means you must hold them all in memory, processing them one at a time. Streaming the messages would reduce the memory pressure on your application as you would only need to hold each message in memory long enough to process it.
Unless your lists contains 10 millions items or more, it should not be a problem.
Outlook seems to have a problem much smaller sized mailboxes, so I would say you are pretty much safe.

Synchronizing filesystem and cached data on program startup

I have a program that needs to retrieve some data about a set of files (that is, a directory and all files within it and sub directories of certain types). The data is (very) expensive to calculate, so rather than traversing the filesystem and calculating it on program startup, I keep a cache of the data in a SQLite database and use a FilesystemWatcher to monitor changes to the filesystem. This works great while the program is running, but the question is how to refresh/synchronize the data during program startup. If files have been added (or changed -- I presume I can detect this via last modified/size) the data needs to be recomputed in the cache, and if files have been removed, the data needs to be removed from the cache (since the interface traverses the cache instead of the filesystem).
So the question is: what's a good algorithm to do this? One way I can think of is to traverse the filesystem and gather the path and last modified/size of all files in a dictionary. Then I go through the entire list in the database. If there is not a match, then I delete the item from the database/cache. If there is a match, then I delete the item from the dictionary. Then the dictionary contains all the items whose data needs to be refreshed. This might work, however it seems it would be fairly memory-intensive and time-consuming to perform on every startup, so I was wondering if anyone had better ideas?
If it matters: the program is Windows-only written in C# on .NET CLR 3.5, using the SQLite for ADO.NET thing which is being accessed via the entity framework/LINQ for ADO.NET.
Our application is cross-platform C++ desktop application, but has very similar requirements. Here's a high-level description of what I did:
In our SQLite database there is a Files table that stores file_id, name, hash (currently we use last modified date as the hash value) and state.
Every other record refers back to a file_id. This makes is easy to remove "dirty" records when the file changes.
Our procedure for checking the filesystem and refreshing the cache is split into several distinct steps to make things easier to test and to give us more flexibility as to when the caching occurs (the names in italics are just what I happened to pick for class names):
On 1st Launch
The database is empty. The Walker recursively walks the filesystem and adds the entries into the Files table. The state is set to UNPROCESSED.
Next, the Loader iterates through the Files table looking for UNPARSED files. These are handed off to the Parser (which does the actual parsing and inserting of data)
This takes a while, so 1st launch can be a bit slow.
There's a big testability benefit because you can test the walking the filesystem code independently from the loading/parsing code. On subsequent launches the situation is a little more complicated:
n+1 Launch
The Scrubber iterates over the Files table and looks for files that have been deleted and files that have been modified. It sets the state to DIRTY if the file exists but has been modified or DELETED if the file no longer exists.
The Deleter (not the most original name) then iterates over the Files table looking for DIRTY and DELETED files. It deletes other related records (related via the file_id). Once the related records are removed, the original File record is either deleted or set back to state=UNPARSED
The Walker then walks the filesystem to pick-up new files.
Finally the Loader loads all UNPARSED files
Currently the "worst case scenario" (every file changes) is very rare - so we do this every time the application starts-up. But by splitting the process up unto these steps we could easily extend the implementation to:
The Scrubber/Deleter could be refactored to leave the dirty records in-place until after the new
data is loaded (so the application "keeps working" while new data is cached into the database)
The Loader could load/parse on a background thread during an idle time in the main application
If you know something about the data files ahead of time you could assign a 'weight' to the files and load/parse the really-important files immediately and queue-up the less-important files for processing at a later time.
Just some thoughts / suggestions. Hope they help!
Windows has a change journal mechanism, which does what you want: you subscribe to changes in some part of the filesystem and upon startup can read a list of changes which happened since last time you read them. See: http://msdn.microsoft.com/en-us/library/aa363798(VS.85).aspx
EDIT: I think it requires rather high privileges, unfortunately
The first obvious thing that comes to mind is creating a separate small application that would always run (as a service, perhaps) and create a kind of "log" of changes in the file system (no need to work with SQLite, just write them to a file). Then, when the main application starts, it can look at the log and know exactly what has changed (don't forget to clear the log afterwards :-).
However, if that is unacceptable to you for some reason, let us try to look at the original problem.
First of all, you have to accept that, in the worst case scenario, when all the files have changed, you will need to traverse the whole tree. And that may (although not necessarily will) take a long time. Once you realize that, you have to think about doing the job in background, without blocking the application.
Second, if you have to make a decision about each file that only you know how to make, there is probably no other way than going through all files.
Putting the above in other words, you might say that the problem is inherently complex (and any given problem cannot be solved with an algorithm that is simpler than the problem itself).
Therefore, your only hope is reducing the search space by using tweaks and hacks. And I have two of those on my mind.
First, it's better to query the database separately for every file instead of building a dictionary of all files first. If you create an index on the file path column in your database, it should be quicker, and of course, less memory-intensive.
Second, you don't actually have to query the database at all :-)
Just store the exact time when your application was last running somewhere (in a .settings file?) and check every file to see if it's newer than that time. If it is, you know it's changed. If it's not, you know you've caught it's change last time (with your FileSystemWatcher).
Hope this helps. Have fun.

Categories

Resources