Where can I store a very big data static resource - c#

I'm working on a WP(8) app which demands a big database (of words, to be precise). Actually, it has something like 300 000 values (currently stored in a text file). The thing is that I don't really want to open this file at every treatment, since it would add a lot of time to the process to parse the thing, that's why on the desktop version, I've got a module declaring a public array containing all the values.
But on Windows Phone, when I launch the app... it just crashes. The only reason I see is the array being to big, but where can I store all these strings then ? I don't think the List or the Dictionary will be better... Would you have any idea ?
Thank you in advance

As #MajkeloDev and #Ron Beyer said in comments:
Database (like SQLite) would benefit greatly in memory usage and wouldn't have to parse the file. Database wouldn't need to be parsed, nor loaded entirely in memory, you can query it for what you need (I find it hard to believe you need 300k words in memory when an SQL select could work).
Another way you can go if you don't want a database, is lazy loading your data, with a sort of pagination system, handling the disposal of unused items.
But that's too mess when you can just call the database.
It all depends on how do you need to use those words inside your application.

Related

C# combined files direct accessable

I am creating a very simple database in C# which I use to store playlists and an overview of all my music. I want to make this C compatible in the future I plan to make this completely text based. The idea is that every text file is a table, and the contents are JSON format where every line of text is a record.
I don't want to have loose files for each database, so I was thinking about something like a zip file. I don't want to extract and compress every time I access a file. Is there someway I can use a stream reader/writer in C# on different files where windows only see one file?
I'm not completely convinced that this is the way to go. So I'm open to suggestions.
Update,
Im currently messing around with the "Local database" item in C#. I never payed any atention to it. It could very well be the solution.
Update2,
SQLite seems to be very simple. I have some experience with MySQL in the past with some php projects so that will give me a headstart.
You want to use a file as a container containing different files? If so, there are a lot ways to accomplish this. These are techniques I used in the past:
Zip:
A compressed file, such as Zip is known to behave that way, can be used as a solution for your interest. It is capable to store virtual files. They can vary in size to at least up to 1 Gigabyte (testet, but I currently don't know if there are implementation based size limits).
SQLite:
SQLite sounds oldschool, but it stores all database related stuff into one physical file. Creating a database with tables for each virtual file should to the trick. This approach is useful if you know that your virtual files won't use a lot of bytes in size or neither reach any limit of sqlite field datatypes. As your virtual files are going to use textlines, may you can be able to form then into attributes and tuples. This way you can even use SQL specific statements to query and filter your data as you wish to.
There are still more ways to implement that kind of container format by your own, but propably needs to invest more time and work in it than getting effort out of it. Stay tuned for better ideas and may ready to use implementations :-)
Will you ever try to search between your data? Then use a real database manager, in C# the built in local database file is the simpliest choice (if you are familiar with SQL).
The zip file is a good choice for data space and compactness (a single file instead of many files) but it is very slow: for each database operation the whole zip file will be reorganized. Even a tar file (without compression) needs a continous reallocation when the content changes, and a zip file needs extra computation and relocation.
If you want something what is compressed and still standard, you can use OpenXML (ods or xlsx, does not matter) to store your data but the save operation will be slow and even slower as your database grows.

Best structure other than Datatable to store ~half-a-million records in Frontend

I have a Datatable which is fetching ~5,50,000 records from database(SQLite). When fetching it is slowing down the system.
I am storing these records on backend in SQLite Database and in frontend in Datatable.
Now what should i do so that database creation time(~10.5 hours) in backend and fetching time in Front end reduces.
Is there any other structure that can be used to do so. I have read that Dictionary & Binary File is fast. Can they be used for this purpose & how?
(Its not a web app. its a WPF desktop app where frontend and backend are on same machine).
Your basic problem, I believe, is not the strucutre that you want to maintain but the way you manage your data-flow. Instead of fetching all data from database into some strcuture (DataTable in your case), make some stored procedure to make a calculations you need on server-side and return to you data already calculated. In this way you gain several benefits, like:
servers holding database servers are usually faster then development or client machine
huge reduction of data trasmission, as you return only result of the calculation
EDIT
Considering edited post, I would say that DataTable is already highly optimized on in memory access, and I don't think changing it to something else will bring you a notable benefits. What I think can bring a banefit is a revision of a program flow. In other words, try to answer following questions:
do I need all that records contemporary ?
can I run some calculations in service, let's say, on night ?
can I use SQL Server Express (just example) and gain benefits of possibility to run a stored procedure, that may (have to be measured) run the stuff faster, even on the same machine ?
Have you thought about doing the calcuation at the location of the data, rather than fetching it? Can you give us some more information as what it's stored in and how you are fetching it and how you are processing it please.
It's hard to make a determination for a optimsation without the metrics and the background information.
It's easy to say - yes put the data in a file. But is the file local, is the network the problem, are you make best use of cores/threading etc.
Much better to start back from the data and see what needs to be done to it and then engineer the best optimisation.
Edit:
Ok so you are on the same machine? One thing you should really consider in this scenario is what are you doing to the data. Does it need to be SQL? If you are just using is to load a datatable? Or is a complexity you are not disclosing?
I've had a similar task- I just created a large text file and used memory mapping to read is efficicently without any overhead. Is this the kind of thing you're talking about?
You could try using a persisted dictionary like RaptorDB for storing and fetching/manipulating the data in an ArrayList. The proof of concept should not take long to build.

Best way to store small amount of data

I'm new to windows app and I would like to know what the best way to save a small amount of data, like 1 value a day.
I'm going for the text file because it's easy, but I know i could use MS Access.
Do you have other option ? Faster or better ?
Since you are already considering using a MS Access database, I would recommend using SQLite. Here's a quote from their site (SQLite Home Page):
SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.
It is really very easy to use - no installations required, you simply need to reference a DLL.
If you need to read it then use a plain text file.
If you need to read the values back into the application then serialize to an XML or binary file by making your user data serializable possibly by having a List of values in your object.
How do you want to use the data? Do you just want to look at it once in awhile? Do you plan to analyze it in a spreadsheet? Etc. Based upon what you say so far, I would just use a text file, one value per line. Even if later you wanted to do more with it, it's easy to import into spreadsheets, etc. If the daily data is a little more complicated (maybe a couple of different values for things each day), you might consider something like YAML.
Why stray from the path? XML gives you the ability to expand on it later without having to rethink everything.
Its mainly dependent upon the complexity of the data that you want to store. If its just DateTime some other simple built in type you would be able to recreate that object as a strongly typed one easily. But in case if its more complicated I would suggest you to create a serializable class (link on how to create such class is here) and then use one of Binary or SOAP serializations based on the size, security and other such needs. I am suggesting this as it would be best to be able to recreate objects as strongly typed ones from a flat file rather than just trying to parse what's there in the flat file.
Please let me know in case you need more clarity.
Thanks,
Sai Pavan

Storing and loading program data, C# WPF

I'm writing a utility program with C# in WPF that allows users to create role-playing scenarios, including monsters, items, characters, etc.
The user will create or import the elements (monsters, etc) and then use the imported elements to create scenarios. Everything used by the program is created within the program, so I don't have any pre-defined data I'll be accessing.
Here's my question - what's the best way to store and load the data?
Currently, I'm using XML serialization to serialize the objects to XML files and reload them later. This is kind of clunky, and I'm wondering if a database would be more effective - the data is definitely relational (monsters have items, maps have monsters, etc), and there could be dozens or hundreds of entries.
I don't need actual lines of code or methods to use, just an idea of what kind of file storage/retrieval would usually be used in this situation (in .NET).
Thanks!
As you said yourself: The data is relational so a relational database will probably help. Using Sql Server Compact you can have simple files, which are named whatever you want, that you load into Sql Server when opening. That way you won't have to administer a traditional database server and the user won't even know there is a database involved.
To access the data I'm personally very fond of Linq-to-Sql, which gives type-safe querying directly in C#.
Database is the way to go, definetely. Use an object-relational mapper to talk to a database, this will probably cover 99% of your needs at the beginning.
I prefer to keep the XML-serialization for the scenarios requiring different process intercommunication.
It really depends on what you need to achieve. Databases have a place, but flat files are also perfectly fine for data (via serialization).
So; what problems is the xml giving you? If you can answer that, then you'll know what the pain points are that you want to address. You mention "game", and indeed flat files tend to be more suitable (assuming you want minimum overhead etc), but either would normally do fine. Binary serialization might be more efficient in terms of CPU and disk (but I don't recommend BinaryFormatter - it will bite you when you change the types).
I'm not anti-database (far from it) - I just wanted to present a balanced viewpoint ;-p
You could use an object database (such as db4o). The benefits include: type safety, no ORM, indexed information...

Best (free) way to store data? How about updates to the file system?

I have an idea for how to solve this problem, but I wanted to know if there's something easier and more extensible to my problem.
The program I'm working on has two basic forms of data: images, and the information associated with those images. The information associated with the images has been previously stored in a JET database of extreme simplicity (four tables) which turned out to be both slow and incomplete in the stored fields. We're moving to a new implementation of data storage. Given the simplicity of the data structures involved, I was thinking that a database was overkill.
Each image will have information of it's own (capture parameters), will be part of a group of images which are interrelated (taken in the same thirty minute period, say), and then part of a larger group altogether (taken of the same person). Right now, I'm storing people in a dictionary with a unique identifier. Each person then has a List of the different groups of pictures, and each picture group has a List of pictures. All of these classes are serializable, and I'm just serializing and deserializing the dictionary. Fairly straightforward stuff. Images are stored separately, so that the dictionary doesn't become astronomical in size.
The problem is: what happens when I need to add new information fields? Is there an easy way to setup these data structures to account for potential future revisions? In the past, the way I'd handle this in C was to create a serializable struct with lots of empty bytes (at least a k) for future extensibility, with one of the bytes in the struct indicating the version. Then, when the program read the struct, it would know which deserialization to use based on a massive switch statement (and old versions could read new data, because extraneous data would just go into fields which are ignored).
Does such a scheme exist in C#? Like, if I have a class that's a group of String and Int objects, and then I add another String object to the struct, how can I deserialize an object from disk, and then add the string to it? Do I need to resign myself to having multiple versions of the data classes, and a factory which takes a deserialization stream and handles deserialization based on some version information stored in a base class? Or is a class like Dictionary ideal for storing this kind of information, as it will deserialize all the fields on disk automatically, and if there are new fields added in, I can just catch exceptions and substitute in blank Strings and Ints for those values?
If I go with the dictionary approach, is there a speed hit associated with file read/writes as well as parameter retrieval times? I figure that if there's just fields in a class, then field retrieval is instant, but in a dictionary, there's some small overhead associated with that class.
Thanks!
Sqlite is what you want. It's a fast, embeddable, single-file database that has bindings to most languages.
With regards to extensibility, you can store your models with default attributes, and then have a separate table for attribute extensions for future changes.
A year or two down the road, if the code is still in use, you'll be happy that 1)Other developers won't have to learn a customized code structure to maintain the code, 2) You can export, view, modify the data with standard database tools (there's an ODBC driver for sqlite files and various query tools), and 3) you'll be able to scale up to a database with minimal code changes.
Just a wee word of warning, SQLLite, Protocol Buffers, mmap et al...all very good but you should prototype and test each implementation and make sure that your not going to hit the same perf issues or different bottlenecks.
Simplicity may be just to upsize to SQL (Express) (you'll may be surprised at the perf gain) and fix whatever's missing from the present database design. Then if perf is still an issue start investigating these other technologies.
My brain is fried at the moment, so I'm not sure I can advise for or against a database, but if you're looking for version-agnostic serialization, you'd be a fool to not at least check into Protocol Buffers.
Here's a quick list of implementations I know about for C#/.NET:
protobuf-net
Proto#
jskeet's dotnet-protobufs
There's a database schema, for which I can't remember the name, that can handle this sort of situation. You basically have two tables. One table stores the variable name, and the other stores the variable value. If you want to group the variables, then add a third table that will have a one to many relationship with the variable name table. This setup has the advantage of letting you keep adding different variables without having to keep changing your database schema. Saved my bacon quite a few times when dealing with departments that change their mind frequently (like Marketing).
The only drawback is that the variable value table will need to store the actual value as a string column (varchar or nvarchar actually). Then you have to deal with the hassle of converting the values back to their native representations. I currently maintain something like this. The variable table currently has around 800 million rows. It's still fairly fast, as I can still retrieve certain variations of values in under one second.
I'm no C# programmer but I like the mmap() call and saw there is a project doing such a thing for C#.
See Mmap
Structured files are very performing if tailored for a specific application but are difficult to manage and an hardly reusable code resource. A better solution is a virtual memory-like implementation.
Up to 4 gigabyte of information can be managed.
Space can be optimized to real data size.
All the data can be viewed as a single array and accessed with read/write operations.
No needing to structure to store but just use and store.
Can be cached.
Is highly reusable.
So go with sqllite for the following reasons:
1. You don't need to read/write the entire database from disk every time
2. Much easier to add to even if you don't leave enough placeholders at the beginning
3. Easier to search based on anything you want
4. easier to change data in ways beyond the application was designed
Problems with Dictionary approach
1. Unless you made a smart dictionary you need to read/write the entire database every time (unless you carefully design the data structure it will be very hard to maintain backwards compatibility)
----- a) if you did not leave enough place holders bye bye
2. It appears as if you'd have to linear search through all the photos in order to search on one of the Capture Attributes
3. Can a picture be in more than one group? Can a picture be under more than one person? Can two people be in the same group? With dictionaries these things can get hairy....
With a database table, if you get a new attribute you can just say Alter Table Picture Add Attribute DataType. Then as long as you don't make a rule saying the attribute has to have a value, you can still load and save older versions. At the same time the newer versions can use the new attributes.
Also you don't need to save the picture in the database. You could just store the path to the picture in the database. Then when the app needs the picture, just load it from a disk file. This keeps the database size smaller. Also the extra seek time to get the disk file will most likely be insignificant compared to the time to load the image.
Probably your table should be
Picture(PictureID, GroupID?, File Path, Capture Parameter 1, Capture Parameter 2, etc..)
If you want more flexibility you could make a table
CaptureParameter(PictureID, ParameterName, ParameterValue) ... I would advise against this because it is a lot less efficient than just putting them in one table (not to mention the queries to retrieve/search the Capture Parameters would be more complicated).
Person(PersonID, Any Person Attributes like Name/Etc.)
Group(GroupID, Group Name, PersonID?)
PersonGroup?(PersonID, GroupID)
PictureGroup?(GroupID, PictureID)

Categories

Resources