XML via database - c#

I have a list of strings,the list is constant and won't be changed, there are 12 strings.
Inside database table I have a column with an index to the string.
I don't think it's wise to hold a separate table to hold those strings because they never get changed neither to save the string it self inside this column.
So the only option is to hold the list in some other type.
What about holding the strings in XML file and using Linq-to-Xml to load them into dictionary.
If so is thi better, performance wise, then using datatable?

Those strings will most likely get cached by your SQL server and apply almost no performance hit. But will give you a flexibility in case you have multiple applications sharing same database. Overall keep those in the database unless you have/expect millions database hits.

I agree with Zepplock, keep the strings in the database. You won´t have to worry about performance. One of the big reason is also that if you do so, it will easier for future developers to find the strings and understand their function within the application if you store them in the database in their proper context.

It sounds as if you're describing a table holding product catalog data. Suggest keeping those values in their own rows, and not stored as an XML datatype or in XML in a varchar column.
It sounds as if the data is static today, and rarely if ever, is changed. By storing in XML, you lose the potential future advantage of the relational nature of the database.
Suggest keeping them in a table. As you say, it's only 12 strings/products, and the performance hit will be zero.

Related

String search in Access Database

I am new to this site. I have a question about data structures. Here's the project:
I have a MS Access Database, with approx 50 tables. Each table has an index field (sequential auto number) and 10-12 Memo type fields. These fields can contain small or large amounts of text. In all, the DB contains between 20,000 - 40,000 individual strings (Memo field entries).
I am looking for a way to search for a string in all of these tables (using C# / ASP.NET). I do not have a lot of exposure to either Access, C# or ASP..but..I am thinking that there may be a data structure that might be more suitable (than any other) that might work. I am conscious that reading that amount of data into any data structure would be a memory hog, which is why I am asking the question. So the question relates specifically to suitable data structures (arrays, linked lists, etc) that might be the most appropriate. I will try to figure the rest out later..
Thanks..
What you're looking for is to connect to your database first in order to get the data necessary at first hand. You can look it up there to help yourself : http://msdn.microsoft.com/en-us/library/bb655884(v=vs.90).aspx
Then, you'll then be able to select the needed data, for instance the strings, with your database with SqlDataSource. In order to get more information is this situation, look there : http://msdn.microsoft.com/en-us/library/w1kdt8w2(v=vs.90).aspx
Finally, if you were to connect to your database and put the data in a data structure like a list or a an arraylist. Don't put it in a set because you cannot repeat data in a set, if you have more than once the same string, you'll end up with false(missing) data.
Since this is quite important to know, I would strongly refer you to go to msdn and look up this : An Extensive Examination of Data Structures Using C# 2.0. It will provide you with better knowledge of data structure so you'll know next time what you need
Hope it helps you !

.NET data storage - something between built-in collections and external SQL database?

I will preface the question by saying that I am somewhat new to the .NET world, and might be missing something entirely obvious. If so, I'd love to hear what that is!
I often find myself writing small program that all do more or less the same thing:
Read in data from one or more files
Store this data in memory, in some sort of container
Crunch data, output analysis results to a text file and quit
I often find myself creating monstrous-looking containers to store said data. E.g.:
Dictionary<DateTime, SortedDictionary<ItemType, List<int>>> allItemTypesAndPropertiesByDate =
new Dictionary<DateTime, SortedDictionary<ItemType, List<int>>>();
This works, in the sense that the data structure describes my intent more or less accurately - I want to be able to access item types and properties by date. Nevertheless, I feel that the storage container is too tightly bound to the output data format (if tomorrow I decide that I'd like to find all dates on which items with certain properties were seen, this data structure becomes a liability). Generally, making input and output changes down the line is time-consuming and error-prone. Plus, I have to keep staring at these ugly-looking declarations - and code to iterate over them is not pretty either.
On the other end of the complexity spectrum, I can create a SQL database with schema that describes input in a more flexible format, and then run queries (using SQL or LINQ to SQL) against the database. This certainly works, but feels like too big of a hammer - I write many programs like these, and don't want to create a database for each one, manage the SQL dependency (even if it is SQL express on local machine), etc. I don't need to actually persist the data - just to read it in, keep it in memory, make a few queries and quit. Even using an in-memory SQLite instance feels like an overkill. I am not overly concerned with runtime performance - these are usually just little local machine experiments - but it just feels wrong.
Ideally, what I would like is to have a low-overhead, in memory row store with a loosely-defined schema that is easily LINQ-queryable, and takes only a few lines of code to set up and use. Does the Microsoft .NET 4 stack include something like this? If you found yourself in a similar predicament, what would you do?
Your thoughts are appreciated - thanks!
Alex
If you find a database structure easier to work with, one option might be to create a DataSet with DataTables representing your schema which you can then query using Linq 2 DataSets
Or you could try to use object databases like db4o; they store the actual objects you would work with, helping you to program in a more object-oriented manner, and it's quite easy to work with. Also, it's not a database server in the traditional sense of the word - it uses flat files as containers and reads/writes directly from/to them.
Why not just use linq?
You can read the data into flat lists, then chain some linq statements to get the structure you want.
Apologies if I'm missing something, but I don't think you need an intermediate.
Comparing databases and OOP, a table definition corresponds to a class definition, a record is an object, and the table data is any kind of collection of objects.
My approach would be to define classes and properties representing the file contents, parse each file entry into an object, and add these objects into a List< T>.
This List can then be queried using Linq.

Is it a good idea to store serialized objects in a Database instead of multiple xml text files?

I am currently working on a web application that requires certain requests by users to be persisted. I have three choices:
Serialize each request object and store it as an xml text file.
Serialize the request object and store this xml text in a DB using CLOB.
Store the requests in separate tables in the DB.
In my opinion I would go for option 2 (storing the serialized objects' xml text in the DB). I would do this because it would be so much easier to read from 1 column and then deserialize the objects to do some processing on them. I am using c# and asp .net MVC to write this application. I am fairly new to software development and would appreciate any help I can get.
Short answer: If option 2 fits your needs well, use it. There's nothing wrong with storing your data in the database.
The answer for this really depends on the details. What kind of data are storing? How do you need to query it? How often will you need to query it?
Generally, I would say it's not a good idea to do both 1 and 2. The problem with option 2 is that you it will be much harder to query for specific fields. If you're going to do a LIKE query and have it search a really long string, it's going to be an expensive operation and you'll likely run into perf issues later on.
If you really want to stay away from having to write code to read multiple columns to load your data, look into using an ORM like Linq to SQL. That will help load database tables into objects for you.
I have designed a number of systems where storing 'some' object as serialized xml in the db has proven the better choice. I also learned lessons where storing objects in the db as xml ended up causing more headaches down the road. So I came up with some questions that you have to answer yes to in order to be comfortable in doing:
Does the object need to be portable?
Is the data in the object encapsulated i.e. not part of something else, and not made up of something else.
In the future can number 2 change?
In SQL you can always create a table view using XQuery, but I would only recommend you do this if a) its too late to change your mind b) you don't have that many objects to manage.
Serializing and storing objects in XML has some real benefits, especially for extensibilty and agile development.
If the number of this kind of objects is large and the size of it isn't very large. I think that using the database is a good idea.
Whether store it in a separate table or store it in the original table depends on how would you use this CLOB data with the original table.
Go with option 2 if you will always need the CLOB data when you access the original table.
Otherwise go with option 3 to improve performance.
You need to also think about security and n-tier architecture. Storing serialized data in a database means your data will be on another server, ideal if the data needs to be secure, but will alos give you network latency, whereas storing the data in the filesystem will give you quicker IO access, but very limited searching ability.
I have a situiation like this and I use the database. It also gets backed up properly with the rest of the related data.

synch enumeration with static data from a database

I have an enumeration of delivery status codes. And when I save delivery data to the database they are stored with a foreign key to a table containing the same data (i.e. the same delivery codes)
What is the best strategy for keeping an enumeration in synch with data in a database?
Do you just remember to add to the enumeration when a new code is added to the database?
Or load the data into a dictionary when the application starts? And use the dictionary instead of an enumeration? Though this means that I don't have a strongly typed representation of the data - which I definitely want.
Or something else?
The data is not very volatile but new codes do get added once every blue moon
Would appreciate any suggestions.
thanks
I use T4 templates in Visual Studio 2008. This way, I can force code generation during build and it generates Enums for each table that I want.
A good starting point would be Hilton Giesenow's sample, which I believe answers exactly your question.
A more interesting approach, from a different angle, would be to use SQL views to emulate enums in the database. This way you synchronize the database with your C# (or others) code. You can find a great post about it on Oleg Sych's blog, here.
When I use a enumeration in the code, I usually store the formatted name as a varchar in the database rather than keep a table of the enumeration values in the database. I realize that this is not as normalized as one might like, but I believe it is better than trying to keep the database and my enumeration synched. All that is needed is to format on insert/update and parse on select to reconstitute the value back into the enumeration.
I only do this when I believe that the enumeration is going to be fixed -- although, I too have made infrequent updates. If I believe that it is likely that the data will be regularly updated, I won't use an enumeration and will have a separate table in the database with foreign key references.

Best (free) way to store data? How about updates to the file system?

I have an idea for how to solve this problem, but I wanted to know if there's something easier and more extensible to my problem.
The program I'm working on has two basic forms of data: images, and the information associated with those images. The information associated with the images has been previously stored in a JET database of extreme simplicity (four tables) which turned out to be both slow and incomplete in the stored fields. We're moving to a new implementation of data storage. Given the simplicity of the data structures involved, I was thinking that a database was overkill.
Each image will have information of it's own (capture parameters), will be part of a group of images which are interrelated (taken in the same thirty minute period, say), and then part of a larger group altogether (taken of the same person). Right now, I'm storing people in a dictionary with a unique identifier. Each person then has a List of the different groups of pictures, and each picture group has a List of pictures. All of these classes are serializable, and I'm just serializing and deserializing the dictionary. Fairly straightforward stuff. Images are stored separately, so that the dictionary doesn't become astronomical in size.
The problem is: what happens when I need to add new information fields? Is there an easy way to setup these data structures to account for potential future revisions? In the past, the way I'd handle this in C was to create a serializable struct with lots of empty bytes (at least a k) for future extensibility, with one of the bytes in the struct indicating the version. Then, when the program read the struct, it would know which deserialization to use based on a massive switch statement (and old versions could read new data, because extraneous data would just go into fields which are ignored).
Does such a scheme exist in C#? Like, if I have a class that's a group of String and Int objects, and then I add another String object to the struct, how can I deserialize an object from disk, and then add the string to it? Do I need to resign myself to having multiple versions of the data classes, and a factory which takes a deserialization stream and handles deserialization based on some version information stored in a base class? Or is a class like Dictionary ideal for storing this kind of information, as it will deserialize all the fields on disk automatically, and if there are new fields added in, I can just catch exceptions and substitute in blank Strings and Ints for those values?
If I go with the dictionary approach, is there a speed hit associated with file read/writes as well as parameter retrieval times? I figure that if there's just fields in a class, then field retrieval is instant, but in a dictionary, there's some small overhead associated with that class.
Thanks!
Sqlite is what you want. It's a fast, embeddable, single-file database that has bindings to most languages.
With regards to extensibility, you can store your models with default attributes, and then have a separate table for attribute extensions for future changes.
A year or two down the road, if the code is still in use, you'll be happy that 1)Other developers won't have to learn a customized code structure to maintain the code, 2) You can export, view, modify the data with standard database tools (there's an ODBC driver for sqlite files and various query tools), and 3) you'll be able to scale up to a database with minimal code changes.
Just a wee word of warning, SQLLite, Protocol Buffers, mmap et al...all very good but you should prototype and test each implementation and make sure that your not going to hit the same perf issues or different bottlenecks.
Simplicity may be just to upsize to SQL (Express) (you'll may be surprised at the perf gain) and fix whatever's missing from the present database design. Then if perf is still an issue start investigating these other technologies.
My brain is fried at the moment, so I'm not sure I can advise for or against a database, but if you're looking for version-agnostic serialization, you'd be a fool to not at least check into Protocol Buffers.
Here's a quick list of implementations I know about for C#/.NET:
protobuf-net
Proto#
jskeet's dotnet-protobufs
There's a database schema, for which I can't remember the name, that can handle this sort of situation. You basically have two tables. One table stores the variable name, and the other stores the variable value. If you want to group the variables, then add a third table that will have a one to many relationship with the variable name table. This setup has the advantage of letting you keep adding different variables without having to keep changing your database schema. Saved my bacon quite a few times when dealing with departments that change their mind frequently (like Marketing).
The only drawback is that the variable value table will need to store the actual value as a string column (varchar or nvarchar actually). Then you have to deal with the hassle of converting the values back to their native representations. I currently maintain something like this. The variable table currently has around 800 million rows. It's still fairly fast, as I can still retrieve certain variations of values in under one second.
I'm no C# programmer but I like the mmap() call and saw there is a project doing such a thing for C#.
See Mmap
Structured files are very performing if tailored for a specific application but are difficult to manage and an hardly reusable code resource. A better solution is a virtual memory-like implementation.
Up to 4 gigabyte of information can be managed.
Space can be optimized to real data size.
All the data can be viewed as a single array and accessed with read/write operations.
No needing to structure to store but just use and store.
Can be cached.
Is highly reusable.
So go with sqllite for the following reasons:
1. You don't need to read/write the entire database from disk every time
2. Much easier to add to even if you don't leave enough placeholders at the beginning
3. Easier to search based on anything you want
4. easier to change data in ways beyond the application was designed
Problems with Dictionary approach
1. Unless you made a smart dictionary you need to read/write the entire database every time (unless you carefully design the data structure it will be very hard to maintain backwards compatibility)
----- a) if you did not leave enough place holders bye bye
2. It appears as if you'd have to linear search through all the photos in order to search on one of the Capture Attributes
3. Can a picture be in more than one group? Can a picture be under more than one person? Can two people be in the same group? With dictionaries these things can get hairy....
With a database table, if you get a new attribute you can just say Alter Table Picture Add Attribute DataType. Then as long as you don't make a rule saying the attribute has to have a value, you can still load and save older versions. At the same time the newer versions can use the new attributes.
Also you don't need to save the picture in the database. You could just store the path to the picture in the database. Then when the app needs the picture, just load it from a disk file. This keeps the database size smaller. Also the extra seek time to get the disk file will most likely be insignificant compared to the time to load the image.
Probably your table should be
Picture(PictureID, GroupID?, File Path, Capture Parameter 1, Capture Parameter 2, etc..)
If you want more flexibility you could make a table
CaptureParameter(PictureID, ParameterName, ParameterValue) ... I would advise against this because it is a lot less efficient than just putting them in one table (not to mention the queries to retrieve/search the Capture Parameters would be more complicated).
Person(PersonID, Any Person Attributes like Name/Etc.)
Group(GroupID, Group Name, PersonID?)
PersonGroup?(PersonID, GroupID)
PictureGroup?(GroupID, PictureID)

Categories

Resources