I am using SQL Server 2008 with NHibernate for an application. In the application I need to create multiple object of a Info class and use it in multiple places. I also need to store that object in the databse.
There are multiple types of Info class.
To store these objects of Info class I have two options
Store the Serialized obejct of the class
Store the details of that class as string.
What is the advantage of storing the serialized object in the database over storing its values as multiple strings?
-Ram
If you store the serialized object into the db:
You don't have to rebuild it from the partial data (ie. write your own deserializer if the behaviour is default, create objects from the partial data)
You must create the object "manually"
May be faster in some cases
Stores redundant infrastructure data
You may choose multiple formats (XML, custom format, blobs)
You have fully prepared serialized objects that are ready to be processed anywhere (sent over the network, stored in a disk)
I you store the multiple strings, you:
Need to build the objects "manually"
May use the database data in different scenarios (from .net, to build another structures such as cubes)
The data is much more compact
May store the data in a relational normalized form which is (almost) always a good practice
Query the data
And the overall more versatile usage of the data.
I would definitely go for the relational normalized form to store the strings and then build the corresponding class builder in .net.
I would definitely store records and fields and not just a chunk of bytes ( either binary or text or xml ) representing the current status of your object.
it depends of course on the complexity of your business entities ( in your case the Info class ), but I would really avoid saving the serialized version of it in 1 column.
if you explode all properties into fields you can query better for records having certain values and you can handle new columns and releases much easier.
The most common issue with storing an object as a serialized stream is that it is difficult to search the properties of the object once it is stored, whereas if each property of the object is explicitly stored in its own strongly typed column, it can be indexed for searches, and you get the integrity benefit of strongly typed storage.
However, At least if the object is XmlSerialized into an XML column in SQL, you can use technologies such as xquery and OPENXML to ease your searches.
Serialized obejct (XML)
If you store the class as XML. You will be able to search the contect of the class by using Xquery. This way is eay way if you want to search(with or without conditions). More over, you can create index over XML column. The XML index will make you application faster.
AS string
If you don have bussines login to look at the content of class.
Related
We have a scenario where we have multiple sources of data coming in from various external systems through API calls, SQL tables and physical files, and we now have to map it against a number of transaction templates. I want to build an integration adapter and UI where I can choose any entity data class and map it's fields to a class or action that will be used to create a transaction in our financial system.
I want to have an object type or class that can be modified dynamically, setup links between these objects and possibly create a set of rules that defines the interaction between these objects. I have seen some versions of this types of software that uses a drag and drop type of UI interface to do the mappings, so that will be the ideal end goal.
I'm coming from a C# .Net background, so I need some advise or tips on where to start and what to look at.
I am currently doing something similar. I wrote some code to turn data from our legacy system into JSON objects written out to flat files (1 file per data record in a table), and then wrote some code to cleanup that data and push it into Acumatica via the REST API.
I like flat file json objects because they can easily be hashed, and the hashes used to compare them to new data that comes in. Only data where the hash has changed needs to be merged or overwritten and then pushed into the target system. The file names are usually primary key values from whatever table you're working with. Our legacy system has a hierarchical (non-tabular/SQL-like data structure), so my mileage may be greater than doing with this with a well-normalized SQL database.
There are also products like Alteryx that are built for doing data pipelines the way you have proposed.
I would caution to be practical in the building of these types of things. For us, for example, we have a limited set of data that needs to come over, so we don't need the perfect abstraction for every data type. We inevitably had to do cleanup on legacy/3rd party data as well, and those problems aren't always so easy to abstract. I had previously built a system using closures for function passing in order to write custom cleanup routines for any abstract data problem I might encounter (sort of sounds like what you're talking about), but realized in the end that just writing simpler code that deals with specific data problems was cleaner and simpler to maintain....in the end, there is probably only a limited amount of data that needs to be synched.
Serialization -> Convert an object to a binary representation that can be then written to Disk or write on a file..
Above is the basic definition of serialization that I know. But what does this really mean? I have a class in my application and I use this to get data from user and store it in Database. Does this mean I am using serializion here? Even storing the data is more like saving the state of the object, I can get the data and form the same object once again.
Can any one light me up with whats a real serialization? If serialization is not used what will be the result? Whats the difference between saving the data in a file and doing the serialization (to save the data) in a file.
I doubt storing data in a database should be considered serialization. Even when you're storing the data coming from your object-oriented programming layer, actually you're translating objects into the relational world and viceversa. This is called data-mapping.
Perhaps you may argue performing an INSERT is storing data in an interoperable format. Not necessarily, since SQL is a domain-specific language to manage relational data, and you don't know how the data is actually stored either in memory or disk. SQL itself isn't a serialization format.
Since most databases are on disk, you can consider serialization the process of persisting database registers to disk in order to retrieve or alter them afterwards, and use RAM to optimize reads and writes without carrying the entire database to memory.
In the other hand, serialization can be done in binary or non-binary formats. For example, you can serialize an object into JSON, and JSON isn't a binary format. Also, XML it has been used as serialization format for years and it's not binary.
A good definition to serialization may be: consider serialization when some in-memory object is turned into an interoperable representation that can be stored in disk or transmitted over the wire to easily get back it as in-memory object in any platform and language being capable of understanding the serialization format.
Examples:
A REST API sending a list of users as data-transfer objects serialized to JSON.
An application lets user visually edit its configuration and settings. When UI needs to show current values, it will deserialize the configuration back to objects to bind them to the UI, and once the user presses Save, configuration gets serialized again to disk.
An application provides its own backup. The backup can be the entire object graph serialized as JSON.
I have an object called Data serialized as varbinary(MAX). Data object contains property named Source. Is there a way to make something similar:
select * from content_table where Data.Source == 'Feed'
I know that it is possible when XML serialization is used (XQuery). But serialization type cannot be changed in this case.
If you have used BinaryFormatter, then no, not really - at least, not without deserilizing the entire object model, which is usually not possible at the database. It is an undocumented format, with very little provision for ad-hoc query.
Note: BinaryFormatter is not (IMO) a good choice for anything that relates to storage of items; I fully expect that this will bite you at some point (i.e. unable to reliably deserialize the data you have stored). Pain points:
tightly tied to type names; can break as you move code around
tightly tied to field names; can break as you refactor your classes (even just creating to an automatically implemented property is a breaking change)
prone to including a bigger graph than you expect, in particular via events
It is of course also platform-specific, and potentially framework-specific.
In all seriousness, I've lost count of the number of "I can't deserialize my data" questions I've fielded over the years...
There are alternative binary serializers that do allow some (limited) ability to inspect the data via a reader (without requiring full deserialization), and which do not become tied to the type metadata (instead, being contract-based, allowing deserialization into any suitable type model - not just that specific type/version.
However, I genuinely doubt that such work work with anything approaching efficiency in a WHERE clause etc; you would need a SQL/CLR method etc. IMO, a better approach here is simply to store the required filter columns as data in other columns, allowing you to add indexing etc. On the rare occasions that I have used the xml type, this is the same as I have done there (with the small caveat that you can use "lifted" computed+stored+indexed columns from the underlying xml data, which wouldn't be possible here - the extra columns would have to be explicit).
You could deserialise the data, using a SQL CLR function. But I suspect it wont be fast. It all depends on how the serialisation was done. If the library is available, then a simple CLR function, shuld be able to query the data quite easily.
I am currently working on a web application that requires certain requests by users to be persisted. I have three choices:
Serialize each request object and store it as an xml text file.
Serialize the request object and store this xml text in a DB using CLOB.
Store the requests in separate tables in the DB.
In my opinion I would go for option 2 (storing the serialized objects' xml text in the DB). I would do this because it would be so much easier to read from 1 column and then deserialize the objects to do some processing on them. I am using c# and asp .net MVC to write this application. I am fairly new to software development and would appreciate any help I can get.
Short answer: If option 2 fits your needs well, use it. There's nothing wrong with storing your data in the database.
The answer for this really depends on the details. What kind of data are storing? How do you need to query it? How often will you need to query it?
Generally, I would say it's not a good idea to do both 1 and 2. The problem with option 2 is that you it will be much harder to query for specific fields. If you're going to do a LIKE query and have it search a really long string, it's going to be an expensive operation and you'll likely run into perf issues later on.
If you really want to stay away from having to write code to read multiple columns to load your data, look into using an ORM like Linq to SQL. That will help load database tables into objects for you.
I have designed a number of systems where storing 'some' object as serialized xml in the db has proven the better choice. I also learned lessons where storing objects in the db as xml ended up causing more headaches down the road. So I came up with some questions that you have to answer yes to in order to be comfortable in doing:
Does the object need to be portable?
Is the data in the object encapsulated i.e. not part of something else, and not made up of something else.
In the future can number 2 change?
In SQL you can always create a table view using XQuery, but I would only recommend you do this if a) its too late to change your mind b) you don't have that many objects to manage.
Serializing and storing objects in XML has some real benefits, especially for extensibilty and agile development.
If the number of this kind of objects is large and the size of it isn't very large. I think that using the database is a good idea.
Whether store it in a separate table or store it in the original table depends on how would you use this CLOB data with the original table.
Go with option 2 if you will always need the CLOB data when you access the original table.
Otherwise go with option 3 to improve performance.
You need to also think about security and n-tier architecture. Storing serialized data in a database means your data will be on another server, ideal if the data needs to be secure, but will alos give you network latency, whereas storing the data in the filesystem will give you quicker IO access, but very limited searching ability.
I have a situiation like this and I use the database. It also gets backed up properly with the rest of the related data.
I have an idea for how to solve this problem, but I wanted to know if there's something easier and more extensible to my problem.
The program I'm working on has two basic forms of data: images, and the information associated with those images. The information associated with the images has been previously stored in a JET database of extreme simplicity (four tables) which turned out to be both slow and incomplete in the stored fields. We're moving to a new implementation of data storage. Given the simplicity of the data structures involved, I was thinking that a database was overkill.
Each image will have information of it's own (capture parameters), will be part of a group of images which are interrelated (taken in the same thirty minute period, say), and then part of a larger group altogether (taken of the same person). Right now, I'm storing people in a dictionary with a unique identifier. Each person then has a List of the different groups of pictures, and each picture group has a List of pictures. All of these classes are serializable, and I'm just serializing and deserializing the dictionary. Fairly straightforward stuff. Images are stored separately, so that the dictionary doesn't become astronomical in size.
The problem is: what happens when I need to add new information fields? Is there an easy way to setup these data structures to account for potential future revisions? In the past, the way I'd handle this in C was to create a serializable struct with lots of empty bytes (at least a k) for future extensibility, with one of the bytes in the struct indicating the version. Then, when the program read the struct, it would know which deserialization to use based on a massive switch statement (and old versions could read new data, because extraneous data would just go into fields which are ignored).
Does such a scheme exist in C#? Like, if I have a class that's a group of String and Int objects, and then I add another String object to the struct, how can I deserialize an object from disk, and then add the string to it? Do I need to resign myself to having multiple versions of the data classes, and a factory which takes a deserialization stream and handles deserialization based on some version information stored in a base class? Or is a class like Dictionary ideal for storing this kind of information, as it will deserialize all the fields on disk automatically, and if there are new fields added in, I can just catch exceptions and substitute in blank Strings and Ints for those values?
If I go with the dictionary approach, is there a speed hit associated with file read/writes as well as parameter retrieval times? I figure that if there's just fields in a class, then field retrieval is instant, but in a dictionary, there's some small overhead associated with that class.
Thanks!
Sqlite is what you want. It's a fast, embeddable, single-file database that has bindings to most languages.
With regards to extensibility, you can store your models with default attributes, and then have a separate table for attribute extensions for future changes.
A year or two down the road, if the code is still in use, you'll be happy that 1)Other developers won't have to learn a customized code structure to maintain the code, 2) You can export, view, modify the data with standard database tools (there's an ODBC driver for sqlite files and various query tools), and 3) you'll be able to scale up to a database with minimal code changes.
Just a wee word of warning, SQLLite, Protocol Buffers, mmap et al...all very good but you should prototype and test each implementation and make sure that your not going to hit the same perf issues or different bottlenecks.
Simplicity may be just to upsize to SQL (Express) (you'll may be surprised at the perf gain) and fix whatever's missing from the present database design. Then if perf is still an issue start investigating these other technologies.
My brain is fried at the moment, so I'm not sure I can advise for or against a database, but if you're looking for version-agnostic serialization, you'd be a fool to not at least check into Protocol Buffers.
Here's a quick list of implementations I know about for C#/.NET:
protobuf-net
Proto#
jskeet's dotnet-protobufs
There's a database schema, for which I can't remember the name, that can handle this sort of situation. You basically have two tables. One table stores the variable name, and the other stores the variable value. If you want to group the variables, then add a third table that will have a one to many relationship with the variable name table. This setup has the advantage of letting you keep adding different variables without having to keep changing your database schema. Saved my bacon quite a few times when dealing with departments that change their mind frequently (like Marketing).
The only drawback is that the variable value table will need to store the actual value as a string column (varchar or nvarchar actually). Then you have to deal with the hassle of converting the values back to their native representations. I currently maintain something like this. The variable table currently has around 800 million rows. It's still fairly fast, as I can still retrieve certain variations of values in under one second.
I'm no C# programmer but I like the mmap() call and saw there is a project doing such a thing for C#.
See Mmap
Structured files are very performing if tailored for a specific application but are difficult to manage and an hardly reusable code resource. A better solution is a virtual memory-like implementation.
Up to 4 gigabyte of information can be managed.
Space can be optimized to real data size.
All the data can be viewed as a single array and accessed with read/write operations.
No needing to structure to store but just use and store.
Can be cached.
Is highly reusable.
So go with sqllite for the following reasons:
1. You don't need to read/write the entire database from disk every time
2. Much easier to add to even if you don't leave enough placeholders at the beginning
3. Easier to search based on anything you want
4. easier to change data in ways beyond the application was designed
Problems with Dictionary approach
1. Unless you made a smart dictionary you need to read/write the entire database every time (unless you carefully design the data structure it will be very hard to maintain backwards compatibility)
----- a) if you did not leave enough place holders bye bye
2. It appears as if you'd have to linear search through all the photos in order to search on one of the Capture Attributes
3. Can a picture be in more than one group? Can a picture be under more than one person? Can two people be in the same group? With dictionaries these things can get hairy....
With a database table, if you get a new attribute you can just say Alter Table Picture Add Attribute DataType. Then as long as you don't make a rule saying the attribute has to have a value, you can still load and save older versions. At the same time the newer versions can use the new attributes.
Also you don't need to save the picture in the database. You could just store the path to the picture in the database. Then when the app needs the picture, just load it from a disk file. This keeps the database size smaller. Also the extra seek time to get the disk file will most likely be insignificant compared to the time to load the image.
Probably your table should be
Picture(PictureID, GroupID?, File Path, Capture Parameter 1, Capture Parameter 2, etc..)
If you want more flexibility you could make a table
CaptureParameter(PictureID, ParameterName, ParameterValue) ... I would advise against this because it is a lot less efficient than just putting them in one table (not to mention the queries to retrieve/search the Capture Parameters would be more complicated).
Person(PersonID, Any Person Attributes like Name/Etc.)
Group(GroupID, Group Name, PersonID?)
PersonGroup?(PersonID, GroupID)
PictureGroup?(GroupID, PictureID)