I am retrieving five columns through an SQL query. Among the columns retrieved, I have a column RecordID which should act as a key to a dictionary.
I am referring to the solution posted: C# Multi-Value Dictionary at StackOverflow, but I am not able to use it effectively depending upon my situation.
I want to store all the rows of my query but the RecordID column should also act as a key to the dictionary element. I want something like:
Dictionary<RecordID, Entire columns of the current row for this RecordID>
An alternative I think is to use an array, something like:
Dictionary<key,string[]>
But I want to use any super-fast way.
Use a Lookup - you haven't said much about the data, but you may need something as simple as:
var lookup = list.ToLookup(x => x.RecordID);
There are, of course, overloads for ToLookup which allow you to do other things.
Oh goodness sake, you need a dedicated class to hold the values. Since the columns are from a db table the number of fields is fixed. Hence you can have a class with fixed fields rather than a dynamically growing collection. And lastly use a KeyedCollection<TKey, TItem> structure to hold a collection of records. In a KeyedCollection<TKey, TItem> the TKey part is embedded in the TItem, in your case its RecordId.
class Record
{
public int Id { get; set; }
//rest of the fields
}
Your structure should look like KeyedCollection<int, Record>. It preserves insertion order as well. You can query the collection using RecordId via indexer, just like a dictionary.
Related
I came to a conclusion that it is impossible to properly implement GetHashCode() for an NHibernate entity with an identity column. The only working solution I found is to return a constant. See below for explanation.
This, obviously, is terrible: all dictionary searches effectively become linear. Am I wrong? Is there a workaround I missed?
Explanation
Let's suppose we have an Order entity that refers to one or more Product entities like this:
class Product
{
public virtual int Id { get; set; } // auto; assigned by the database upon insertion
public virtual string Name { get; set; }
public virtual Order Order { get; set; } // foreign key into the Orders table
}
"Id" is what is called an IDENTITY column in SQL Server terms: an integer key that is automatically generated by the database when the record is inserted.
Now, what options do I have for implementing Product.GetHashCode()? I can base it on
Id value
Name value
Identity of the product object (default behavior)
Each of these ideas does not work. If I base my hash code on Id, it will change when the object is inserted into a database. The following was experimentally shown to break, at least in the presence of NHibernate.SetForNet4:
/* add product to order */
var product = new Product { Name = "Sushi" }; // Id is zero
order.Products.Add(product); // GetHashCode() is calculated based on Id of zero
session.SaveOrUpdate(order);
// product.Id is now changed to an automatically generated value from DB
// product.GetHashCode() value changes accordingly
// order.Products collection does not like it; it assumes GetHashCode() does not change
bool isAdded = order.Products.Contains(product);
// isAdded is false;
// the collection is looking up the product by its new hash code and not finding it
Basing GetHashCode() on the object identity (i.e. leaving Product with default implementation) does not work well either, it was covered on StackOverflow before. Basing GetHashCode() on Name is obviously not a good idea if Name is mutable.
So, what is left? The only thing that worked for me was
class Product
{
...
override public GetHashCode() { return 42; }
}
Thanks for reading through this long quesiton.
Do you have any ideas on how to make it better?
PS. Please keep in mind that this is an NHibernate question, not collections question. The collection type and the order of operations are not arbitrary. They are tied to the way NHibernate works. For instance, I cannot simply make Order.Products to be something like IList. It will have important implications such as requiring an index/order column, etc.
I would base the hashcode (and equality, obviously) on the Id, that's the right thing to do. Your problem stems from the fact that you modify Id while the object is in the Dictionary. Objects should be immutable in terms of hashcode and equality while they are inside a dictionary or hashset.
You have two options -
Don't populate dictionaries or hashsets before storing items in DB
Before saving an object to the DB, remove it from the dictionaries. Save it to the DB and then add it again to the dictionary.
Update
The problem can also be solved by using others mappings
You can use a bag mapping - it will be mapped to an IList and should work OK with you. No need to use HashSets or Dictionaries.
If the DB schema is under your control, you may wish to consider adding an index column and making the relation ordered. This will again be mapped to an IList but will have a List mapping.
There are differences in performance, depending on your mappings and scenarios (see http://nhibernate.info/doc/nh/en/#performance-collections-mostefficientupdate)
The title is awful, I know, so here's the long version:
I need to store variable data in a database column -- mostly key-value pairs, but both the number of items and the names of those items are completely unknown at run-time. My initial thinking is to "pickle" the data (a dictionary) into something like a JSON string, which can be stored in the database. When I retrieve the item, I would convert ("unpickle") the JSON string into a normal C# dictionary. Obviously, I don't want anyone directly interacting with the JSON string, though, so the actual property corresponding to the database column should be private, and I would have a public getter and setter that would not be mapped.
private string Data { get; set; }
public Dictionary<string, object> DataDictionary
{
get
{
return Deserialize(Data);
}
set
{
Data = Serialize(value);
}
}
The problem of course is that EF will refuse to map the private Data property and actually want to map the public DataDictionary property, which shouldn't be mapped. There's ways around this, I believe, but the complexity that this starts generating makes me think I'm going down a rabbit hole I shouldn't. Is my thinking reasonable here, or should I go a different direction?
I suppose I could simply create a one-to-many relationship with a basic table that just consisted of key and value columns, but that feels hackneyed. However, perhaps, that actually is a better route to go given the inherent limitations of EF?
Have you tried using Complex Types? You should be able to achieve your goal by creating a complex type of string on the EF Model.
Start by adding a complex type to the Model. On the complex type, add a scalar property of type string that will hold the data.
You can then create a property of this complex type on the entity that will hold the data.
The code generator should add a partial class that provides access to the properties for the complex type. Create a new partial class of the complex type and add in the serialisation/de-serialisation code as a property as in your question. You can then use this property to access the data.
The complex type in this example is essentially acting as a wrapper for a value that allows you to store the data value to storage.
I have a huge in-memory set (like ~100K records) of plain CLR objects of defined type. This Type has public property int Id {get; set;}. What is the best .NET structure to contain this huge set of data in to provide quick access to any item by its Id? More specifically, this set of data is supposed to be operated inside a loop to find an item by Id, so the search should be done as fast as possible. The search might look like this:
// Find by id
var entity = entities.First(e => e.Id == id)
IEnumerable based structures like collections and lists are going to go through every element of the data until seeking element is found. What are alternative ways? I believe there should be a way to make a search of sorted arrays by Id like an index search in databases.
Thanks
Results of testing: FYI: Dictionary is not just fast, it's just incomparable. My small test shown performance gain from around 3000+ ms (calling First() on IEnumerable) to 0 ([index] on Dictionary)!
I would go with a Dictionary<TKey, TValue>:
var index = new System.Collections.Generic.Dictionary<int, T>();
where T is the type of objects that you want to access.
This is implemented as a hash table, ie. looking up an item is done by computing the key's hash value (which is usually a very quick operation) and using that hash value as an index into a table. It's perhaps a bit of a over-simplification, but with a dictionary, it almost doesn't matter how many entries you've stored in your dictionary — access time should stay approximately constant.
To add an entry, do index.Add(entity.Id, entity);
To check whether an item is in the collection, index.ContainsKey(id).
To retrieve an item by ID, index[id].
Dictionary<TKey, TValue>, where TKey is int and TValue is YourEntity.
Example
var dictionary = new Dictionary<TKey, TValue>();
dictionary.Add(obj1.Id, obj1);
// continue
Or if you have a collection of objects, you can create the dictionary using a query
var dictionary = list.ToDictionary(obj => obj.Id, obj => obj);
Note: key values must be unique. If you have a non-unique collection, filter duplicates first (perhaps by calling Distinct() before creating the dictionary. Alternately, if you're looping over the collection to create the dictionary manually, check the ContainsKey method before attempting an Add operation.
Generally in-memory seek is best done with the Dictionary:
System.Collections.Generic.Dictionary<TKey, TValue>
Optionally when your data set no longer fits in memory, one would use disk-based btree.
Based on the information given, a HashTable is probably going to be the fastest. The Dictionary<T> class is going to provide you the best trade off for ease of use vs. performance. If you truly need maximum performance I would try all of the following classes. Based on memory usage, insert speed, search speed, they all perform differently:
ListDictionary
HashTable
Dictionary
SortedDictionary
ConcurrentDictionary
in addition to performance you may be concerned with multithreaded access. These two collections provide thread saftey:
HashTable (multiple reads, only one thread allowed to write)
ConcurrentDictionary
It depends on your data. If there is a ceiling to the amount of objects you have and there aren't too many missing objects (meaning you can't have more than X objects and you usually have close to X objects) then a regular array is the fastest.
T[] itemList = new T[MAX_ITEMS];
However, if either of those two conditions don't hold, an IDictionary is probably the best option.
Dictionary<int, T> itemList = new Dictionary<int, T>();
I'm thinking of building a ecommerce application with an extensible data model using NHibernate and Fluent NHibernate. By having an extensible data model, I have the ability to define a Product entity, and allow a user in the application to extend it with new fields/properties with different data types including custom data types.
Example:
Product can have an addition fields like:
Size - int
Color - string
Price - decimal
Collection of ColoredImage - name, image (e.g. "Red", red.jpg (binary file))
An additional requirement is to be able to filter the products by these additional/extended fields. How should I implement this?
Thanks in advance.
I think this link describes kind of what you want...
http://ayende.com/Blog/archive/2009/04/11/nhibernate-mapping-ltdynamic-componentgt.aspx
More info on dynamic-component:
http://www.mattfreeman.co.uk/2009/01/nhibernate-mapping-with-dynamic-component/
http://bartreyserhove.blogspot.com/2008/02/dynamic-domain-mode-using-nhibernate.html
The idea behind dynamic-component is that you can build your data model by not having a one to one mapping of databse columns with properties. Instead you have only a dictionary property that can contain data from as many properties as you like. This way when you fetch the entity, the dictionary gets the data of all columns configured to belong in there. You can extend the database table's schema to include more columns and that will be reflected to the databse model if you update the mapping file accordingly (manually or though code at application start).
To be honest I do not know you can query such entity using the "attributes" property but if I had to guess I would do an IN statement to it.
One of the options is EAV model (Entity-Attribute-Value).
This model is good to apply if you have a single class in your domain, which table representation would result in a wide table (large number of columns, many null values)
It's originally designed for medical domain, where objects may have thousands of columns (sympthoms).
Basically you have
Entity (Id) (for example your Product table)
Attribute(Id, ColumnName)
Value(EntityId, AttributeId, value)
You can have some additional metadata tables.
Value should better be multiple tables, one for a type.
For example:
ShortStringValue(EntityId, AttributeId, Value nvarchar(50));
LongStringValue(EntityId, AttributeId, Value nvarchar(2048));
MemoValue(EntityId, AttributeId, Value nvarchar(max));
IntValue(EntityId, AttributeId, Value int);
or even a comple type:
ColorComponentsValue(EntityId, AttributeId, R int, G int, B int );
One of the things from my experience is that you should not have EAV for everything. Just have EAV for a single class, Product for example.
If you have to use extensibility for different base classes, let it be a separate set of EAV tables.
Onother thing is that you have to invent a smart materialization strategy for your objects.
Do not pivot these values to a wide row set, pivot just a small number of collumns for your query criteria needs, then return a narrow collection of Value rows for each of the selected objects. Otherwise pivoting would involve massive join.
There are some points to consider:
. Each value takes storage space for foreign keys
. For example row-level locking will behave different for such queries, which may result in performance degradation.
. May result in larger index sizes.
Actually in a shallow hellow world test my EAV solution outperformed it's static counterpart on a 20 column table in a query with 4 columns involved in criteria.
Possible option would be to store all extra fields in an XML structure and use XPath/XQuery to retrieve them from the database.
Each extensible entity in your application will have an XML field, like ExtendedData, which will contain all extra properties.
Another option is to use Non-relationnal Databases which are typically suited for this kind of things.
NOSQL databases(couchDB, mongoDB, cassandre...) let you define dynamically your propretyfields, you could add fields to your product class whenever you want.
I'm searching for similar thing and just found N2 CMS (http://n2cms.com) which implements domain extensibility in quite usable way. It also supports querying over extension fields which is important. The only downside I find out is that it's implemented using HQL so it would take some time to reimplement it to be able to query using QueryOver/Linq, but the main idea and mappings are there. Take a look on ContentItem, DetailCollection, ContentDetail classes, their mappings and QueryBuilder/DetailCriteria.
I'm using NHibernate as a persistency layer and I have many places in my code where I need to retrieve all the columns of a specific table (to show in a grid for example) but i also need a fast way to get specific item from this collection.
The ICriteria API let me get the query result either as a unique value of T or a IList of T.
I wonder if there is a way to make NHibernate give me those objects as an IDictionary where the key in the object's Id and the value is the object itself. doing it myself will make me iterate all over the original list which is not very scalable.
Thank you.
If you are working with .NET 3.5, You could use the Enumerable() method from IQuery, then use the IEnumerable<T>.ToDictionary() extension method :
var dictionary = query.Enumerable().ToDictionary(r => r.Id);
This way, the list would not be iterated twice over.
You mention using ICriteria, but it does not provide a way to lazily enumerate over items, whereas IQuery does.
However, if the number of items return by your query is too big, you might want to consider querying the database with the key you'd have used against the IDictionary instance.