i have some tabular data in c# having around 100 K(100000) records which i have to store in memory. What is the best way to store this data other than in table format keeping in mind i have filter out result which is based on some conditions ( like dt.select("field1=1 and...) ,sorting resultset just like SQL table.
kindly suggest any other way to retrieve data.Dictionary is another way but based on field conditions how to retrieve data if using Dictionary or any other collections.
Assuming you go for speed an low on memory consumption, try this:
Create a model class which contains a property for each
column in the source table. This is your entity.
Read from the source table (if it comes from a database, use a DataReader). Read the data record by record and for each record, create an entity. During reading each field of type string, you can optimize a little bit:
Optimize for speed: read the string and put it directly into the property of the entity.
Optimize for memory: read the string, use String.Intern on it and put in into the property.
Store all these entities in a collection. Here you have two choises:
Use a List<Entity> to store it all. You can use LINQ on the list and entities to query your collection. This is rather slow on performance but the best solution for memory.
If you know in advance which queries/criteria you are going to use, use one dictionay for a set of criteria. For example. If you have the properties "FirstName" and "LastName", make a dictionary which will store your entity as a value and a Tuple<string, string> which the values of the FirstName and LastName. Now it is extremely fast to query on these values. For sorting, use the SortedDictionary. If a key has duplicates, create a dictionary like this: Dictionary<Tuple<string, string>, List<Entity>> which will store all records with the same matching first- and last names. I know this solution requires more coding, but is pretty fast.
Of course you can keep the DataTable solution. If memory is your only concern, try to make a DataReader-wrapper which will Intern all strings. Wrap your wrapper arround the original DataReader and use it to create / fill the DataTable.
Related
I have a table which it has 2000 row. This tables "name" columns are static. But Others is dinamic. The table follow as :
Can i keep this table inside program? Which format must use? (Class,list, datatable, struct Json or etc.).
I'll search and update on table in future...
Thanks.
You can easily put your row data into a MyRow class with Id, Position, Weight and FullEmp properties and then put those row objects into a list of rows like List<MyRow>.
Performance considerations largely depend on what you want to do with your data. How often do you read it? How often do you change it? How often do you insert, append or remove rows? Which properties do you access during your search?
A list is a good starting point that you can use until you find out that the performance is unsatisfactory for your usecase.
I have a data table
and it contains some int type columns, some type double columns, some date type columns
what i am trying to do is,
i want to do double.TryParse for double column, and if there is any value with it then it will store dbnull value in corresponding rows,
same thing i will do for date, int
since my data table could have 100000 records so i don't to run loop for each row
is it possible through linq or with any method
Thank You
LINQ is not good for batch operations. You should create a stored procedure in your DB and import it in your model (If you are using EF that is import function, if using LINQ to SQL then a simple drag and drop will do it).
LINQ is no silver bullet for all problems where you need to loop over a (maybe very large) set of data. So if you want to go over each data set and change the values depending on some condition, a foreach loop is your friend.
LINQ is a query language to retrieve data and not some kind of super-fast way to alter large lists or other enumerable objects. It comes in handy if you want to get data from a given object applying some conditions or doing a GroupBy without ending up in a 20-lines unreadable mash of foreach-loops and if-statements.
It doesn't matter if you'll do it in a loop or with linq, you'll still need to iterate over the entire data table ...
there's no silver bullet that will save you from doing the checks and inserts i'm afraid
So I've been trying to better understand the difference between these two but all i can really find info on is the difference between DataSets and DataTables- a single Array can only hold one datatype, whereas from what i can tell, DataTables are basically a generic multidimensional array and it has a 1:1 relationship to the DataSource stored in memory. Is this accurate? are DataTables 'just' a generic multidimensional array or am i missing some fundamental difference?
A DataTable models a database table in memory. The type can track changes etc in order to sync with a database. The columns (dimensions) can be referenced either by index or name.
A DataSet can hold a collection of such tables and the relationships between them (referential integrity constraints).
An array doesn't do any of that.
DataTable is kind of like a multi-dimensional array in that it's an in-memory data storage of a certain "size", but there are significant additional features. For example, each "column" has name information and specific type information, there is change tracking for synchronization with the data storage, rows can store null values, etc..
A DataSet is basically an entire "set" of data (ie: multiple DataTables) held in memory.
When should I consider representing the primary-key as classes?
Should we only represent primary keys as classes when a table uses composite-key?
For example:
public class PrimaryKey
{ ... ... ...}
Then
private PrimaryKey _parentID;
public PrimaryKey ParentID
{
get { return _parentID; }
set { _parentID = value; }
}
And
public void Delete(PrimaryKey id)
{...}
When should I consider storing data as comma-separated values in a column in a DB table rather than storing them in different columns?
When should I consider representing the table-id columns as classes?
Much more difficult to answer without knowing your application architecture. If you use an ORM such as nhibernate or linq to sql, they will create classes for you automatically.
In general - if your primary key is a composite and has meaning in your domain, create a class for it.
If it is not a composite, there is no need for a class.
If it has no meaning in your domain it is difficult to justify a class (if creating one, I would probably go with a struct instead of a class, as it would be a value type). The only reason I would use one is if the key needs to be used as is in your code and the constituent parts do not normally need to be accessed separately.
When should I consider storing data as comma-separated values in a column in a DB table rather than storing them in different columns?.
Never. You should keep your tables normalized, so different columns and tables for the different data. Using comma separated is bad practice in this regards, especially considering the fairly bad text manipulation support in SQL.
When should I consider representing
the table-id columns as classes?
what do you mean?
When should I consider storing data as
comma-separated values in a column in
a DB table rather than storing them in
different columns?
According to Normalization Rules you shouldn't store multiple values in the same column.
Comma separated values in fields (or any other similar trick) is poor practice and often resorted to either as a stopgap measure (i.e. you find out that you needed a multi-value when it's too late to change the data model) or some legacy cruft.
By having multivalues "encrypted" in a field you lose all the benefits of having a RDBMS model: in particular you make finding/sorting/comparing values in the comma-separated field(s) hard if not impossible to use along with the rest of your data.
In almost all cases one should store each comma-delimited value in its own column. This enables SQL selects to filter rows by specific comma-delimited values.
E.g., if 12 comma-delimited values were available for each table row representing sales per month (Jan-Dec) then storing the numbers in 12 columns enables manipulation; such as: return all rows where August sales > $100,000.00. Had one 'stuffed' all 12 values into a single column then all rows would need to be returned and the column parsed for each row to pull out the August figure.
One example where 'stuffing' may be considered... where the values in a comma-delimited data set cannot be be related or compared with data from another row.
E.g., in a multi-choice questionnaire the answer will be one or more options. Given a question with the correct answer being OPTION A & OPTION B and a second question with a correct answer of OPTION B & OPTION C then one could consider stuffing each answer into a single column. In this example would could store "B,C" and "B & C" if one accepts there is no required case to compare answers with each other.
The only situation I may consider using a custom class as the 'primary key' is using a OR/M against a legacy database that uses composite keys.
On the first not sure what you are looking for. I create business objects as classes that map to the my data layer, which is typically a datatable containing the data.
The second question is never. There are very few situations where I would store a comma separated list instead of creating a normalized data structure.
I have a table full of id's,categories and weights that I need to reference in my program as I read in records that contain those categories. What is the most efficient method to read those from a database and put into a structure that I can reference?
The ID's (and possibly the names) would be unique
Data might look like:
ID,Category,Weight
1,Assignment,5
2,Test,10
3,Quiz,5
4,Review,3
Your best bet is to read in your table using a DataReader, and put each row into an object containing Category and Weight, then each object into a Dictionary.
If you're using a later version of .NET, you could always use Linq to just grab that data for you.
If you want to avoid a database hit to fetch static data, you can hard-code the values into a common class in your solution. A Dictionary collection would work fine here too.
The trade off of course is; 2 locations to manage for any possible future changes.