Loading data from SQL database into a list in C# - c#

I'm working on an application that displays (highlighted) the dates on which a certain event will occur.
In the database that I have to use there is a table that contains date intervals or collections of dates that have special meaning (like collection of holidays, school breaks...). When an event is defined, the user defines on which day(s) that event occurs (if it occurs periodically), but the user can also define that that particular event does or does not occur on special interval.
For instance - there can be an event that occurs every Sunday, if that days is not a holiday.
Now, I've solved the periodical part by reading data from the database for that event (determining on which day does that event occur) and then filling a list with dates that will be highlighted (for graphical representation I use MotnhCalendar and its BoldedDates array):
for (DateTime day = start; day < end;day = day.AddDays(1))
{
if (Sunday && day.DayOfWeek == DayOfWeek.Sunday)
{
MarkedDates.Add(day);
}
monthCalendar1.BoldedDates = MarkedDates.ToArray();
}
Now, for this code to work properly, I need to skip the dates that belong to a special interval (in this case if that date is a holiday). It could be easily done by adding another condition into the if clause:
!SpecialDates.Contains(day)
The problem is that I don't know how to fill the list with these special dates from the database. I can't simply "hard code" it, because these special date collections/intervals can be changed at any time. So, my question is - how can I, by using SQL and C# commands, read data from database and save it in the list.
Thanks

As it looks like your using SQL Server 2008 R2 I would recommend using an ORM tool like ADO.NET Entity Framework to access your database - best going with the latest release EF 5.
There are tons of tutorials online on how to get up & running with it - a good one being Creating Model Classes with the Entity Framework.
To give you an idea of how simple it makes it, here is an example of the minimal amount of code you would need to achieve what it is your looking to do:
using (var db = new MyDbContext())
{
var specialDates = db.SpecialDatesTable.ToList();
}

Related

Parsing and inserting bulk data. How to keep performance and do relations?

The data
I have a collection with around 300,000 vacations. Every vacation has several categories, countries, cities, activities and other subobjects. This data needs to be inserted into a MySQL / SQL Server database. I have the luxury of being able to truncate the entire database and start clean every time the parser program is run.
What I have tried
I have tried working with Entity Framework, this is also where my preference lies. To keep Entity Framework's performance up I have created a construction where 300 items are taken out of the vacations collection, parsed and inserted by Entity Framework and it's context disposed thereafter. The program finishes in a matter of minutes using this method. If I fill the context with all 300k vacations from the collection (and it's subobjects) it's a matter of hours.
int total = vacationsObjects.Count;
for (int i = 0; i < total; i += Math.Min(300, (total - i)))
{
var set = vacationsObjects.Skip(i).Take(300);
int enumerator = 0;
using (var database = InitializeContext())
{
foreach (VacationModel vacationData in set)
{
enumerator++;;
Vacations vacation = new Vacations
{
ProductId = vacationData.ExternalId,
Name = vacationData.Name,
Description = vacationData.Description,
Price = vacationData.Price,
Url = vacationData.Url,
};
foreach (string category in vacationData.Categories)
{
var existingCategory = database.Categories.Local.FirstOrDefault(c => c.CategoryName == categor);
if (existingCategory != null)
vacation.Categories.Add(existingCategory);
else
{
vacation.Categories.Add(new Category
{
CategoryName = category
});
}
}
database.Vacations.Add(vacation);
}
database.SaveChanges();
}
}
The downside (and possibly dealbreaker) with this method is figuring out the relationships. As you can see when adding a Category I check if it's already been created in the local context, and then use that. But what if it has been added in a previous set of 300? I don't want to query the database multiple times for every vacation to check whether an entity already resides within it.
Possible solution
I could keep a dictionary in memory containing the categories that have been added. I'd need to figure out how to attach these categories to the proper vacations (or vice-versa) and insert them, including their respective relations into the database.
Possible alternatives
Segregate the context and the transaction -
Purely theoretical, I do not know if I'm making any sense here. Maybe I could have EF's context keep track of all objects, and take manual control over the inserting part. I have messed around with this, trying to work with manual transaction scopes without avail.
Stored procedure -
I could write a stored procedure that handles and inserts my data. I'm not a big fan of this alternative, as I would like to keep the flexibility of switching between MySQL and SQL Server. Also, I would be in the dark as to where to begin.
Intermediary CSV file -
Instead of inserting parsed data directly into the RDMBS, I could export it into one or more CSV files and make use of importing tools such as MySQL's INFLINE.
Alternative database systems
Databases such as Azure Table Storage, MongoDB or RavenDB could be an option. However, I would prefer to stick to a traditional RDMBS due to compatibility with my skillset and tools.
I have been working on and researching this problem for a couple of weeks now. It seems like the best way of finding a solution that fits is by simply trying the different possibilities and observing the result. I was hoping that I could receive some pointers or tips from your personal experiences.
If you insert each record separately, the whole operation will take a lot of time. The bottleneck is SQL-queries between client and server. Each query takes time, so try to avoid using multiple of them. For huge amount of data it will be much better to process them locally. The best solution is to use special import tool. In MySQL you can use LOAD DATA, in MSSQL there is BULK INSERT. To import your data, you need a .css file.
To handle external keys correctly, you must populate tables manually before inserting. If destination tables are empty, you can simply create .css file with predefined primary and external keys. Otherwise you can import existing records from server, update them with your data, then export them back.
Time
Since you can afford to make only INSERTs, one suggestion is to try Entity Framework Bulk Insert extension. I have used it to save up to 200K records and it works fine. Just include in your project and write something like this:
context.BulkInsert(listOfEntities);
This should solve (or greatly improve the EF version) your problem's the time dimension
Data integrity
Keeping everything in one transaction does not sound reasonable (I expect that 300K parent records to generate at least 3M overall records), so I would try the following approach:
1) make your entities insertion using bulk insert.
2) call a stored procedure to check data integrity
If the insertion is quite long and the chance of failure is relatively big, you can load what is already loaded and have the process skip what is already loaded:
1) make smaller bulk inserts for a batch of vacation records and all its children records. Ensure that it runs in a transaction. One BULK INSERT is run atomically (no transaction needed), for several it seems tricky.
2) if the process fails, you have complete vacation data in your database (no partially imported vacation)
3) retake the process, but load existing vacation records (parents only). Using EF, a faster way is using AsNoTracking to spare the tracking overhead (which is great for large lists)
var existingVacations = context.Vacation.Select(v => v.VacationSourceIdentifier).AsNoTracking();
As suggested by Alexei, EntityFramework.BulkInsert is a very good solution if your model is supported by this library.
You can also use Entity Framework Extensions (PRO Version) which allow to use BulkSaveChanges and Bulk Operations (Insert, Update, Delete and Merge).
It's support your both provider: MySQL and SQL Server
// Upgrade SaveChanges performance with BulkSaveChanges
var context = new CustomerContext();
// ... context code ...
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(operation => operation.BatchSize = 1000);
// Use direct bulk operation
context.BulkInsert(customers);
Disclaimer: I'm the owner of the project Entity Framework Extensions

How to model a Recurring Query based calendar with dependencies on resources like recrurring tasks and timekeeping

I'm storing calendar events as a table/list of events. I want to change this to something more flexible to perform easier queries-
For e.g. This design will let me ask, what I want to be able to ask Calendar.Employee().Between("Jan 1 2014").and("July 4 2014").Get("TimeSheets John").All("Recurring").TwoWeeks();
or
Calendar.Resource().Between("2014").and("2017").Get("Conf Room Boston").All("Recurring").ToList();
Could you please share a framework or a DB model for the DB data table/s. And, how the server side object would map-build that recurring event into an object model- so that
Query tasks within a time frame, for e.g. last week?
Query tasks based on recurring models, (every 2 weeks)
It can persist recurring events - with dependencies on TaskID, Conference Room etc
In this following Q&A someone recommended templates. what a template would look like in a DB table. saw this SO but did not help much
Can you also give me some guidance on if this is the wrong class/object to use -- I'm using Calendar Class. Would I be better served with DDay object?
How about using your existing objects with LINQ?
One of your queries might look like this:
CalendarItems.Where(c=>
c.Date.Year >= 2014 &&
c.Date.Year <= 2017 &&
c.Room = "Conf Room Boston" &&
c.isRecurring == true &&
c.RecurringType == RecurringTypes.BiWeekly);
For your database interface, I'd recommend Entity Framework. That will seamlessly convert between tables and the regular C# classes your used to.

.Where clause on DateTime.Month

Here is the situation:
I'm working on a basic search for a somewhat big entity. Right now, the amount of results is manageable but I expect a very large amount of data after a year or two of use, so performance is important here.
The object I'm browsing has a DateTime value and I need to be able to output all objects with the same month, regardless of the year. There are multiple search fields that can be combined, but the other fields do not cause a problem here.
I tried this :
if(model.SelectedMonth != null)
{
contribs = contribs.Where(x => x.Date.Value.Month == model.SelectedMonth);
}
model.Contribs = contribs
.Skip(NBRESULTSPERPAGE*(model.CurrentPage - 1))
.Take(NBRESULTSPERPAGE)
.ToList();
So far all I get is "Invalid 'where' condition. An entity member is invoking an invalid property or method." I thought of just invoking ToList() but it doesn't seem to be very efficient, again the entity is quite big. I'm looking for a clean way to make this work.
You said:
The object I'm browsing has a DateTime value and I need to be able to output all objects with the same month, regardless of the year
...
I expect a very large amount of data after a year or two of use, so performance is important here.
Right there, you have a problem. I understand you are using LINQ to CRM, but this problem would actually come up regardless of what technology you're using.
The underlying problem is that date and time is stored in a single field. The year, month, day, hour, minute, seconds, and fractional seconds are all packed into a single integer that represents the number of units since some time. In the case of a DateTime in .NET, that's the number of ticks since 1/1/0001. If the value is stored in a SQL DateTime2 field, it's the same thing. Other data types have different start dates (epochs) and different precisions. But in all cases, there's just a single number internally.
If you're searching for a value that is in a month of a particular year, then you could get decent performance from a range query. For example, give all values >= 2014-01-01 and < 2014-02-01. Those two points can be mapped back to their numeric representation in the database. If the field has an index, then a range query can use that index.
But if the value you're looking for is just a month, then any query you provide will require the database to extract that month from each and every value in the table. This is also known as a "table scan", and no amount of indexing will help.
A query that can effectively use an index is known as a sargable query. However, the query you are attempting is non-sargable because it has to evaluate every value in the table.
I don't know how much control over the object and its storage you have in CRM, but I will tell you what I usually recommend for people querying a SQL Server directly:
Store the month in a separate column as a single integer. Sometimes this can be a computed column based on the original datetime, such that you never have to enter it directly.
Create an index that includes this column.
Use this column when querying by month, so that the query is sargable.
This is a guess and really should be a comment, but it's too much code to format well in a comment. If it's not helpful I'll delete the answer.
Try moving model.SelectedMonth to a variable rather than putting it in the Where clause
var selectedMonth = model.SelectedMonth;
if(selectedMonth != null)
{
contribs = contribs.Where(x => x.Date.Value.Month == selectedMonth);
}
you might do the same for CurrentPage as well:
int currentPage = model.CurrentPage;
model.Contribs = contribs
.Skip(NBRESULTSPERPAGE*(currentPage - 1))
.Take(NBRESULTSPERPAGE)
.ToList();
Many query providers work better with variables than properties of non-related objects.
What is the type of model.SelectedMonth?
According to your code logic it is nullable, and it appears that it might be a struct, so does this work?
if (model.SelectedMonth.HasValue)
{
contribs = contribs.Where(x => x.Date.Value.Month == model.SelectedMonth.Value);
}
You may need to create a Month OptionSet Attribute on your contrib Entity, which is populated via a plugin on the create/update of the entity for the Date Attribute. Then you could search by a particular month, and rather than searching a Date field, it's searching an int field. This would also make it easy to search for a particular month in the advanced find.
The Linq to CRM Provider isn't a full fledged version of Linq. It generally doesn't support any sort of operation on the attribute in your where statement, since it has to be converted to whatever QueryExpressions support.

How to have a database trigger code

I need some help with this..
This table I have has a date column in it, and when any of the dates in that column equal the servers date I need to tell my website/program to send out an email or perform some certain notification action to let the user know something.
I was thinking of having a program running on the server polling the data base a certain intervals but the problem with this is if the date is 01/31/11 10:30 AM and my interval is every 5 mins there potential for the polling to be inaccurate i.e. the poll polling at 10:35 AM. In other words I need the database to somehow notify something when "x" date has been hit exactly at that date.
I'd like to avoid having a 1sec interval checking the database as I think that would be a huge performance hit.
I'm using ASP.NET MVC 3 with MSSQL and LINQ Entity framework.
Any creative ideas?
You could use Quartz.net to setup those events. Quartz is pretty flexible and powerful - and it was meant for this sort of thing.
Do not have the database trigger the code. Have a trigger create a row in another table with information about what just happened.
Have a separate program periodically read from the second table to email users or whatever you need to do. Have that program delete the row from the table once it's done with the email.
I don't have any personal experience, but Sql Server CLR Integration might be the answer you are looking for. From the description it sounds like you can write almost anything that will compile against the .NET framework and deploy it to a sql server instance and Sql Server will be able to execute it. http://msdn.microsoft.com/en-us/library/ms254498.aspx
you either need to make use of a scheduler (e.g. DBMS_SCHEDULER in Oracle or SQL Server Jobs, etc) or find some third party tool like maybe Quartz.net as mentioned by another responder. Or maybe code something like the following into a polling app
select all jobs due in next 5 minutes, order by due date
while there are jobs
if the next job is due action it
else sleep for duration of interval till job due
loop
This is bit dirty, but I think it will give you the functionality you're looking for.
In Global.asax.cs
public DateTime LastMaxDateTime;
protected void Application_Start(object sender,EventArgs e)
{
LastMaxDateTime = GetMaxDateTime();
Thread bgThread=new Thread(BackgroundThread_CheckDatabase);
bgThread.IsBackground=true;
bgThread.Start();
}
private void BackgroundThread_CheckDatabase()
{
while(true)
{
DateTime dtMaxDateTime = GetMaxDateTime();
if(dtMaxDateTime > this.LastMaxDateTime)
{
//Send Notifications
this.LastMaxDateTime=dtMaxDateTime;
}
Thread.Sleep(5000); //5 seconds
}
}
private DateTime GetMaxDateTime()
{
//function that returns DateTime from something like "SELECT MAX(DateTimeColumn) FROM [MyTable]"
}
Basically, the code keeps track of the newest DateTime in your table and on each poll, checks to see if there's a newer DateTime in the database since the last time it checked. If so, you can send out your notifications. If you're not expecting many records in your table that could cause a race condition, then I don't see a problem with this as a quick solution.
Most efficient way to do it is to have an application that instead of polling runs event-driven.
For example, have a thread query the database for the earliest scheduled event and sleep until then. Then have another thread synchronously wait for a table change (e.g. in PostgreSQL this would be the NOTIFY/LISTEN statements) and signal the first thread to check if the earliest event has changed.
The easiest way is to keep track of the date of your last check. When you check again, pull all rows greater than the last check date and less than or equal to the new check date. To make sure you execute them, you could add a column for when the action was performed and update that. With an index on that new column there shouldn't be any performance problem with checking it every second for rows with a NULL DateExecuted.
You could also read ahead and sort the upcoming items by trigger date and do a Thread.Wait() until the next one comes up to be precise.

How to refresh xml cache after one day from the last modified date?

I am developing a web app using ASP.NET 2.0 (C#), where on home page I am displaying recently added records. Adding of records frequency is around 1-5 records per day, so I decided not to put much overhead on the sql server by fetching recent records every time from db server.
So, To make the data cached I have used XML files, I have generated the XML file from dataset, (ds.WriteXML function in .NET), now lets say today (10 Jan 2008 12:30:00) I have created a file recent-cache.xml is created. So, the recent cache file is valid for one day.
Then if the difference between current date and last modified date is greater than or equal 1 day then cache xml file must be generated again, with the new data from the db server.
So, I want the code using which I can get the last modfied date of the xml file and then find the difference between both (current and file last modified-date) dates.
And also please tell me what I thought is the better solution, or we can do anything else, some other easy and speedy technique.
Thanks
You might consider using the ASP.NET Cache API, which exists to do the sort of job you're describing. You can add any object (like an XmlDocument, or a DataSet) to the Cache collection and specify how long you want it in there like so:
Cache.Insert("MyCacheKey", myObjectToCache, null, DateTime.Now.AddDays(1), null);
Then you could get at your cached data with a function like this:
const string CACHE_KEY = "MyCacheKey";
private DataSet RecentlyAdded()
{
if(Cache[CACHE_KEY] == null)
Cache.Insert(CACHE_KEY, GetRecentlyAddedFromDatabase(), null, DateTime.Now.AddDays(1), null);
return Cache[CACHE_KEY];
}
The caching API has lots of other neato features but this would accomplish what you want without having to roll your own file-based solution.
Note that if your app shuts down before the day is up, the Cache will shut down with it, and the "recently added" database query will have to run again the next time the data is requested.
edit: changed cache key to a string constant so it only has to be specified once.

Categories

Resources