Loading lots of upfront data. . sync or async . - c#

so i have a winforms apps that downloads a set of data syncronously on startup. This obviously takes a while but then when any of the services or GUI classes load, they all have this data. I could change this to put on a background thread but then every component that needs access to this data would continuously have to get notified when this data was ready. This seems like bad design for every one of my classes that depends on this data to be loaded to have a If (Loaded) check or have to subscribe to a loaded event . . . any ideas?
Any other ideas?

I've written a number of applications that have similar behaviour to what you describe, and have three suggestions for you ...
Splash Screen
Add a splash screen to your application that displays the status of a number of startup steps. I've used this in the past when an application has a number of steps that have to occur on startup, before the user gets to use the application - confirmation of identity and authorisation of access through Active Directory, contact database for system information, loading static data, initial contact with specified web services, checking that prerequisites (like Crystal reports) are installed and working, etc etc.
Subscription
Have each component register interest with your data loader, and be notified when the data is available. This is the observer pattern, nothing wrong with it, though managing the subscriptions can be a bit messy.
Lazy Loading
Design the rest of your application to request the data as late as possible, giving as wide an opportunity for background loading to complete as possible. Users who are quick off the mark after startup have to wait for necessary data to load; users who take their time (maybe they started the application and then switched to Outlook) find response is immediate.

I would suggest you use the Observer Pattern and setup all the classes that rely on the data set being loaded. To minimize the amount of time the user needs to wait you could also consider implemented two categories of classes those that need the entire dataset to function and those that can function once a subset of the data has been loaded.
Observer Design Pattern

Everything I do in a desktop app, whether it is Winforms or WPF, I try to do on a background thread. This is done for two reasons. First, the user experience is better. Second, in WPF testing I have found it to be a better performer when loading a lot of data, like records into a grid or list.
Loading data upfront vs lazy loading is really a per-application customization. I would build a central data object that handles both scenarios. The way I might recommend doing this is to create an event driven dependency model. What I mean by this is that you can place an event or callback registration function on a data manager object that various units of code subscribe to when they need to use data, and then they are called back when the data are available. If the data are already available then the callback occurs immediately. Else, the code unit is called back when the data are loaded from a background thread. For example, in some window or component you might have some code that looks like:
DataManager.LoadDataAsync(dataCommandPatternObject, CallBackFunction);
...
public void CallbackFunction(SomeDataObjectClass data)
{
//load data into UI
}
If data loading is done through a central mechanism then if the same data are requested twice, a cache version can be used or the second request can wait if the first request is still running.
If data needs to be loaded up-front, a loading screen (splash screen) can request a number of pieces of data, and when each block of data loads a callback is fired. When all the callbacks have fired, the splash screen exists.
These are just a few points from some various techniques I have used over the years to manage the loading of large data-sets of mostly static/lookup data. On top of all of this, I would also recommend some sort of client-side disk caching for very large datasets that rarely change, and implement some sort of change tracking in the database. This would allow this data to be loaded from local disk by the client, which is faster that going to a DB. It also lets the DB scale better, since it is not serving out data that it highly repetitive, and instead it can focus on transactional data.

Related

Best practices to enable fast startup for a Windows Forms Application (.NET 4.0 C#)

I have made a rather complex .NET 4.0 (C#) Windows Forms application using Visual Studio 2013. The question is quite general though, and should be applicable for other versions of .NET and VS as well.
On startup the system reads config file, parses file folders and reads file content, reads data from database, performs a web request and adds data to a lot of controls on the main startup form.
I want to avoid having a splash screen with "waiting-hourglass", the goal is to make the application startup fast and show the main form immediately.
My solution has been to use backgroundworker for some of the startup tasks, making the application visible and responsive while data are fetched. The user can then choose to navigate away from the startup form and start doing other tasks without having to wait for all the startup procedures to be completed.
Is use of backgroundworker suitable for this?
What other methods should be considered instead of, or in addition to, backgroundworker to enable fast startup for an application with a lot of startup procedures?
In my applications I use a splash screen. However, I do not show a waiting-hourglass. Instead it shows a status line where current action is written, e.g. "Read config file", "Connect to database", "Perform web request", etc.
Of course, the application does not start faster but the user does not have the feeling of a hanging program and it appears faster.
In any case it depends if early access availablity makes sense for the user. A good way would also be to just preload the first page / form / tab before the user can see the interface (Splashscreen or loading bar before that).
When the first bits are loaded you could asynchroniously cache more data and only allow the user switching pages / tabs when the caching of these components is completed (you will have to display a "still loading" message or grey out other tabs while doing this to not confuse the user).
You can also just load addditional data if the user chooses to use the page / tab / feature to reduce loading unneccesary information but this will lead to waiting while using the application - it`s up to you.
Technically, as BackgroundWorker is explicitly labeled as obsolete in .NET 4.5 you should see if the introduced await/async would be a more elegant solution for you (See MSDN Asynchronous Programming with Async and Await Introduction)
MSDN says:
The async-based approach to asynchronous programming is preferable to
existing approaches in almost every case. In particular, this approach
is better than BackgroundWorker for IO-bound operations because the
code is simpler and you don't have to guard against race conditions.
See a comparison thread Background Worker vs Await/Async
See a well commented example of backgroundworker code to load GUI data if you choose to use that technique
Rather an advice than an answer:
Is use of backgroundworker suitable for this? - yes.
What other methods should be considered instead of, or in addition to, backgroundworker to enable fast startup for an application with a lot of startup procedures? - consider on-demand a.k.a. lazy loading of data. That is, load the data only when they are actually needed rather than query everything at once possibly many of them without ever being used or looked at. If this is not possible as of your UI setup, consider refining your UI and rethink whether everything should be displayed as is. For example, use separate windows or expanders to display details and query the data when they are made visible. This does not only save you time on app startup but also makes sure that you display any data in an up-to-date manner.

Listening events in a web service or API over Database changes

I have this scenario, and I don't really know where to start. Suppose there's a Web service-like app (might be API tho) hosted on a server. That app receives a request to proccess some data (through some method we will call processData(data theData)).
On the other side, there's a robot (might be installed on the same server) that procceses the data. So, The web-service inserts the request on a common Database (both programms have access to it), and it's supposed to wait for that row to change and send the results back.
The robot periodically check the database for new rows, proccesses the data and set some sort of flag to that row, indicating that the data was processed.
So the main problem here is, what should the method proccessData(..) do to check for the changes of the data row?.
I know one way to do it: I can build an iteration block that checks for the row every x secs. But i don't want to do that. What I want to do is to build some sort of event listener, that triggers when the row changes. I know it might involve some asynchronous programming
I might be dreaming, but is that even possible in a web enviroment.?
I've been reading about a SqlDependency class, Async and AWait classes, etc..
Depending on how much control you have over design of this distributed system, it might be better for its architecture if you take a step back and try to think outside the domain of solutions you have narrowed the problem down to so far. You have identified the "main problem" to be finding a way for the distributed services to communicate with each other through the common database. Maybe that is a thought you should challenge.
There are many potential ways for these components to communicate and if your design goal is to reduce latency and thus avoid polling, it might in fact be the right way for the service that needs to be informed of completion of this work item to be informed of it right away. However, if in the future the throughput of this system has to increase, processing work items in bulk and instead poll for the information might become the only feasible option. This is also why I have chosen to word my answer a bit more generically and discuss the design of this distributed system more abstractly.
If after this consideration your answer remains the same and you do want immediate notification, consider having the component that processes a work item to notify the component(s) that need to be notified. As a general design principle for distributed systems, it is best to have the component that is most authoritative for a given set of data to also be the component to answer requests about that data. In this case, the data you have is the completion status of your work items, so the best component to act on this would be the component completing the work items. It might be better for that component to inform calling clients and components of that completion. Here it's also important to know if you only write this data to the database for the sake of communication between components or if those rows have any value beyond the completion of a given work item, such as for reporting purposes or performance indicators (KPIs).
I think there can be valid reasons, though, why you would not want to have such a call, such as reducing coupling between components or lack of access to communicate with the other component in a direct manner. There are many communication primitives that allow such notification, such as MSMQ under Windows, or Queues in Windows Azure. There are also reasons against it, such as dependency on a third component for communication within your system, which could reduce the availability of your system and lead to outages. The questions you might want to ask yourself here are: "How much work can my component do when everything around it goes down?" and "What are my design priorities for this system in terms of reliability and availability?"
So I think the main problem you might want to really try to solve fist is a bit more abstract: how should the interface through which components of this distributed system communicate look like?
If after all of this you remain set on having the interface of communication between those components be the SQL database, you could explore using INSERT and UPDATE triggers in SQL. You can easily look up the syntax of those commands and specify Stored Procedures that then get executed. In those stored procedures you would want to check the completion flag of any new rows and possibly restrain the number of rows you check by date or have an ID for the last processed work item. To then notify the other component, you could go as far as using the built-in stored procedure XP_cmdshell to execute command lines under Windows. The command you execute could be a simple tool that pings your service for completion of the task.
I'm sorry to have initially overlooked your suggestion to use SQL Query Notifications. That is also a feasible way and works through the Service Broker component. You would define a SqlCommand, as if normally querying your database, pass this to an instance of SqlDependency and then subscribe to the event called OnChange. Once you execute the SqlCommand, you should get calls to the event handler you added to OnChange.
I am not sure, however, how to get the exact changes to the database out of the SqlNotificationEventArgs object that will be passed to your event handler, so your query might need to be specific enough for the application to tell that the work item has completed whenever the query changes, or you might have to do another round-trip to the database from your application every time you are notified to be able to tell what exactly has changed.
Are you referring to a Message Queue? The .Net framework already provides this facility. I would say let the web service manage an application level queue. The robot will request the same web service for things to do. Assuming that the data needed for the jobs are small, you can keep the whole thing in memory. I would rather not involve a database, if you don't already have one.

Threads in asp.net (C#)

When a user visits an .aspx page, I need to start some background calculations in a new thread. The results of the calculations need to be stored in the user's Session, so that on a callback, the results can be retrieved. Additionally, on the callback, I need to be able to see what the status of the background calculation is. (E.g. I need to check if the calculation is finished and completed successfully, or if it is still running) How can I accomplish this?
Questions
How would I check on the status of the thread? Multiple users could have background calculations running at the same time, so I'm unsure how the process of knowing which thread belongs to which user would work.. (though in my scenario, the only thread that matters, is the thread originally started by user A -- and user A does a callback to retrieve/check on the status of that thread).
Am I correct in my assumption that passing an HttpSessionState "Session" variable for the user to the new thread, will work as I expect (e.g. I can then add stuff to their Session later).
Thanks. Also I have to say, I might be confused about something but it seems like the SO login system is different now, so I don't have access to my old account.
Edit
I'm now thinking about using the approach described in this article which basically uses a class and a Singleton to manage a list of threads. Instead of storing my data in the database (and incurring the performance penalty associated with retrieving the data, as well as the extra table, maintenance, etc in the database), I'll probably store the data in my class as well.
Edit 2
The approach mentioned in my first edit worked well. Additionally I had timers to ensure the threads, and their associated data, were both cleaned up after the corresponding timers called their cleanup methods. The Objects containing my data and the threads were stored in the Singleton class. For some applications it might be appropriate to use the database for storage but it seemed like overkill for mine, since my data is tied to a specific instance of a page, and is useless outside of that page context.
I would not expect session-state to continue working in this scenario; the worker may have no idea who the user is, and even if it does (or more likely: you capture this data into the worker), no reason to store anything (updating session is a step towards the end of the request pipeline; but if you aren't in the pipeline...?).
I suspect you might need to store this data separately using some unique property of the user (their id or cn), or invent a GUID otherwise. On a single machine it may suffice to store this in a synchronised dictionary (or similar), but on a farm/cluster you may need to push the data down a layer to your database or state server. And fetch manually.

Improve perceived WPF app startup time

I have a WPF database viewer application: It's a simple main window containing a user control with a data grid showing the data extracted from an SQLite database.
The problem is that this application takes 6 seconds to start until it is usable.
I tried building the user control (and doing all the data loading) in the constructor of the main window:
The splash screen will be shown 5s this way, then followed by 1s of empty main window until the application is ready to be used.
Users said that it takes too long until something (visually) happens.
I then moved the user control creation (and data loading) into the Loaded event handler of the main window:
The splash screen will be shown 3s, followed by 3s of empty main window until the application is ready.
Users said that it is "better", but don't like the fact that a half finished main window is shown in disabled state for so long.
Is there some general advice to be found about perceived application load time or are there any other recommendations about how this situation can be improved?
I believe ideally the main window would be shown as fast as possible, along with some hour glass or spinner until the data is loaded. But then I cannot just move the user control creation into a background worker as this would be done on the wrong thread.
Does anybody have any suggestions to this problem?
Edit:
Note that right now I've just assigned a LINQ-to-EF query as the grid data source.
One possible improvement may be to load this data into a data table in background and assign it only once loaded...
Edit2:
I'm using .net 4 with System.Data.SQLite and EF4 to load the data. There are more or less 4000 rows and 30 columns.
Load your data asynchronous. Present something nice on the GUI for the user while loading. The following code can help you with this:
BackgroundWorker bgWorker = new BackgroundWorker() { WorkerReportsProgress=true};
bgWorker.DoWork += (s, e) => {
// Load here your file/s
// Use bgWorker.ReportProgress(); to report the current progress
};
bgWorker.ProgressChanged+=(s,e)=>{
// Here you will be informed about progress and here it is save to change/show progress.
// You can access from here savely a ProgressBars or another control.
};
bgWorker.RunWorkerCompleted += (s, e) => {
// Here you will be informed if the job is done.
// Use this event to unlock your gui
};
bgWorker.RunWorkerAsync();
The app is not faster but it seems to be much faster because the GUI is immediately visible and responsive. Maybe you also can show the user a part of the loaded data while loading the rest. Use the ProgressChanged-event for doing this.
Update
I'm not sure if I understand your problem right. If your problem is not the time data needs to be loaded, then something is odd in your application. WPF is IMO very fast. Control-creation does not takes a lot of time. I visualize much bigger lists as you mention in some milliseconds.
Try to look if you have something in your UI that hinders the DataGrid to virtualize the Items. Maybe you have a proplem there. To analyse WPF apps, I can recommend you the WPF Profiling Tools.
The most obvious thing you can do is to profile your application and find the bottlenecks in start up time. It sounds like the most likely culprit will be the loading of data from your database.
One lesson I've learnt is that if you're using an ORM, when loading large datasets if you favour POCO (Plain Old CLR/C# Objects) over ORM-generated database entities (see example below), the load time will be a lot faster and RAM usage will be significantly decreased too. The reason for this is that EF will try to load the entire entity (i.e. all of it's fields) and possibly a whole load of data related to your entities, most of which you won't even need. The only time you really need to work directly with entities is when you're doing insert/update/delete operations. When reading data you should only get fields that your application needs to display and/or validate.
If you follow the MVVM pattern, the above architecture isn't hard to implement.
Example of loading data into POCOs with EF:
var query = from entity in context.Entities
select new EntityPoco
{
ID = entity.ID,
Name = entity.Name
};
return query.ToList();
POCOs are very simple classes with autoproperties for each field.
We usually have repositories for each entity in our applications and each repository is responsible for getting/updating data related to that entity. The view models have references to the repositories they need so they don't use EF directly. When users make changes that need to be persisted, we use other methods in the repository that then load only a subset of entities (i.e. the ones the user changed) and apply the necessary updates - with some validation done by the viewmodel and possibly other validation going on in the DB via constraints/triggers, etc.
There are many reasons for this.
1) Deployment machine might have fairly low configuration.
2) In-Proper or problem with data binding.
Possible solutions would be:
1) Lazy load the data
2) Optimize the performance. http://msdn.microsoft.com/en-us/library/aa970683.aspx
I had saw applications render 5M records less than a second in wpf.
PS:One more least possible reasons may be 30 columns, due to column order access.

Optimizing Smart Client Performance

I have a smart client (WPF) that makes calls to the server va services (WCF). The screen I am working on holds a list of objects that it loads when the constructor is called. I am able to add, edit and delete records in the list.
Typically what I am doing is after every add or delete I am reloading the entire model from the service again, there are a number off reasons for this including the fact that the data may have changed on the server between calls.
This approach has proved to be a big hit on perfomance because I am loading everything sending the list up and down the wire on Add and Edit.
What other options are open to me, should I only be send the required information to the server and how would I go about not reloading all the data again ever time an add or delete is performed?
The optimal way of doing what you're describing (I'm going to assume that you know that client/server I/O is the bottleneck already) is to send only changes in both directions once the client is populated.
This can be straightforward if you've adopted a journaling model for updates to the data. In order for any process to make a change to the shared data, it has to create a time-stamped transaction that gets added to a journal. The update to the data is made by a method that applies the transaction to the data.
Once your data model supports transaction journals, you have a straightforward way of keeping the client and server in sync with a minimum of network traffic: to update the client, the server sends all of the journal entries that have been created since the last time the client was updated.
This can be a considerable amount of work to retrofit into an existing design. Before you go down this road, you want to be sure that the problem you're trying to fix is in fact the problem that you have.
Make sure this functionality is well-encapsulated so you can play with it without having to touch other components.
Have your source under version control and check in often.
I highly recommend having a suite of automated unit tests to verify that everything works as expected before refactoring and continues to work as you perform each change.
If the performance hit is on the server->client transfer of data, moreso than the querying, processing and disk IO on the server, you could consider devising a hash of a given collection or graph of objects, and passing the hash to a service method on the server, which would query and calculate the hash from the db, compare the hashes, and return true or false. Only if false would you then reload data. This works if changes are unlikely or infrequent, because it requires two calls to get the data, when it has changed. If changes in the db are a concern, you might not want to only get the changes when the user modifies or adds something -- this might be a completely separate action based on a timer, for example. Your concurrency strategy really depends on your data, number of users, likelihood of more than one user being interested in changing the same data at the same time, etc.

Categories

Resources