Optimizing Smart Client Performance - c#

I have a smart client (WPF) that makes calls to the server va services (WCF). The screen I am working on holds a list of objects that it loads when the constructor is called. I am able to add, edit and delete records in the list.
Typically what I am doing is after every add or delete I am reloading the entire model from the service again, there are a number off reasons for this including the fact that the data may have changed on the server between calls.
This approach has proved to be a big hit on perfomance because I am loading everything sending the list up and down the wire on Add and Edit.
What other options are open to me, should I only be send the required information to the server and how would I go about not reloading all the data again ever time an add or delete is performed?

The optimal way of doing what you're describing (I'm going to assume that you know that client/server I/O is the bottleneck already) is to send only changes in both directions once the client is populated.
This can be straightforward if you've adopted a journaling model for updates to the data. In order for any process to make a change to the shared data, it has to create a time-stamped transaction that gets added to a journal. The update to the data is made by a method that applies the transaction to the data.
Once your data model supports transaction journals, you have a straightforward way of keeping the client and server in sync with a minimum of network traffic: to update the client, the server sends all of the journal entries that have been created since the last time the client was updated.
This can be a considerable amount of work to retrofit into an existing design. Before you go down this road, you want to be sure that the problem you're trying to fix is in fact the problem that you have.

Make sure this functionality is well-encapsulated so you can play with it without having to touch other components.
Have your source under version control and check in often.
I highly recommend having a suite of automated unit tests to verify that everything works as expected before refactoring and continues to work as you perform each change.
If the performance hit is on the server->client transfer of data, moreso than the querying, processing and disk IO on the server, you could consider devising a hash of a given collection or graph of objects, and passing the hash to a service method on the server, which would query and calculate the hash from the db, compare the hashes, and return true or false. Only if false would you then reload data. This works if changes are unlikely or infrequent, because it requires two calls to get the data, when it has changed. If changes in the db are a concern, you might not want to only get the changes when the user modifies or adds something -- this might be a completely separate action based on a timer, for example. Your concurrency strategy really depends on your data, number of users, likelihood of more than one user being interested in changing the same data at the same time, etc.

Related

In C# How can I get a event from MySQL when somebody Insert, Delete or Modify a record?

I am developing a program in WPF.Net, and I need to know when somebody makes a change over any table of the database.
The idea is receive a event from the database when it was changed. I was reading a lot of articles but I can't find a method to resolve my problem.
Kind Regards
The best solution is to use a message queue. After your app commits a change to the database, the app also publishes a message on the message queue. Other clients then just wait for notifications on that message queue.
There are a few other common solutions, but all of them have disadvantages.
Polling. If a client is interested in recent changes, they run a query searching for new data every N seconds.
The downside is you have to keep polling even during times when there are no changes. You might have to poll very frequently, depending on how promptly you need to notice the changes. This adds to database load just to support the polling queries.
Also it costs more if you have many clients all polling for queries. In one system I supported, the database was struggling to process 30,000 queries per second just for clients running polling.
Change Data Capture. Using the binary log as a de facto message queue, because it records all the changes. Use a client tool such as Debezium, or write your own binlog tail client (this is a lot of work).
The downside is the binlog records all changes, not just those you want to be notified about. You have to filter it somehow. Also you have to learn how to use Debezium or equivalent tool.
Triggers. Write a trigger on the table that invokes a UDF to post notification outside the database. This is a bad idea, because the trigger executes when your insert/update/delete executes, not when the transaction commits. Clients could be notified of changes before the changes are committed, so if they go query the database right after they get the notification, the change is not visible to them yet.
Also a disadvantage because it requires you install a UDF extension in MySQL Server. MySQL doesn't normally have any way of posting an external notification.
I'm not a C# developer so I can't suggest specific code. But the general methods above are similar regardless of which language the app is written in.
I don't think this is possible with MySQL, DBs like MondgoDB have this sort of feature.
You may like to use the method described in this answer.
Essentially have date/time fields on rows where you can pull data since a certain date time. Or you could use a CQRS/Event stratagem and maybe use a message queue.

What is the best Method for monitoring a large number of clients reliably with good performance

This is more of a programming strategy and direction question, than the actual code itself.
I am programming in C-Sharp.
I have an application that remotely starts processes on many different clients on the network, could be up to 1000 clients in theory.
It then monitors the status of the remote processes by reading a log file on each client.
I currently do this by running one thread that loops through all of the clients in a list, and reading the log file. It works fine for 10 or 20 machines, but 1000 would probably be untenable.
There are several problems with this approach:
First, if the thread doesn’t finish reading all of the client statuses before it’s called again, the client statuses at the end of the list might not be read and updated.
Secondly, if any client in the list goes offline during this period, the updating hangs, until that client is back online again.
So I require a different approach, and have thought up a few possible ways to resolve this.
Spawn a separate thread for each client, to read their log file and update its progress.
a. However, I’m not sure if having 1000 threads running on my machine is something that would be acceptable.
Test the connect for each machine first, before trying to read the file, and if it cannot connect, then just ignore it for that iteration and move on to the next client in the list.
a. This still has the same problem of not getting through the list before the next call, and causes more delay and it tries to test the connection via a port first. With 1000 clients, this would be noticeable.
Have each client send the data to the machine running the application whenever there is an update.
a. This could create a lot of chatter with 1000 machines trying to send data repeatedly.
So I’m trying to figure if there is another more efficient and reliable method, that I haven’t considered, or which one of these would be the best.
Right now I’m leaning towards having the clients send updates to the application, instead of having the application pulling the data.
Looking for thoughts, concerns, ideas and recommendations.
In my opinion, you are doing this (Monitoring) the wrong way. Instead of keeping all logs in a text file, you'd better preserve them in a central data repository that can be of any kind. With respect to the fact that you are monitoring the performance of those system, your design and the mechanism behind it must not impact the performance of the target systems negatively, and with this design the disk and CPU would be involved so much in certain cases that can result in a performance issue itself.
I recommend you to create a log repository server using a fast in-memory database like Redis, and send logged data directly to that server. Keep in mind that this database must be running on a different virtual machine. You can then tune Redis to store received data on physical Disk once a particular number of indexes are reached or a particular interval elapses. The in-memory feature here is advantageous as you may need to query information a lot in a monitoring application like this. On the other hand, the performance of Redis is so high that it efficiently passes processing millions of indexes.
The blueprint for you is that:
1- Centralize all log data in a single repository.
2- Configure clients to send monitored information to the centralized repository.
3- Read the data from the centralized repository by the main server (monitoring system) when required.
I'm not trying to advertise for a particular tool here as I'm only sharing my own experience. There's many more tools that you can use for this purpose such as ElasticSearch.

SQL CLR Web Service Call: Limiting Overhead

I'm attempting to improve query performance for an application and I'm logically stuck.
So the application is proprietary and thus we're unable to alter application-side code. We have, however, received permission to work with the underlying database (surprisingly enough). The application calls a SQL Server database, so the current idea we're running with is to create a view with the same name as the table and rename the underlying table. When the application hits the view, the view calls one of two SQL CLR functions, which both do nothing more than call a web service we've put together. The web service performs all the logic, and contains an API call to an external, proprietary API that performs some additional logic and then returns the result.
This all works, however, we're having serious performance issues when scaling up to large data sets (100,000+ rows). The pretty clear source of this is the fact we're having to work on one row at a time with the web service, which includes the API call, which makes for a lot of latency overhead.
The obvious solution to this is to figure out a way to limit the number of times that the web service has to be hit per query, but this is where I'm stuck. I've read about a few different ways out there for potentially handling scenarios like this, but as a total database novice I'm having difficulty getting a grasp on what would be appropriate in this situation.
If any there are any ideas/recommendations out there, I'd be very appreciative.
There are probably a few things to look at here:
Is your SQLCLR TVF streaming the results out (i.e. are you adding to a collection and then returning that collection at the end, or are you releasing each row as it is completed -- either with yield return or building out a full Enumerator)? If not streaming, then you should do this as it allows for the rows to be consumed immediately instead of waiting for the entire process to finish.
Since you are replacing a Table with a View that is sourced by a TVF, you are naturally going to have performance degradation since TVFs:
don't report their actual number of rows. T-SQL Multi-statement TVFs always appear to return 1 row, and SQLCLR TVFs always appear to return 1000 rows.
don't maintain column statistics. When selecting from a Table, SQL Server will automatically create statistics for columns referenced in WHERE and JOIN conditions.
Because of these two things, the Query Optimizer is not going to have an easy time generating an appropriate plan if the actual number of rows is 100k.
How many SELECTs, etc are hitting this View concurrently? Since the View is hitting the same URI each time, you are bound by the concurrent connection limit imposed by ServicePointManager ( ServicePointManager.DefaultConnectionLimit ). And the default limit is a whopping 2! Meaning, all additional requests to that URI, while there are already 2 active/open HttpWebRequests, will wait inline, patiently. You can increase this by setting the .ServicePoint.ConnectionLimit property of the HttpWebRequest object.
How often does the underlying data change? Since you switched to a View, that doesn't take any parameters, so you are always returning everything. This opens the door for doing some caching, and there are two options (at least):
cache the data in the Web Service, and if it hasn't reached a particular time limit, return the cached data, else get fresh data, cache it, and return it.
go back to using a real Table. Create a SQL Server Agent job that will, every few minutes (or maybe longer if the data doesn't change that often): start a transaction, delete the current data, repopulate via the SQLCLR TVF, and commit the transaction. This requires that extra piece of the SQL Agent job, but you are then back to having more accurate statistics!!
For more info on working with SQLCLR in general, please visit: SQLCLR Info

How to get real time update of data to main warehouse

All,
Need some info.
We have stores at multiple locations and use client server app installed for sales activity.
sales data is stored in database which is setup in all stores...
# end of day - a batch pulls data from all of the store locations and update main warehouse database.
We want to have real time implementation so that whenever there is transcation # any store... data will update immediately to main warehouse repository.
Any clue as how can we achive real time update of data to main warehouse ?
Thanks in advance...
One approach to this is called replication. There are several ways to do it in SQL Server. You're probably looking for transaction replication or merge replication.
Here's a place to start in the SQL Server 2012 documentation.
And here's a fairly recent overview that might be helpful.
You should make sure you understand what "real time" means, and how real time you really need to be. If you are not pre aggregating data and then storing it in the WH, then you should be able to set up replication between the database servers (if they can talk to each other). If you are loading an aggregate, then it gets tricky because you have to merge the measures (facts) into the warehouses existing measures, which is tough. If you don't need true real time, just a slow trickle, then consider simply running your current process on a schedule in sql agent.
First off - why not run the batch multiple times a day. It would not really be "real-time" but might yield good enough real world results.
One option would be to implement master-master replication provided by the SQL engine in use. Though this probably means that some steps need to be taken to guard against duplicate IDs, auto increment mismatch etc. For example we have a master-master system set up so that one produces entries with odd IDs, the other with even.
Another approach could be that all reads are performed against local databases, and all writes are performed into the single remote master. Data would be replicated as a master-slave setup. This would provide best data consistency, but slow network would make any writes slow. We have this kind of a setup implemented atop of the master-master replication as most interaction are reads.
One real world use case I have actually come across for a similar stores/warehouse setup was based on Firebird SQL. Every single table had triggers implemented to store every action on local databases in so called log tables. And there was a replication application running at all times, regularly checking these log tables, updating the data to a remote database and pulling in new data from the remote (which had it's own log tables). But as a downside it was a horror to maintain as triggers needed to be updated when something changed in the database setup and the replication application would fail/hang at times. But data consistency was maintained well and resolved by negative IDs being used for local database and positive for master/remote. But in the end it did not really provide real "real-time".
In the end - there is no one-shoe-fits-all answer and books could probably be written on the topic. Research and Google are your friends.

Listening events in a web service or API over Database changes

I have this scenario, and I don't really know where to start. Suppose there's a Web service-like app (might be API tho) hosted on a server. That app receives a request to proccess some data (through some method we will call processData(data theData)).
On the other side, there's a robot (might be installed on the same server) that procceses the data. So, The web-service inserts the request on a common Database (both programms have access to it), and it's supposed to wait for that row to change and send the results back.
The robot periodically check the database for new rows, proccesses the data and set some sort of flag to that row, indicating that the data was processed.
So the main problem here is, what should the method proccessData(..) do to check for the changes of the data row?.
I know one way to do it: I can build an iteration block that checks for the row every x secs. But i don't want to do that. What I want to do is to build some sort of event listener, that triggers when the row changes. I know it might involve some asynchronous programming
I might be dreaming, but is that even possible in a web enviroment.?
I've been reading about a SqlDependency class, Async and AWait classes, etc..
Depending on how much control you have over design of this distributed system, it might be better for its architecture if you take a step back and try to think outside the domain of solutions you have narrowed the problem down to so far. You have identified the "main problem" to be finding a way for the distributed services to communicate with each other through the common database. Maybe that is a thought you should challenge.
There are many potential ways for these components to communicate and if your design goal is to reduce latency and thus avoid polling, it might in fact be the right way for the service that needs to be informed of completion of this work item to be informed of it right away. However, if in the future the throughput of this system has to increase, processing work items in bulk and instead poll for the information might become the only feasible option. This is also why I have chosen to word my answer a bit more generically and discuss the design of this distributed system more abstractly.
If after this consideration your answer remains the same and you do want immediate notification, consider having the component that processes a work item to notify the component(s) that need to be notified. As a general design principle for distributed systems, it is best to have the component that is most authoritative for a given set of data to also be the component to answer requests about that data. In this case, the data you have is the completion status of your work items, so the best component to act on this would be the component completing the work items. It might be better for that component to inform calling clients and components of that completion. Here it's also important to know if you only write this data to the database for the sake of communication between components or if those rows have any value beyond the completion of a given work item, such as for reporting purposes or performance indicators (KPIs).
I think there can be valid reasons, though, why you would not want to have such a call, such as reducing coupling between components or lack of access to communicate with the other component in a direct manner. There are many communication primitives that allow such notification, such as MSMQ under Windows, or Queues in Windows Azure. There are also reasons against it, such as dependency on a third component for communication within your system, which could reduce the availability of your system and lead to outages. The questions you might want to ask yourself here are: "How much work can my component do when everything around it goes down?" and "What are my design priorities for this system in terms of reliability and availability?"
So I think the main problem you might want to really try to solve fist is a bit more abstract: how should the interface through which components of this distributed system communicate look like?
If after all of this you remain set on having the interface of communication between those components be the SQL database, you could explore using INSERT and UPDATE triggers in SQL. You can easily look up the syntax of those commands and specify Stored Procedures that then get executed. In those stored procedures you would want to check the completion flag of any new rows and possibly restrain the number of rows you check by date or have an ID for the last processed work item. To then notify the other component, you could go as far as using the built-in stored procedure XP_cmdshell to execute command lines under Windows. The command you execute could be a simple tool that pings your service for completion of the task.
I'm sorry to have initially overlooked your suggestion to use SQL Query Notifications. That is also a feasible way and works through the Service Broker component. You would define a SqlCommand, as if normally querying your database, pass this to an instance of SqlDependency and then subscribe to the event called OnChange. Once you execute the SqlCommand, you should get calls to the event handler you added to OnChange.
I am not sure, however, how to get the exact changes to the database out of the SqlNotificationEventArgs object that will be passed to your event handler, so your query might need to be specific enough for the application to tell that the work item has completed whenever the query changes, or you might have to do another round-trip to the database from your application every time you are notified to be able to tell what exactly has changed.
Are you referring to a Message Queue? The .Net framework already provides this facility. I would say let the web service manage an application level queue. The robot will request the same web service for things to do. Assuming that the data needed for the jobs are small, you can keep the whole thing in memory. I would rather not involve a database, if you don't already have one.

Categories

Resources