I need some suggestions from the community to a requirement I have. Below is the requirement and I need some approach suggestions.
Users from the client need to retrieve data from my source database (Let say SQL database in my production server). The users access the data by a intermediary service layer (WCF Rest service). On another server (Info Server) I have a SQL Database (Info DB) which will hold all queries that can be requested. Since in some cases my data is huge, I give the option to user to schedule the data retrieval and look at the data later. The schedule information per user would also be stored in the Info DB. I also allow user to retrieve data real time in case he wants.
In both cases I want to Query data from Source (Production DB), store them in file format (May be CSV or excel) and then when user wants the data I would send the data over to the client.
Since the queries are stored in InfoDB. I let the admin define schedule run time for the every Query. This is to enable Admin to adjust long running queries run at night time when calls to server is low. In case the user demands a query to be run at real time, I would allow that.
As a solution architecture I have thought of this :
I will have a WCF rest service which would be installed on Info Server. This service will act as calling point for the Users. When user calls a query real time, the service will get the results, save to a file format and transfer it over. If the user schedules the query, the service will add an entry for the user/ for the query in the info database.
I will have a Windows Service on the Info Server. This Windows Service will periodically check the Info DB for Scheduled Queries entries and if the queries fall within the scheduled time , it will start running the query, get the data and save that to a file location and add the file location entry to the Schedule entry. This will enable me to track which schedules are finished and where the data is available (File path).
Now here are my issues with this:
My data can be huge, will a WCF rest service be good enough to transfer large files over the wire ? Can I transfer files over wire or I need to transfer data as JSON ? What is the best approach.
If I use a windows service, is this a good approach or is there a better alternative ? The reason I am asking is because as per my understanding Windows Service will have to run always , because I need to figure out the entries which are scheduled. This means at specific interval the Windows Service would check the info database and see if the schedule entry should be run or not. In ideal scenario the windows service will run through out the day and check the database periodically without much action because preferably all schedules would be at night time.
I have used an intermediary service approach because if I need to move to cloud tomorrow, I can easily move this solution. Am I right in my assumption ?
If I move to cloud tomorrow, would I be bale to encrypt the data transfer (may be data encryption or file encryption). I have no idea on data encryption/decryption.
Need your suggestion(s) to this.
My data can be huge, will a WCF rest service be good enough to
transfer large files over the wire ? Can I transfer files over wire or
I need to transfer data as JSON ? What is the best approach.
When you say huge, how huge? Are we talking gigabytes, megabytes, or kilobytes. I regularly have 100mb rest responses (you will probably have to tweak some things, to increase your MaxMessageLength, but this should be enough to get you going. I would take their advice and use a streaming API though, especially if you are talking several megs of content.
If I use a windows service, is this a good approach or is there a better alternative ? The reason I am asking is because as per my understanding Windows Service will have to run always , because I need to figure out the entries which are scheduled. This means at specific interval the Windows Service would check the info database and see if the schedule entry should be run or not. In ideal scenario the windows service will run through out the day and check the database periodically without much action because preferably all schedules would be at night time.
Beware writing your own scheduler. You might be better off dropping things onto a queue for processing, then just firing up workers at the appropriate time. That way you can just invoke the worker directly for your realtime call. Plus you can run it whenever the database is idle, not on a scheduled basis. It's tricky "knowing" when a service will be idle. Especially in a world of round-the-clock users.
I have used an intermediary service approach because if I need to move to cloud tomorrow, I can easily move this solution. Am I right in my assumption ?
Yes, wrapping an endpoint in a rest service (WCF) will make moving to the cloud much easier.
If I move to cloud tomorrow, would I be bale to encrypt the data transfer (may be data encryption or file encryption). I have no idea on data encryption/decryption.
HTTPS is your friend here. Read this. Don't invent your own here, or use proprietary encryption. HTTPS is old, straightforward and good.
Related
I have an application with one DB which is used by many users. Whenever one user makes changes, we save the changes to the database.
Now, I need to notify other logged-in users about this change. How can this be done?
I'm thinking - when the application succcessfully saves / updates the data in the database, the application will send a notification to the connected clients with the new record updated or added.
I'm using C# and SQL Server database.
Your immediate options are push-based notifications with something like a message bus, or polling loops on known ids.
Message busses operate on publish-subscribe models which work well for Windows applications. Have a look at MassTransit or MSMQ as a starting point. There are plenty of options out there. They can be combined into web apps using something that essentially wraps a polling loop with the client like SignalR.
Polling-based options work typically on a timer and do quick timestamp or version # checks against the database, reloading a record if a difference is found.
Push-based are more efficient but only notify of changes between supported applications. (Client to Client) they don't detect changes by applications that don't participate as publishers, nor do they see changes made directly to the database.
Polling-based options cover all changes, but are generally slower and require a schema that has version information to work effectively.
I need to tell the Database Handler that it should run a script which generates 300-500MB of data, then notify my C# application that the script has completed.
After that I need to get the newly created data to my C# application, manipulate it then send it to an FTP-server.
One way to do this would be to call the SQL-server via a web service, let the Database Handler run the script, then return the data. The C# application would then manipulate the data and finally send it by FTP. However because of the size of the data and the run-time of the script being around 1 hour this method is far from optimal.
Any suggestions on a better solution?
EDIT: I forgot to mention some important parts. SSIS was something I thought of as well. The problem is that the database is on a different server which doesn't have ports open for sending via FTP.
I would implement this as a service. Use the web service to allow users to request execution of the script (queue a request), but actually execute the script from the service (read the queue and perform the action). You have far greater control of the entire data gathering and delivery process from within the service.
My own strategy would be to develop a standalone application (installed on the server) to perform the data retrieval, manipulation and FTP transfer. This application could be executed by code in your ASP.NET application as a result of some user interaction perhaps, and could update the database as to its progress so that your user could have some feedback via the ASP.NET web interface (ajax polling or loading a status page).
Have you considered using SSIS? One of the things it is really useful for is controlling data-centric multi-step workflows.
I'd like to know my options for the following scenario:
I have a C# winforms application (developed in VS 2010) distributed to a number of offices within the country. The application communicates with a C# web service which lies on a main server at a separate location and there is one database (SQL Server 2012) at a further location. (All servers run Windows Server 2008)
Head Office (where we are) utilize the same front-end to manage certain information on the database which needs to be readily available to all offices - real-time. At the same time, any data they change needs to be readily available to us at Head Office as we have a real-time dashboard web application that monitors site-wide statistics.
Currently, the users are complaining about the speed at which the application operates. They say it is really slow. We work in a business-critical environment where every minute waiting may mean losing a client.
I have researched the following options, but do not come from a DB background, so not too sure what the best route for my scenario is.
Terminal Services/Sessions (which I've just implemented at Head Office and they say it's a great improvement, although there's a terrible lag - like remoting onto someones desktop, which is not nice to work on.)
Transactional Replication (Sounds like something quite plausible for my scenario, but would require all offices to have their own SQL server database on their individual servers and they have a tendency to "fiddle" and break everything they're left in charge of!) Wish we could take over all their servers, but they are franchises so have their own IT people on site.)
I've currently got a whole lot of the look-up data being cached on start-up of the application but this too takes 2-3 minutes to complete which is just not acceptable!
Does anyone have any ideas?
With everything running through the web service, there is no need for additional SQL Servers to be deployed local to the client. The WS wouldn't be able to communicate with these databases, unless the WS was also deployed locally as well.
Before suggesting any specific improvements, you need to benchmark where your bottlenecks are occurring. What is the latency between the various clients and the web service, and then from the web service and the database? Does the database show any waiting? Once you know the worst case scenario, improve that, and then work your way down.
Some general thoughts, though:
Move the WS closer to the database
Cache the data at the web service level to save on DB calls
Find the expense WS calls, and try to optimize the throughput
If the lookup data doesn't change all that often, use a local copy of SQL CE to cache that data, and use the MS Sync Framework to keep the data synchronized to the SQL Server
Use SQL CE for everything on the client computer, and use a background process to sync between the client and WS
UPDATE
After your comment, two additional thoughts. If your web service payload(s) is/are large, you can try adding compression on the web service (if it hasn't already been implemented).
You can also update your client to do the WS calls asynchronously, either in a thread or if you are using .NET 4.5 using async/await. This would at least allow the client to use the UI, but wouldn't necessary fix any issues with data load times.
I have two situations in this case:
I want to query a WCF service and hold the data somewhere, because one of the web pages renders based on the data that's retrieved from the service. I don't want the page itself querying the service, but I'd rather have some sort of scheduled worker that runs once every a couple of minutes, and retrieves the data and holds it somewhere.
Where should I cache the service response, and what is the correct way to create the task to query the service every couple minutes?
I think I could achieve this by saving the response to a static variable, alongside the last query date, and then check on the page load if enough time has passed, I call the service and refresh the data, else I use the static cache.
This would also account for the case where no users access the page for a long time, and the site not futilely querying the service.
But it seems kind of rough, are there other, better ways to accomplish this kind of task?
You could indeed take another approach like having a scheduled program query the information and put it in an in-memory cache available to all the web servers in your farm. However, whether that would be better for your scenario depends on the size of your app and how much time/effort you want to spend on it.
An in-memory cache is harder to implement/support than a static variable but it's sometimes better since static variables can be cleared up every time the server resets (e.g. after X number of minutes of inactivity)
Depending on the size of your system I would start with the static variable, test drive the approach for a while and then decide if you need something more sophisticated.
Have you taken a look at Velocity
Nico: Why don't you write a simple console daemon that gets the data and stores it on your end in a database and then have your web app get the data from your local copy? You can make that console app run every certain amount of time. Inserting the data should not be a problem if you are using sql server 2008. You can pass datatable parameters to a stored proc and insert a whole table in one call. If you don't use Sql Server 2008, then serialize the whole collection returned by the web service and store in a table in one big blob column and record the timestamp when you got the data. You then can read the content of that column, deserealize your collection and reconstruct it to native objects for displaying on your page.
I've never seen (and I don't think its possible) to have your web app query the web service every certain amount of time. Imagine the web site is idle for hours therefore no interaction from anybody. That means that no events will fire and nothing will be queried.
Alternatively, you could create a dummy page executing a javascript function at certain intervals and have that javascript function make an ajax request to the server to get the data from the web service and cache the data. The problem is that the minute you walk out of that page, nothing will happen and you'll stop querying the web service. I think this is silly.
Scenario: A WCF service receives an XDocument from clients, processes it and inserts a row in an MS SQL Table.
Multiple clients could be calling the WCF service simultaneously. The call usually doesn't take long (a few secs).
Now I need something to poll the SQL Table and run another set of processes in an asynchronous way.
The 2nd process doesn't have to callback anything nor is related to the WCF in any way. It just needs to read the table and perform a series of methods and maybe a Web Service call (if there are records of course), but that's all.
The WCF service clients consuming the above mentioned service have no idea of this and don't care about it.
I've read about this question in StackOverflow and I also know that a Windows Service would be ideal, but this WCF Service will be hosted on a Shared Hosting (discountasp or similar) and therefore, installing a Windows Service will not be an option (as far as I know).
Given that the architecture is fixed (I.E.: I cannot change the table, it comes from a legacy format, nor change the mechanism of the WCF Service), what would be your suggestion to poll/process this table?
I'd say I need it to check every 10 minutes or so. It doesn't need to be instant.
Thanks.
Cheat. Expose this process as another WCF service and fire a go command from a box under your control at a scheduled time.
Whilst you can fire up background threads in WCF, or use cache expiry as a poor man's scheduler those will stop when your app pool recycles until the next hit on your web site and the app pool spins up again. At least firing the request from a machine you control means you know the app pool will come back up every 10 minutes or so because you've sent a request in its direction.
A web application is not suited at all to be running something at a fixed interval. If there are no requests coming in, there is no code running in the application, and if the application is inactive for a while the IIS can decide to shut it down completely until the next request comes in.
For some applications it isn't at all important that something is run at a specific interval, only that it has been run recently. If that is the case for your application then you could just keep track of when the table was last polled, and for every request check if enough time has passed for the table to be polled again.
If you have access to administer the database, there is a scheduler in SQL Server. It can run queries, stored procedures, and even start processes if you have permission (which is very unlikely on a shared hosting, though).
If you need the code on a specific interval, and you can't access the server to schedule it or run it as a service, or can't use the SQL Server scheduler, it's simply not doable.
Make you application pool "always active" and do whatever you want with your threads.