We have a fairly busy distributed cloud based system that we want to introduce basic profiling into. Firstly we'd like to monitor web page render times and DB calls - we use EF and SQLServer.
The question is what is the best (performant & easy) way to record this information? My first thought is to store it in a DB, but would this cause a performance issue when a single page render may require multiple DB calls and hence, multiple inserts into the performance table.
Would it be better to store this information in memory only, or perhaps store in memory short term then batch persist to a DB later? Or is some other approach recommended?
If you simply send INSERT statements to the database without block-waiting to receive values of identity columns, it should be fairly lightweight for the web server.
If the database table does not have any keys, (or only a clustered key,) it should be fairly lightweight on the server, too.
This would certainly be the easiest approach, and would not take much to implement it and give it a try, so I would recommend that you check whether it covers your needs before trying anything else.
Related
I have a Datatable which is fetching ~5,50,000 records from database(SQLite). When fetching it is slowing down the system.
I am storing these records on backend in SQLite Database and in frontend in Datatable.
Now what should i do so that database creation time(~10.5 hours) in backend and fetching time in Front end reduces.
Is there any other structure that can be used to do so. I have read that Dictionary & Binary File is fast. Can they be used for this purpose & how?
(Its not a web app. its a WPF desktop app where frontend and backend are on same machine).
Your basic problem, I believe, is not the strucutre that you want to maintain but the way you manage your data-flow. Instead of fetching all data from database into some strcuture (DataTable in your case), make some stored procedure to make a calculations you need on server-side and return to you data already calculated. In this way you gain several benefits, like:
servers holding database servers are usually faster then development or client machine
huge reduction of data trasmission, as you return only result of the calculation
EDIT
Considering edited post, I would say that DataTable is already highly optimized on in memory access, and I don't think changing it to something else will bring you a notable benefits. What I think can bring a banefit is a revision of a program flow. In other words, try to answer following questions:
do I need all that records contemporary ?
can I run some calculations in service, let's say, on night ?
can I use SQL Server Express (just example) and gain benefits of possibility to run a stored procedure, that may (have to be measured) run the stuff faster, even on the same machine ?
Have you thought about doing the calcuation at the location of the data, rather than fetching it? Can you give us some more information as what it's stored in and how you are fetching it and how you are processing it please.
It's hard to make a determination for a optimsation without the metrics and the background information.
It's easy to say - yes put the data in a file. But is the file local, is the network the problem, are you make best use of cores/threading etc.
Much better to start back from the data and see what needs to be done to it and then engineer the best optimisation.
Edit:
Ok so you are on the same machine? One thing you should really consider in this scenario is what are you doing to the data. Does it need to be SQL? If you are just using is to load a datatable? Or is a complexity you are not disclosing?
I've had a similar task- I just created a large text file and used memory mapping to read is efficicently without any overhead. Is this the kind of thing you're talking about?
You could try using a persisted dictionary like RaptorDB for storing and fetching/manipulating the data in an ArrayList. The proof of concept should not take long to build.
I want to understand the purpose of datasets when we can directly communicate with the database using simple SQL statements.
Also, which way is better? Updating the data in dataset and then transfering them to the database at once or updating the database directly?
I want to understand the purpose of datasets when we can directly communicate with the database using simple SQL statements.
Why do you have food in your fridge, when you can just go directly to the grocery store every time you want to eat something? Because going to the grocery store every time you want a snack is extremely inconvenient.
The purpose of DataSets is to avoid directly communicating with the database using simple SQL statements. The purpose of a DataSet is to act as a cheap local copy of the data you care about so that you do not have to keep on making expensive high-latency calls to the database. They let you drive to the data store once, pick up everything you're going to need for the next week, and stuff it in the fridge in the kitchen so that its there when you need it.
Also, which way is better? Updating the data in dataset and then transfering them to the database at once or updating the database directly?
You order a dozen different products from a web site. Which way is better: delivering the items one at a time as soon as they become available from their manufacturers, or waiting until they are all available and shipping them all at once? The first way, you get each item as soon as possible; the second way has lower delivery costs. Which way is better? How the heck should we know? That's up to you to decide!
The data update strategy that is better is the one that does the thing in a way that better meets your customer's wants and needs. You haven't told us what your customer's metric for "better" is, so the question cannot be answered. What does your customer want -- the latest stuff as soon as it is available, or a low delivery fee?
Datasets support disconnected architecture. You can add local data, delete from it and then using SqlAdapter you can commit everything to the database. You can even load xml file directly into dataset. It really depends upon what your requirements are. You can even set in memory relations between tables in DataSet.
And btw, using direct sql queries embedded in your application is a really really bad and poor way of designing application. Your application will be prone to "Sql Injection". Secondly if you write queries like that embedded in application, Sql Server has to do it's execution plan everytime whereas Stored Procedures are compiled and it's execution is already decided when it is compiled. Also Sql server can change it's plan as the data gets large. You will get performance improvement by this. Atleast use stored procedures and validate junk input in that. They are inherently resistant to Sql Injection.
Stored Procedures and Dataset are the way to go.
See this diagram:
Edit: If you are into .Net framework 3.5, 4.0 you can use number of ORMs like Entity Framework, NHibernate, Subsonic. ORMs represent your business model more realistically. You can always use stored procedures with ORMs if some of the features are not supported into ORMs.
For Eg: If you are writing a recursive CTE (Common Table Expression) Stored procedures are very helpful. You will run into too much problems if you use Entity Framework for that.
This page explains in detail in which cases you should use a Dataset and in which cases you use direct access to the databases
I usually like to practice that, if I need to perform a bunch of analytical proccesses on a large set of data I will fill a dataset (or a datatable depending on the structure). That way it is a disconnected model from the database.
But for DML queries I prefer the quick hits directly to the database (preferably through stored procs). I have found this is the most efficient, and with well tuned queries it is not bad at all on the db.
Currently, my entire website does updating from SQL parameterized queries. It works, we've had no problems with it, but it can occasionally be very slow.
I was wondering if it makes sense to refactor some of these SQL commands into classes so that we would not have to hit the database so often. I understand hitting the database is generally the slowest part of any web application For example, say we have a class structure like this:
Project (comprised of) Tasks (comprised of) Assignments
Where Project, Task, and Assignment are classes.
At certain points in the site you are only working on one project at a time, and so creating a Project class and passing it among pages (using Session, Profile, something else) might make sense. I imagine this class would have a Save() method to save value changes.
Does it make sense to invest the time into doing this? Under what conditions might it be worth it?
If your site is slow, you need to figure out what the bottleneck is before you randomly start optimizing things.
Caching is certainly a good idea, but you shouldn't assume that this will solve the problem.
Caching is almost always underutilized in ASP .NET applications. Any time you hit your database, you should look for ways to cache the results.
Serializing objects into the session can be costly in itself, but most likley faster than just hitting the database every single time. You are benefiting now from Execution Plan caching in SQL Server so it's very likely that what you're getting is optimal performance out of your stored procedure.
One option you might consider doing to increase performance is to astract your data into objects via LINQ to SQL (against your sprocs) and then use AppFabric to cache the objects.
http://msdn.microsoft.com/en-us/windowsserver/ee695849.aspx
As for your updates, you should do that directly against the sprocs, but you will also need to clear our the Cache in AppFabric for objects that are affected by the Insert/Update/Delete.
You could also do the same thing simply using the standard Cache as well, but AppFabric has some added benefits.
Use the SQL Profiler to identify your slowest queries, and see if you can improve them with some simple index changes (removing unused indexes, adding missing indexes).
You could very easily improve your application performance by an order of magnitude without changing your front-end app at all.
See http://sqlserverpedia.com/wiki/Find_Missing_Indexes
If you have look up data only, you can store it in Cache object. This will avoid the hits to DB. Only data that can be used globally should be stored in Cache.
If this data requires filtering, you can restore it from Cache, and filter the data before rendering.
Session can be used to store user specific data. But care must be taken that too much of session variables can easily cause performance problems.
This book may be helpful.
http://www.amazon.com/Ultra-Fast-ASP-NET-Build-Ultra-Scalable-Server/dp/1430223839
Hi all I wanted to know when I should prefer writing stored procedures over writing programming logic and pulling data using a ORM or something else.
Stored procedures are executed on server side.
This means that processing large amounts of data does not require passing these data over the network connection.
Also, with stored procedures, you can build consistent complicated business logic.
Say, you need to update the account balance each time you insert a transaction, and you need to insert many transactions at once.
Instead of doing this with triggers (which are implemented using inefficient record-by-record approach in many systems), you can pass a table variable or temporary table with the inputs and issue a set-based SQL statement inside the procedure. This will be much more efficient.
I prefer SPs over programming logic mainly for two reasons
Performance, anything what will reduce result set or can be more effectively done on the server, e.g.:
paging
filtering
ordering (on indexed columns)
Security -- if someone have got application's access to the database and wants to wipe out your all your records, having to execute Row_Delete for single each of them instead of DELETE FROM Rows already sounds good.
Never unless you identify a performance issue. (largely opinion)
(a Jeff blog post!)
http://www.codinghorror.com/blog/2004/10/who-needs-stored-procedures-anyways.html
If you see stored procs as optimizations:
http://en.wikipedia.org/wiki/Program_optimization#When_to_optimize
When appropriate.
complex data validation/checking logic
avoid several round trips to do one action in the DB
several clients
anything that should be set based
You can't say "never" or "always".
There is also the case where the database engine will outlive your client code. I bet there's more DAL or ORM upgrades/refactoring that DB engine upgrades/refactoring going on.
Finally, why can't I encapsulate code in a stored proc? Isn't that a good thing?
As ever, much of your decision as to which to use will depend on your application and its environment.
There are a couple of schools of thought here, and this debate always arouses strong sentiments on both sides.
The advantanges of Stored Procedures (as well as the large data moving that Quassnoi has mentioned) are that the logic is tied down in the database, and therefore potentially more secure. It is also only ever in one place.
However, there will be others who believe that the place for application logic should be in the application, especially if you are planning to access other types of datebases (for which you will have to write often different SPs).
Another consideration may be the skills of the resources you have to implement your application.
The point at which stored procedures become preferable to an ORM is that point at which you have multiple applications talking to the same database. At this point, you want your query logic embedded in one place, rather than once per application. And even here, you might want to prefer a service layer (which can scale horizontally) instead of the database (which only scales vertically).
I have developed an network application that is in use in my company for last few years.
At start it was managing information about users, rights etc.
Over the time it grew with other functionality. It grew to the point that I have tables with, let's say 10-20 columns and even 20,000 - 40,000 records.
I keep hearing that Access in not good for multi-user environments.
Second thing is the fact that when I try to read some records from the table over the network, the whole table has to be pulled to the client.
It happens because there is no database engine on the server side and data filtering is done on the client side.
I would migrate this project to the SQL Server but unfortunately it cannot be done in this case.
I was wondering if there is more reliable solution for me than using Access Database and still stay with a single-file database system.
We have quite huge system using dBase IV.
As far as I know it is fully multiuser database system.
Maybe it will be good to use it instead of Access?
What makes me not sure is the fact that dBase IV is much older than Access 2000.
I am not sure if it would be a good solution.
Maybe there are some other options?
If you're having problems with your Jet/ACE back end with the number of records you mentioned, it sounds like you have schema design problems or an inefficiently-structured application.
As I said in my comment to your original question, Jet does not retrieve full tables. This is a myth propagated by people who don't have a clue what they are talking about. If you have appropriate indexes, only the index pages will be requested from the file server (and then, only those pages needed to satisfy your criteria), and then the only data pages retrieved will be those that have the records that match the criteria in your request.
So, you should look at your indexing if you're seeing full table scans.
You don't mention your user population. If it's over 25 or so, you probably would benefit from upsizing your back end, especially if you're already comfortable with SQL Server.
But the problem you described for such tiny tables indicates a design error somewhere, either in your schema or in your application.
FWIW, I've had Access apps with Jet back ends with 100s of thousands of records in multiple tables, used by a dozen simultaneous users adding and updating records, and response time retrieving individual records and small data sets was nearly instantaneous (except for a few complex operations like checking newly entered records for duplication against existing data -- that's slower because it uses lots of LIKE comparisons and evaluation of expressions for comparison). What you're experiencing, while not an Access front end, is not commensurate with my long experience with Jet databases of all sizes.
You may wish to read this informative thread about Access: Is MS Access (JET) suitable for multiuser access?
For the record this answer is copied/edited from another question I answered.
Aristo,
You CAN use Access as your centralized data store.
It is simply NOT TRUE that access will choke in multi-user scenarios--at least up to 15-20 users.
It IS true that you need a good backup strategy with the Access data file. But last I checked you need a good backup strategy with SQL Server, too. (With the very important caveat that SQL Server can do "hot" backups but not Access.)
So...you CAN use access as your data store. Then if you can get beyond the company politics controlling your network, perhaps then you could begin moving toward upfitting your current application to use SQL Server.
I recently answered another question on how to split your database into two files. Here is the link.
Creating the Front End MDE
Splitting your database file into front end : back end is sort of a key to making it more performant. (Assume, as David Fenton mentioned, that you have a reasonably good design.)
If I may mention one last thing...it is ridiculous that your company won't give you other deployment options. Surely there is someone there with some power who you can get to "imagine life without your application." I am just wondering if you have more power than you might realize.
Seth
The problems you experience with an Access Database shared amongst your users will be the same with any file based database.
A read will pull a lot of data into memory and writes are guarded with some type of file lock. Under your environment it sounds like you are going to have to make the best of what you have.
"Second thing is the fact that when I try to read some records from the table over the network, the whole table has to be pulled to the client. "
Actually no. This is a common misstatement spread by folks who do not understand the nature of how Jet, the database engine inside Access, works. Pulling down all the records, or excessive number of records, happens because you don't have all the fields used in the selection criteria or sorting in the index. We've also found that indexing yes/no aka boolean fields can also make a huge difference in some queries.
What really happens is that Jet brings down the index pages and data pages which are required. While this is a lot more data than a database engine would create this is not the entire table.
I also have clients with 600K and 800K records in various tables and performance is just fine.
We have an Access database application that is used pretty heavily. I have had 23 users on all at the same time before without any issues. As long as they don't access the same record then I don't have any problems.
I do have a couple of forms that are used and updated by several different departments. For instance I have a Quoting form that contains 13 different tabs and 10-20 fields on each tab. Users are typically in a single record for minutes editing and looking for information. To avoid any write conflicts I call the below function any time a field is changed. As long as it is not a new record being entered, then it updates.
Function funSaveTheRecord()
If ([chkNewRecord].value = False And Me.Dirty) Then
'To save the record, turn off the form's Dirty property
Me.Dirty = False
End If
End Function
They way I have everything setup is as follows:
PDC.mdb <-- Front End, Stored on the users machine. Every user has their own copy. Links to tables found in PDC_be.mdb. Contains all forms, reports, queries, macros, and modules. I created a form that I can use to toggle on/off the shift key bipass. Only I have access to it.
PDC_be.mdb <-- Back End, stored on the server. Contains all data. Only form and VBA it contains is to toggle on/off the shift key bipass. Only I have access to it.
Secured.mdw <-- Security file, stored on the server.
Then I put a shortcut on the users desktop that ties the security file to the front end and also provides their login credentials.
This database has been running without error or corruption for over 6 years.
Access is not a flat file database system! It's a relational database system.
You can't use SQL Server Express?
Otherwise, MySQL is a good database.
But if you can't install ANYTHING (you should get into those politics sooner rather than later -- or it WILL be later), just use you existing database system.
Basically with Access, it cannot handle more than 5 people connected at the same time, or it will corrupt on you.