I'm trying to do quite a lot of querying against a Microsoft SQL Server that I only have read only access to. My queries are needing to work with the data in very different structure from how the DB architecture is. Since I only have read access, I can't create views, so I'm looking for a solution.
What I'm currently doing is using complex queries to return the results as I need them, but this is 4-5 table joins with subqueries. It is rediculously slow and resource intensive.
I can see two solutions, but would love to hear about anything I might have missed:
Use some sort of "proxy" that caches the data, and creates views around it. This would need some sort of method to determine the dirtiness of the data. (is there something like this?)
run my own SQL server, and mirror the data from the source SQL server every X minutes, and then load views on my SQL server.
Any other ideas? or recommendations on these ideas?
Thanks!
Here are some options for you:
Replication
Set up replication to move the data to your own SQL Server and create any views you need over there. An administrator has to set this up. If you need to see the data as it changes, use Transactional Replication. If not, you can do snapshots.
Read more here: http://technet.microsoft.com/en-us/library/ms151198.aspx
New DB on same instance
Get a new database MyDB on the same server as your ProductionDB with WRITE access for you. Create your views there.
Your view creation can look like this:
USE MyDB
GO
CREATE VIEW DBO.MyView
AS
SELECT Column1, Column2, Column3, Column4
FROM ProductionDB.dbo.TableName1 t1
INNER JOIN ProductionDB.dbo.TableName2 t2
ON t1.ColX = T2.ColX
GO
Same Instance, not same Server + Difference instance: I would suggest to create the MyDB on the same instance of SQL Server as ProductionDB rather than install a new instance. Multiple instances of SQL Server on a single machine is much more expensive in terms of resources than a new DB on the same instance.
Standard Reusable Views
Create a set of standardized views and ask the administrators to put them on the read only server for you and reuse those views in queries
you can also use a CTE which can act like a view.
I will go for that if Raj More's #2 suggestion does not work for you...
WITH myusers (userid, username, password)
AS
(
-- this is where the definition of view would go.
select userid, username, password from Users
)
select * from myusers
If you can create a new database on that server you can create the views in the new database. The views can access the data using a three part name. E.g. select * from OtherDB.dbo.Table.
If you have access to another SQL server, the DBA can created a "Linked Server". You can then create views that access the data using a four part name. E.g. select * from OtherServer.OtherDB.dbo.Table
In either case, the data is always "live", so no need to worry about dirty data.
The views will bring you cleaner code and a single location to make changes, and few milliseconds of performance benefit from cached execution plans. However, there shouldn't be in great performance leaps. You mention caching, but as far as I know, the server does not do any particular data caching for ordinary, non-indexed views that it wouldn't do for ad-hoc queries.
If you haven't already done so, you may wish to do experiments to see if the views are actually faster--make a copy of the database and add the views there.
Edit: I did a similar experiment today. I had a stored proc on Server1 that was getting data from Server2 via a Linked Server. It was a complex query, joining many tables on both servers. I created a view on Server2 that got all of the data that I needed from that server, and updated the proc (on Server1) so that it used that view (via a Linked Server) and then joined the view to a bunch of tables that were on Server1. It was noticeably faster after the update. The reason seems to be that Server1 was miss-estimating the number of rows that it would get from Server2, and thus building a bad plan. It did better estimating when using a view. It didn't matter if the view was in the same database as the data it was reading, it just had to be on the same server (I only have on instance, so I don't know how instances would have come into play).
This particular scenario would only come into play if you were already using Linked Servers to get the data, so it may not be relevant to the original question, but I thought it was interesting since we're discussing the performance of views.
You could ask DBA to create a schema for people like you "Contractors" and allow you to create objects inside that schema only.
I would look at the query plan in Management studio and see if you can tell why its not performing well. Maybe you need to rewrite your query. You might also make use of table level variables as temporary tables to store intermediate results if that helps. Just make sure you're not storing a lot of records in them. You can run multiple statements in a batch like this:
DECLARE #tempTable TABLE
(
col1 int,
col2 varchar(250)
)
INSERT INTO #tempTable (col1, col2)
SELECT a, b
FROM SomeTable
WHERE a < 100 ... /* some complex query */
SELECT *
FROM OtherTable o
INNER JOIN #tempTable T
ON o.col1 = T.col1
WHERE ...
By using views, your queries would not perform better. You need to tune those queries, and probably some indexes should be made on those tables, to support your queries.
If you cannot get access to the database, in order to create those indexes, you can "cache" the data in a new database you create, and tune your queries in this new one. And of course, you will have to implement some synchronization, to keep the cached data up to date.
This way you won't see the changes made to the original database immediately (there will be a latency), but you can get your queries perform a lot faster, and you can even create those views, if you wish.
Related
I don't know whether it is better to use temporary tables in SQL Server or use the DataTable in C# for a report. Here is the scope of the report: it will be copied into a workbook with about 10 worksheets - each worksheet containing about 1000 rows and about 30 columns so it's a lot of data. There is some guidance out there but I could not find anything specific regarding the amount of data that is too much for a DataTable. According to https://msdn.microsoft.com/en-us/library/system.data.datatable.aspx, 16M rows but my data set seems unwieldy considering the number of columns I have. Plus, I will either have to make multiple SQL queries to collect the data in my report or try to write a stored procedure in SQL to collect that data. How do I figure out this quandary?
My rule of thumb is that if it can be processed on the database server, it probably should. Keep in mind, no matter how efficient your C# code is, SQL Server will mostly likely to it faster and more efficiently, after all it was designed for data manipulation.
There is no shame in using #temp tables. They maintain stats, can be indexed, and/or manipulated. One recent example, a developer create an admittedly elegant query using cte, the performance was 12-14 seconds vs mine at 1 second using #temps.
Now, one carefully structured stored procedure could produce and return the 10 data-sets for your worksheets. If you are using a product like SpreadSheetLight (there are many options available), it becomes a small matter of passing the results and creating the tabs (no cell level looping... unless you want or need to).
I would also like to add, you can dramatically reduce the number of touch points and better enforce the business logic by making SQL Server do the heavy lifting. For example, a client introduced a 6W risk rating, which was essentially a 6.5. HUNDREDS of legacy reports had to be updated, while I only had to add the 6W into my mapping table.
There's a lot of missing context here - how is this report going to be accessed and run? Is this going to run as a scripted event every day?
Have you considered SSRS?
In my opinion it's best to abstract away your business logic by creating Views or Stored Procedures in the database. Stored Procedures would probably be the way to go but it really depends on your specific environment. Then you can point whatever tools you want to use at the database object. This has several advantages:
if you end up having different versions or different formats of the report, and your logic ever changes, you can update the logic in one place rather than many.
your code is simpler and cleaner, typically:
select v.col1, v.col2, v.col3
from MY_VIEW v
where v.date between #startdate and #enddate
I assume your 10 spreadsheets are going to be something like
Summary Page | Department 1 | Department 2 | ...
So you could make a generalized View or SP, create a master spreadsheet linked to the db object that pulls all the relevant data from SQL, and use Pivot Tables or filters or whatever else you want, and use that to generate your copies that get sent out.
But before going to all that trouble, I would make sure that SSRS is not an option, because if you can use that, it has a lot of baked in functionality that would make your life easier (export to Excel, automatic date parameters, scheduled execution, email subscriptions, etc).
I have one database server, acting as the main SQL Server, containing a Table to hold all data. Other database servers come in and out (different instances of SQL Server). When they come online, they need to download data from main Table (for a given time period), they then generate their own additional data to the same local SQL Server database table, and then want to update the main server with only new data, using a C# program, through a scheduled service, every so often. Multiple additional servers could be generating data at the same time, although it's not going to be that many.
Main table will always be online. The additional non-main database table is not always online, and should not be an identical copy of main, first it will contain a subset of the main data, then it generates its own additional data to the local table and updates main table every so often with its updates. There could be a decent amount of number of rows generated and/or downloaded. so an efficient algorithm is needed to copy from the extra database to the main table.
What is the most efficient way to transfer this in C#? SqlBulkCopy doesn't look like it will work because I can't have duplicate entries in main server, and it would fail if checking constraints since some entries already exist.
You could do it in DB or in C#. In all cases you must do something like Using FULL JOINs to Compare Datasets. You know that already.
Most important thing is to do it in transaction. If you have 100k rows split it to 1000 rows per transaction. Or try to determine what combination of rows per transaction is best for you.
Use Dapper. It's really fast.
If you have all your data in C#, use TVP to pass it to DB stored procedure. In stored procedure use MERGE to UPDATE/DELETE/INSERT data.
And last. In C# use Dictionary<Tkey, TValue> or something different with O(1) access time.
SQLBulkCopy is the fastest way for inserting data into a table from a C# program. I have used it to copy data between databases and so far nothing beats it speed wise. Here is a nice generic example: Generic bulk copy.
I would use a IsProcessed flag in the table of the main server and keep track of the main table's primary keys when you download data to the local db server. Then you should be able to do a delete and update to the main server again.
Here's how i would do it:
Create a stored procedure on the main table database which receives a user defined table variable with the same structure as the main table.
it should do something like -
INSERT INTO yourtable (SELECT * FROM tablevar)
OR you could use the MERGE statement for the Insert-or-Update functionality.
In code, (a windows service) load all (or a part of) the data from the secondery table and send it to the stored procedure as a table variable.
You could do it in bulks of 1000's and each time a bulk is updated you should mark it in the source table / source updater code.
Can you use linked servers for this? If yes it will make copying of data from and to main server much easier.
When copying data back to the main server I’d use IF EXISTS before each INSERT statement to additionally make sure there are no duplicates and encapsulate all insert statements into transaction so that if an error occurs transaction is rolled back.
I also agree with others on doing this in batches on 1000 or so records so that if something goes wrong you can limit the damage.
I could be re-inventing the wheel - but..
I need to allow a user to be able to build 'customer reports' from our database - which will be from a GUI.
They can't have access to SQL just a list of Tables (Data groups) and columns within those groups.
They also have the ability to create Where clauses (criteria).
I've looked around Google - but nothing cropped up.
Any ideas?
well my recommendation, Developer express have some amazing end user criteria builder, you can use theirs.
there are other controls to create end users criteria , like
http://devtools.korzh.com/query-builder-net/
I hope that help you
both controls above abstract the data acess layer so your end users wont have access to send a direct query to the database. The controls only build the criteria and its your work to send the query to the database.
As a precurser to my answer, there are a number of expensive products out there like Izenda (www.izenda.com) that will do this very elegantly...
If you want to roll your own (and to speak to your question) you can fake this pretty quickly (and yes, this does not scale well to more than about 4 joins) like this:
Create a join statement that encompasses all of your tables that you want to use.
Create a dictionary of all available fields you want to expose to the user such as this: Dictionary = Dictionary<[Pretty displayable name], [fully qualified Sql field name]>
Let the user build a select list of fields they want to see and conditions they want add from the dictionary above and use the dictionary value to concat the sql string together that is necessary to return their results.
(I'm skipping quite a bit of validation work about making sure that the user doesn't try to mis-type the condition and such, but essentially point of this is that for a small collection of tables you can create a static "from" statement and then safely tack on the concat'ed "select" and "where" that the user builds)
Note that I've worked on some systems that actually stored the relationships of the table and compiled the most efficient "from" statement possible... that is not a huge stretch from this answer, it's just a bit more work.
I strongly recommend going with an existing product like Crystal Reports, Sql Server Report Builder, or Infomaker. It's just so easy to get something that seems to work, but leaves you open for an sql injection attack.
If you do go ahead, I recommend using a separate sql connection for these reports. This connection should have a user account that only has read privileges anywhere in the database.
Thanks for the answers! We ended up doing this ourselves through a collection of views!
For instances:
Sales View
Customer View
The views already take care of most of the joining between tables and return as much joined data as they can.
The user then selects what columns they could like to see from each view and we do the join between the views at code level.
The resulting statement is very small as the views take most of the work out of it.
People suggest creating database table dynamically (or, in run-time) should be avoided, with the saying that it is bad practice and will be hard to maintain.
I don't see the reason why, and I don't see difference between creating table and any another SQL query/statement such as SELECT or INSERT. I wrote apps that create, delete and modify database and tables in run time, and so far I do not see any performance issues.
Can anyone explane the cons of creating database and tables in run-time?
Tables are much more complex entities than rows and managing table creation is much more complex than an insert which has to abide by an existing model, the table. True, a table create statement is a standard SQL operation but depending on creating them dynamically smacks of a bad design decisions.
Now, if you just create one or two and that's it, or an entire database dynamically, or from a script once, that might be ok. But if you depend on having to create more and more tables to handle your data you will also need to join more and more and query more and more. One very serious issue I encountered with an app that made use of dynamic table creation is that a single SQL Server query can only involve 255 tables. It's a built-in constraint. (And that's SQL Server, not CE.) It only took a few weeks in production for this limit to be reached resulting in a nonfunctioning application.
And if you get into editing the tables, e.g. adding/dropping columns, then your maintenance headache gets even worse. There's also the matter of binding your db data to your app's logic. Another issue is upgrading production databases. This would really be a challenge if a db had been growing with objects dynamically and you suddenly needed to update the model.
When you need to store data in such a dynamic manner the standard practice is to make use of EAV models. You have fixed tables and your data is added dynamically as rows so your schema does not have to change. There are drawbacks of course but it's generally thought of as better practice.
KMC ,
Remember the following points
What if you want to add or remove a column , you many need to change in the code and compile it agian
what if the database location changes
Developers who are not very good at database can make changes , if you create the schema at the backend , DBA's can take care of it.
If you get any performance issues , it may get tough to debug.
You will need to be a little clearer about what you mean by "creating tables".
One reason to not allow the application to control table creation and deletion is that this is a task that should be handled only by an administrator. You don't want normal users to have the ability to delete whole tables.
Temporary tables ar a different story, and you may need to create temporary tables as part of your queries, but your basic database structure should be managed only by someone with the rights to do so.
sometimes, creating tables dynamically is not the best option security-wise (Google SQL injection), and it would be better using stored procedures and have your insert or update operations occur at the database level by executing the stored procedures in code.
I have two databases, one is an MS Access file, the other is a SQL Server database. I need to create a SELECT command that filters data from the SQL Server database based on the data in the Access database. What is the best way to accomplish this with ADO.NET?
Can I pull the required data from each database into two new tables. Put these in a single Dataset. Then perform another SELECT command on the Dataset to combine the data?
Additional Information:
The Access database is not permanent. The Access file to use is set at runtime by the user.
Here's a bit of background information to explain why there are two databases. My company uses a CAD program to design buildings. The program stores materials used in the CAD model in an Access database. There is one file for each model. I am writing a program that will generate costing information for each model. This is based on current material prices stored in a SQL Server database.
My Solution
I ended up just importing the data in the access db into a temporary table in the SQL server db. Performing all the necessary processing then removing the temporary table. It wasn't a pretty solution but it worked.
You don't want to pull both datasets across if you don't have to do that. You are also going to have trouble implementing Tomalak's solution since the file location may change and might not even be readily available to the server itself.
My guess is that your users set up an Access database with the people/products or whatever that they are interested in working with and that's why you need to select across the two databases. If that's the case, the Access table is probably smaller than the SQL Server table(s). Your best bet is to pull in the Access data, then use that to generate a filtered query to SQL Server so that you can minimize the data that is sent over the network.
So, the most important things are:
Filter the data ON THE SERVER so that you can minimize network traffic and also because the database is going to be faster at filtering than ADO.NET
If you have to choose a dataset to pull into your application, pull in the smaller dataset and then use that to filter the other table.
Assuming Sql Server can get to the Access databases, you could construct an OPENROWSET query across them.
SELECT a.*
FROM SqlTable
JOIN OPENROWSET(
'Microsoft.Jet.OLEDB.4.0',
'C:\Program Files\Microsoft Office\OFFICE11\SAMPLES\Northwind.mdb';'admin';'',
Orders
) as b ON
a.Id = b.Id
You would just change the path to the Access database at runtime to get to different MDBs.
First you need to do something on the server - reference the Access DB as a "Linked Server".
Then you will be able to query it from within the SQL server, pulling out or stuffing in data however you like. This web page gives a nice overview on how to do it.
http://blogs.meetandplay.com/WTilton/archive/2005/04/22/318.aspx
If I read the question correctly, you are NOT attempting to cross reference across multiple databases.
You need merely to reference details about a particular FILE, which in this case, could contain:
primary key, parent file checksum (if it is a modification), file checksum, last known author, revision number, date of last change...
And then that primary key when adding information obtained from analysing that file using your program.
If you actually do need a distributed database, perhaps you would prefer to use a non-relational database such as LDAP.
If you can't use LDAP, but must use a relational database, you might consider using GUID's to ensure that your primary keys are good.
Since you don't give enough information, i'm going to have to make some assumptions.
Assuming:
The SQL Server and the Access Database are not on the same computer
The SQL Server cannot see the Access database over a file share or it would be too difficult to achieve this.
You don't need to do joins between the access database and the sql server, only use data from teh access database as lookup elements of your where clause
If the above assumptions are correct, then you can simply use ADO to open the Access database and retrieve the data you need, possibly in a dataset or datatable. Then extract the data you need and feed it to a different ADO query to your SQL Server in a dynamic Where clause, prepared statement, or via parameters to a stored procedure.
The other solutions people are giving all assume you need to do joins on your data or otherwise execute SQL which includes both databases. To do that, you have to use linked databases, or else import the data into a table (perhaps temporary).
Have you tried benchmarking what happens if you link from the Access front end to your SQL Server via ODBC and write your SQL as though both tables are local? You could then do a trace on the server to see exactly what Jet sends to the server. You might be surprised as to how efficient Jet is with this kind of thing. If you're linking on a key field (e.g., and ID field, whether from the SQL Server or not), it would likely be the case that Jet would send a list of of the IDs. Or you could write your SQL to do it that way (using IN SELECT ... in your WHERE clause).
Basically, how efficient things will be depends on where your WHERE clause is going to be executed. If, for instance, you are joining a local Jet table with a linked SQL Server table on a single field, and filtering the results based on values in the local table, it's very likely to be extremely efficient, in that the only thing Jet will send to the server is whatever is necessary to filter the SQL Server table.
Again, though, it's going to depend entirely on exactly what you're trying to do (i.e., which fields you're filtering on). But give Jet a chance to see if it is smart, as opposed to assuming off the bat that Jet will screw it up. It may very well require some tweaking to get Jet to work efficiently, but if you can keep all your logic client-side, you're better off than trying to muck around with tracking all the Access databases from the server.