I'm writing a C# application that should run an Oracle-Select query and perform some calculations for each line.
The select query is very big and takes a long time.
In the current application design, I should wait until the query finishes retrieving all the data from the database in order to start the required computations on each row.
I was wondering if there is a way to get the first query results as the database engine find them.
Means that : Instead of waiting for the database engine to find all the rows that correspond to my query and return them, get the result since the first row found by the database engine.
At the end the computation required for each line will start as long as the first line found in the database and hence the total run time will be less.
The idea here is not about how to speed up an Oracle query or adding any index. It's more about getting overlapping computations to optimize more the computations.
Sorry if it's a dump question and thank you in advance.
I'm using Oracle 11g and the Query may just be as simple as (but returns hundreds of thousands of rows)
Select * from Table Where Condition1;
I run the explain plan for my query :
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 251 | 122K| 656K (1)| 00:07:40 |
|* 1 | TABLE ACCESS FULL| TABLE1 | 251 | 122K| 656K (1)| 00:07:40 |
-----------------------------------------------------------------------------
Oracle has an all rows strategy and a first rows strategy.
Usually, Oracle will, when possible, do a first rows strategy when possible. The simplest example of that would be something like:
select * from emp;
Here, there is no join, there's no sorting, etc, so, Oracle will begin to return rows immediately, as it reads through the EMP table.
On the other hand, this is a simple example of an all rows strategy:
select * from emp order by surname;
Here, we're asking for sort on SURNAME, so, we cannot begin to immediately return results. The table must be read in its entirety, and then sorted, before we can return the first row.
There are other factors as well. If you're joining tables, a NESTED LOOPS join will execute with a first rows strategy, whereas a HASH JOIN will (necessarily) employ an all rows strategy.
Ultimately, which is better, which you will want, is going to be dependent on your application. If you're doing stuff that the user directly interacts with, you'll probably want first rows, to not keep the user waiting. For batch jobs, all rows is (probably) better.
Finally, the optimizer can be influenced with the ALL_ROWS and FIRST_ROWS_n hints.
Related
I have a part of a program wich has to display data from multiple tables, based on different queries. So far it looks like this: (keep in mind that every subsequent SELECT is based on something we got from A)
SELECT * FROM TABLE A WHERE ID = ...
SELECT [8 fields] FROM TABLE B WHERE ...
SELECT [5 fields] FROM TABLE C WHERE ...
SELECT [1 field] FROM TABLE D WHERE ...
SELECT [1 field] FROM TABLE E WHERE ...
SELECT [1 field] FROM TABLE F WHERE ...
SELECT [1 field] FROM TABLE G WHERE ...
SELECT [1 field] FROM TABLE H WHERE ...
SELECT [2 fields] FROM TABLE I WHERE ...
After that, I take the results and create different objects or put in different fields with them.
Thing is, between clicking the button and getting the window to show, I have a delay of about 2 seconds.
Keep in mind, this is a very big database, with millions of records. Changing the DB is out of the question, unfortunately.
I am searching only by Primary Keys, I have no way to restrict the search even more than that.
The connection is opened from the start, I don't close/reopen it after each statement.
Joining just Table A and Table B takes a lot longer than two different Selects, up to 1.5 seconds, while running sequential selects goes down to bout 300 ms.
I still find it to be quite a long time, given that the first query executes in around 53 ms in the DBMS.
I am using the ODBC driver in C#, Net Framework 4. The database itself is a DB2, however, using the DB2 native driver has given us a plethora of problems and IBM has been less than helpful about it.
Also, whnever I select only a few fields, I create the needed object using only those and leaving the rest on default.
Is there any way I could improve this?
Thank you in advance,
Andrei
Edit: The diagnostic tool says something among the lines of:
--Two queries in another part of the program, we can ignore these, as they are not usually there-- 0.31 s
First query - 0.75 s
Second query - 0.87s
Third query - 0.95s
Fourth query - 0.99s
Fifth query - 1.00s
Sixth query - 1.04s
Seventh query - 1.08s
Eighth query - 1.10s
Ninth query - 1.12s
Program output - 1.81s
There is overhead to constructing query strings and executing them. When running multiple similar queries, you want to be sure that they are compiled once and then the execution plan is re-used.
However, it is likely that the best approach is to create a single query that returned multiple columns. Naively, this would look like:
select . . .
from a join
b
on . . . join
c
on . . . join
. . .
However, the joins might be left joins. The query might be more complex if you are joining along different dimensions that might produce Cartesian products.
The key point is that the SQL query will optimize the query inside the database. This is (generally) more efficient than constructing multiple different queries.
I was wondering if there was a way to qualify the table mappings when using sqlbulkcopy in c#?
Currently, I have a table that contains Stock Codes and then columns associated with range of weekly bucks.
example:
Stock Code | 11-2013 | 12-2013| 13-2013 | 14-2013 etc etc.
I have a query that returns quantities for the given stock code and the week number in which they occurred.
example:
part a | 20 | 11-2013
part b | 10 | 14-2013
Ideally, there would be a way to set the columnmappings.add method and specify that I would like to map the date column of the table to the resulting date in the return row of the query. I would show what I have; however, I have no idea if this is even possible. Any suggestions or alternative ideas would be great.
Thanks
Not directly possible. Your source data has to match to your destination data. The SqlBulkCopy class isn't going to do that for you.
Create a sql query from your source data that matches the table schema of your destination table. Then you can use the SqlBulkCopy class.
How can I query on the Azure Table Storage for duplicate values?
Suppose the table contains a column named 'LastName' and there are a few lastnames that equal to each other. How can I query on that without knowing or having that specific string that holds the lastname value?
Edit
An example would be:
Partitionkey RowKey LastName
1 1 Smith
1 2 Smith
1 3 Smith
1 3 MILLER
1 3 WILLIAMS
In this case, I'd like to get all records where Smith is the last name, because they are duplicates.
As a general rule of thumb: queries that do not include the PartitionKey or RowKey will not perform very well.
I assume your LastName column is neither Partition- nor RowKey. In that case you only have bad options. The way Table Storage works is that entitios of a Partition are stored close together, so the fastest queries are those that include the Partition Key of the entities you are looking for. Since you cannot build indexes on any other columns, all queries that do not include the RowKey will be partition-scans, i.e. not perform well at all because all rows of that partition must be analyzed.
In your case, if you are looking for all those columns that include duplicate values, your best bet will likely be to just query everything and look for duplicates locally.
I don't think you can create a table storage query that would return the results. As far as I know, there is no such thing as select … where count(select duplicates) > 1 – and even if so, that query would be very slow. Unless we're talking about huge amounts of data, simply querying everything and filtering locally would likely perform better.
As I said, you only have bad options. That's because Table Storage wasn't designed for queries like this. Unlike SQL tables, Table Storage tables should be designed with queries in mind, i.e. you should know how you're gonna query the table before you design it.
Your second option would be to migrate to Azure SQL, where such queries are no problem at all. Azure SQL is very different form Table Storage though, so it's questionable whether it fits your requirements.
Edit: One way you can optimize the query-everything solution would be to only return the LastNames of your entities (+ Partition/RowKey or whatever else you need). This way the amount of data that is being sent can potentially be reduced by quite a bit. Here's an article about query projection that explains this technique in detail.
The query to fetch all records should be
PartitionKey eq 'Your PartitionKey' and LastName eq 'Smith'
unless I'm missing something.
You would also need to take table continuation token into consideration as well. See this thread for more details: Copy all Rows to another Table in Azure Table Storage. As #enzi mentioned, there's no Select * from table where ... functionality is available in table storage.
Firstly, as a disclaimer, I'm learning a little about SQL/LINQ indirectly through C#. I'm very green in both really. I've quickly grown tired of using sample databases full of data and doing queries set in the most simple of situations. The information is presented this way because the focus is on C# and visual programming and not the hairies of SQL or even LINQ for that matter.
Even so, I decided to convert a simple application I wrote using text (CSV) files for storage. The application keeps track of three possible payables where 0 - 3 records will exist for a given date. I was able to create the database and separate tables for each contract and build an application that inserts the existing records into the database using LINQ to SQL classes.
Now I have another application that is used to add entries via a contract calculator or directly through a BindingSourceNavigator and DataGridView. Each table has in common four columns - Date, GrossPay, TaxAmount, and NetPay, Date being the primary key. I'd like to view the records by date where I have TotalGross, TotalTax, TotalNet, and a column for the GrossPay each contract table for that date.
What would correct approach to this - a view, LINQ query, separate table, or other? It seems a table would be the "easiest", at least in terms of my ability, but seems like an unnecessary copying of records. I tried to "link" the tables but no other column is guaranteed to be unique or primary.
Any suggestions would be great.
Clarification:
Three tables have the format:
| Date | GrossPay | TaxAmount | NetPay | ...each have others not in common... |
** Each table has specific data used to calculate the common columns based on contract type
I would like to view all records "grouped" by date such that each are represented like:
| Date | TotalGross | TotalTax | TotalNet | Table1Gross | Table2Gross | Table3Gross |
** "Total" columns are sums of the respective columns of all records sharing the date.
** One or two of the "Table(n)Gross" may be zero
I think you are asking if you can select records from three different tables by date for the columns they have in common?
If so, you need to do a union.
In your case it may look something like this in SQL. note that I have made a dummy column to denote the source of the record (which you may want)
SELECT Date, GrossPay, TaxAmount, NetPay, 'Table1' as Source FROM Table1
UNION
SELECT Date, GrossPay, TaxAmount, NetPay, 'Table2' as Source FROM Table2
UNION
SELECT Date, GrossPay, TaxAmount, NetPay, 'Table3' as Source FROM Table3
WHERE Date = '2013-05-05'
I wouldn't bother with a view and definitely don't replicate your data with a seperate table.
I'm working on a local city project and have some questions on efficiently creating relationships between "parks" and "activities" in Microsoft SQL 2000. We are using ASP.NET C# to
I have my two tables "Parks" and "Activities." I have also created a lookup table with the proper relationships set on the primary keys of both "Parks" and "Activities." My lookup table is called "ParksActitivies."
We have about 30 activities that we can associate with each park. An intern is going to be managing the website, and the activities will be evaluated every 6 months.
So far I have created an admin tool that allows you to add/edit/delete each park. Adding a park is simple. The data is new, so I simply allow them to edit the park details, and associate "Activities" dynamically pulled from the database. This was done in a repeater control.
Editing works, but I don't feel that its as efficient as it could be. Saving the main park details is no problem, as I simply call Save() on the park instance that I created. However, to remove the stale records in the lookup table I simply DELETE FROM ParksActitivies WHERE ParkID = #ParkID" and then INSERT a record for each of the checked activities.
For my ID column on the lookup table, I have an incrementing integer value, which after quite a bit of testing has got into the thousands. While this does work, I feel that there has to be a better way to update the lookup table.
Can anyone offer some insight on how I may improve this? I am currently using stored procedures, but I'm not the best at very complex statements.
[ParkID | ParkName | Latitude | Longitude ]
1 | Freemont | -116.34 | 35.32
2 | Jackson | -116.78 | 34.2
[ActivityID | ActivityName | Description ]
1 | Picnic | Blah
2 | Dancing | Blah
3 | Water Polo | Blah
[ID | ParkID | ActivityID ]
1 | 1 | 2
2 | 2 | 1
3 | 2 | 2
4 | 2 | 3
I would prefer to learn how to do it a more universal way as opposed to using Linq-To-SQL or ADO.NET.
would prefer to learn how to do it a more universal way as opposed to using LINQ2SQL or ADO.NET.
You're obviously using ADO.NET Core :). And that's fine I think you should stick to using Stored procedures and DbCommands and such...
If you were using MSSQL 2008 you'd be able to do this using TableValued parameters and the MERGE statement. since you're using MSSQL 200 (why?) what you'd need to do is the following:
1. Send a comma delimited list of the Activity ids (the new ones) along with the ParkId to your stored proc. The ActivityIds parameter would be a varchar(50) for example.
In your stored proc you can split the ids
The strategy would be something like
1. For the Ids passed in, delete records that don't match
The SQL for that would be
DELETE FROM ParkActivities
WHERE ActivityId NOT IN (Some List of Ids)
WHERE ParkId = #ParkId
Since your list is a string you can do it like this
EXEC('DELETE FROM ParkActivities WHERE ActivityId NOT IN (' + #ActivityIds + ') AND ParkId = ' + #ParkId)
Now you can insert those activities that are not already in the table. The simplest way to do this would be to insert the ParkActivity ids into a temp table. To do that you'll need to split the comma delimited list into individual ids and insert them into a temp table. Once you have the data in the temp table you can insert doing a join.
The is a built-in user defined function in MSSQL 2000 that can do the split and return a Table Variable with each value on a seperate row.
http://msdn.microsoft.com/en-us/library/Aa496058
What is wrong with LinqToSQL and ADO.NET? I mean, could you specify your doubts about using those technologies
update
if LinqToSQL is not supported for 2000, you can easily upgrade to free 2008 express. It would be definitely enough for purposes you described.