I've been developing some small database applications in Visual Studio C# for a while now. I am currently using VS 2010. Up until recently all the apps were ran on the same computer that the database was stored on and everything ran great. Recently I had to start to developing some apps that will run on a separate computer that is on the same local network.
Easy enough, but I run into a problem when running queries to fill controls, such as a grid or even combo box. The problem is that it can take 15-30 seconds per control if my query is pulling a large amount of data. I know this is because the app is sending out my select query, waiting for all of the results to come across the network and then displaying the information. The problem is I don't know what to do about it.
Below I have a code snippet(slightly modified to make more sense). It is using a Firebird database, though I use MSSQL and Sybase Advantage as well with the same results.
FbConnection fdbConnect = new FbConnection();
fdbConnect.ConnectionString = Program.ConnectionString;
fdbConnect.Open();
FbCommand fcmdQuery = new FbCommand();
fcmdQuery.Connection = fdbConnect;
fcmdQuery.CommandText = "select dadda.name, yadda.address, yadda.phone1 from SOMETABLE left join yadda on dadda where yadda.pk = dadda.yaddapk";
FbDataAdapter fdaDataSet = new FbDataAdapter(fcmdQuery);
DataSet dsReturn = new DataSet();
fdaDataSet.Fill(dsReturn);
fdbConnect.Close();
DataGridView1.DataSource = dsReturn.Tables[0];
Does anyone have any suggestions on how I can speed this up?
You may returning unnecessary data in that SELECT * statement. It can be wasteful in network traffic and drive down the performance of your application. There are many articles about this and how you should specify your columns explicitly. Here is one in particular.
You can reduce the volume of the response by restricting your columns:
Instead of
select * from SOMETABLE
Try
select a,b,c from SOMETABLE
to retrieve only the data you need.
Your mileage vary depending on what the table contains. If there are unused blob columns for instance, you are adding a considerable overhead to your response.
If you are displaying the data in gridview, and if data is huge, its better to do server side paging so that a specific number of rows is returned at a time.
Related
I expected to be able to include multiple SELECT statements, each separated by a semicolon, in my query, and get a dataset returned with as the same number of datatables as individual SELECT statements.
I am starting to think that the only way that this can be done is to create a stored procedure with multiple refcursor output parameters.
string sql = #"SELECT
R.DERVN_RULE_NUM
,P.DERVN_PARAM_INPT_IND
,R.DERVN_PARAM_NM
,R.DERVN_PARAM_VAL_DESC
,P.DERVN_PARAM_SPOT_NUM
,R.DERVN_PARAM_VAL_TXT
FROM
FDS_BASE.DERVN_RULE R
INNER JOIN FDS_BASE.DERVN_PARAM P
ON R.DERVN_TY_CD = P.DERVN_TY_CD
AND R.DERVN_PARAM_NM = P.DERVN_PARAM_NM
WHERE
R.DERVN_TY_CD = :DERVN_TY_CD
ORDER BY
R.DERVN_RULE_NUM
,P.DERVN_PARAM_INPT_IND DESC
, P.DERVN_PARAM_SPOT_NUM";
var dataSet = new DataSet();
using (OracleConnection oracleConnection = new OracleConnection(connectionString))
{
oracleConnection.Open();
var oracleCommand = new OracleCommand(sql, oracleConnection)
{
CommandType = CommandType.Text
};
oracleCommand.Parameters.Add(":DERVN_TY_CD", derivationType);
var oracleDataAdapter = new OracleDataAdapter(oracleCommand);
oracleDataAdapter.Fill(dataSet);
}
I tried to apply what I read here:
https://www.intertech.com/Blog/executing-sql-scripts-with-oracle-odp/
including changing my SQL to enclose it in a BEGIN END BLOCK in this form:
string sql = #"BEGIN
SELECT 1 FROM DUAL;
SELECT 2 FROM DUAL;
END";
and replacing my end of line character
sql = sql.Replace("\r\n", "\n");
but nothing works.
Is this even possible w/o using a stored procedure using ODP or must I make a seperate trip to the server for each query?
The simplest way to return multiple query results from a single statement is with the CURSOR SQL function. For example:
select
cursor(select * from all_tables) tables,
cursor(select * from all_objects) objects
from dual;
(However, I am not a C# programmer, so I don't know if this solution will work for you. Let me know if the code doesn't work - there's probably another solution using anonymous blocks and OUT parameters.)
must I make a seperate trip to the server for each query?
The way this is asked makes it seem like there's a considerable effort or waste of resources going on somewhere that can be saved or reduced, like making a database query is the equivalent of walking to the shops to get milk, coming back, then walking to the shops again to get bread and coming back
There isn't any appreciable saving to be had; if this was going to the shops, db querying is like being able to clone yourself X times, the X of you all going to the shops, and coming back at different times - some of you found your small things instantly and sprint back with them, some of you found the massive things instantly and stagger back with them, some of you took ages to find your things etc. (These are metaphors for the speed of query execution and the time required to download large vs small result sets).
If you have two queries that take ten seconds each to run, you can set them going in parallel and have your results ready and retrieved to the client in 10+x seconds (x being the time required to drag the data over the network), or you could execute them in series and have it be 20+x
If you think about it, putting two queries in one statement is only the same thing as submitting two statements for execution over different connections. The set of steps the db must take, and the set of steps the client must do to read, are the same. Writing a sproc to handle it is more effort, more complexity to maintain and more places code lives in. Even writing a block to do it is more. None of it saves anything. Even the bytes in the header of the tcp packets, minutiae as they are, are offset by more complex multi line blocks. If one query takes considerably longer than the other you might even be hamstrung into having to wait for them all to finish before you can get the results
Write your "query statement x with y parameters and return resultset Z" as async, start two of them and Task.WhenAll to wait for them to finish; if you can handle it, don't do a WhenAll but instead read and use the results as they finish - that's a saving, if the process can logically proceed before all queries deliver
I get that you're thinking "surely I should just walk to the shops and carry milk and bread back with me - that's more efficient than going twice" but it's a faulty perspective when you consider that the shop is nanoseconds away because you run at the speed of light, you have multiple private unobstructed paths to it and the bigger spend of time is finding the items you want and loading them into sufficient chained-together carts/dragging them all home. With a cloning approach, if the milk is right there, one of you can take it home and spend 10 minutes making the béchamel with it while the other of you is still waiting 10 minutes for the shop to bake the bread that you'll eat directly when you get home - you can still eat in 10 minutes if you maintain the parallelism, and launching separate operations is not only simpler but it keeps you in easy control of that
I'm querying Caché for a list of tables in two schemas and looping through those tables to obtain a count on the tables. However, this is incredibly slow. For instance, 13 million records took 8 hours to return results. When I query an Oracle database with 13 million records (on the same network), it takes 1.1 seconds to return results.
I'm using a BackgroundWorker to carry out the work apart from the UI (Windows Form).
Here's the code I'm using with the Caché ODBC driver:
using (OdbcConnection odbcCon = new OdbcConnection(strConnection))
{
try
{
odbcCon.Open();
OdbcCommand odbcCmd = new OdbcCommand();
foreach (var item in lstSchema)
{
var item = i;
odbcCmd.CommandText = "SELECT Count(*) FROM " + item;
odbcCmd.Connection = odbcCon;
AppendTextBox(item + " Count = " + Convert.ToInt32(odbcCmd.ExecuteScalar()) + "\r\n");
int intPercentComplete = (int)((float)(lstSchema.IndexOf(item) + 1) / (float)intTotalTables * 100);
worker.ReportProgress(intPercentComplete);
ModifyLabel(" (" + (lstSchema.IndexOf(item) + 1) + " out of " + intTotalTables + " processed)");
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
return;
}
}
Is the driver the issue?
Thanks.
I supose the devil is in the details. Your code does
SELECT COUNT(*) FROM Table
If the table has no indices then I wouldn't be surprised that it is slower than you expect. If the table has indices, especially bitmap indices, I would expect this to be on par with Oracle.
The other thing to consider is to understand how Cache is configured, ie what are the global buffers, what does the performance of the disk look like.
Intersystems cache is slower for querying than any SQL database I have used, especially when you deal with large databases. Now add an ODBC overhead to the picture and you will achieve even worse performance.
Some level of performance can be achieved through use of bitmap indexes, but often the only way to get good performance is to create more data.
You might even find that you can allocate more memory for the database (but that never seemed to do much for me)
For example every time you add new data force the database to increment a number somewhere for your count (or even multiple entries for the purpose of grouping). Then you can have performance at a reasonable level.
I wrote a little Intersystems performance test post on my blog...
http://tesmond.blogspot.co.uk/2013/09/intersystems-cache-performance-woe-is-me.html
Cache has a built in (smart) function that determines how to best execute queries. Of course having indexes, especially bitmapped, will drastically help query times. Though, a mere 13 million rows should take seconds tops. How much data is in each row? We have 260 million rows in many tables and 790 million rows in others. We can mow through the whole thing in a couple of minutes. A non-indexed, complex query may take a day, though that is understandable. Take a look at what's locking your globals. We have also discovered that apparently queries run even if the client is disconnected. You can kill the task with the management portal, but the system doesn't seem to like doing more than one ODBC query at once with larger queries because it takes gigs of temp data to do such a query. We use DBVisualizer for a JDBC connection.
Someone mentioned TuneTable, that's great to run if your table changes a lot or at least a couple of times in the table's life. This is NOT something that you want to overuse. http://docs.intersystems.com/ens20151/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_optimizing is where you can find some documentation and other useful information about this and improving performance. If it's not fast then someone broke it.
Someone also mentioned that select count() will count an index instead of the table itself with computed properties. This is related to that decision engine that compiles your sql queries and decides what the most efficient method is to get your data. There is a tool in the portal that will show you how long it takes and will show you the other methods (that the smart interpreter [I forget what it's called]) that are available. You can see the Query Plan at the same page that you can execute SQL in the browser mentioned below. /csp/sys/exp/UtilSqlQueryShowPlan.csp
RE: I can't run this query from within the Management Portal because the tables are only made available from within an application and/or ODBC.
That isn't actually true. Within the management portal, go to System Explorer, SQL, then Execute SQL Statements. Please note that you must have adequate privileges to see this %ALL will allow access to everything. Also, you can run SQL queries natively in TERMINAL by executing.... do $system.SQL.Shell() Then type your queries. This interface should be faster than ODBC as I think it uses object access. Also, keep in mind that embedded SQL and object access of data is the fastest way to access data.
Please let me know if you have any more questions!
I am sorry that they ask this question has been asked many times but I still have not yet found the best answer.
I am worried applications take a long time to download the record or filter the records. Assuming I have a table called tbl_customer. And records in tbl_customer more than 10,000 rows.
The first question, I am using Data Grid View to display the records. Would be ideal if I download all the records up to 10,000 rows into the Data Grid View? Or perhaps I had better put the record row limit?
Second question, what is the best way to filter records in tbl_customer. Do we just need to query using SQL? or using LINQ? or maybe there is a better way?
For now, I only use this way:
DataTable dtCustomer = new DataTable();
using (SqlConnection conn = new SqlConnection(cs.connString))
{
string query = "SELECT customerName,customerAddress FROM tbl_customer WHERE customerAddress = '"+addressValue+"' ORDER BY customerName ASC;";
using (SqlDataAdapter adap = new SqlDataAdapter(query, conn))
{
adap.Fill(dtCustomer);
}
}
dgvListCustomer.DataSource = dtCustomer
Then I learn about LINQ so i do like this
DataTable dtCustomer = new DataTable();
using (SqlConnection conn = new SqlConnection(cs.connString))
{
string query = "SELECT * FROM tbl_customer ORDER BY customerName ASC;";
using (SqlDataAdapter adap = new SqlDataAdapter(query, conn))
{
adap.Fill(dtCustomer);
}
}
var resultCustomer = from row in dtCustomer.AsEnumerable()
where row.Field<string>("customerAddress") == addressValue
select new
{
customerName = row["customerName"].ToString(),
customerAddress = row2["customerAddress"].ToString(),
};
dgvListCustomer.DataSource = resultCustomer;
Workflow SQL> DATATABLE> LINQ > DataGridView is suitable to filter records? Or if there are better suggestions are most welcome.
Thanks you..:)
I am worried applications take a long time to download the record or filter the records.
Welcome - you seem to live in a world like me where performance ms measured in milliseconds, and yes, on a low power server it will take likely more than a millisecond (0.001 seconds) to hot load and filter 10.000 rows.
As such, my advice is not to put that database on a tablet or mobile phone but to use at least a decent desktop level compute r or VM for the database server.
As a hint: I am regularly making queries on a billion row table and it is fast. Anything below a million rows is a joke these days - in fact it was nothing worth mentioning when I started with databases more than 15 years ago. You are the guy asking whether it is better to have a ferrari or a porsche becauese you are concerned whether any of those case goes more than 20km/h.
Would be ideal if I download all the records up to 10,000 rows into the Data Grid View?
In order to get fired? Yes. Old rule with databases: never load more data than you have to, especially when you have no clue. Forget the SQL side - you will get UI problems with 10.000 rows and more, especially usability issues.
Do we just need to query using SQL? or using LINQ?
Hint: Linq is also using SQL under the hood. The question is more - how much time do you want to spend writing boring repetitive code for handwritten SQL like in your examples? Espeically given that you also do "smart" things like referencing fields by name, not ordinal, and asking for "select *" instead of a field list, bot obvious beginner mistakes.
What you should definitely not do - but you do - is using a DataTable. Get a decent book about programming databases. RTFSM may help - both LINQ (which I am not sure what you mean - LINQ is a language for the compiler, you need an implementor, so that could be NHibernate, Entity Framework, Linq2Sql, BlToolkit, to name just a FEW tha t go from a LINQ query to a sql statement).
Workflow SQL> DATATABLE> LINQ > DataGridView is suitable to filter records?
A Ferrari is also suitable to transport 20 tons of coal from A to B - just the worst possible car for it. GSour stack is likely the worst I have seen, but it is suuitable in that you CAN do it - slow, lots f mmemoory use, but you will get a result and hopefully fired. You pull the data from a high performance database into a data table, then use a non integrating technology (LINQ) to filter (not using the indices in the data table) to go into yet another layer.
Just to give you an idea - this would get you removed from quite some "beginning programming" courses.
What about:
LINQ
Point.
Pulls a collection of business objects that go to the UI. Period.
Read at least some of the sample code for the technologies you use.
I'm creating a windows application in which I need to get data using ado.net/(Or any other way using C# if any ). From one table. The database table apparently has around 100000 records and it takes forever to download.
Is there any faster way where I could get data into faster?
I tried the DataReader but still isn't fast enough.
The data-reader API is about the most direct you can do. The important thing is where is the time?
is it bandwidth in transferring the data?
or is it in the fundamental query?
You can find out by running the query locally on the machine, and see how long it takes. If bandwidth is your limit, then all you can really try is removing columns you don't actually need (don't do select *). Or pay for a fatter pipe between you and the server. In some cases, querying the data locally, and returning it in some compressed form might help - but then you're really talking about something like a web-service, which has other bandwidth considerations.
More likely, though, the problem is the query itself. Often, things like:
writing sensible tsql
adding an appropriate index
avoid cursors, complex processing, etc
You might want to implement a need to know basis method. Only pull down the first chunk of data that is needed and then when the next set is needed, pull those rows.
It's probably your query that is so slow not the streaming process. You should show us your sql query, then we could help you to improve it.
Assuming you want to get all 100000 records from you table, you could use a SqlDataAdapter to fill a DataTable or a SqlDataReader to fill a List<YourCustomClass>:
the DataTable approach (since i don't know your fields it's difficult to show a class):
var table = new DataTable();
const string sql = "SELECT * FROM dbo.YourTable ORDER BY SomeColumn";
using(var con = new SqlConnection(Properties.Settings.Default.ConnectionString))
using(var da = new SqlDataAdapter(sql, con))
{
da.Fill(table);
}
I'm using sync framework to synchronize sql server 2008 database with sqlCE on mobile device. Everything looks fine besides some problems. One of them is:
If i want to sync 1000 or more rows, i get OutOfMemory Exception on mobile device(tho the sync completes well, because after it i check the data of some rows and it looks synced). I thought that maybe too large xmls are rotating between mobile device and server(for 100 rows evrth works just fine)...Thats why i asked about how to split the sent data. But maybe im wrong. I didn't found any resources on this, so i dont exactly know WHAT can eat so much memory to add just 60Kb to the compact database.
You'll need to implement some sort of batching.
A quite naive version of it is shown on here:
http://msdn.microsoft.com/en-us/library/bb902828.aspx.
I've seen that you're intrested in some filtring. If this will filter out some, or rather alot, of rows I would recommend to write your own batch logic. The one we're currently using sets the #sync_new_received_anchor to the anchor of the #sync_batch_size:th row to be synced.
In a quite simplified way the logic looks like this:
SELECT #sync_new_received_anchor = MAX(ThisBatch.ChangeVersion)
FROM (SELECT TOP (#sync_batch_size) CT.SYS_CHANGE_VERSION AS ChangeVersion
FROM TabletoSync
INNER JOIN CHANGETABLE(CHANGES [TabletoSync],
#sync_last_received_anchor) AS CT
ON TabletoSync. TabletoSyncID = CT. TabletoSyncID
WHERE TabletoSync.FilterColumn = #ToClient
ORDER BY CT.SYS_CHANGE_VERSION ASC) AS ThisBatch