I have a DataTable with a DateTime stored as a string like "20.12.2017".
I want to select all rows within the last 6 month.
I can do this with a foreach:
foreach (DataRow dr in dsErgebnisse.Tables[0].Rows)
{
if (Convert.ToDateTime(dr[6].ToString()) > DateTime.Now.AddMonths(-6))
{
dsTemp.Tables[0].ImportRow(dr);
}
}
This gives me 3.613 rows.
I try to do this with a select to check if it's faster:
DataRow[] foundRows = dsErgebnisse.Tables[0].Select("DATUM > '" + DateTime.Now.AddMonths(-6).ToShortDateString() + "'");
DATUM is my column where the DateTimeis stored as a string.
This gives me 2.624 rows.
Why is there a difference?
I tried to use convert in the select statement but I fail with System.Data.EvaluateException:
foundRows = dsErgebnisse.Tables[0].Select("Convert(DATUM, 'System.DateTime') > '" + DateTime.Now.AddMonths(-6).ToShortDateString() + "'");
rule 0: stop using DataTable... just ever*
rule 1: don't store date/time values as strings; store them as DateTime or similar
rule 2: if you ignore rules 0 and 1, make sure you store them in a sortable order like "2017-12-20"
since you've broken all of those rules, most bets are off, and you'll probably have to do the filter manually by iterating over the rows, fetching the values, and doing your own conversions. LINQ via .AsEnumerable() may help; it certainly can't make it much worse :)
*=there is a tiny category of problems where DataTable is appropriate; if you know the schema of the data in advance well enough to issue a Select query: this isn't one of them
Related
I'm having a sudden and strange problem with DataTable. I'm using C# with MySQL database to develop a system, and I'm trying to export custom reports. The problem is that, somehow, my DataTable is getting only one result (I've tested my query on MySQL and should be something like 30 results on the xls file and the DataTable).
Strangely, these functions are used in other parts of the system to export other kinds of reports, and work perfectly. This is the select function that I'm using:
public DataTable selectBD(String tabela, String colunas) {
var query = "SELECT " + colunas + " FROM " + tabela;
var dt = new DataTable();
Console.WriteLine("\n\n" + query + "\n\n");
try
{
using (var command = new MySqlCommand(query, bdConn)) {
MySqlDataReader reader = command.ExecuteReader();
dt.Load(reader);
reader.Close();
}
}
catch (MySqlException) {
return null;
}
bdConn.Close();
return dt;
}
And this is my query:
SELECT
cpf_cnpj, nomeCliente, agenciaContrato, contaContrato,
regionalContrato, carteiraContrato, contratoContrato,
gcpjContrato, avalistaContrato, enderecoContrato,
telefoneContrato, dataChegadaContrato, dataFatoGerContrato,
dataPrimeiraParcelaContrato, dataEmissaoContrato, valorPlanilhaDebitoContrato
FROM
precadastro
INNER JOIN
contrato
ON precadastro.cpf_cnpj = contrato.FK_cpf_cnpj
LEFT JOIN faseprocessual
ON contrato.idContrato = faseprocessual.FK_idContrato
And that is the result of the query on SQLyog
I've tested and the DataTable returned by the function only receive the one row, and it's not the first row of the MySQL results. Someone had this kind of problem before?
DataTable load expects primary key from your data (supplied by DataReader) and tries to guess it from passed rows. Since there's no such key, Load method guesses it's the first column (cpf_cnpj). But, values in that column aren't unique so the each row gets overwritten by next one, and the result is just one row in your DataTable.
It's the issue that persist for years, and I'm not sure there's one solution to rule them all. :)
You can try:
change query so that some unique values get into first column (unfortunately, I can't see something unique in your screenshot) or concatenate two or more values to get unique value.
Prepare DataTable by yourself by creating columns (this mirroring structure of resultset) and then iterate through DataReader to copy data.
add some autoincrement value in your query (or make temporary table with auto_increment column then fill that table)
Last suggestion could be something like this (I haven't worked much with mySql, so this is some suggestion i have googled :)):
SELECT
#i:=#i+1 AS id,
cpf_cnpj, nomeCliente, agenciaContrato, contaContrato,
regionalContrato, carteiraContrato, contratoContrato,
gcpjContrato, avalistaContrato, enderecoContrato,
telefoneContrato, dataChegadaContrato, dataFatoGerContrato,
dataPrimeiraParcelaContrato, dataEmissaoContrato, valorPlanilhaDebitoContrato
FROM
precadastro
INNER JOIN
contrato
ON precadastro.cpf_cnpj = contrato.FK_cpf_cnpj
LEFT JOIN faseprocessual
ON contrato.idContrato = faseprocessual.FK_idContrato
CROSS JOIN (SELECT #i:= 0) AS i
here's answer on SO which uses auto number in query.
I have an array or string:
private static string[] dataNames = new string[] {"value1", "value2".... };
I have table in my SQL database with a column of varchar type. I want to check which values from the array of string exists in that column.
I tried this:
public static void testProducts() {
string query = "select * from my table"
var dataTable = from row in dt.AsEnumerable()
where String.Equals(row.Field<string>("columnName"), dataNames[0], StringComparison.OrdinalIgnoreCase)
select new {
Name = row.Field<string> ("columnName")
};
foreach(var oneName in dataTable){
Console.WriteLine(oneName.Name);
}
}
that code is not the actual code, I am just trying to show you the important part
That code as you see check according to dataNames[index]
It works fine, but I have to run that code 56 times because the array has 56 elements and in each time I change the index
is there a faster way please?
the Comparison is case insensitive
First, you should not filter records in memory but in the datatabase.
But if you already have a DataTable and you need to find rows where one of it's fields is in your string[], you can use Linq-To-DataTable.
For example Enumerable.Contains:
var matchingRows = dt.AsEnumerable()
.Where(row => dataNames.Contains(row.Field<string>("columnName"), StringComparer.OrdinalIgnoreCase));
foreach(DataRow row in matchingRows)
Console.WriteLine(row.Field<string>("columnName"));
Here is a more efficient (but less readable) approach using Enumerable.Join:
var matchingRows = dt.AsEnumerable().Join(dataNames,
row => row.Field<string>("columnName"),
name => name,
(row, name) => row,
StringComparer.OrdinalIgnoreCase);
try to use contains should return all value that you need
var data = from row in dt.AsEnumerable()
where dataNames.Contains(row.Field<string>("columnName"))
select new
{
Name = row.Field<string>("columnName")
};
Passing a list of values is surprisingly difficult. Passing a table-valued parameter requires creating a T-SQL data type on the server. You can pass an XML document containing the parameters and decode that using SQL Server's convoluted XML syntax.
Below is a relatively simple alternative that works for up to a thousand values. The goal is to to build an in query:
select col1 from YourTable where col1 in ('val1', 'val2', ...)
In C#, you should probably use parameters:
select col1 from YourTable where col1 in (#par1, #par2, ...)
Which you can pass like:
var com = yourConnection.CreateCommand();
com.CommandText = #"select col1 from YourTable where col1 in (";
for (var i=0; i< dataNames.Length; i++)
{
var parName = string.Format("par{0}", i+1);
com.Parameters.AddWithValue(parName, dataNames[i]);
com.CommandText += parName;
if (i+1 != dataNames.Length)
com.CommandText += ", ";
}
com.CommandText += ");";
var existingValues = new List<string>();
using (var reader = com.ExecuteReader())
{
while (read.Read())
existingValues.Add(read["col1"]);
}
Given the complexity of this solution I'd go for Max' or Tim's answer. You could consider this answer if the table is very large and you can't copy it into memory.
Sorry I don't have a lot of relevant code here, but I did a similar thing quite some time ago, so I will try to explain.
Essentially I had a long list of item IDs that I needed to return to the client, which then told the server which ones it wanted loaded at any particular time. The original query passed the values as a comma separated set of strings (they were actually GUIDs). Problem was that once the number of entries hit 100, there was a noticeable lag to the user, once it got to 1000 possible entries, the query took a minute and a half, and when we went to 10,000, lets just say you could boil the kettle and drink your tea/coffee before it came back.
The answer was to stick the values to check directly into a temporary table, where one row of the table represented one value to check against. The temporary table was keyed against the user who performed the search, so this meant other users searches wouldn't become corrupted with each other, and when the user logged out, then we knew which values in the search table could be removed.
Depending on where this data comes from will depend on the best way for you to load the reference table. But once it is there, then your new query will look something like:-
SELECT Count(t.*), rt.dataName
FROM table t
RIGHT JOIN referenceTable rt ON tr.dataName = t.columnName
WHERE rt.userRef = #UserIdValue
GROUP BY tr.dataName
The RIGHT JOIN here should give you a value for each of your reference table values, including 0 if the value did not appear in your table. If you don't care which one don't appear, then changing it to an INNER JOIN will eliminate the zeros.
The WHERE clause is to ensure that your search only returns the unique items that you are looking for at the moment - the design should consider that concurrent access will someday occur here (even if it doesn't at the moment), so writing something in to protect it is advisable.
I am trying to query names in one table and then use that result to pull the master records into a DataGridView. So what I need to do is get the names from the interest table that are like what is put into text boxes then use those results to pull the data from the CaseSelector table and set the bindingsource filter to those results. Why can't I seem to set results to the caseSelectorBindingSourceFilter
var results = from CaseSelector in db.CaseSelectors
from Interest in db.Interests
where SqlMethods.Like(Interest.First, txtFirst.Text) && SqlMethods.Like(Interest.Last, txtLast.Text)
select CaseSelector;
caseSelectorBindingSource.Filter = ("CaseNumberKey =" + results);
You can find examples for LINQ join queries here:
http://code.msdn.microsoft.com/101-LINQ-Samples-3fb9811b
I don't know your DB schema but you're looking for something along the lines of:
from c in db.Cases
join i in db.Interest on i.CaseNumberKey equals c.CaseNumberKey
select c
What you've just retrieved is a SQL Set of the valid caseNumberKeys.
Your casefilter should be something like
caseSelectorBindingSource.Filter = "CaseNumberKey in " + interestsresultsAsSQLSET;
You'll have to iterate over the interestsresults and convert it to the string representation of a SQLSET. If you're feeling lucky, you could just try to .toString() it and see what happens.
string interestsResultsAsSQLSet = "(";
//write most of them
for(int i=0; i<interestsresults.size() - 1; i++) {
interestsResultAsSQLSet.append(interestsresults[i] + ",");
//write the last one, with right paren
interestsresults.append(interestsresults[interestsresults.size() -1] + ")");
I write Java all day, ignore my basic language errors. :P
I am using c#, sqlce, datatable. I select products with this query:
select price from products where productid=myproduct and mydate between startdate and finaldate and clientid=myclient
But it's going to take much time.
So I could use a datatable to store products:
datatable prices = new datatable();
prices=this.returntable("select productid,startdate,finaldate,client from products")
for(int i=0;i<allproductsihave;i++) {
product.newprice=this.getmyprice(prices,product.client,product.id,product.date);
}
private decimal getmyprice(datatable products,string myclient, string myproduct, datetime mydate)
{
//how do I do at here the equivalent to first sql query?
//for select a product from a datatable where my date is between 2 dates,
//and client=myclient and productid=myproduct
return Convert.Todecimal(datatable.prices[something]["price"].tostring());
}
In this way I shouldn't connect to the database for each product query.
Is it posible?
maybe doing convert.todatetime to startdate and finaldate?
Sounds like you dont want to make multiple requests to the database because you are worried about making several connections, which will be slower? If you dont want to connect for every product, why not just use a "where product in ('xxx', 'yyy')" instead? Then you have a single query call, and no changes on the database. You can then process the results when you get them back.
Your boss should speak to #pilotcam. He is right. Its no quicker in c# than it would be on the database. :)
In fact, doing this in c# means you are getting information back (I assume over a network if its a remote database) which you would never use, so its probably slower, how much depends on how much data is in your database which wont be in your final results!
The DataTable.Select() method accepts a string representing a "where" condition and returns an array of DataRow containing matching rows.
Datarow[] products = prices.Select(string.Format("'{0}'>='{1}' AND '{0}'<='{2}'", mydate, startdate, finaldate));
Mind the date format!! It must be sql compliant.
As much as I disagree with this method for reasons stated in comments, here's one possibility. The idea here is to check for the most obvious exclusions first, then work your way to the date later.
private decimal GetMyPrice(DataTable table, string client, string product, DateTime date)
{
foreach (DataRow row in table.Rows)
{
if (row["productid"] != product) continue;
if (row["client"] != client) continue;
if (Convert.ToDateTime(row["startdate"]) < date) continue;
if (Convert.ToDateTime(row["finaldate"]) > date) continue;
return Convert.ToDecimal(row["price"]);
}
throw new KeyNotFoundException();
}
I'm importing the data from three Tab delimited files in the DataTables and after that I need to go thru every row of master table and find all the rows in two child tables. Against each DataRow[] array I found from the child tables, I have to again go thru individually each row and check the values based upon different paramenters and at the end I need to create a final record which will be merger of master and two child table columns.
Now I have done that and its working but the problem is its Performance. I'm using the DataTable.Select to find all child rows from child table which I believe making it very slow.
Please remember the None of the table has any Primary key as the duplicate rows are acceptable.
At the moment I have 1200 rows in master table and aroun 8000 rows in child table and the total time it takes to do that is 8 minutes.
Any idea how can I increase the Performance.
Thanks in advance
The code is below ***************
DataTable rawMasterdt = importMasterFile();
DataTable rawDespdt = importDescriptionFile();
dsHelper = new DataSetHelper();
DataTable distinctdt = new DataTable();
distinctdt = dsHelper.SelectDistinct("DistinctOffers", rawMasterdt, "C1");
if (distinctdt.Rows.Count > 0)
{
int count = 0;
foreach (DataRow offer in distinctdt.Rows)
{
string exp = "C1 = " + "'" + offer[0].ToString() + "'" + "";
DataRow masterRow = rawMasterdt.Select(exp)[0];
count++;
txtBlock1.Text = "Importing Offer " + count.ToString() + " of " + distinctdt.Rows.Count.ToString();
if (masterRow != null )
{
Product newProduct = new Product();
newProduct.Code = masterRow["C4"].ToString();
newProduct.Name = masterRow["C5"].ToString();
// -----
newProduct.Description = getProductDescription(offer[0].ToString(), rawDespdt);
newProduct.Weight = getProductWeight(offer[0].ToString(), rawDespdt);
newProduct.Price = getProductRetailPrice(offer[0].ToString(), rawDespdt);
newProduct.UnitPrice = getProductUnitPrice(offer[0].ToString(), rawDespdt);
// ------- more functions similar to above here
productList.Add(newProduct);
}
}
txtBlock1.Text = "Import Completed";
public string getProductDescription(string offercode, DataTable dsp)
{
string exp = "((C1 = " + "'" + offercode + "')" + " AND ( C6 = 'c' ))";
DataRow[] dRows = dsp.Select( exp);
string descrip = "";
if (dRows.Length > 0)
{
for (int i = 0; i < dRows.Length - 1; i++)
{
descrip = descrip + " " + dRows[i]["C12"];
}
}
return descrip;
}
.Net 4.5 and the issue is still there.
Here are the results of a simple benchmark where DataTable.Select and different dictionary implementations are compared for CPU time (results are in milliseconds)
#Rows Table.Select Hashtable[] SortedList[] Dictionary[]
1000 43,31 0,01 0,06 0,00
6000 291,73 0,07 0,13 0,01
11000 604,79 0,04 0,16 0,02
16000 914,04 0,05 0,19 0,02
21000 1279,67 0,05 0,19 0,02
26000 1501,90 0,05 0,17 0,02
31000 1738,31 0,07 0,20 0,03
Problem:
The DataTable.Select method creates a "System.Data.Select" class instance internally, and this "Select" class creates indexes based on the fields (columns) specified in the query. The Select class makes re-use of the indexes it had created but the DataTable implementation does not re-use the Select class instance hence the indexes are re-created every time DataTable.Select is invoked. (This behaviour can be observed by decompiling System.Data)
Solution:
Assume the following query
DataRow[] rows = data.Select("COL1 = 'VAL1' AND (COL2 = 'VAL2' OR COL2 IS NULL)");
Instead, create and fill a Dictionary with keys corresponding to the different value combinations of the values of the columns used as the filter. (This relatively expensive operation must be done only once and the dictionary instance must then be re-used)
Dictionary<string, List<DataRow>> di = new Dictionary<string, List<DataRow>>();
foreach (DataRow dr in data.Rows)
{
string key = (dr["COL1"] == DBNull.Value ? "<NULL>" : dr["COL1"]) + "//" + (dr["COL2"] == DBNull.Value ? "<NULL>" : dr["COL2"]);
if (di.ContainsKey(key))
{
di[key].Add(dr);
}
else
{
di.Add(key, new List<DataRow>());
di[key].Add(dr);
}
}
Query the Dictionary (multiple queries may be required) to filter the rows and combine the results into a List
string key1 = "VAL1//VAL2";
string key2 = "VAL1//<NULL>";
List<DataRow>() results = new List<DataRow>();
if (di.ContainsKey(key1))
{
results.AddRange(di[key1]);
}
if (di.ContainsKey(key2))
{
results.AddRange(di[key2]);
}
I know this is an old question, and code underpinning this issue may have changed, but I've recently encountered (and gain some insight into) this very issue.
For anyone coming along at a later date ... here's what I found.
Performance of the DataTable.Select(condition) is quite sensitive to the nature and structure of the 'condition' you provide. This looks like a bug to me (where would I report it to Microsoft?) but it may merely be a quirk.
I've written a set of tests to demonstrate the issue that are structured as follows:
Define a datatable with a few simple columns,like this:
var dataTable = new DataTable();
var idCol = dataTable.Columns.Add("Id", typeof(Int32));
dataTable.Columns.Add("Code", typeof(string));
dataTable.Columns.Add("Name", typeof(string));
dataTable.Columns.Add("FormationDate", typeof(DateTime));
dataTable.Columns.Add("Income", typeof(Decimal));
dataTable.Columns.Add("ChildCount", typeof(Int32));
dataTable.Columns.Add("Foreign", typeof(Boolean));
dataTable.PrimaryKey = new DataColumn[1] { idCol };
Populate the table with 40000 records, each with a unique 'Code' field.
Perform a batch of 'selects' (each with different parameters) against the datatable using two similar, but differently formatted, queries and record and compare the total time taken by each of the two formats.
You get remarkable results. Testing, for example, the below two conditions side-by-side:
Q1: [Code] = 'XX'
Q2: ([Code] = 'XX')
[ I do multiple Select calls using the above two queries, each iteration I replace the XX with a valid code that exists in the datatable ]
The result?
Time comparison for 320 lookups against 40000 records: 180 msec total search time with no brackets, 6871 msec total search time for search WITH brackets
Yes - 38 times slower if you just have the extra brackets surrounding the condition.
There are other scenarios which react differently.
For example,
[Code] = '{searchCode}' OR 1=0 vs ([Code] = '{searchCode}' OR 1=0) take similar (slow) times to execute, but:
[Code] = '{searchCode}' AND 1=1 vs ([Code] = '{searchCode}' AND 1=1) again shows the non-bracketed version to be close to 40 times faster.
I've not investigated all scenarios, but it seems that the introduction of brackets - either redundantly around a simple comparison check, or as required to specify sub-expression precedence - or the presence of an 'OR' slows the query down considerably.
I could speculate that the issue is caused by how the datatable parses the condition you use and how it creates and uses internal indexes ... but I won't.
You can speed it up a lot by using a dictionary. For example:
if (distinctdt.Rows.Count > 0)
{
// build index of C1 values to speed inner loop
Dictionary<string, DataRow> masterIndex = new Dictionary<string, DataRow>();
foreach (DataRow row in rawMasterdt.Rows)
masterIndex[row["C1"].ToString()] = row;
int count = 0;
foreach (DataRow offer in distinctdt.Rows)
{
Then in place of
string exp = "C1 = " + "'" + offer[0].ToString() + "'" + "";
DataRow masterRow = rawMasterdt.Select(exp)[0];
You would do this
DataRow masterRow;
if (masterIndex.ContainsKey(offer[0].ToString())
masterRow = masterIndex[offer[0].ToString()];
else
masterRow = null;
If you create a DataRelation between your parent and child DataTables, you can look up child rows by invoking DataRow.GetChildRows(DataRelation) on the parent row (resp. DataRow.GetChildRelName in case of typed DataSets). The search will apply a TreeMap lookup, and performance should be fine even with a lot of child rows.
In case you have to search for rows based on other criteria than on a DataRelation's foreign keys, I recommend to use DataView.Sort / DataView.FindRows() instead of DataTable.Select(), as soon as you have to query the data more than once. DataView.FindRows() will be based on TreeMap lookup (O(log(N)), where as DataTable.Select() has to scan all rows (O(N)). This article contains more details: http://arnosoftwaredev.blogspot.com/2011/02/when-datatableselect-is-slow-use.html
DataTables can be made to have Relationships with other DataTables in a DataSet. See http://msdn.microsoft.com/en-us/library/ay82azad%28VS.71%29.aspx for a bit of discussion and as a start point to browsing. I've not much experience of using them but as I understand it they will do what you want (assuming your tables are in a suitable format). I would assume that these have greater efficiency than a manual process of doing the same but I may be wrong. Might be worth seeing if they work for you and benchmarking to see if they are an improvement or not...
Have you ran it through a profiler? That should be the first step. Anyhow, this might help:
Read the master text file into memory line by line. Put the master record into a dictionary as the key. Add it to the dataset (1 pass through master).
Read child text file line by line, add this as a value for the appropriate master record in the dictionary created above
Now you have everything in the dictionary in memory, only doing 1 pass through each file.
Do a final pass through the dictionary/children and process each column and perform final calcs.