Optimizing query that uses AsEnumerable and SingleOrDefault

Optimizing query that uses AsEnumerable and SingleOrDefault - c#

Not long ago there was a feature request in the program I am maintaining. Basically it has to fill up a table in the database with info from a text file. These files can be pretty big, but it was fairly easy to do because these files were defined as the complete list of user data. Therefore the table could be truncated and the just filled up again with data from the text file.
But then a week ago it was decided that these files are actually updates of current user info, so now I have to retrieve the correct MeteringPointId (which only exist once if it does exist) and then update info on it. If it doesn't exist, just insert data as before.
The way I do this is retrieving the complete database table with data from the database into memory and then just updating on that info before finally saving the changes by calling the datatables update function. It works fine, except that finding the row with the MeteringPointId is slow:
DataRow row = MeteringPointsDataTable.NewRow();
// this is called for each line in the text file to find the corresponding MeteringPointId. It can be 300.000 times.
row = MeteringPointsDataTable.AsEnumerable().SingleOrDefault(r => r.Field<string>("MeteringPointId").ToString() == MeteringPointId);
Is there a way to retrieve a DataRow from a DataTable that is faster than this?

If you are sure that only one item con fullfil the condition use FirstOrDefault instead of Single. Thus you won´t collect the whole table but only the first entry you´ve found.

You can use Select method of DataTable.
var expression = "[MeteringPointId] = '" + MeteringPointId + "'";
DataRow[] result = MeteringPointsDataTable.Select(expression);
Also you can create an expression like,
var idList = new []{"id1", "id2", "id3", ...};
var expression = "[MeteringPointId] in " + string.Format("({0})", string.Join(",", idList.Select(i=> "'"+i+"'")));
Similar usage is here
Hope it helps..

You could put the whole table in a dictionary:
//At the start
var meteringPoints = MeteringPointsDataTable.AsEnumerable().ToDictionary(r => r.Field<string>("MeteringPointId").ToString());
//For each row of the text file:
DataRow row;
if (!meteringPoints.TryGetValue(MeteringPointId, out row))
{
row = MeteringPointsDataTable.NewRow();
meteringPoints[MeteringPointId] = row;
}

Related

DataTable update() inserts duplicate new rows without checking if it exists

I'm trying to use the update() method, but it is inserting my datatable data into my database without checking if the row exists, so it is inserting duplicate data. It is also not deleting rows that don't exist in datatable. How to resolve this? I want to synchronize my datatable with server table.
private void Form1_Load(object sender, EventArgs e)
{
// TODO: This line of code loads data into the 'MyDatabaseDataSet11.Vendor_GUI_Test_Data' table. You can move, or remove it, as needed.
this.vendor_GUI_Test_DataTableAdapter.Fill(this.MyDatabaseDataSet11.Vendor_GUI_Test_Data);
// read target table on SQL Server and store in a tabledata var
this.ServerDataTable = this.MyDatabaseDataSet11.Vendor_GUI_Test_Data;
}
Insertion
private void convertGUIToTableFormat()
{
ServerDataTable.Rows.Clear();
// loop through GUIDataTable rows
for (int i = 0; i < GUIDataTable.Rows.Count; i++)
{
String guiKEY = (String)GUIDataTable.Rows[i][0] + "," + (String)GUIDataTable.Rows[i][8] + "," + (String)GUIDataTable.Rows[i][9];
//Console.WriteLine("guiKey: " + guiKEY);
// loop through every DOW value, make a new row for every true
for(int d = 1; d < 8; d++)
{
if ((bool)GUIDataTable.Rows[i][d] == true)
{
DataRow toInsert = ServerDataTable.NewRow();
toInsert[0] = GUIDataTable.Rows[i][0];
toInsert[1] = d + "";
toInsert[2] = GUIDataTable.Rows[i][8];
toInsert[3] = GUIDataTable.Rows[i][9];
ServerDataTable.Rows.InsertAt(toInsert, 0);
//printDataRow(toInsert);
//Console.WriteLine("---------------");
}
}
}
Trying to update
// I got this adapter from datagridview, casting my datatable to their format
CSharpFirstGUIWinForms.MyDatabaseDataSet1.Vendor_GUI_Test_DataDataTable DT = (CSharpFirstGUIWinForms.MyDatabaseDataSet1.Vendor_GUI_Test_DataDataTable)ServerDataTable;
DT.PrimaryKey = new DataColumn[] { DT.Columns["Vendor"], DT.Columns["DOW"], DT.Columns["LeadTime"], DT.Columns["DemandPeriod"] };
this.vendor_GUI_Test_DataTableAdapter.Update(DT);

Let's look at what happens in the code posted.
First this line:
this.ServerDataTable = this.MyDatabaseDataSet11.Vendor_GUI_Test_Data;
This is not a copy, but just an assignment between two variables. The assigned one (ServerDataTable) receives the 'reference' to the memory area where the data coming from the database has been stored. So these two variables 'point' to the same memory area. Whatever you do with one affects what the other sees.
Now look at this line:
ServerDataTable.Rows.Clear();
Uh! Why? You are clearing the memory area where the data loaded from the database were. Now the Datatable is empty and no records (DataRow) are present there.
Let's look at what happen inside the loop
DataRow toInsert = ServerDataTable.NewRow();
A new DataRow has been created, now every DataRow has a property called RowState and when you create a new row this property has the default value of DataRowState.Detached, but when you add the row inside the DataRow collection with
ServerDataTable.Rows.InsertAt(toInsert, 0);
then the DataRow.RowState property becomes DataRowState.Added.
At this point the missing information is how a TableAdapter behaves when you call Update. The adapter needs to build the appropriate INSERT/UPDATE/DELETE sql command to update the database. And what is the information used to choose the proper sql command? Indeed, it looks at the RowState property and it sees that all your rows are in the Added state. So it chooses the INSERT command for your table and barring any duplicate key violation you will end in your table with duplicate records.
What should you do to resolve the problem? Well the first thing is to remove the line that clears the memory from the data loaded, then, instead of calling always InsertAt you should first look if you have already the row in memory. You could do this using the DataTable.Select method. This method requires a string like it is a WHERE statement and you should use some value for the primarykey of your table
var rows = ServerDataTable.Select("PrimaryKeyFieldName = " + valueToSearchFor);
if you get a rows count bigger than zero then you can use the first row returned and update the existing values with your changes, if there is no row matching the condition then you can use the InsertAt like you are doing it now.

You're trying too hard, I think, and you're unfortunately getting nearly everything wrong
// read target table on SQL Server and store in a tabledata var
this.ServerDataTable = this.MyDatabaseDataSet11.Vendor_GUI_Test_Data;
No, this line of code doesn't do anything at all with the database, it just assigns an existing datatable to a property called ServerDataTable.
for (int i = 0; i < GUIDataTable.Rows.Count; i++)
It isn't clear if GUIDataTable is strongly or weakly typed, but if it's strong (I.e. it lives in your dataset, or is of a type that is a part of your dataset) you will do yourself massive favors if you do not access it's Rows collection at all. The way to access a strongly typed datatable is as if it were an array
myStronglyTypedTable[2] //yes, third row
myStronglyTypedTable.Rows[2] //no, do not do this- you end up with a base type DataRow that is massively harder to work with
Then we have..
DataRow toInsert = ServerDataTable.NewRow();
Again, don't do this.. you're working with strongly typed datatables. This makes your life easy:
var r = MyDatabaseDataSet11.Vendor_GUI_Test_Data.NewVendor_GUI_Test_DataRow();
Because now you can refer to everything by name and type, not numerical index and object:
r.Total = r.Quantity * r.Price; //yes
toInsert["Ttoal"] = (int)toInsert["Quantity"] * (double)toInsert["Price"]; //no. Messy, hard work, "stringly" typed, casting galore, no intellisense.. The typo was deliberate btw
You can also easily add data to a typed datatable like:
MyPersonDatatable.AddPersonRow("John, "smith", 29, "New York");
Next up..
// I got this adapter from datagridview, casting my datatable to their format
CSharpFirstGUIWinForms.MyDatabaseDataSet1.Vendor_GUI_Test_DataDataTable DT = (CSharpFirstGUIWinForms.MyDatabaseDataSet1.Vendor_GUI_Test_DataDataTable)ServerDataTable;
DT.PrimaryKey = new DataColumn[] { DT.Columns["Vendor"], DT.Columns["DOW"], DT.Columns["LeadTime"], DT.Columns["DemandPeriod"] };
this.vendor_GUI_Test_DataTableAdapter.Update(DT);
Need to straighten out the concepts and terminology in your mind here.. that is not an adapter, it didn't come from a datagridview, grid views never provide adapters, your datatable variable was always their format and if you typed it as DataTable ServerDataTable then that just makes it massively harder to work with, in the same way that saying object o = new Person() - now you have to cast o every time you want to do nearly anything Person specific with it. You could always declare all your variables in every program, as type object.. but you don't.. Hence don't do the equivalent by putting your strongly typed datatables inside DataTable typed variables because you're just hiding away the very things that make them useful and easy to work with
If you download rows from a database into a datatable, and you want to...
... delete them from the db, then call Delete on them in the datatable
... update them in the db, then set new values on the existing rows in the datatable
... insert more rows into the db alongside the existing rows, then add more rows to the datatable
Datatables track what you do to their rows. If you clear a datatable it doesn't mark every row as deleted, it just jettisons the rows. No db side rows will be affected. If you delete rows then they gain a rowstate of deleted and a delete query will fire when you call adapter.Update
Modify rows to cause an update to fire. Add new rows for insert
As Steve noted, you jettisoned all the rows, added new ones, added (probably uselessly) a primary key(the strongly typed table will likely have already had this key) which doesn't mean that the new rows are automatically associated to the old/doesn't cause them to be updated, hen inserted a load of new rows and wrote them to the db. This process was never going to update or delete anything
The way this is supposed to work is, you download rows, you see them in the grid, you add some, you change some, you delete some, you hit the save button. Behind the scenes the grid just poked some new rows into the datatable, marked some as deleted, changed others. It didn't go to the huge (and unfortunately incorrect) lengths your code went to. If you want your code to behave the same you follow the same idea:
var pta = new PersonTableAdapter();
var pdt = pta.GetData(); //query that returns all rows
pta.Fill(somedataset.Person); //or can do this
pdt = somedataset.Person; //alias of Person table
var p = pdt.FindByPersonId(123); //PersonId is the primary key in the datatable
p.Delete(); //mark person 123 as deleted
p = pdt.First(r => r.Name = "Joe"); //LINQ just works on strongly typed datatables, out of the box, no messing
p.Name = "John"; //modify joes name to John
pdt.AddPersonRow("Jane", 22);
pta.Update(pdt); //saves changes(delete 123, rename joe, add Jane) to db
What you need to appreciate is that all these commands are just finding or creating datarow obj3cts, that live inside a table.. the table tracks what you do and the adapter uses appropriate sql to send changes to the db.. if you wanted to mark all rows in a datatable as deleted you can visit each of them and call Delete() on it, then update the datatable to save the changes to the db

Avoid duplication in DataTable query and build

What would be the right way to avoid duplication when querying datatable and then saving it to DataTable. I'm using the pattern below, which gets very error-prone once tables grow. I looked at below hints. With first one copyToDataTable() looks not really applicable and second is for me much too complex for the task. I would like to split the below code into 2 separate methods (first to build the query and second to retrieve the DataTable). Perhaps if I avoid the anonymous type in the query this should be easier to avoid hardcoding all the column names - but I'm somehow lost with this.
Filling a DataSet or DataTable from a LINQ query result set
or
https://msdn.microsoft.com/en-us/library/bb669096%28v=vs.110%29.aspx
public DataTable retrieveReadyReadingDataTable()
{
DataTable dtblReadyToSaveToDb = RetrieveDataTableExConstraints();
var query = from scr in scrTable.AsEnumerable()
from products in productsTable.AsEnumerable()
where(scr.Field<string>("EAN") == products.Field<string>("EAN"))
select
new
{
Date = DateTime.Today.Date,
ProductId = products.Field<string>("SkuCode"),
Distributor = scr.Field<string>("Distributor"),
Price = float.Parse(scr.Field<string>("Price")),
Url = scr.Field<string>("Url")
};
foreach (var q in query)
{
DataRow newRow = dtblReadyToSaveToDb.Rows.Add();
newRow.SetField("Date", q.Date);
newRow.SetField("ProductId", q.ProductId);
newRow.SetField("Distributor", q.Distributor);
newRow.SetField("Price", q.Price);
newRow.SetField("Url", q.Url);
}
return dtblReadyToSaveToDb;
}

Firstly, you have to decide what "duplicate" means in your case. According to your code i would say a duplicate is a row with the same value in column Date, ProductId and Distributor. So add a multi column primary key for those columns first.
Secondly, you should add some sort of code that first queries existing rows and then compares these existing rows to the rows you want to create. If a match is found, then simply just don't insert a new row.

Filling table takes a lot of time in MS Word

I made the following code to add external data table to another table in MS word document, its working fine but takes a lot of time in case that the number of rows is more than 100, and in case of adding table with rows count more that 500 it fills the ms word table really slow and can't complete the task.
I tried to hide the document and disable the screen update for the document but still no solution for the slow performance.
//Get the required external data to the DT data table
DataTable DT = XDt.GetData();
Word.Table TB;
int X = 1;
foreach (DataRow Rw in DT.Rows)
{
Word.Row Rn = TB.Rows.Add(TB.Rows[X + 1]);
for(int i=0;i<=DT.Columns.Count-1;i++)
{
Rn.Cells[i+1].Range.Text = Rw[i].ToString());
}
X++;
}
So is there a way to make this process go faster ?

The most efficient way to add a table to Word is to first concatenate the data in a delimited text string, where "/n" must be the symbol for end-of-row (record separator). The end-of-cell (field separator) can be any character you like that's not in the string content that makes up the table.
Assign this string to a Range object, then use the ConvertToTable() method to create the table.

You're retrieving the last row of the current table for the BeforeRow parameter of TB.Rows.Add. This is significantly slower than simply adding the row. You should replace this:
Word.Row Rn = TB.Rows.Add(TB.Rows[X + 1]);
With this:
Word.Row Rn = TB.Rows.Add();
Utilizing parallelization as suggested in the comments might help slightly, but I'm afraid it's not going to do much good seeing the table add code runs on the main thread as mentioned in this link.
EDIT:
If performance is still an issue, I'd look into creating the Word table independently of the Word object model by using OpenXML. It's orders of magnitude faster.

ConvertToTable method is orders of magnitude faster than adding Rows/Cells one at a time.
while (reader.Read())
{
values = new object[reader.FieldCount];
var cols = reader.GetValues(values);
var item = String.Join("\t", values);
items.Add(item);
};
data = String.Join("\n", items.ToArray());
var tempDocument = application.Documents.Add();
var range = tempDocument.Range();
range.Text = data;
var tempTable = range.ConvertToTable(Separator: Microsoft.Office.Interop.Word.WdTableFieldSeparator.wdSeparateByTabs,
NumColumns: reader.FieldCount,
NumRows: rows, DefaultTableBehavior: WdDefaultTableBehavior.wdWord9TableBehavior,
AutoFitBehavior: WdAutoFitBehavior.wdAutoFitWindow);

check if values are in datatable

I have an array or string:
private static string[] dataNames = new string[] {"value1", "value2".... };
I have table in my SQL database with a column of varchar type. I want to check which values from the array of string exists in that column.
I tried this:
public static void testProducts() {
string query = "select * from my table"
var dataTable = from row in dt.AsEnumerable()
where String.Equals(row.Field<string>("columnName"), dataNames[0], StringComparison.OrdinalIgnoreCase)
select new {
Name = row.Field<string> ("columnName")
};
foreach(var oneName in dataTable){
Console.WriteLine(oneName.Name);
}
}
that code is not the actual code, I am just trying to show you the important part
That code as you see check according to dataNames[index]
It works fine, but I have to run that code 56 times because the array has 56 elements and in each time I change the index
is there a faster way please?
the Comparison is case insensitive

First, you should not filter records in memory but in the datatabase.
But if you already have a DataTable and you need to find rows where one of it's fields is in your string[], you can use Linq-To-DataTable.
For example Enumerable.Contains:
var matchingRows = dt.AsEnumerable()
.Where(row => dataNames.Contains(row.Field<string>("columnName"), StringComparer.OrdinalIgnoreCase));
foreach(DataRow row in matchingRows)
Console.WriteLine(row.Field<string>("columnName"));
Here is a more efficient (but less readable) approach using Enumerable.Join:
var matchingRows = dt.AsEnumerable().Join(dataNames,
row => row.Field<string>("columnName"),
name => name,
(row, name) => row,
StringComparer.OrdinalIgnoreCase);

try to use contains should return all value that you need
var data = from row in dt.AsEnumerable()
where dataNames.Contains(row.Field<string>("columnName"))
select new
{
Name = row.Field<string>("columnName")
};

Passing a list of values is surprisingly difficult. Passing a table-valued parameter requires creating a T-SQL data type on the server. You can pass an XML document containing the parameters and decode that using SQL Server's convoluted XML syntax.
Below is a relatively simple alternative that works for up to a thousand values. The goal is to to build an in query:
select col1 from YourTable where col1 in ('val1', 'val2', ...)
In C#, you should probably use parameters:
select col1 from YourTable where col1 in (#par1, #par2, ...)
Which you can pass like:
var com = yourConnection.CreateCommand();
com.CommandText = #"select col1 from YourTable where col1 in (";
for (var i=0; i< dataNames.Length; i++)
{
var parName = string.Format("par{0}", i+1);
com.Parameters.AddWithValue(parName, dataNames[i]);
com.CommandText += parName;
if (i+1 != dataNames.Length)
com.CommandText += ", ";
}
com.CommandText += ");";
var existingValues = new List<string>();
using (var reader = com.ExecuteReader())
{
while (read.Read())
existingValues.Add(read["col1"]);
}
Given the complexity of this solution I'd go for Max' or Tim's answer. You could consider this answer if the table is very large and you can't copy it into memory.

Sorry I don't have a lot of relevant code here, but I did a similar thing quite some time ago, so I will try to explain.
Essentially I had a long list of item IDs that I needed to return to the client, which then told the server which ones it wanted loaded at any particular time. The original query passed the values as a comma separated set of strings (they were actually GUIDs). Problem was that once the number of entries hit 100, there was a noticeable lag to the user, once it got to 1000 possible entries, the query took a minute and a half, and when we went to 10,000, lets just say you could boil the kettle and drink your tea/coffee before it came back.
The answer was to stick the values to check directly into a temporary table, where one row of the table represented one value to check against. The temporary table was keyed against the user who performed the search, so this meant other users searches wouldn't become corrupted with each other, and when the user logged out, then we knew which values in the search table could be removed.
Depending on where this data comes from will depend on the best way for you to load the reference table. But once it is there, then your new query will look something like:-
SELECT Count(t.*), rt.dataName
FROM table t
RIGHT JOIN referenceTable rt ON tr.dataName = t.columnName
WHERE rt.userRef = #UserIdValue
GROUP BY tr.dataName
The RIGHT JOIN here should give you a value for each of your reference table values, including 0 if the value did not appear in your table. If you don't care which one don't appear, then changing it to an INNER JOIN will eliminate the zeros.
The WHERE clause is to ensure that your search only returns the unique items that you are looking for at the moment - the design should consider that concurrent access will someday occur here (even if it doesn't at the moment), so writing something in to protect it is advisable.

What is the best way to fast insert SQL data and dependant rows?

I need to write some code to insert around 3 million rows of data.
At the same time I need to insert the same number of companion rows.
I.e. schema looks like this:
Item
- Id
- Title
Property
- Id
- FK_Item
- Value
My first attempt was something vaguely like this:
BaseDataContext db = new BaseDataContext();
foreach (var value in values)
{
Item i = new Item() { Title = value["title"]};
ItemProperty ip = new ItemProperty() { Item = i, Value = value["value"]};
db.Items.InsertOnSubmit(i);
db.ItemProperties.InsertOnSubmit(ip);
}
db.SubmitChanges();
Obviously this was terribly slow so I'm now using something like this:
BaseDataContext db = new BaseDataContext();
DataTable dt = new DataTable("Item");
dt.Columns.Add("Title", typeof(string));
foreach (var value in values)
{
DataRow item = dt.NewRow();
item["Title"] = value["title"];
dt.Rows.Add(item);
}
using (System.Data.SqlClient.SqlBulkCopy sb = new System.Data.SqlClient.SqlBulkCopy(db.Connection.ConnectionString))
{
sb.DestinationTableName = "dbo.Item";
sb.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Title", "Title"));
sb.WriteToServer(dt);
}
But this doesn't allow me to add the corresponding 'Property' rows.
I'm thinking the best solution might be to add a Stored Procedure like this one that generically lets me do a bulk insert (or at least multiple inserts, but I can probably disable logging in the stored procedure somehow for performance) and then returns the corresponding ids.
Can anyone think of a better (i.e. more succinct, near equal performance) solution?

To combine the previous best two answers and add in the missing piece for the IDs:
1) Use BCP to Load the data into a temporary "staging" table defined like this
CREATE TABLE stage(Title AS VARCHAR(??), value AS {whatever});
and you'll need the appropriate index for performance later:
CREATE INDEX ix_stage ON stage(Title);
2) Use SQL INSERT to load the Item table:
INSERT INTO Item(Title) SELECT Title FROM stage;
3) Finally load the Property table by joining stage with Item:
INSERT INTO Property(FK_ItemID, Value)
SELECT id, Value
FROM stage
JOIN Item ON Item.Title = stage.Title

The best way to move that much data into SQL Server is bcp. Assuming that the data starts in some sort of file, you'll need to write a small script to funnel the data into the two tables. Alternately you could use bcp to funnel the data into a single table and then use an SP to INSERT the data into the two tables.

Bulk copy the data into a temporary table, and then call a stored proc that splits the data into the two tables you need to populate.

You can bulk copy in code as well, using the .NET SqlBulkCopy class.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Optimizing query that uses AsEnumerable and SingleOrDefault - c#

If you are sure that only one item con fullfil the condition use FirstOrDefault instead of Single. Thus you won´t collect the whole table but only the first entry you´ve found.

Related

DataTable update() inserts duplicate new rows without checking if it exists

Avoid duplication in DataTable query and build

Filling table takes a lot of time in MS Word

check if values are in datatable

What is the best way to fast insert SQL data and dependant rows?

Categories

Resources