check if values are in datatable - c#

I have an array or string:
private static string[] dataNames = new string[] {"value1", "value2".... };
I have table in my SQL database with a column of varchar type. I want to check which values from the array of string exists in that column.
I tried this:
public static void testProducts() {
string query = "select * from my table"
var dataTable = from row in dt.AsEnumerable()
where String.Equals(row.Field<string>("columnName"), dataNames[0], StringComparison.OrdinalIgnoreCase)
select new {
Name = row.Field<string> ("columnName")
};
foreach(var oneName in dataTable){
Console.WriteLine(oneName.Name);
}
}
that code is not the actual code, I am just trying to show you the important part
That code as you see check according to dataNames[index]
It works fine, but I have to run that code 56 times because the array has 56 elements and in each time I change the index
is there a faster way please?
the Comparison is case insensitive

First, you should not filter records in memory but in the datatabase.
But if you already have a DataTable and you need to find rows where one of it's fields is in your string[], you can use Linq-To-DataTable.
For example Enumerable.Contains:
var matchingRows = dt.AsEnumerable()
.Where(row => dataNames.Contains(row.Field<string>("columnName"), StringComparer.OrdinalIgnoreCase));
foreach(DataRow row in matchingRows)
Console.WriteLine(row.Field<string>("columnName"));
Here is a more efficient (but less readable) approach using Enumerable.Join:
var matchingRows = dt.AsEnumerable().Join(dataNames,
row => row.Field<string>("columnName"),
name => name,
(row, name) => row,
StringComparer.OrdinalIgnoreCase);

try to use contains should return all value that you need
var data = from row in dt.AsEnumerable()
where dataNames.Contains(row.Field<string>("columnName"))
select new
{
Name = row.Field<string>("columnName")
};

Passing a list of values is surprisingly difficult. Passing a table-valued parameter requires creating a T-SQL data type on the server. You can pass an XML document containing the parameters and decode that using SQL Server's convoluted XML syntax.
Below is a relatively simple alternative that works for up to a thousand values. The goal is to to build an in query:
select col1 from YourTable where col1 in ('val1', 'val2', ...)
In C#, you should probably use parameters:
select col1 from YourTable where col1 in (#par1, #par2, ...)
Which you can pass like:
var com = yourConnection.CreateCommand();
com.CommandText = #"select col1 from YourTable where col1 in (";
for (var i=0; i< dataNames.Length; i++)
{
var parName = string.Format("par{0}", i+1);
com.Parameters.AddWithValue(parName, dataNames[i]);
com.CommandText += parName;
if (i+1 != dataNames.Length)
com.CommandText += ", ";
}
com.CommandText += ");";
var existingValues = new List<string>();
using (var reader = com.ExecuteReader())
{
while (read.Read())
existingValues.Add(read["col1"]);
}
Given the complexity of this solution I'd go for Max' or Tim's answer. You could consider this answer if the table is very large and you can't copy it into memory.

Sorry I don't have a lot of relevant code here, but I did a similar thing quite some time ago, so I will try to explain.
Essentially I had a long list of item IDs that I needed to return to the client, which then told the server which ones it wanted loaded at any particular time. The original query passed the values as a comma separated set of strings (they were actually GUIDs). Problem was that once the number of entries hit 100, there was a noticeable lag to the user, once it got to 1000 possible entries, the query took a minute and a half, and when we went to 10,000, lets just say you could boil the kettle and drink your tea/coffee before it came back.
The answer was to stick the values to check directly into a temporary table, where one row of the table represented one value to check against. The temporary table was keyed against the user who performed the search, so this meant other users searches wouldn't become corrupted with each other, and when the user logged out, then we knew which values in the search table could be removed.
Depending on where this data comes from will depend on the best way for you to load the reference table. But once it is there, then your new query will look something like:-
SELECT Count(t.*), rt.dataName
FROM table t
RIGHT JOIN referenceTable rt ON tr.dataName = t.columnName
WHERE rt.userRef = #UserIdValue
GROUP BY tr.dataName
The RIGHT JOIN here should give you a value for each of your reference table values, including 0 if the value did not appear in your table. If you don't care which one don't appear, then changing it to an INNER JOIN will eliminate the zeros.
The WHERE clause is to ensure that your search only returns the unique items that you are looking for at the moment - the design should consider that concurrent access will someday occur here (even if it doesn't at the moment), so writing something in to protect it is advisable.

Related

Create column from another table dynamically

I'm working with TSQL and C#. I have two queries that return strings:
string[] allSubcategories = dt.AsEnumerable().Select(x => x.Field<string>("SubcategoryName")).Distinct().ToArray();
var redMark = db.GetTableBySQL("SELECT * FROM RedMarkItems");
string[] redMarkColumns = redMark.Columns.Cast<DataColumn>().Select(x => x.ColumnName).ToArray();
So, as you can see I have two different arrays, first I get subcategoriesNames:
and all columns of table RedMarkItems:
That I want to do is to create column dynamically, I mean, if subcategorieName does not exist as column in RedMarkItems do an Update and create it someting like:
var createColumn = db.ExeSQL($"ALTER TABLE RedMarkItems ADD {ColumnName} BIT");
How can I compare if subcategorieName does not exist as column in RedMarkItems table? Then create column as my query? Regards
If you want to know if a particular column exists in an already filled DataTable using the Linq approach then it is just:
bool exists = redMark.Columns.Cast<DataColumn>().Any(x => x.ColumnName == "SubCategoryName");
Instead, if you want to ask this info directly to the database then use the INFORMATION_SCHEMA views The Columns view is the one to use with a query like this.
string query = #"IF EXISTS(SELECT 1 FROM INFORMATION_SCHEMA.Column
WHERE Column_Name = #colName)
SELECT 1 ELSE SELECT 0";
SqlCommand cmd = new SqlCommand(query, connection);
cmd.Parameters.Add("#colName", SqlDbType.NVarChar).Value = "SubCategoryName";
bool exists = (cmd.ExecuteScalar() == 1);
Now, the part about creating the column is pretty simple as code per se. It is just an appropriate ALTER TABLE. But there are a lot of things to be cleared before. What will be the datatype of the new column? What will be its length and precision? What will be the constraints applied to it (Null/Not Null defaults etc)? As you can see all these info are very important and require to be defined somewhere in your code.

Avoid duplication in DataTable query and build

What would be the right way to avoid duplication when querying datatable and then saving it to DataTable. I'm using the pattern below, which gets very error-prone once tables grow. I looked at below hints. With first one copyToDataTable() looks not really applicable and second is for me much too complex for the task. I would like to split the below code into 2 separate methods (first to build the query and second to retrieve the DataTable). Perhaps if I avoid the anonymous type in the query this should be easier to avoid hardcoding all the column names - but I'm somehow lost with this.
Filling a DataSet or DataTable from a LINQ query result set
or
https://msdn.microsoft.com/en-us/library/bb669096%28v=vs.110%29.aspx
public DataTable retrieveReadyReadingDataTable()
{
DataTable dtblReadyToSaveToDb = RetrieveDataTableExConstraints();
var query = from scr in scrTable.AsEnumerable()
from products in productsTable.AsEnumerable()
where(scr.Field<string>("EAN") == products.Field<string>("EAN"))
select
new
{
Date = DateTime.Today.Date,
ProductId = products.Field<string>("SkuCode"),
Distributor = scr.Field<string>("Distributor"),
Price = float.Parse(scr.Field<string>("Price")),
Url = scr.Field<string>("Url")
};
foreach (var q in query)
{
DataRow newRow = dtblReadyToSaveToDb.Rows.Add();
newRow.SetField("Date", q.Date);
newRow.SetField("ProductId", q.ProductId);
newRow.SetField("Distributor", q.Distributor);
newRow.SetField("Price", q.Price);
newRow.SetField("Url", q.Url);
}
return dtblReadyToSaveToDb;
}
Firstly, you have to decide what "duplicate" means in your case. According to your code i would say a duplicate is a row with the same value in column Date, ProductId and Distributor. So add a multi column primary key for those columns first.
Secondly, you should add some sort of code that first queries existing rows and then compares these existing rows to the rows you want to create. If a match is found, then simply just don't insert a new row.

Optimizing query that uses AsEnumerable and SingleOrDefault

Not long ago there was a feature request in the program I am maintaining. Basically it has to fill up a table in the database with info from a text file. These files can be pretty big, but it was fairly easy to do because these files were defined as the complete list of user data. Therefore the table could be truncated and the just filled up again with data from the text file.
But then a week ago it was decided that these files are actually updates of current user info, so now I have to retrieve the correct MeteringPointId (which only exist once if it does exist) and then update info on it. If it doesn't exist, just insert data as before.
The way I do this is retrieving the complete database table with data from the database into memory and then just updating on that info before finally saving the changes by calling the datatables update function. It works fine, except that finding the row with the MeteringPointId is slow:
DataRow row = MeteringPointsDataTable.NewRow();
// this is called for each line in the text file to find the corresponding MeteringPointId. It can be 300.000 times.
row = MeteringPointsDataTable.AsEnumerable().SingleOrDefault(r => r.Field<string>("MeteringPointId").ToString() == MeteringPointId);
Is there a way to retrieve a DataRow from a DataTable that is faster than this?
If you are sure that only one item con fullfil the condition use FirstOrDefault instead of Single. Thus you won´t collect the whole table but only the first entry you´ve found.
You can use Select method of DataTable.
var expression = "[MeteringPointId] = '" + MeteringPointId + "'";
DataRow[] result = MeteringPointsDataTable.Select(expression);
Also you can create an expression like,
var idList = new []{"id1", "id2", "id3", ...};
var expression = "[MeteringPointId] in " + string.Format("({0})", string.Join(",", idList.Select(i=> "'"+i+"'")));
Similar usage is here
Hope it helps..
You could put the whole table in a dictionary:
//At the start
var meteringPoints = MeteringPointsDataTable.AsEnumerable().ToDictionary(r => r.Field<string>("MeteringPointId").ToString());
//For each row of the text file:
DataRow row;
if (!meteringPoints.TryGetValue(MeteringPointId, out row))
{
row = MeteringPointsDataTable.NewRow();
meteringPoints[MeteringPointId] = row;
}

Filling table takes a lot of time in MS Word

I made the following code to add external data table to another table in MS word document, its working fine but takes a lot of time in case that the number of rows is more than 100, and in case of adding table with rows count more that 500 it fills the ms word table really slow and can't complete the task.
I tried to hide the document and disable the screen update for the document but still no solution for the slow performance.
//Get the required external data to the DT data table
DataTable DT = XDt.GetData();
Word.Table TB;
int X = 1;
foreach (DataRow Rw in DT.Rows)
{
Word.Row Rn = TB.Rows.Add(TB.Rows[X + 1]);
for(int i=0;i<=DT.Columns.Count-1;i++)
{
Rn.Cells[i+1].Range.Text = Rw[i].ToString());
}
X++;
}
So is there a way to make this process go faster ?
The most efficient way to add a table to Word is to first concatenate the data in a delimited text string, where "/n" must be the symbol for end-of-row (record separator). The end-of-cell (field separator) can be any character you like that's not in the string content that makes up the table.
Assign this string to a Range object, then use the ConvertToTable() method to create the table.
You're retrieving the last row of the current table for the BeforeRow parameter of TB.Rows.Add. This is significantly slower than simply adding the row. You should replace this:
Word.Row Rn = TB.Rows.Add(TB.Rows[X + 1]);
With this:
Word.Row Rn = TB.Rows.Add();
Utilizing parallelization as suggested in the comments might help slightly, but I'm afraid it's not going to do much good seeing the table add code runs on the main thread as mentioned in this link.
EDIT:
If performance is still an issue, I'd look into creating the Word table independently of the Word object model by using OpenXML. It's orders of magnitude faster.
ConvertToTable method is orders of magnitude faster than adding Rows/Cells one at a time.
while (reader.Read())
{
values = new object[reader.FieldCount];
var cols = reader.GetValues(values);
var item = String.Join("\t", values);
items.Add(item);
};
data = String.Join("\n", items.ToArray());
var tempDocument = application.Documents.Add();
var range = tempDocument.Range();
range.Text = data;
var tempTable = range.ConvertToTable(Separator: Microsoft.Office.Interop.Word.WdTableFieldSeparator.wdSeparateByTabs,
NumColumns: reader.FieldCount,
NumRows: rows, DefaultTableBehavior: WdDefaultTableBehavior.wdWord9TableBehavior,
AutoFitBehavior: WdAutoFitBehavior.wdAutoFitWindow);

What is the best way to fast insert SQL data and dependant rows?

I need to write some code to insert around 3 million rows of data.
At the same time I need to insert the same number of companion rows.
I.e. schema looks like this:
Item
- Id
- Title
Property
- Id
- FK_Item
- Value
My first attempt was something vaguely like this:
BaseDataContext db = new BaseDataContext();
foreach (var value in values)
{
Item i = new Item() { Title = value["title"]};
ItemProperty ip = new ItemProperty() { Item = i, Value = value["value"]};
db.Items.InsertOnSubmit(i);
db.ItemProperties.InsertOnSubmit(ip);
}
db.SubmitChanges();
Obviously this was terribly slow so I'm now using something like this:
BaseDataContext db = new BaseDataContext();
DataTable dt = new DataTable("Item");
dt.Columns.Add("Title", typeof(string));
foreach (var value in values)
{
DataRow item = dt.NewRow();
item["Title"] = value["title"];
dt.Rows.Add(item);
}
using (System.Data.SqlClient.SqlBulkCopy sb = new System.Data.SqlClient.SqlBulkCopy(db.Connection.ConnectionString))
{
sb.DestinationTableName = "dbo.Item";
sb.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Title", "Title"));
sb.WriteToServer(dt);
}
But this doesn't allow me to add the corresponding 'Property' rows.
I'm thinking the best solution might be to add a Stored Procedure like this one that generically lets me do a bulk insert (or at least multiple inserts, but I can probably disable logging in the stored procedure somehow for performance) and then returns the corresponding ids.
Can anyone think of a better (i.e. more succinct, near equal performance) solution?
To combine the previous best two answers and add in the missing piece for the IDs:
1) Use BCP to Load the data into a temporary "staging" table defined like this
CREATE TABLE stage(Title AS VARCHAR(??), value AS {whatever});
and you'll need the appropriate index for performance later:
CREATE INDEX ix_stage ON stage(Title);
2) Use SQL INSERT to load the Item table:
INSERT INTO Item(Title) SELECT Title FROM stage;
3) Finally load the Property table by joining stage with Item:
INSERT INTO Property(FK_ItemID, Value)
SELECT id, Value
FROM stage
JOIN Item ON Item.Title = stage.Title
The best way to move that much data into SQL Server is bcp. Assuming that the data starts in some sort of file, you'll need to write a small script to funnel the data into the two tables. Alternately you could use bcp to funnel the data into a single table and then use an SP to INSERT the data into the two tables.
Bulk copy the data into a temporary table, and then call a stored proc that splits the data into the two tables you need to populate.
You can bulk copy in code as well, using the .NET SqlBulkCopy class.

Categories

Resources