Compare Two DataTables for Non-Matching Records C# - c#

I'm curious how I can compare two data tables in C#. I have two data tables, data table one contains FirstName and LastName, data table 2 has Field1, Field2, First_Name, and Last_Name.
I want to find records that exist in data table 1 that do not exist in data table 2. Anyone ever done this before? Any help would be appreciated. Thank you!

Using LINQ would be most natural, but you will need to convert away from the DataTable to use Except.
var In_dt1_only = dt1.AsEnumerable().Select(r => new { first = r.Field<string>("First"), last = r.Field<string>("Last")}).Except(dt2.AsEnumerable().Select(r => new { first = r.Field<string>("First"), last = r.Field<string>("Last")}));
If you need the original DataRows, you can use a Where instead:
var datarows_in_dt1_only = dt1.AsEnumerable().Where(dr1 => !dt2.AsEnumerable().Any(dr2 => dr1.Field<string>("First") == dr2.Field<string>("First") && dr1.Field<string>("Last") == dr2.Field<string>("Last")));

Related

Create column from another table dynamically

I'm working with TSQL and C#. I have two queries that return strings:
string[] allSubcategories = dt.AsEnumerable().Select(x => x.Field<string>("SubcategoryName")).Distinct().ToArray();
var redMark = db.GetTableBySQL("SELECT * FROM RedMarkItems");
string[] redMarkColumns = redMark.Columns.Cast<DataColumn>().Select(x => x.ColumnName).ToArray();
So, as you can see I have two different arrays, first I get subcategoriesNames:
and all columns of table RedMarkItems:
That I want to do is to create column dynamically, I mean, if subcategorieName does not exist as column in RedMarkItems do an Update and create it someting like:
var createColumn = db.ExeSQL($"ALTER TABLE RedMarkItems ADD {ColumnName} BIT");
How can I compare if subcategorieName does not exist as column in RedMarkItems table? Then create column as my query? Regards
If you want to know if a particular column exists in an already filled DataTable using the Linq approach then it is just:
bool exists = redMark.Columns.Cast<DataColumn>().Any(x => x.ColumnName == "SubCategoryName");
Instead, if you want to ask this info directly to the database then use the INFORMATION_SCHEMA views The Columns view is the one to use with a query like this.
string query = #"IF EXISTS(SELECT 1 FROM INFORMATION_SCHEMA.Column
WHERE Column_Name = #colName)
SELECT 1 ELSE SELECT 0";
SqlCommand cmd = new SqlCommand(query, connection);
cmd.Parameters.Add("#colName", SqlDbType.NVarChar).Value = "SubCategoryName";
bool exists = (cmd.ExecuteScalar() == 1);
Now, the part about creating the column is pretty simple as code per se. It is just an appropriate ALTER TABLE. But there are a lot of things to be cleared before. What will be the datatype of the new column? What will be its length and precision? What will be the constraints applied to it (Null/Not Null defaults etc)? As you can see all these info are very important and require to be defined somewhere in your code.

Pulling data from one SQL Azure table, add a column, then populate a different table

I using C# and LINQ to pull/push data housed in SQL Azure. The basic scenario is we have a Customer table that contains all customers (PK = CompanyID) and supporting tables like LaborTypes and Materials (FK CompanyID to Customer table).
When a new customer signs up, a new record is created in the Customers table. Once that is complete, I want to load a set of default materials and laborTypes from a separate table. It is simple enough if I just wanted to copy data direct from one table to another but in order to populate the existing tables for the new customer, I need to take the seed data (e.g. laborType, laborDescription), add the CompanyID for each row of seed data, then do the insert to the existing table.
What the best method to accomplish this using C# and LINQ with SQL Azure?
An example of a direct insert from user input for LaborTypes is below for contextual reference.
using (var context = GetContext(memCustomer))
{
var u = GetUserByUsername(context, memUser);
var l = (from lbr in context.LaborTypes
where lbr.LaborType1.ToLower() == laborType
&& lbr.Company == u.Company
select lbr).FirstOrDefault();
if (l == null)
{
l = new AccountDB.LaborType();
l.Company = u.Company;
l.Description = laborDescription;
l.LaborType1 = laborType;
l.FlatRate = flatRate;
l.HourlyRate = hourlyRate;
context.LaborTypes.InsertOnSubmit(l);
context.SubmitChanges();
}
result = true;
}
What you'll want to do is write a query retrieving data from table B and do an Insert Statement on Table A using the result(s).
This has been covered elsewhere in SO I think, here would be a good place to start
I don't know the syntax for Linq specifically; but by constructing something similar to #Murph 's answer beyond that link, I think this might work.
var fromB= from b in TableB
where ... ;//identify the row/data from table B
// you may need to make fromB populate from the DB here.
var toInsert = ...; //construct your object with the desired data
// do other necessary things here
TableA.InsertAllOnSubmit(toInsert);
dc.SubmitChanges(); // Submit your changes

Avoid duplication in DataTable query and build

What would be the right way to avoid duplication when querying datatable and then saving it to DataTable. I'm using the pattern below, which gets very error-prone once tables grow. I looked at below hints. With first one copyToDataTable() looks not really applicable and second is for me much too complex for the task. I would like to split the below code into 2 separate methods (first to build the query and second to retrieve the DataTable). Perhaps if I avoid the anonymous type in the query this should be easier to avoid hardcoding all the column names - but I'm somehow lost with this.
Filling a DataSet or DataTable from a LINQ query result set
or
https://msdn.microsoft.com/en-us/library/bb669096%28v=vs.110%29.aspx
public DataTable retrieveReadyReadingDataTable()
{
DataTable dtblReadyToSaveToDb = RetrieveDataTableExConstraints();
var query = from scr in scrTable.AsEnumerable()
from products in productsTable.AsEnumerable()
where(scr.Field<string>("EAN") == products.Field<string>("EAN"))
select
new
{
Date = DateTime.Today.Date,
ProductId = products.Field<string>("SkuCode"),
Distributor = scr.Field<string>("Distributor"),
Price = float.Parse(scr.Field<string>("Price")),
Url = scr.Field<string>("Url")
};
foreach (var q in query)
{
DataRow newRow = dtblReadyToSaveToDb.Rows.Add();
newRow.SetField("Date", q.Date);
newRow.SetField("ProductId", q.ProductId);
newRow.SetField("Distributor", q.Distributor);
newRow.SetField("Price", q.Price);
newRow.SetField("Url", q.Url);
}
return dtblReadyToSaveToDb;
}
Firstly, you have to decide what "duplicate" means in your case. According to your code i would say a duplicate is a row with the same value in column Date, ProductId and Distributor. So add a multi column primary key for those columns first.
Secondly, you should add some sort of code that first queries existing rows and then compares these existing rows to the rows you want to create. If a match is found, then simply just don't insert a new row.

check if values are in datatable

I have an array or string:
private static string[] dataNames = new string[] {"value1", "value2".... };
I have table in my SQL database with a column of varchar type. I want to check which values from the array of string exists in that column.
I tried this:
public static void testProducts() {
string query = "select * from my table"
var dataTable = from row in dt.AsEnumerable()
where String.Equals(row.Field<string>("columnName"), dataNames[0], StringComparison.OrdinalIgnoreCase)
select new {
Name = row.Field<string> ("columnName")
};
foreach(var oneName in dataTable){
Console.WriteLine(oneName.Name);
}
}
that code is not the actual code, I am just trying to show you the important part
That code as you see check according to dataNames[index]
It works fine, but I have to run that code 56 times because the array has 56 elements and in each time I change the index
is there a faster way please?
the Comparison is case insensitive
First, you should not filter records in memory but in the datatabase.
But if you already have a DataTable and you need to find rows where one of it's fields is in your string[], you can use Linq-To-DataTable.
For example Enumerable.Contains:
var matchingRows = dt.AsEnumerable()
.Where(row => dataNames.Contains(row.Field<string>("columnName"), StringComparer.OrdinalIgnoreCase));
foreach(DataRow row in matchingRows)
Console.WriteLine(row.Field<string>("columnName"));
Here is a more efficient (but less readable) approach using Enumerable.Join:
var matchingRows = dt.AsEnumerable().Join(dataNames,
row => row.Field<string>("columnName"),
name => name,
(row, name) => row,
StringComparer.OrdinalIgnoreCase);
try to use contains should return all value that you need
var data = from row in dt.AsEnumerable()
where dataNames.Contains(row.Field<string>("columnName"))
select new
{
Name = row.Field<string>("columnName")
};
Passing a list of values is surprisingly difficult. Passing a table-valued parameter requires creating a T-SQL data type on the server. You can pass an XML document containing the parameters and decode that using SQL Server's convoluted XML syntax.
Below is a relatively simple alternative that works for up to a thousand values. The goal is to to build an in query:
select col1 from YourTable where col1 in ('val1', 'val2', ...)
In C#, you should probably use parameters:
select col1 from YourTable where col1 in (#par1, #par2, ...)
Which you can pass like:
var com = yourConnection.CreateCommand();
com.CommandText = #"select col1 from YourTable where col1 in (";
for (var i=0; i< dataNames.Length; i++)
{
var parName = string.Format("par{0}", i+1);
com.Parameters.AddWithValue(parName, dataNames[i]);
com.CommandText += parName;
if (i+1 != dataNames.Length)
com.CommandText += ", ";
}
com.CommandText += ");";
var existingValues = new List<string>();
using (var reader = com.ExecuteReader())
{
while (read.Read())
existingValues.Add(read["col1"]);
}
Given the complexity of this solution I'd go for Max' or Tim's answer. You could consider this answer if the table is very large and you can't copy it into memory.
Sorry I don't have a lot of relevant code here, but I did a similar thing quite some time ago, so I will try to explain.
Essentially I had a long list of item IDs that I needed to return to the client, which then told the server which ones it wanted loaded at any particular time. The original query passed the values as a comma separated set of strings (they were actually GUIDs). Problem was that once the number of entries hit 100, there was a noticeable lag to the user, once it got to 1000 possible entries, the query took a minute and a half, and when we went to 10,000, lets just say you could boil the kettle and drink your tea/coffee before it came back.
The answer was to stick the values to check directly into a temporary table, where one row of the table represented one value to check against. The temporary table was keyed against the user who performed the search, so this meant other users searches wouldn't become corrupted with each other, and when the user logged out, then we knew which values in the search table could be removed.
Depending on where this data comes from will depend on the best way for you to load the reference table. But once it is there, then your new query will look something like:-
SELECT Count(t.*), rt.dataName
FROM table t
RIGHT JOIN referenceTable rt ON tr.dataName = t.columnName
WHERE rt.userRef = #UserIdValue
GROUP BY tr.dataName
The RIGHT JOIN here should give you a value for each of your reference table values, including 0 if the value did not appear in your table. If you don't care which one don't appear, then changing it to an INNER JOIN will eliminate the zeros.
The WHERE clause is to ensure that your search only returns the unique items that you are looking for at the moment - the design should consider that concurrent access will someday occur here (even if it doesn't at the moment), so writing something in to protect it is advisable.

Querying data from SQL

I need to query a table from database which has 400 rows and 24 columns. I need to query this table so that on each row and then on each column of row I can perform some C# code ( I can use column information to execute some code).
Now at the moment I am querying each row again and again from table using select statement and storing to a custom list and performing custom operations on it.
Is it best and fastest way of doing it ? or should I just query the whole table one time and store somewhere ? not sure where in a dataset and then run throw custom code to do some operation using information in each row ?
You can fetch the table from database once and store it in datatable and then just use linq to select the column something like this
var data = dt.AsEnumerable().Select(s => s.Field<string>("myColumnName")).ToArray<string>();
and If you don't want to use other columns anywhere in your code then you should select only useful column from the database.
You can also select multiple columns of a database using linq. The values will be stored in anonymous type of object.
var mutipleData = from row
in dt.AsEnumerable()
select new
{ Value1 = row["Column1"].ToString(),
Value2 = row["Column2"].ToString()
};
Assuming that each field is 1000 bytes, the total memory to hold your 400 rows would be 9.6MB. Peanuts! Just read the whole table in a DataTable and process it as you wish.
Select all the records from DB table which are f your concern
Copy the select records into DataTable.
Pseudo Code:
--dt is datatable
foreach(datarow dr in dt.rows)
{
--perform operation
string str=dr["columnname"].tostring
}
400 rows isn't a massive amount, however it depends on the data in each column and how often you are likely to run the query. If all you are going to do is run the query and manipulate the output, use a DataReader instead.
If it's only 400 records, I'd fetch them all at once, store them in a class and iterate over each instance.
Something like:
Class
public class MyTableClass(){
string property1 {get; set;}
string property2 {get; set;}
int property3 {get; set;}
// etc.
}
Logic:
ICollection<MyTableClass> lstMyTableClass = (
from m in db.mytableclass
select m
).ToList();
return lstMyTableClass;
And then a loop:
foreach(var myTableInstance in lstMyTableClass){
myTableInstance.DoMyStuff();
}
If the number of records will always be below thoussand, just query all the records and keep it in a List.
Once the data is in the List you can query the List n number of times using LINQ without hitting the database for each request.

Categories

Resources