Improve Duplicate check method - c#

Morning guys.
Using C sharp .net4, and MS Visual Studio 2010.
I have Developed a duplication checker for my windows form program.
It works perfectly and Is virtually Instant on my Datagrid when there are a couple hundred records.
The problem I've noticed is that when there are 6000 records displayed it is not efficient enough at all and takes minutes.
I was wandering if anyone has some good tips to make this method a lot faster either improving upon the existing design or, a different method all together that I've over looked.
Your help is once again much appreciated!
Here's the code:
public void CheckForDuplicate()
{
DataGridViewRowCollection coll = ParetoGrid.Rows;
DataGridViewRowCollection colls = ParetoGrid.Rows;
IList<String> listParts = new List<String>();
int count = 0;
foreach (DataGridViewRow item in coll)
{
foreach (DataGridViewRow items in colls)
{
count++;
if ((items.Cells["NewPareto"].Value != null) && (items.Cells["NewPareto"].Value != DBNull.Value))
{
if ((items.Cells["NewPareto"].Value != DBNull.Value) && (items.Cells["NewPareto"].Value != null) && (items.Cells["NewPareto"].Value.Equals(item.Cells["NewPareto"].Value)))
{
if ((items.Cells["Part"].Value != DBNull.Value) && (items.Cells["Part"].Value != null) && !(items.Cells["Part"].Value.Equals(item.Cells["Part"].Value)))
{
listParts.Add(items.Cells["Part"].Value.ToString());
dupi = true; //boolean toggle
}
}
}
}
}
MyErrorGrid.DataSource = listParts.Select(x => new { Part = x }).ToList();
}
Any Questions let me know and I will do my best to answer them.

If you can, you should try and do this on the underlying data rather than on the UI objects - however I have a hunch that you're seeding it from a set of DataRows, in which case you might not be able to do that.
I think a big part of the issue here is the repeated dereferencing of the cells by name, and the fact that you repeatedly deference the second set of cells. So do it all up front:
var first = (from row in coll.Cast<DataGridViewRow>()
let newpareto = row.Cells["NewPareto"].Value ?? DBNull.Value
let part = row.Cells["Part"].Value ?? DBNull.Value
where newpareto != DBNull.Value && part != DBNull.Value
select new
{ newpareto = newpareto, part = part }).ToArray();
//identical - so a copy-paste job (if not using anonymous type we could refactor)
var second = (from row in colls.Cast<DataGridViewRow>()
let newpareto = row.Cells["NewPareto"].Value ?? DBNull.Value
let part = row.Cells["Part"].Value ?? DBNull.Value
where newpareto != DBNull.Value && part != DBNull.Value
select new
{ newpareto = newpareto, part = part }).ToArray();
//now produce our list of strings
var listParts = (from f in first
where second.Any(v => v.newpareto.Equals(f.newpareto)
&& !v.part.Equals(f.part))
select f.part.ToString()).ToList(); //if you want it as a list.

There is an approach that will make this much more efficient. You need to compute a hash of each item. Items with different hashes can't possibly be duplicates.
Once you have the hashes, you could either sort by hash or use a data structure with efficient keyed retrieval (like Dictionary<TKey,TValue>) to find all the duplicates.

Related

MongoDB Select with QueryBuilder

i'm trying to select values from my database, but currently i'm unable to to it and although i know its the fact that the method doesnt except the QueryBuilder class as a parameter, i dont know what to do about it. I only found solutions for querys with one parameter or all parameters are not null. In my case i've a List with ID, and 4 parameters which dont have to be passed to the function so they could be null.
My current code looks like this.
collection = db.GetCollection<Datapoint>("test");
var query = new QueryBuilder<Datapoint>();
var queryattributes = new List<IMongoQuery>();
var ids = new List<IMongoQuery>();
// Add all Attributes if there
if (fromdate != null)
{
BsonDateTime from = BsonDateTime.Create(fromdate);
queryattributes.Add(Query.GTE("UTCTimestamp", from));
}
if (to != null)
{
BsonDateTime bto = BsonDateTime.Create(to);
queryattributes.Add(Query.LTE("UTCTimestamp", bto));
}
if (evId != null)
{
queryattributes.Add(Query.EQ("EvId", evId));
}
if (evType != null)
{
queryattributes.Add(Query.EQ("EvType", evType));
}
// Add all ID's
Parallel.ForEach(idList, data =>
{
lock (query)
{
ids.Add(Query.EQ("CId", data));
}
});
// But everything in the Query
query.Or(ids);
// Add Queryattributes if there
if (queryattributes.Count > 0)
{
query.And(queryattributes);
}
var result = collection.FindAs<Datapoint>(query);
I'm trying to do it without Linq, since i found countless of test, which say that linq is much much slower, and since i want to run it as an Databasetest, i kinda need the performace to execute alot of querys.
The Linq query looks like this
var query2 =
from e in collection.AsQueryable<Datapoint>()
where idList.Contains(e.CId)
&& (evtId == null || e.EvId == evId)
&& (evType == null || e.EvType == evType.Value)
&& (fromdate == null || Query.GTE("UtcTimestamp", BsonDateTime.Create(fromdate)).Inject())
&& (to == null || Query.LT("UtcTimestamp", BsonDateTime.Create(to)).Inject())
select e;
The QueryBuilder doesn't save a query inside it. You use it to generate a IMongoQuery and then use that query to actually query the database.
It seems the end of your code should look like this;
// But everything in the Query
IMongoQuery mongoQuery = query.Or(ids);
// Add Queryattributes if there
if (queryattributes.Count > 0)
{
mongoQuery = query.And(queryattributes);
}
var result = collection.FindAs<Datapoint>(mongoQuery);

Only do Where condition if a value is passed in

I have the following LINQ statement that does on where on the date and a LabID.
I'm passing in a list of LABS and a date, however they are not required, and I could potentially only pass in a date, and no lab, in which case I'd like to get results for all labs for that particular lab.
here is what I have now:
List<dExp> lstDatExp = (from l in ctx.dExp.Include("datLab")
where values.Contains(l.datL.Lab_ID)
&& l.reportingPeriod == reportingPeriod
select l).ToList<dExp>();
But this breaks if the value getting passed in is not there. How do I change this to make sure both of my where statements are optional?
With IQueryable you can simply add conditions in steps:
int? reportingPeriod = ...;
IQueryable<dExp> resultsQuery = // don't use `var` here.
ctx.dExp.Include("datLab");
if (values != null)
resultsQuery = resultsQuery.Where(exp => values.Contains(exp.datL.Lab_ID));
if (reportingPeriod.Hasvalue)
resultsQuery = resultsQuery.Where(exp => exp.reportingPeriod == reportingPeriod.Value);
// additional .Where(), .OrderBy(), .Take(), .Skip() and .Select()
// The SQL query is made and executed on the line below
// inspect the string value in the debugger
List<dExp> results = resultsQuery.ToList();
Here are two ways to do that.
But first, please don't use a single lowercase l as an identifier. It is way too easy to confuse it with the number 1. More generally, stp using abbrevs in yr cde, it mks it hrdr to rd.
First technique:
var query = from lab in ctx.dExp.Include("datLab")
where values == null || values.Contains(lab.datL.Lab_ID)
where reportingPeriod == null || lab.reportingPeriod == reportingPeriod
select lab;
var list = query.ToList<dExp>();
Second technique:
IEnumerable<dExp> query = ctx.dExp.Include("datLab");
if (values != null)
query = query.Where(lab=>values.Contains(lab.datL.Lab_ID));
if (reportingPeriod != null)
query = query.Where(lab=>lab.reportingPeriod == reportingPeriod);
var list = query.ToList<dExp>();
What we do is something like (l.reportingPeriod == reportingPeriod || reportingPeriod == null) So you check to see if the parameter is its default meaning it hasnt been used or if there is something there check it against the database.
You need to check if your values are null before doing the query, and if they are, don't do the extra condition.
List<dExp> lstDatExp =
(from l in ctx.dExp.Include("datLab")
where
(values == null || values.Contains(l.datL.Lab_ID)) &&
(reportingPeriod == null || l.reportingPeriod == reportingPeriod)
select l).ToList<dExp>();
This way if values or reportingPeriod are null they are essentially optional.

What is the best data structure to use when adding elements to list based on comparisons c#

List<string> allApps = new List<string>();
roster = MURLEngine.GetUserFriendDetails(token, userId);
var usersfriends = from elements in roster.RosterEntries
where elements[0] == 'm' && elements[1] >= '0' && elements[1] <= '9'
select elements;
foreach (string userid in usersfriends)
{
roster = MURLEngine.GetUserFriendDetails(token, userid);
var usersapps = from elements in roster.RosterEntries
where elements[0] != 'm'
select elements;
allApps.AddRange(usersapps);
allApps = allApps.Distinct().ToList();
}
int countapps = 0;
List<string> Appname = new List<string>();
countapps = appList.Count();
for (int y = 0; y < countapps; y++)
{
foreach (string li in allApps) //
{
bool istrueapp = appList.ElementAt(y).AppName.Equals(li);
if (istrueapp == true)
{
Appname.Add(appList.ElementAt(y).AppName);
}
}
}
In the code above i am first getting a list of strings i.e. usersfriends then based on those id's i am getting the list of apps for the user and then adding all the apps of all users to another List i.e. allApps hence the whole process is slow it takes around 20 seconds to perform this using Lists. tried using a HashSet and a SortedSet also but it was even more slower.
My questions is what datastructure should i be using for this scenario ??
Would really help me
My favorite thing about LINQ is that it lets you describe what you want to do, rather than making you write a bunch of loops which obscure your goals. Here's a refactored version of your code which I think is pretty clear, and which runs faster in my testbed (0.5s vs ~15s).
// create a hashset for fast confirmation of app names
var realAppNames = new HashSet<string>(appList.Select(a => a.AppName));
var roster = MURLEngine.GetUserFriendDetails(token, userId);
// get the ids of all friends
var friendIds = roster.RosterEntries
.Where (e => e[0] == 'm' && e[1] >= '0' && e[1] <= '9');
var apps =
friendIds
// get the rosters for all friends
.SelectMany (friendId => MURLEngine.GetUserFriendDetails(token, friendId)).RosterEntries)
// include the original user's roster so we get their apps too
.Concat(roster.RosterEntries)
// filter down to apps only
.Where (name => name[0] != 'm' && realAppNames.Contains(name))
// remove duplicates
.Distinct()
// we're done!
.ToList();
Ok, what can I suggest so far.
Firstly: you've got a lot of Add's.
In general default List<T> is not the best datastructure for lot of Add's, because internally it's implemented as array which is destroyed and copied to larger one when it's full.
Two options are possible:
- create list with predefined capacity: List<string> allApps = new List<string>(countOfApps);. This one is good if you can roughly calculate count of items that are to be added to list in advance.
- use LinkedList<string> allApps = new LinkedList<string>(). LinkedList does adding new items pretty fast.
Same stuff is true for List<string> Appname = new List<string>(); list.
Secondly: at the beginning you've got list which is distinct-ed and then converted to list on each iteration of foreach-loop, while the newly constructed list is not used in that loop. So here you can move that distinct->tolist code out of the loop, the code logic won't change, but performance will increase.
So far I can suggest the following code:
LinkedList<string> allApps2 = new LinkedList<string>();// linkedlist here
roster = MURLEngine.GetUserFriendDetails(token, userId);
var usersfriends = from elements in roster.RosterEntries
where elements[0] == 'm' && elements[1] >= '0' && elements[1] <= '9'
select elements;
foreach (string userid in usersfriends)
{
roster = MURLEngine.GetUserFriendDetails(token, userid);
var usersapps = from elements in roster.RosterEntries
where elements[0] != 'm'
select elements;
foreach(var userapp in usersapps)// add _all the apps_ to list. Will be distinct-ed later
{
allApps2.AddLast(userapp);// don't worry, it works for O(1)
}
}
var allApps = allApps2.Distinct().ToList();
int countapps = 0;
LinkedList<string> Appname2 = new LinkedList<string>();// linkedlist here
countapps = appList.Count();
for (int y = 0; y < countapps; y++)
{
foreach (string li in allApps) //
{
bool istrueapp = appList.ElementAt(y).AppName.Equals(li);
if (istrueapp == true)
{
Appname2.AddLast(appList.ElementAt(y).AppName);// and here
}
}
}
var AppName = Appname2.ToList();// and you've got you List<string> as the result
Please, try this code and let me know how it works(though I think it should work considerably faster).
Hope this helps.
UPDATE
Finally I'm home, sorry for delay. I played with code a bit and made it faster by rewriting last for into this:
foreach (var app in appList)
{
foreach (string li in allApps)
{
bool istrueapp = app.AppName.Equals(li);
if (istrueapp)
{
Appname2.AddLast(app.AppName);
}
}
}
That gave great speed-up, at least on my machine(r).
Please check whether it's faster on your environment.
Hope that helps.
You should keep allApps inside a dictionary keyed by the appname. To check if the app exists in appList simply look for allApps.Contains(li).
The problem most likely originates from the last for loop, its complexity seems like O(n^2). Using a dictinoary should reduced complexity to O(n*logn) and thus solve the problem.
As already commented in another answer, a List is not too efficient while dealing with a high number of elements and performing simple actions. Simpler collections (e.g., Array) represent a more efficient solution under these conditions. Sample code (adapted to deal with "plain" arrays; you can use it with the current lists or with arrays in case of start using them):
List<string> Appname = new List<string>();
roster = MURLEngine.GetUserFriendDetails(token, userId);
foreach (string item in roster.RosterEntries)
{
if(item == null || item.Trim().Length < 1) continue;
if (item[0] == 'm' && Convert.ToInt32(item[1]) >= 0 && Convert.ToInt32(item[1]) <= 9)
{
var roster2 = MURLEngine.GetUserFriendDetails(token, item);
foreach (string item2 in roster2.RosterEntries)
{
if(item2 == null || item2.Trim().Length < 1) continue;
if (item2[0] != 'm')
{
bool found = false;
foreach (string app in appList)
{
if(app == null || app.Trim().Length < 1) continue;
if (app.AppName == item2)
{
found = true;
break;
}
}
if (found) Appname.Add(item2);
}
}
}
}
As you can see, I have ignored the intermediate storage to allApps (can also be done in your version via query).
This code should deliver a noticiable improvement with respect to the original version (mainly when you will convert the list into arrays). In case of being interested in speeding this code even further, you should consider the option of re-designing the way in which the inputs are provided (and thus avoiding what, presumably, is the most time consuming bit: calling MURLEngine.GetUserFriendDetails twice). Finally, bear in mind that you can replace the (string app in appList) loop with a simple condition (AppNames.Contains(item2)), by storing the AppNames in a List/Array.

Linq - getting a value from a string

First question on SO - I've read it many, many times so time to drop in and get my feet wet in the community!
I start by getting a single row from a Linq query:
var relationshipDetails = (from p in dc.tbl_ClientRelationships
where p.ID == relationship_id
select p).FirstOrDefault();
Then I look through a list of strings (_cols), which is the known column names (and also form item names) like so:
foreach (string c in _cols)
{
if (relationshipDetails.GetType().GetProperty(c).GetValue(relationshipDetails, null).ToString() != null)
{
setValue(relationshipDetails.GetType().GetProperty(c).GetValue(relationshipDetails, null).ToString(), c);
}
}
the setValue() method basically assigns the returned value to the webcontrol (and has logic to determine the type and how it should be assigned etc..)
My question, is there a better way to get a value out of a Linq object from a know property value?
It works on some forms but has recently just blown up on me!
Otherwise, I'm tempted to go back to the old method or returning a DataRow from the DAL and just reference by name easily!
Thanks in advance,
Mark
One of the biggest advantages (in my opinion) of Linq to (Sql / Entities) is that the objects returned are strongly-typed. You're using LinqToX and then using reflection to assign values, you are basically doing what the old school DataRow did.
I'm not sure why you are trying to dynamically assign values. This definitely is an XY Problem.
First:
var relationshipDetails = (from p in dc.tbl_ClientRelationships
where p.ID == relationship_id
select p).FirstOrDefault();
Linq queries are objects that represent the query, keep them separate and distinct from the results of those queries. In this case I'd suggest something like this instead:
var relationshipDetails = dc.tbl_ClientRelationships
.FirstOrDefault( p => p.Id == relationship_id);
Now, this is going to be very slow:
foreach (string c in _cols)
{
if (relationshipDetails.GetType().GetProperty(c).GetValue(relationshipDetails, null).ToString() != null)
{
setValue(relationshipDetails.GetType().GetProperty(c).GetValue(relationshipDetails, null).ToString(), c);
}
}
You can easily get a reference to the reflection members and cut down on the overhead, maybe something like this: (Might not be 100% syntax correct)
var properties = relationshipDetails.GetType().GetProperties();
foreach (string c in _cols)
{
var currentProperty = properties.Single( p=> p.Name == c );
if (currentProperty.GetValue(relationshipDetails, null) != null)
{
setValue(currentProperty.GetValue(relationshipDetails, null).ToString(), c);
}
}
Finally - Why are you doing this? Please detail exactly what you are trying to do, and why refering to the columns in a type safe named manner ie:
relationshipDetails.Id = ...
relationshipDetails.SomethingElse = ...
relationshipDetails.AnotherThing = ...
Won't work in your case.

How to handle CopyToDataTable() when no rows?

I have the code:
dt = collListItems.GetDataTable().AsEnumerable()
.Where(a => Convert.ToString(a["Expertise"]).Contains(expertise) && Convert.ToString(a["Office"]) == office)
.CopyToDataTable();
filteredCount = dt.Rows.Count();
How should I best handle the event when there are no rows that match? Currently I get "The source contains no DataRows" but I want to set filteredCount to 0 in that case.
Thanks in advance.
Edit: I know a try..catch works but is there a more elegant way?
You certainly do not want to use a try/catch for this. Try/Catch should be used in truly exceptional circumstances, you do not want to have it drive your control flow. In nearly all situations, there are better methods that are built right into the language/library or require minimal effort to code.
In this case, you want to capture the table beforehand so that you do not invoke the GetDataTable() method more times than necessary, because we're going to need it if the query does not include any results. You could also optionally include ToList() on the query if the query itself is expensive or long-running, so you only need to do that once.
Afterwards, it's a matter of testing if there are any rows in the result. If so, you can safely copy to a datatable. Otherwise, just clone the structure of the original table (will not include the rows), so that in either case, you have a proper table structure and can inspect the row count, bind it to a control, etc., and there are no surprises.
var table = collListItems.GetDataTable();
var rows = table.AsEnumerable().Where(...); // optionally include .ToList();
var dt = rows.Any() ? rows.CopyToDataTable() : table.Clone();
int filteredCount = dt.Rows.Count;
How about this solution :
DataRow[] filtered_rows = data.Tables[0].Select(filter_string);
if(filtered_rows.Length > 0)
{
filtered_data = filtered_rows.CopyToDataTable();
}
else
{
filtered_data.Clear();
}
data.Tables[0] is the source table and filtered_data is the resultant table.
you can first judge whether there are rows that match:
var rows = collListItems.GetDataTable().AsEnumerable()
.Where(a => Convert.ToString(a["Expertise"]).Contains(expertise) && Convert.ToString(a["Office"]) == office);
DataTable dt = table.Clone();
if (rows.Count() > 0)
dt = rows.CopyToDataTable();
I think this would be a simpler solution:
var Adj = (from c in View.AdjustmentsDataSource.AsEnumerable()
where c["Adjustment"] != System.DBNull.Value
select c);
if (Adj == null || Adj.Count() == 0)
return;
DataTable dtChanges = Adj.CopyToDataTable();
var results = from myRow in dtL1Users.AsEnumerable()
where (Convert.ToInt32(myRow["No_x0020_of_x0020_L1_x0020_Remin"]) >= Convert.ToInt32(maxL1Escalation) && Convert.ToDateTime(myRow["L2_x0020_Last_x0020_Escalated_x0"]) < DateTime.Now.Date.AddDays(-Convert.ToInt32(reminderinterval)))
select myRow;
foreach (var v in results)
{
collEligible = results.CopyToDataTable<DataRow>();
break;
}
Below code works for me. please try
DataRow []dataRow = dataTable.Select(query, seq);
if (dataRow != null && dataRow.Length > 0)
{
return dataTable.Select(query, seq).CopyToDataTable();
}
Reusable generic solution:
Create an extension method:
public static IEnumerable<T> NullIfEmpty<T>(this IEnumerable<T> source) => source.Count() > 0 ? source : null;
Now you can call this extension method like:
var newTable = dataRows.NullIfEmpty()?.CopyToDataTable();
//Here dataRows can be of any IEnumerable collection such as EnumerableRowCollection<DataRow> in case of DataTable
You can use the extension method Any(): to check before using CopyToDataTable() to avoid No Datarow found exception
IEnumerable<DataRow> result = <<Some Linq Query>>;
if (result.Any())
{
}

Categories

Resources