C# loop optimization - c#

i have this loop and it loops for large count like 30 000 times at least
i am looking for some way to improve it's performance
DbRecordDictionary is derived from DictionaryBase class
here is the loop:
ArrayList noEnter = new ArrayList();
DbRecordDictionary oldArray = new DbRecordDictionary();
DbRecordDictionary userArray = new DbRecordDictionary();
DbRecordDictionary result = null;
foreach (string key in keys)
{
if (noEnter.Contains(key))
{ //may need cast!
if (count < 1)
result.Add(key, userArray[key]);
else if (oldArray.Count == 0)
break;
else if (oldArray.Contains(key))
result.Add(key, userArray[key]);
}
}

You may want to use a Dictionary/Hashset for oldArray, but else there is not much you can do. Also noEnter if that is an array.

From what I can see, the variable count or the oldArray never changes during the loop, so you can place those condition outside the loop, and make two different loops.
if (count < 1) {
foreach (string key in keys) {
if (noEnter.Contains(key)) {
result.Add(key, userArray[key]);
}
}
} else if (oldArray.Count == 0) {
// no data
} else {
foreach (string key in keys) {
if (noEnter.Contains(key)) {
if (oldArray.Contains(key)) {
result.Add(key, userArray[key]);
}
}
}
}
The collections noEnter and oldArray should be dictionaries, otherwise you will be spending a lot of execution time in the Contains calls.

If noEnter has more then about 10 items in it, then use a Dictionary rathern then a List/Array for it. As a Dictionary can look up a item without having to look at all the items, when an List/Array has to loop over all items.
Otherwise consider shorting "keys" and "oldArray" and then proforming a "merge" on them. Look at the code for a "merge sort" to see how to do the merge. (This would need carefull profiling)

try this
(from k in keys
where noEnter.Contains(k) &&
oldArray.Count > 0 &&
count < 1 &&
oldArray.Contains(k)
select k)
.ToList()
.ForEach(k => result.Add(k, userArray[k]));

Small small optimization could be using generics
ArrayList noEnter = new ArrayList(); would be List noEnter = new List();
and DbRecordDictionary would inherit Dictonary instead of DictionaryBase.
im not 100% sure that you would gain performance but you will use a more modern c# style.

Related

Loop through list of strings to find different values

I have a list that is populated with different values:
e.g
{GBP, GBP, GBP, USD}
so far I have this:
List<string> currencyTypes = new List<string>();
for (int i = 0; i < currencyTypes.Count; i++)
{
if currencyTypes[i] != [i]
console.writeline("currencies are different");
}
So if the list has all the same entries, the if statement shouldnt fire
e,g {GBP, GBP, GBP, GBP}
however if any of the values are different from the rest then the if statement should notice the difference and fire.
this doesnt work however.
any ideas?
You could use LINQ to test whether all entries are the same
if (currencyTypes.Distinct().Count() > 1) {
Console.WriteLine("currencies are different");
}
Slightly more efficient for long lists:
if (currencyTypes.Count > 1 && currencyTypes.Distinct().Skip(1).Any()) {
Console.WriteLine("currencies are different");
}
This is more efficient because Any iterates at most one element unlike Count which iterates the whole list.
First of all, your list is empty. Maybe it's for the sake of the example. If not, initialize it with data. However, modify line 3 and 5 to this to fix the problem.
for (int i = 1; i < currencyTypes.Count; i++)
{
if (currencyTypes[i] != currencyTypes[i-1])
....
}
you should first group your data and find your result depending the group.
eg
List<string> currencyTypes = new List<string>() {"USD", "GBP", "GBP", "GBP" };
// group list items
var typeGroup = currencyTypes.GroupBy(t => t);
if (typeGroup.Count() > 1)
Console.WriteLine("currencies are different");
// .
// .
// also you can check what item is unique
foreach (var t in typeGroup.Where(g => g.Count() == 1 ))
{
Console.WriteLine($"{t.Single()} is different");
}

Appropriate datastructure for key.contains(x) Map/Dictionary

I am somewhat struggling with the terminology and complexity of my explanations here, feel free to edit it.
I have 1.000 - 20.000 objects. Each one can contain several name words (first, second, middle, last, title...) and normalized numbers(home, business...), email adresses or even physical adresses and spouse names.
I want to implement a search that enables users to freely combine word parts and number parts.When I search for "LL 676" I want to find all objects that contain any String with "LL" AND "676".
Currently I am iterating over every object and every objects property, split the searchString on " " and do a stringInstance.Contains(searchword).
This is too slow, so I am looking for a better solution.
What is the appropriate language agnostic data structure for this?
In my case I need it for C#.
Is the following data structure a good solution?
It's based on a HashMap/Dictionary.
At first I create a String that contains all name parts and phone numbers I want to look through, one example would be: "William Bill Henry Gates III 3. +436760000 billgatesstreet 12":
Then I split on " " and for every word x I create all possible substrings y that fullfill x.contains(y). I put every of those substrings inside the hashmap/dictionary.
On lookup/search I just need to call the search for every searchword and the join the results. Naturally, the lookup speed is blazingly fast (native Hashmap/Dictionary speed).
EDIT: Inserts are very fast as well (insignificant time) now that I use a smarter algorithm to get the substrings.
It's possible I've misunderstood your algorithm or requirement, but this seems like it could be a potential performance improvement:
foreach (string arg in searchWords)
{
if (String.IsNullOrEmpty(arg))
continue;
tempList = new List<T>();
if (dictionary.ContainsKey(arg))
foreach (T obj in dictionary[arg])
if (list.Contains(obj))
tempList.Add(obj);
list = new List<T>(tempList);
}
The idea is that you do the first search word separately before this, and only put all the subsequent words into the searchWords list.
That should allow you to remove your final foreach loop entirely. Results only stay in your list as long as they keep matching every searchWord, rather than initially having to pile everything that matches a single word in then filter them back out at the end.
In case anyone cares for my solution:
Disclaimer:
This is only a rough draft.
I have only done some synthetic testing and I have written a lot of it without testing it again.I have revised my code: Inserts are now ((n^2)/2)+(n/2) instead of 2^n-1 which is infinitely faster. Word length is now irrelevant.
namespace MegaHash
{
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Threading.Tasks;
public class GenericConcurrentMegaHash<T>
{
// After doing a bulk add, call AwaitAll() to ensure all data was added!
private ConcurrentBag<Task> bag = new ConcurrentBag<Task>();
private ConcurrentDictionary<string, List<T>> dictionary = new ConcurrentDictionary<string, List<T>>();
// consider changing this to include for example '-'
public char[] splitChars;
public GenericConcurrentMegaHash()
: this(new char[] { ' ' })
{
}
public GenericConcurrentMegaHash(char[] splitChars)
{
this.splitChars = splitChars;
}
public void Add(string keyWords, T o)
{
keyWords = keyWords.ToUpper();
foreach (string keyWord in keyWords.Split(splitChars))
{
if (keyWord == null || keyWord.Length < 1)
return;
this.bag.Add(Task.Factory.StartNew(() => { AddInternal(keyWord, o); }));
}
}
public void AwaitAll()
{
lock (this.bag)
{
foreach (Task t in bag)
t.Wait();
this.bag = new ConcurrentBag<Task>();
}
}
private void AddInternal(string key, T y)
{
for (int i = 0; i < key.Length; i++)
{
for (int i2 = 0; i2 < i + 1; i2++)
{
string desire = key.Substring(i2, key.Length - i);
if (dictionary.ContainsKey(desire))
{
List<T> l = dictionary[desire];
lock (l)
{
try
{
if (!l.Contains(y))
l.Add(y);
}
catch (Exception ex)
{
ex.ToString();
}
}
}
else
{
List<T> l = new List<T>();
l.Add(y);
dictionary[desire] = l;
}
}
}
}
public IList<T> FulltextSearch(string searchString)
{
searchString = searchString.ToUpper();
List<T> list = new List<T>();
string[] searchWords = searchString.Split(splitChars);
foreach (string arg in searchWords)
{
if (arg == null || arg.Length < 1)
continue;
if (dictionary.ContainsKey(arg))
foreach (T obj in dictionary[arg])
if (!list.Contains(obj))
list.Add(obj);
}
List<T> returnList = new List<T>();
foreach (T o in list)
{
foreach (string arg in searchWords)
if (dictionary[arg] == null || !dictionary[arg].Contains(o))
goto BREAK;
returnList.Add(o);
BREAK:
continue;
}
return returnList;
}
}
}

How can I filter a list of strings, creating a list containing only those meeting my criteria?

In my application I want the program to search through a list, testing each list element. If the list element is the required length I then want this to be inserted into a new list. Below is the code I have already
List<string> foo = new List<string>();
List<string> newFoo = new List<string>();
for (int h = 0; h < l; h++);
{
// Here I want to search through every element of foo and if the element
// length is greater than say 5 i want to add it to the newFoo
}
I don't know how to search through each element and any examples I can find use LINQ which I don't want to do as I'm sure there is a simpler way. Any help much appreciated.
It sounds like you're looking for a foreach loop:
foreach (string element in foo)
{
if (element.Length > 5)
{
newFoo.Add(element);
}
}
However, assuming you start with an empty newFoo list, this is better done with LINQ:
List<string> newFoo = foo.Where(x => x.Length > 5).ToList();
Or if you already have an existing list, you can use:
newFoo.AddRange(foo.Where(x => x.Length > 5));
(In my experience it's more common to be creating a new list, mind you.)
If you're new to C#, you should probably make sure you understand the first form before you move on to use LINQ, lambda expressions etc.
Note that if you really, really want to use a straight for loop instead of a foreach loop, you can do so:
for (int i = 0; i < foo.Count; i++)
{
string element = foo[i];
if (element.Length > 5)
{
newFoo.Add(element);
}
}
... but I'd strongly recommend using foreach any time you want to iterate over a sequence and don't really care about the index of each entry.
You may use something like this (foreach loop):
foreach (String item in foo)
if (!Object.ReferenceEquals(null, item)) // <- be careful with nulls!
if (item.Length > 5)
newFoo.Add(item);
Or if you prefer index based access
for (int i = 0; i < foo.Count; ++i)
if (!Object.ReferenceEquals(null, foo[i])) // <- be careful with nulls!
if (foo[i].Length > 5)
newFoo.Add(foo[i]);
Yet another possibility is LINQ, e.g.
// Do not forget the nulls...
newFoo.AddRange(foo.Where(item => Object.ReferenceEquals(null, item) ? false : item.Length > 5));
Without Linq, you can do it with a simple loop
foreach(var f in foo)
{
if(f.Length > 5)
{
newFoo.Add(f);
}
}
But with Linq, it's even simpler
newFoo = foo.Where(f => f.Length > 5).ToList()
You can use LINQ to filter items with Length > 5 to your newFoo List
List<string> newFoo = foo.Where(r => r.Length > 5).ToList();
If you want to use simple for loop then:
for (int h = 0; h < foo.Count; h++)
{
if (foo[h] != null && foo[h].Length > 5)
newFoo.Add(foo[h]);
}
(Remember to remove the ; semicolon at the end of your for-loop, currently it will not do anything since it will consider ; as the only statement for the loop to work on)

Collection was modified; enumeration operation may not execute in ArrayList [duplicate]

This question already has answers here:
How to remove elements from a generic list while iterating over it?
(28 answers)
Closed 9 years ago.
I'm trying to remove an item from an ArrayList and I get this Exception:
Collection was modified; enumeration operation may not execute.
Any ideas?
You are removing the item during a foreach, yes? Simply, you can't. There are a few common options here:
use List<T> and RemoveAll with a predicate
iterate backwards by index, removing matching items
for(int i = list.Count - 1; i >= 0; i--) {
if({some test}) list.RemoveAt(i);
}
use foreach, and put matching items into a second list; now enumerate the second list and remove those items from the first (if you see what I mean)
Here's an example (sorry for any typos)
var itemsToRemove = new ArrayList(); // should use generic List if you can
foreach (var item in originalArrayList) {
if (...) {
itemsToRemove.Add(item);
}
}
foreach (var item in itemsToRemove) {
originalArrayList.Remove(item);
}
OR if you're using 3.5, Linq makes the first bit easier:
itemsToRemove = originalArrayList
.Where(item => ...)
.ToArray();
foreach (var item in itemsToRemove) {
originalArrayList.Remove(item);
}
Replace "..." with your condition that determines if item should be removed.
One way is to add the item(s) to be deleted to a new list. Then go through and delete those items.
I like to iterate backward using a for loop, but this can get tedious compared to foreach. One solution I like is to create an enumerator that traverses the list backward. You can implement this as an extension method on ArrayList or List<T>. The implementation for ArrayList is below.
public static IEnumerable GetRemoveSafeEnumerator(this ArrayList list)
{
for (int i = list.Count - 1; i >= 0; i--)
{
// Reset the value of i if it is invalid.
// This occurs when more than one item
// is removed from the list during the enumeration.
if (i >= list.Count)
{
if (list.Count == 0)
yield break;
i = list.Count - 1;
}
yield return list[i];
}
}
The implementation for List<T> is similar.
public static IEnumerable<T> GetRemoveSafeEnumerator<T>(this List<T> list)
{
for (int i = list.Count - 1; i >= 0; i--)
{
// Reset the value of i if it is invalid.
// This occurs when more than one item
// is removed from the list during the enumeration.
if (i >= list.Count)
{
if (list.Count == 0)
yield break;
i = list.Count - 1;
}
yield return list[i];
}
}
The example below uses the enumerator to remove all even integers from an ArrayList.
ArrayList list = new ArrayList() {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
foreach (int item in list.GetRemoveSafeEnumerator())
{
if (item % 2 == 0)
list.Remove(item);
}
Don't modify the list inside of a loop which iterates through the list.
Instead, use a for() or while() with an index, going backwards through the list. (This will let you delete things without getting an invalid index.)
var foo = new List<Bar>();
for(int i = foo.Count-1; i >= 0; --i)
{
var item = foo[i];
// do something with item
}
Am I missing something? Somebody correct me if I'm wrong.
list.RemoveAll(s => s.Name == "Fred");
Instead of foreach(), use a for() loop with a numeric index.
I agree with several of the points I've read in this post and I've incorporated them into my solution to solve the exact same issue as the original posting.
That said, the comments I appreciated are:
"unless you are using .NET 1.0 or 1.1, use List<T> instead of ArrayList. "
"Also, add the item(s) to be deleted to a new list. Then go through and delete those items."
.. in my case I just created a new List and the populated it with the valid data values.
e.g.
private List<string> managedLocationIDList = new List<string>();
string managedLocationIDs = ";1321;1235;;" // user input, should be semicolon seperated list of values
managedLocationIDList.AddRange(managedLocationIDs.Split(new char[] { ';' }));
List<string> checkLocationIDs = new List<string>();
// Remove any duplicate ID's and cleanup the string holding the list if ID's
Functions helper = new Functions();
checkLocationIDs = helper.ParseList(managedLocationIDList);
...
public List<string> ParseList(List<string> checkList)
{
List<string> verifiedList = new List<string>();
foreach (string listItem in checkList)
if (!verifiedList.Contains(listItem.Trim()) && listItem != string.Empty)
verifiedList.Add(listItem.Trim());
verifiedList.Sort();
return verifiedList;
}
using ArrayList also you can try like this
ArrayList arraylist = ... // myobject data list
ArrayList temp = (ArrayList)arraylist.Clone();
foreach (var item in temp)
{
if (...)
arraylist.Remove(item);
}

Compare adjacent list items

I'm writing a duplicate file detector. To determine if two files are duplicates I calculate a CRC32 checksum. Since this can be an expensive operation, I only want to calculate checksums for files that have another file with matching size. I have sorted my list of files by size, and am looping through to compare each element to the ones above and below it. Unfortunately, there is an issue at the beginning and end since there will be no previous or next file, respectively. I can fix this using if statements, but it feels clunky. Here is my code:
public void GetCRCs(List<DupInfo> dupInfos)
{
var crc = new Crc32();
for (int i = 0; i < dupInfos.Count(); i++)
{
if (dupInfos[i].Size == dupInfos[i - 1].Size || dupInfos[i].Size == dupInfos[i + 1].Size)
{
dupInfos[i].CheckSum = crc.ComputeChecksum(File.ReadAllBytes(dupInfos[i].FullName));
}
}
}
My question is:
How can I compare each entry to its neighbors without the out of bounds error?
Should I be using a loop for this, or is there a better LINQ or other function?
Note: I did not include the rest of my code to avoid clutter. If you want to see it, I can include it.
Compute the Crcs first:
// It is assumed that DupInfo.CheckSum is nullable
public void GetCRCs(List<DupInfo> dupInfos)
{
dupInfos[0].CheckSum = null ;
for (int i = 1; i < dupInfos.Count(); i++)
{
dupInfos[i].CheckSum = null ;
if (dupInfos[i].Size == dupInfos[i - 1].Size)
{
if (dupInfos[i-1].Checksum==null) dupInfos[i-1].CheckSum = crc.ComputeChecksum(File.ReadAllBytes(dupInfos[i-1].FullName));
dupInfos[i].CheckSum = crc.ComputeChecksum(File.ReadAllBytes(dupInfos[i].FullName));
}
}
}
After having sorted your files by size and crc, identify duplicates:
public void GetDuplicates(List<DupInfo> dupInfos)
{
for (int i = dupInfos.Count();i>0 i++)
{ // loop is inverted to allow list items deletion
if (dupInfos[i].Size == dupInfos[i - 1].Size &&
dupInfos[i].CheckSum != null &&
dupInfos[i].CheckSum == dupInfos[i - 1].Checksum)
{ // i is duplicated with i-1
... // your code here
... // eventually, dupInfos.RemoveAt(i) ;
}
}
}
I have sorted my list of files by size, and am looping through to
compare each element to the ones above and below it.
The next logical step is to actually group your files by size. Comparing consecutive files will not always be sufficient if you have more than two files of the same size. Instead, you will need to compare every file to every other same-sized file.
I suggest taking this approach
Use LINQ's .GroupBy to create a collection of files sizes. Then .Where to only keep the groups with more than one file.
Within those groups, calculate the CRC32 checksum and add it to a collection of known checksums. Compare with previously calculated checksums. If you need to know which files specifically are duplicates you could use a dictionary keyed by this checksum (you can achieve this with another GroupBy. Otherwise a simple list will suffice to detect any duplicates.
The code might look something like this:
var filesSetsWithPossibleDupes = files.GroupBy(f => f.Length)
.Where(group => group.Count() > 1);
foreach (var grp in filesSetsWithPossibleDupes)
{
var checksums = new List<CRC32CheckSum>(); //or whatever type
foreach (var file in grp)
{
var currentCheckSum = crc.ComputeChecksum(file);
if (checksums.Contains(currentCheckSum))
{
//Found a duplicate
}
else
{
checksums.Add(currentCheckSum);
}
}
}
Or if you need the specific objects that could be duplicates, the inner foreach loop might look like
var filesSetsWithPossibleDupes = files.GroupBy(f => f.FileSize)
.Where(grp => grp.Count() > 1);
var masterDuplicateDict = new Dictionary<DupStats, IEnumerable<DupInfo>>();
//A dictionary keyed by the basic duplicate stats
//, and whose value is a collection of the possible duplicates
foreach (var grp in filesSetsWithPossibleDupes)
{
var likelyDuplicates = grp.GroupBy(dup => dup.Checksum)
.Where(g => g.Count() > 1);
//Same GroupBy logic, but applied to the checksum (instead of file size)
foreach(var dupGrp in likelyDuplicates)
{
//Create the key for the dictionary (your code is likely different)
var sample = dupGrp.First();
var key = new DupStats() {FileSize = sample.FileSize, Checksum = sample.Checksum};
masterDuplicateDict.Add(key, dupGrp);
}
}
A demo of this idea.
I think the for loop should be : for (int i = 1; i < dupInfos.Count()-1; i++)
var grps= dupInfos.GroupBy(d=>d.Size);
grps.Where(g=>g.Count>1).ToList().ForEach(g=>
{
...
});
Can you do a union between your two lists? If you have a list of filenames and do a union it should result in only a list of the overlapping files. I can write out an example if you want but this link should give you the general idea.
https://stackoverflow.com/a/13505715/1856992
Edit: Sorry for some reason I thought you were comparing file name not size.
So here is an actual answer for you.
using System;
using System.Collections.Generic;
using System.Linq;
public class ObjectWithSize
{
public int Size {get; set;}
public ObjectWithSize(int size)
{
Size = size;
}
}
public class Program
{
public static void Main()
{
Console.WriteLine("start");
var list = new List<ObjectWithSize>();
list.Add(new ObjectWithSize(12));
list.Add(new ObjectWithSize(13));
list.Add(new ObjectWithSize(14));
list.Add(new ObjectWithSize(14));
list.Add(new ObjectWithSize(18));
list.Add(new ObjectWithSize(15));
list.Add(new ObjectWithSize(15));
var duplicates = list.GroupBy(x=>x.Size)
.Where(g=>g.Count()>1);
foreach (var dup in duplicates)
foreach (var objWithSize in dup)
Console.WriteLine(objWithSize.Size);
}
}
This will print out
14
14
15
15
Here is a netFiddle for that.
https://dotnetfiddle.net/0ub6Bs
Final note. I actually think your answer looks better and will run faster. This was just an implementation in Linq.

Categories

Resources