find and findindex painful slow for List<Object> why?

find and findindex painful slow for List<Object> why? - c#

Im am working in a project that uses intensively List and i try to find the object via the name (that is a member of the object).
My code worked without searching it using a single for-next loop (function find1) but i found that it is possible to the same using the build-in found find, and the code works. However, it feel a bit slow. So, i did a project for test the speed:
I have the next code
public List<MyObject> varbig = new List<MyObject>();
public Dictionary<string,string> myDictionary=new Dictionary<string, string>();
public Form1() {
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e) {
myDictionary.Clear();
varbig.Clear();
for (int i = 0; i < 5000; i++) {
varbig.Add(new MyObject("name" + i.ToString(),"value"+i.ToString()));
myDictionary.Add("name" + i.ToString(), i.ToString());
}
// first test
var start1 = Environment.TickCount;
for (int i = 0; i < 3000; i++) {
var ss=find1("name499");
}
var end1 = Environment.TickCount;
Console.WriteLine("time 1 :" + (end1 - start1));
// second test
var start2 = Environment.TickCount;
for (int i = 0; i < 3000; i++) {
var ss=find2("name499");
}
var end2 = Environment.TickCount;
Console.WriteLine("time 2 :" + (end2 - start2));
// third test
var start3 = Environment.TickCount;
for (int i = 0; i < 3000; i++) {
var ss = find3("name499");
}
var end3 = Environment.TickCount;
Console.WriteLine("time 3 :" + (end3 - start3));
// first test b
var start1b = Environment.TickCount;
for (int i = 0; i < 3000; i++) {
var ss=find1("name4999");
}
var end1b = Environment.TickCount;
Console.WriteLine("timeb 1 :" + (end1b - start1b));
// second test
var start2b = Environment.TickCount;
for (int i = 0; i < 3000; i++) {
var ss=find2("name4999");
}
var end2b = Environment.TickCount;
Console.WriteLine("timeb 2 :" + (end2b - start2b));
// third test
var start3b = Environment.TickCount;
for (int i = 0; i < 3000; i++) {
var ss = find3("name4999");
}
var end3b = Environment.TickCount;
Console.WriteLine("timeb 3 :" + (end3b - start3b));
}
public int find1(string name) {
for (int i = 0; i < varbig.Count; i++) {
if(varbig[i].Name == name) {
return i;
}
}
return -1;
}
public int find2(string name) {
int idx = varbig.FindIndex(tmpvar => Name == name);
return idx;
}
public int find3(string name) {
var ss=myDictionary[name];
return int.Parse(ss);
}
}
And i use the next object
public class MyObject {
private string _name = "";
private string _value = "";
public MyObject() {}
public MyObject(string name, string value) {
_name = name;
_value = value;
}
public string Name {
get { return _name; }
set { _name = value; }
}
public string Value {
get { return _value; }
set { _value = value; }
}
}
Mostly it do the next thing:
I create an array with 5000 elements.
time 1 = search the 499th object (index) using a simple for-next.
time 2 = search the 499th using the build in function find of List
time 3 = it do the search of the 499th element using dictionary.
Timeb 1, timeb 2 and timeb 3 do the same but try to search the 4999th element instead of the 499th element.
I ran a couple of times :
time 1 :141
time 2 :1248
time 3 :0
timeb 1 :811
timeb 2 :1170
timeb 3 :0
time 1 :109
time 2 :1170
time 3 :0
timeb 1 :796
timeb 2 :1170
timeb 3 :0
(the small then the fast)
And, for my surprise, the build in function findindex is absurdly slow (in some cases, close to 10x slower. Also, the dictionary approach is almost instantly.
My question is, why?. is it because the predicate?.

The problem is in this line:
int idx = varbig.FindIndex(tmpvar => Name == name);
Name == name is wrong, you should write tmpvar.Name == name instead.
In your code you're comparing name argument with the Name property of your form; they are obviously different, and so the method always examines the whole list instead of stopping when the searched value is found. In fact, as you can see looking the numbers, the time spent by find2() is basically always equal.
About the dictionary, it's obviously faster than the other methods because dictionaries are memory structure specifically built to provide fast keyed access.
In fact they arrive close to O(1) time complexity, while looping a list you have a time complexity equal to O(n).

Find1 is using a simple for( i = 0 to count) method
Find2 uses the built in Find method (which is exactly find1 above), except that you have passed a predicate along with it, which I believe is slowing it down.
Find3 using a dictionary, I would assume is the fastest without any timers, becuase a dictionary uses hashtables under the covers which has an 0(1) look up (contant time)

There is the error in your code - the find2 method uses the Form.Name for the comparison instead of your collection objects names. It should looks like this:
public int find2(string name) {
return varbig.FindIndex((obj) => obj.Name == name);
}
The results without using the Form.Name are more consistent:
time 1 :54
time 2 :50
time 3 :0
timeb 1 :438
timeb 2 :506
timeb 3 :0

You don't need to put for loop to search in find2...
Just call find2 directly, then result will be 0.

Related

Checking if instance is null VS calling empty function (C#)

Introduction
So I was making a game, thinking how do I structure and update all my game objects. Do I (case 1) create a simple GameObj as a parent class and put some physics in virtual Update method, some default drawing in virtual Draw, etc, and make every other object (wall, enemy, player...) be the child, OR do I (case 2) use components as described in this article. In short, the writer explains that we could make interfaces for user input, physics update and draw (lets stop at those 3) and describe our GameObj with preprogrammed instances of these interfaces.
Now, in both cases I will get a loop of GameObj class.
In case 1 it would probably look something like this
// in Update function of the level class
for(int i = 0; i < gameObjList.Count; i++)
{
gameObjList[i].Update();
}
And in case 2, something like this
// in UpdatePhysics function of the level class
for(int i = 0; i < gameObjList.Count; i++)
{
gameObjList[i].PhysicComponent.Update();
}
And so on (in case 2) for other interfaces such as InputComponent.Update and DrawComponent.Draw (or CollisionComponent.Check(gameObj[x]), I dunno).
Reasons listed are ment to be inside a level class that takes care of all of our game objects
Reasons to consider if ( x != null )
In both cases we (could) have a situation where we need to call if ( x != null ). In case 1 we maybe don't want to delete and add to the gameObjList all the time, but recycle the instances, so we set them to null without doing something along the lines of gameObjList.Remove(x). In case 2 maybe we want to be able not to set some of the components, so we'd have to ask if (gameObjList[i].someComponent != null) to be able to call gameObjList[i].someComponent.Update().
Reasons to consider calling empty function
Also in both cases, we could just call an empty function (e.g. public void myFunction(){}). Lets consider the self explanatory Wall class. It exists just to be there. Id doesn't update, but it does have a certain relation to other GameObjs. Also, some of it's children in case 1, like a lets say MovingWall or Platform would have some sort of update. As for case 2, we could always declare a default, empty class of someComponent whose Update function would be empty, and so an instance of this class would be set to our GameObj component if none is set in the constructor. Maybe something like this
public GameObj(IPhysicsComponent physicsComponent, ...){
if(physicsComponent == null)
physicsComponent = PhysicsComponent.Default;
this.physicsComponent = physicsComponent;
}
Research
Now, I didn't find what would be the most efficient thing to do in a game engine we are building here. Here are some examples I just tested (note some of them are just for reference):
1. empty loop
2. empty function
3. if(x != null) x.empyFunction(); x is always null
4. x?.emptyFunction(); x is always null
5. if(x != null) x.empyFunction(); x is not null
6. x?.emptyFunction(); x is not null
7. myClass.staticEmptyFunction();
These 7 points are tested 100 000 times, 10 000 times. The code below is the code that I tested with. You can run in locally, change some of the static variables, and the result will appear in "result.txt" in the folder where you ran the program. Here is the code :
public enum TimeType
{
emptyLoop = 1,
loopEmptyFunction = 2,
loopNullCheck = 3,
loopNullCheckShort = 4,
loopNullCheckInstanceNotNull = 5,
loopNullCheckInstanceNotNullShort = 6,
loopEmptyStaticFunction = 7
}
class myTime
{
public double miliseconds { get; set; }
public long ticks { get; set; }
public TimeType type { get; set; }
public myTime() { }
public myTime(Stopwatch stopwatch, TimeType type)
{
miliseconds = stopwatch.Elapsed.TotalMilliseconds;
ticks = stopwatch.ElapsedTicks;
this.type = type;
}
}
class myClass
{
public static void staticEmptyFunction() { }
public void emptyFunction() { }
}
class Program
{
static List<myTime> timesList = new List<myTime>();
static int testTimesCount = 10000;
static int oneTestDuration = 100000;
static void RunTest()
{
Stopwatch stopwatch = new Stopwatch();
Console.Write("TEST ");
for (int j = 0; j < testTimesCount; j++)
{
Console.Write("{0}, ", j + 1);
myClass myInstance = null;
// 1. EMPTY LOOP
stopwatch.Start();
for (int i = 0; i < oneTestDuration; i++)
{
}
stopwatch.Stop();
timesList.Add(new myTime(stopwatch, (TimeType)1));
stopwatch.Reset();
// 3. LOOP WITH NULL CHECKING (INSTANCE IS NULL)
stopwatch.Start();
for (int i = 0; i < oneTestDuration; i++)
{
if (myInstance != null)
myInstance.emptyFunction();
}
stopwatch.Stop();
timesList.Add(new myTime(stopwatch, (TimeType)3));
stopwatch.Reset();
// 4. LOOP WITH SHORT NULL CHECKING (INSTANCE IS NULL)
stopwatch.Start();
for (int i = 0; i < oneTestDuration; i++)
{
myInstance?.emptyFunction();
}
stopwatch.Stop();
timesList.Add(new myTime(stopwatch, (TimeType)4));
stopwatch.Reset();
myInstance = new myClass();
// 2. LOOP WITH EMPTY FUNCTION
stopwatch.Start();
for (int i = 0; i < oneTestDuration; i++)
{
myInstance.emptyFunction();
}
stopwatch.Stop();
timesList.Add(new myTime(stopwatch, (TimeType)2));
stopwatch.Reset();
// 5. LOOP WITH NULL CHECKING (INSTANCE IS NOT NULL)
stopwatch.Start();
for (int i = 0; i < oneTestDuration; i++)
{
if (myInstance != null)
myInstance.emptyFunction();
}
stopwatch.Stop();
timesList.Add(new myTime(stopwatch, (TimeType)5));
stopwatch.Reset();
// 6. LOOP WITH SHORT NULL CHECKING (INSTANCE IS NOT NULL)
stopwatch.Start();
for (int i = 0; i < oneTestDuration; i++)
{
myInstance?.emptyFunction();
}
stopwatch.Stop();
timesList.Add(new myTime(stopwatch, (TimeType)6));
stopwatch.Reset();
// 7. LOOP WITH STATIC FUNCTION
stopwatch.Start();
for (int i = 0; i < oneTestDuration; i++)
{
myClass.staticEmptyFunction();
}
stopwatch.Stop();
timesList.Add(new myTime(stopwatch, (TimeType)7));
stopwatch.Reset();
}
Console.WriteLine("\nDONE TESTING");
}
static void GetResults()
{
// SUMS
double sum1t, sum2t, sum3t, sum4t, sum5t, sum6t, sum7t,
sum1m, sum2m, sum3m, sum4m, sum5m, sum6m, sum7m;
sum1t = sum2t = sum3t = sum4t = sum5t = sum6t = sum7t =
sum1m = sum2m = sum3m = sum4m = sum5m = sum6m = sum7m = 0;
foreach (myTime time in timesList)
{
switch (time.type)
{
case (TimeType)1: sum1t += time.ticks; sum1m += time.miliseconds; break;
case (TimeType)2: sum2t += time.ticks; sum2m += time.miliseconds; break;
case (TimeType)3: sum3t += time.ticks; sum3m += time.miliseconds; break;
case (TimeType)4: sum4t += time.ticks; sum4m += time.miliseconds; break;
case (TimeType)5: sum5t += time.ticks; sum5m += time.miliseconds; break;
case (TimeType)6: sum6t += time.ticks; sum6m += time.miliseconds; break;
case (TimeType)7: sum7t += time.ticks; sum7m += time.miliseconds; break;
}
}
// AVERAGES
double avg1t, avg2t, avg3t, avg4t, avg5t, avg6t, avg7t,
avg1m, avg2m, avg3m, avg4m, avg5m, avg6m, avg7m;
avg1t = sum1t / (double)testTimesCount;
avg2t = sum2t / (double)testTimesCount;
avg3t = sum3t / (double)testTimesCount;
avg4t = sum4t / (double)testTimesCount;
avg5t = sum5t / (double)testTimesCount;
avg6t = sum6t / (double)testTimesCount;
avg7t = sum7t / (double)testTimesCount;
avg1m = sum1m / (double)testTimesCount;
avg2m = sum2m / (double)testTimesCount;
avg3m = sum3m / (double)testTimesCount;
avg4m = sum4m / (double)testTimesCount;
avg5m = sum5m / (double)testTimesCount;
avg6m = sum6m / (double)testTimesCount;
avg7m = sum7m / (double)testTimesCount;
string fileName = "/result.txt";
using (StreamWriter tr = new StreamWriter(AppDomain.CurrentDomain.BaseDirectory + fileName))
{
tr.WriteLine(((TimeType)1).ToString() + "\t" + avg1t + "\t" + avg1m);
tr.WriteLine(((TimeType)2).ToString() + "\t" + avg2t + "\t" + avg2m);
tr.WriteLine(((TimeType)3).ToString() + "\t" + avg3t + "\t" + avg3m);
tr.WriteLine(((TimeType)4).ToString() + "\t" + avg4t + "\t" + avg4m);
tr.WriteLine(((TimeType)5).ToString() + "\t" + avg5t + "\t" + avg5m);
tr.WriteLine(((TimeType)6).ToString() + "\t" + avg6t + "\t" + avg6m);
tr.WriteLine(((TimeType)7).ToString() + "\t" + avg7t + "\t" + avg7m);
}
}
static void Main(string[] args)
{
RunTest();
GetResults();
Console.ReadLine();
}
}
When I put all the data in excel and made a chart, it looked like this (DEBUG):
EDIT - RELEASE version. I guess this answers my question.
The questions are
Q1. What approach to use to be more efficient?
Q2. In what case?
Q3. Is there official documentation on this?
Q4. Did anybody else test this, maybe more intensively?
Q5. Is there a better way to test this (is my code at fault)?
Q6. Is there a better way around the problem of huge lists of instances that need to be quickly and efficiently updated, as in - every frame?
EDIT Q7. Why does the static method take so much longer to execute in the release version?

As #grek40 suggested, I did another test where I called myClass.staticEmptyFunction(); 100 times before starting the test so that it can be cashed. I also did set testTimesCount to 10 000 and oneTestDuration to 1 000 000. Here are the results:
Now, it seems much more stable. Even the little differences you can spot I blame on my google chrome, excel and deluge running in the background. The questions I asked I asked because I thought that there would be a greater difference, but I guess that the optimisation worsk much much better than I expected. I also guess that nobody did this test because they probably knew that there's C behind it, and that people did amasing work on the optimisation.

Permutation on array of type "Location", from GoogleMapsAPI NuGet Package

this is my very first post here on StackOverflow so please tell me if I did anything wrong, also english is not my native language, forgive me if there is any gramatical errors.
My question is how can I permutate the items of an array of type "Location", I need to get all possible permutations of waypoints given by the user to then calculate the best route based on time or distance. (I don't want to use the normal route calculation)
I've searched for algorithms but all of them when I put the array of type "Location[]" in the function's parameter I get the error that the object needs to be IEnumerable and I don't know how to convert to that if is even possible, I never worked with IEnumerable.
If it is of any help this is my code for calculating the route:
//Gets the waypoints from a listBox provided by the user, "mode" selects between best time and best distance
//backgroundworker so the UI dont freezes, and return the optimal waypoint order
public Location[] CalcularRota(Location[] waypoints, int mode, BackgroundWorker work, DoWorkEventArgs e)
{
//Declarations
string origem = "";
string destino = "";
Rota[] prop = new Rota[100]; //this index is the number of times the algorithm will be executed, more equals accuracy but much more time to complete
Rota bestDist = new Rota();
Rota bestTime = new Rota();
DirectionService serv = new DirectionService();
DirectionRequest reqs = new DirectionRequest();
DirectionResponse resp;
Random rnd = new Random();
Location[] rndWays;
int dist = 0;
int ti = 0;
bestDist.Distance = 1000000000; //put higher values for the first comparation to be true (end of code)
bestTime.Time = 1000000000;
if (waypoints != null)
{
reqs.Sensor = false;
reqs.Mode = TravelMode.driving;
for (int i = 0; i < prop.Length; i++) //initializes prop
prop[i] = new Rota();
for (int i = 0; i < prop.Length; i++)
{
rndWays = waypoints.OrderBy(x => rnd.Next()).ToArray(); //randomizes the order, I want to get all permutations and then test
//but I dont know how so I've been using randomized
dist = ti = 0;
origem = prop[0].ToString(); //save this particular waypoint's origin and destination
destino = prop[1].ToString();
reqs.Origin = origem;
reqs.Destination = destino;
if (waypoints.Length > 0)
reqs.Waypoints = rndWays;
resp = serv.GetResponse(reqs); //request the route with X order of waypoints to google
if (resp.Status == ServiceResponseStatus.Ok) //wait the response otherwise the program crashes
{
for (int j = 0; j < resp.Routes[0].Legs.Length; j++) //gets the distance and time of this particular order
{
ti += int.Parse(resp.Routes[0].Legs[j].Duration.Value);
dist += int.Parse(resp.Routes[0].Legs[j].Distance.Value);
}
}
prop[i].Origem = origem; //saves this waypoints order details for further comparison
prop[i].Destino = destino;
prop[i].Distance = dist;
prop[i].Time = ti;
prop[i].Order = rndWays;
work.ReportProgress(i); //report the progress
}
for (int i = 0; i < prop.Length; i++) //gets the best distance and time
{
if (bestDist.Distance > prop[i].Distance)
{
bestDist.Distance = prop[i].Distance;
bestDist.Time = prop[i].Time;
bestDist.Order = prop[i].Order;
bestDist.Origem = prop[i].Origem;
bestDist.Destino = prop[i].Destino;
}
if (bestTime.Time > prop[i].Time)
{
bestTime.Distance = prop[i].Distance;
bestTime.Time = prop[i].Time;
bestTime.Order = prop[i].Order;
bestTime.Origem = prop[i].Origem;
bestTime.Destino = prop[i].Destino;
}
}
if (bestDist.Order == bestTime.Order) //if the same waypoint order has the same time and distance
return bestDist.Order; // returns whatever bestDist.Order or bestTime.Order
else if (bestDist.Order != bestTime.Order) //if different returns corresponding to the mode selected
{
if (mode == 1) return bestDist.Order;
if (mode == 2) return bestTime.Order;
}
}
return null;
}
What I want is to permutate the waypoints given and test each permutation, I've been struggling with this for a time, if u guys could help me with any way possible would be great.
Ty.
EDIT.
I found this function here on StackOverflow:
public static bool NextPermutation<T>(T[] elements) where T : IComparable<T>
{
var count = elements.Length;
var done = true;
for (var i = count - 1; i > 0; i--)
{
var curr = elements[i];
// Check if the current element is less than the one before it
if (curr.CompareTo(elements[i - 1]) < 0)
{
continue;
}
// An element bigger than the one before it has been found,
// so this isn't the last lexicographic permutation.
done = false;
// Save the previous (bigger) element in a variable for more efficiency.
var prev = elements[i - 1];
// Have a variable to hold the index of the element to swap
// with the previous element (the to-swap element would be
// the smallest element that comes after the previous element
// and is bigger than the previous element), initializing it
// as the current index of the current item (curr).
var currIndex = i;
// Go through the array from the element after the current one to last
for (var j = i + 1; j < count; j++)
{
// Save into variable for more efficiency
var tmp = elements[j];
// Check if tmp suits the "next swap" conditions:
// Smallest, but bigger than the "prev" element
if (tmp.CompareTo(curr) < 0 && tmp.CompareTo(prev) > 0)
{
curr = tmp;
currIndex = j;
}
}
// Swap the "prev" with the new "curr" (the swap-with element)
elements[currIndex] = prev;
elements[i - 1] = curr;
// Reverse the order of the tail, in order to reset it's lexicographic order
for (var j = count - 1; j > i; j--, i++)
{
var tmp = elements[j];
elements[j] = elements[i];
elements[i] = tmp;
}
// Break since we have got the next permutation
// The reason to have all the logic inside the loop is
// to prevent the need of an extra variable indicating "i" when
// the next needed swap is found (moving "i" outside the loop is a
// bad practice, and isn't very readable, so I preferred not doing
// that as well).
break;
}
// Return whether this has been the last lexicographic permutation.
return done;
}
The usage is:
NextPermutation(array);
Doing this and putting my array (rndWays) as overload I get the following error:
The type 'Google.Maps.Location' cannot be used as type parameter 'T' in the generic type or method 'Form1.NextPermutation< T >(T[])'. There is no implicit reference conversion from 'Google.Maps.Location' to 'System.IComparable< Google.Maps.Location >'.

The problem is that Location does not implement the IComparable interface.
Change:
public static bool NextPermutation<T>(T[] elements) where T : IComparable<T>
to:
public static bool NextPermutation(Location[] elements)
And replace each CompareTo() with your own comparison function.

Thread safety Parallel.For c#

im frenchi so sorry first sorry for my english .
I have an error on visual studio (index out of range) i have this problem only with a Parallel.For not with classic for.
I think one thread want acces on my array[i] and another thread want too ..
It's a code for calcul Kmeans clustering for building link between document (with cosine similarity).
more information :
IndexOutOfRange is on similarityMeasure[i]=.....
I have a computer with 2 Processor (12logical)
with classic for , cpu usage is 9-14% , time for 1 iteration=9min..
with parallel.for , cpu usage is 70-90% =p, time for 1 iteration =~1min30
Sometimes it works longer before generating an error
My function is :
private static int FindClosestClusterCenter(List<Centroid> clustercenter, DocumentVector obj)
{
float[] similarityMeasure = new float[clustercenter.Count()];
float[] copy = similarityMeasure;
object sync = new Object();
Parallel.For(0, clustercenter.Count(), (i) => //for(int i = 0; i < clustercenter.Count(); i++) Parallel.For(0, clustercenter.Count(), (i) => //
{
similarityMeasure[i] = SimilarityMatrics.FindCosineSimilarity(clustercenter[i].GroupedDocument[0].VectorSpace, obj.VectorSpace);
});
int index = 0;
float maxValue = similarityMeasure[0];
for (int i = 0; i < similarityMeasure.Count(); i++)
{
if (similarityMeasure[i] > maxValue)
{
maxValue = similarityMeasure[i];
index = i;
}
}
return index;
}
My function is call here :
do
{
prevClusterCenter = centroidCollection;
DateTime starttime = DateTime.Now;
foreach (DocumentVector obj in documentCollection)//Parallel.ForEach(documentCollection, parallelOptions, obj =>//foreach (DocumentVector obj in documentCollection)
{
int ind = FindClosestClusterCenter(centroidCollection, obj);
resultSet[ind].GroupedDocument.Add(obj);
}
TimeSpan tempsecoule = DateTime.Now.Subtract(starttime);
Console.WriteLine(tempsecoule);
//Console.ReadKey();
InitializeClusterCentroid(out centroidCollection, centroidCollection.Count());
centroidCollection = CalculMeanPoints(resultSet);
stoppingCriteria = CheckStoppingCriteria(prevClusterCenter, centroidCollection);
if (!stoppingCriteria)
{
//initialisation du resultat pour la prochaine itération
InitializeClusterCentroid(out resultSet, centroidCollection.Count);
}
} while (stoppingCriteria == false);
_counter = counter;
return resultSet;
FindCosSimilarity :
public static float FindCosineSimilarity(float[] vecA, float[] vecB)
{
var dotProduct = DotProduct(vecA, vecB);
var magnitudeOfA = Magnitude(vecA);
var magnitudeOfB = Magnitude(vecB);
float result = dotProduct / (float)Math.Pow((magnitudeOfA * magnitudeOfB),2);
//when 0 is divided by 0 it shows result NaN so return 0 in such case.
if (float.IsNaN(result))
return 0;
else
return (float)result;
}
CalculMeansPoint :
private static List<Centroid> CalculMeanPoints(List<Centroid> _clust)
{
for (int i = 0; i < _clust.Count(); i++)
{
if (_clust[i].GroupedDocument.Count() > 0)
{
for (int j = 0; j < _clust[i].GroupedDocument[0].VectorSpace.Count(); j++)
{
float total = 0;
foreach (DocumentVector vspace in _clust[i].GroupedDocument)
{
total += vspace.VectorSpace[j];
}
_clust[i].GroupedDocument[0].VectorSpace[j] = total / _clust[i].GroupedDocument.Count();
}
}
}
return _clust;
}

You may have some side effects in FindCosineSimilarity, make sure it does not modify any field or input parameter. Example: resultSet[ind].GroupedDocument.Add(obj);. If resultSet is not a reference to locally instantiated array, then that is a side effect.
That may fix it. But FYI you could use AsParallel for this rather than Parallel.For:
similarityMeasure = clustercenter
.AsParallel().AsOrdered()
.Select(c=> SimilarityMatrics.FindCosineSimilarity(c.GroupedDocument[0].VectorSpace, obj.VectorSpace))
.ToArray();

You realize that if you synchronize the whole Content of the Parallel-For, it's just the same as having a normal synchrone for-loop, right? Meaning the code as it is doesnt do anything in parallel, so I dont think you'll have any Problems with concurrency. My guess from what I can tell is clustercenter[i].GroupedDocument is propably an empty Array.

Fast way to use String.Contains with huge list C#

I have somethings like this:
List<string> listUser;
listUser.Add("user1");
listUser.Add("user2");
listUser.Add("userhacker");
listUser.Add("user1other");
List<string> key_blacklist;
key_blacklist.Add("hacker");
key_blacklist.Add("other");
foreach (string user in listUser)
{
foreach (string key in key_blacklist)
{
if (user.Contains(key))
{
// remove it in listUser
}
}
}
The result of listUser is: user1, user2.
The problem is if i have a huge listUser (more than 10 million) and huge key_blacklist (100.000). That code is very very slow.
Is have anyway to get that faster?
UPDATE: I find new solution in there.
http://cc.davelozinski.com/c-sharp/fastest-way-to-check-if-a-string-occurs-within-a-string
Hope that will help someone when he got in there! :)

If you don't have much control over how the list of users is constructed, you can at least test each item in the list in parallel, which on modern machines with multiple cores will speed up the checking a fair bit.
listuser.AsParallel().Where(
s =>
{
foreach (var key in key_blacklist)
{
if (s.Contains(key))
{
return false; //Not to be included
}
}
return true; //To be included, as no match with the blacklist
});
Also - do you have to use .Contains? .Equals is going to be much much quicker, because in almost all cases a non-match will be determined when the HashCodes differ, which can be found only by an integer comparison. Super quick.
If you do need .Contains, you may want to think about restructuring the app. What do these strings in the list really represent? Separate sub-groups of users? Can I test each string, at the time it's added, for whether it represents a user on the blacklist?
UPDATE: In response to #Rawling's comment below - If you know that there is a finite set of usernames which have, say, "hacker" as a substring, that set would have to be pretty large before running a .Equals test of each username against a candidate would be slower than running .Contains on the candidate. This is because HashCode is really quick.

If you are using entity framework or linq to sql then using linq and sending the query to a server can improve the performance.
Then instead of removing the items you are actually querying for the items that fulfil the requirements, i.e. user where the name doesn't contain the banned expression:
listUser.Where(u => !key_blacklist.Any(u.Contains)).Select(u => u).ToList();

A possible solution is to use a tree-like data structure.
The basic idea is to have the blacklisted words organised like this:
+ h
| + ha
| + hac
| - hacker
| - [other words beginning with hac]
|
+ f
| + fu
| + fuk
| - fukoff
| - [other words beginning with fuk]
Then, when you check for blacklisted words, you avoid searching the whole list of words beginning with "hac" if you find out that your user string does not even contain "h".
In the example I provided, with your sample data, this does not of course make any difference, but with the real data sets this should reduce significantly the number of Contains, since you don't check against the full list of blacklisted words every time.
Here is a code example (please note that the code is pretty bad, this is just to illustrate my idea)
using System;
using System.Collections.Generic;
using System.Linq;
class Program {
class Blacklist {
public string Start;
public int Level;
const int MaxLevel = 3;
public Dictionary<string, Blacklist> SubBlacklists = new Dictionary<string, Blacklist>();
public List<string> BlacklistedWords = new List<string>();
public Blacklist() {
Start = string.Empty;
Level = 0;
}
Blacklist(string start, int level) {
Start = start;
Level = level;
}
public void AddBlacklistedWord(string word) {
if (word.Length > Level && Level < MaxLevel) {
string index = word.Substring(0, Level + 1);
Blacklist sublist = null;
if (!SubBlacklists.TryGetValue(index, out sublist)) {
sublist = new Blacklist(index, Level + 1);
SubBlacklists[index] = sublist;
}
sublist.AddBlacklistedWord(word);
} else {
BlacklistedWords.Add(word);
}
}
public bool ContainsBlacklistedWord(string wordToCheck) {
if (wordToCheck.Length > Level && Level < MaxLevel) {
foreach (var sublist in SubBlacklists.Values) {
if (wordToCheck.Contains(sublist.Start)) {
return sublist.ContainsBlacklistedWord(wordToCheck);
}
}
}
return BlacklistedWords.Any(x => wordToCheck.Contains(x));
}
}
static void Main(string[] args) {
List<string> listUser = new List<string>();
listUser.Add("user1");
listUser.Add("user2");
listUser.Add("userhacker");
listUser.Add("userfukoff1");
Blacklist blacklist = new Blacklist();
blacklist.AddBlacklistedWord("hacker");
blacklist.AddBlacklistedWord("fukoff");
foreach (string user in listUser) {
if (blacklist.ContainsBlacklistedWord(user)) {
Console.WriteLine("Contains blacklisted word: {0}", user);
}
}
}
}

You are using the wrong thing. If you have a lot of data, you should be using either HashSet<T> or SortedSet<T>. If you don't need the data sorted, go with HashSet<T>. Here is a program I wrote to demonstrate the time differences:
class Program
{
private static readonly Random random = new Random((int)DateTime.Now.Ticks);
static void Main(string[] args)
{
Console.WriteLine("Creating Lists...");
var stringList = new List<string>();
var hashList = new HashSet<string>();
var sortedList = new SortedSet<string>();
var searchWords1 = new string[3];
int ndx = 0;
for (int x = 0; x < 1000000; x++)
{
string str = RandomString(10);
if (x == 5 || x == 500000 || x == 999999)
{
str = "Z" + str;
searchWords1[ndx] = str;
ndx++;
}
stringList.Add(str);
hashList.Add(str);
sortedList.Add(str);
}
Console.WriteLine("Lists created!");
var sw = new Stopwatch();
sw.Start();
bool search1 = stringList.Contains(searchWords1[2]);
sw.Stop();
Console.WriteLine("List<T> {0} ==> {1}ms", search1, sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
search1 = hashList.Contains(searchWords1[2]);
sw.Stop();
Console.WriteLine("HashSet<T> {0} ==> {1}ms", search1, sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
search1 = sortedList.Contains(searchWords1[2]);
sw.Stop();
Console.WriteLine("SortedSet<T> {0} ==> {1}ms", search1, sw.ElapsedMilliseconds);
}
private static string RandomString(int size)
{
var builder = new StringBuilder();
char ch;
for (int i = 0; i < size; i++)
{
ch = Convert.ToChar(Convert.ToInt32(Math.Floor(26 * random.NextDouble() + 65)));
builder.Append(ch);
}
return builder.ToString();
}
}
On my machine, I got the following results:
Creating Lists...
Lists created!
List<T> True ==> 15ms
HashSet<T> True ==> 0ms
SortedSet<T> True ==> 0ms
As you can see, List<T> was extremely slow comparted to HashSet<T> and SortedSet<T>. Those were almost instantaneous.

C# Properties, Why check for equality before assignment

Why do I see people implement properties like this?
What is the point of checking if the value is equal to the current value?
public double? Price
{
get
{
return _price;
}
set
{
if (_price == value)
return;
_price = value;
}
}

In this case it would be moot; however, in the case where there is an associated side-effect (typically an event), it avoids trivial events. For example:
set
{
if (_price == value)
return;
_price = value;
OnPriceChanged(); // invokes the Price event
}
Now, if we do:
foo.Price = 16;
foo.Price = 16;
foo.Price = 16;
foo.Price = 16;
we don't get 4 events; we get at most 1 (maybe 0 if it is already 16).
In more complex examples there could be validation, pre-change actions and post-change actions. All of these can be avoided if you know that it isn't actually a change.
set
{
if (_price == value)
return;
if(value < 0 || value > MaxPrice) throw new ArgumentOutOfRangeException();
OnPriceChanging();
_price = value;
OnPriceChanged();
}

This is not an answer, more: it is an evidence-based response to the claim (in another answer) that it is quicker to check than to assign. In short: no, it isn't. No difference whatsoever. I get (for non-nullable int):
AutoProp: 356ms
Field: 356ms
BasicProp: 357ms
CheckedProp: 356ms
(with some small variations on successive runs - but essentially they all take exactly the same time within any sensible rounding - when doing something 500 MILLION times, we can ignore 1ms difference)
In fact, if we change to int? I get:
AutoProp: 714ms
Field: 536ms
BasicProp: 714ms
CheckedProp: 2323ms
or double? (like in the question):
AutoProp: 535ms
Field: 535ms
BasicProp: 539ms
CheckedProp: 3035ms
so this is not a performance helper!
with tests
class Test
{
static void Main()
{
var obj = new Test();
Stopwatch watch;
const int LOOP = 500000000;
watch = Stopwatch.StartNew();
for (int i = 0; i < LOOP; i++)
{
obj.AutoProp = 17;
}
watch.Stop();
Console.WriteLine("AutoProp: {0}ms", watch.ElapsedMilliseconds);
watch = Stopwatch.StartNew();
for (int i = 0; i < LOOP; i++)
{
obj.Field = 17;
}
watch.Stop();
Console.WriteLine("Field: {0}ms", watch.ElapsedMilliseconds);
watch = Stopwatch.StartNew();
for (int i = 0; i < LOOP; i++)
{
obj.BasicProp = 17;
}
watch.Stop();
Console.WriteLine("BasicProp: {0}ms", watch.ElapsedMilliseconds);
watch = Stopwatch.StartNew();
for (int i = 0; i < LOOP; i++)
{
obj.CheckedProp = 17;
}
watch.Stop();
Console.WriteLine("CheckedProp: {0}ms", watch.ElapsedMilliseconds);
Console.ReadLine();
}
public int AutoProp { get; set; }
public int Field;
private int basicProp;
public int BasicProp
{
get { return basicProp; }
set { basicProp = value; }
}
private int checkedProp;
public int CheckedProp
{
get { return checkedProp; }
set { if (value != checkedProp) checkedProp = value; }
}
}

Let's suppose we don't handle any change related events.
I don't think comparing is faster than assingment. It depends on the data type. Let's say you have a string, Comparison is much longer in the worst case than a simple assignment where the member simply changes reference to the ref of the new string.
So my guess is that it's better in that case to assign right away.
In the case of simple data types it doesn't have a real impact.

Such that, you dont have to re-assign the same value. Its just faster execution for comparing values. AFAIK

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

find and findindex painful slow for List<Object> why? - c#

You don't need to put for loop to search in find2... Just call find2 directly, then result will be 0.

Related

Checking if instance is null VS calling empty function (C#)

Permutation on array of type "Location", from GoogleMapsAPI NuGet Package

Thread safety Parallel.For c#

Fast way to use String.Contains with huge list C#

C# Properties, Why check for equality before assignment

Categories

Resources