Modify items in List<T> fast - c#

I tried to make this code perform faster using Parallel.ForEach and ConcurrentBag but it's still running way to long (esp. when having in mind that in my scenario i may also be 1.000.000++):
List<Point> points = new List<Point>();
for(int i = 0; i<100000;i++) {
Point point = new Point {X = i-50000, Y = i+50000, CanDelete = false};
points.Add(point);
}
foreach (Point point in points) {
foreach (Point innerPoint in points) {
if (innerPoint.CanDelete == false && (point.X - innerPoint.X) < 2) {
innerPoint.Y = point.Y;
point.CanDelete = true;
}
}
}

That code will perform WORSE in parallel, due to the data access patterns.
The best way to speed it up is to recognize that you don't need to consider all O(N^2) pairs of points, but only the ones having nearby X-coordinates.
First, sort the list by X-coordinate, O(N log N), then process forward and backward in the list from each point until you leave the neighborhood. You'll need to use indexing and not foreach.
If your sample data, the list is already sorted.
Since your distance test is symmetric, and removes matching points from consideration, you can skip looking at earlier points.
for (int j = 0; j < points.Length; ++j) {
int x1 = points[j].X;
//for (int k = j; k >= 0 && points[k].X > x1 - 2; --k ) { /* merge points */ }
for (int k = j + 1; k < points.Length && points[k].X < x1 + 2; ++k ) { /* merge points */ }
}
Not only is the complexity better, the cache behavior is far superior. And it can be split among multiple threads with far less cache contention.

Well, I don't know exactly what do you want, but let's try.
First, when creating the List, you might want to set it's desired initial size, since you know how many items it will hold. So it does not need to grow all the time.
List<Point> points = new List<Point>(100000);
Next, you could sort the list by the X property. So you would only compare each point with the points that are near it: when you find the first, forward or backward, that is too distant, you can stop comparing.

Related

A for loop somehow overrides an entire array, even though I can see no reason for that

So, been looking at this code for a good while now, and I am lost.
The point is to run a for loop that adds classes to an array, and then for each class runs through an array of points inside of that class, and add variations to it.
This then shows as a bunch of dots on a form, which are supposed to move independently of each other, but now follows each other completely.
It does not matter how much variation there is or anything, it's just 99 dots with the exact same acceleration, velocity, and location, and path.
The code is here, the method isn't touched by any other code, and the problem arises before it returns.
//Point of the method is to put variations of Baby into an array, and return that array
Dot.Class[] MutateAndListBaby(Dot.Class Baby)
{
//Making the empty array
Dot.Class[] BabyList = new Dot.Class[dots.Length];
//For loop that goes through through the whole array
for (int i = 1; i < BabyList.Length; i++)
{
//For each itteration the for loop adds the class reference to the index, then puts the standard directions into that reference, and then sets a value preventing it from being changed in another code
BabyList[i] = new Dot.Class();
BabyList[i].Directions = Baby.Directions;
BabyList[i].StartupComplete = true;
//The zero index variation when made like this, allows it to not be overriden, which would lead one to believe that how the directions are copied is he problem
//But it shouldn't be, BabyList[i].Directions = Baby.Directions; should be fire and forget, it should just add the Directions to the array and then leave it
BabyList[0] = new Dot.Class();
BabyList[0].Directions = new PointF[100];
for (int b = 0; b < BabyList[0].Directions.Length; b++)
{
BabyList[0].Directions[b] = new Point (5, 10);
}
BabyList[0].StartupComplete = true;
//The for loop that shuld add variation, but it seems like it somehow overrides it self, somehow
for (int b = 0; b < BabyList[i].Directions.Length; b++)
{
if (rand.Next(0, 101) >= 100)
{
int rando = rand.Next(-50, 51);
float mod = (float)rando / 50;
float x = BabyList[i].Directions[b].X;
x = x + mod;
BabyList[i].Directions[b].X = rand.Next(-5, 6);
}
if (rand.Next(0, 101) >= 100)
{
int rando = rand.Next(-50, 51);
float mod = (float)rando / 50;
float y = BabyList[i].Directions[b].Y;
y = y * mod;
BabyList[i].Directions[b].Y = rand.Next(-5, 6);
}
}
//Now one would assume this would create a unique dot that would move 100% independently right? Since it's at the end of the for loop, so nothin should change it
// Nope, somehow it makes every other dot copy its directions...
if (i == 5)
{
for (int b = 0; b < BabyList[5].Directions.Length; b++)
{
BabyList[5].Directions[b] = new PointF(-5f, -5f);
}
}
}
return BabyList;
}
}
}
With the code there, what I get is the 0 index dot going its own way, while the other 99 dots for some reason follow the 5th index's Directions, even though they should get their own variations later on in the code.
Any help would be much appreciated, it probarbly something obvious, but trust me, been looking at this thing for quite a while, can't see anything.
If I understand you correctly, this might be the issue:
BabyList[i].Directions = Baby.Directions;
Directions is of type array of PointF - a reference. The line above does not copy the array. Is that what you assume? If I'm not misreading the code you're presenting, you're creating one Dot.Class with its own array of PointF at index 0 and fill the rest of your Dot.Class array with instances that share one single array.
Directions is array, which is a reference type. When you're making assigment of a variable of this type
BabyList[i].Directions = Baby.Directions;
no new instance is created and reference us just being copied into new variable which still references original instance. Essentially in your loop only very first item gets a new instance of Directions as it's explicitly constructed. The rest share the instance which comes as a member of parameter passed to the method.
You probably want to change your if conditions:
(rand.Next(0, 101) >= 100
to
(rand.Next(0, 100) < 99
This will run an average of 99 times out of 100, whereas your current condition runs 1 out of 101 times (on average)
Oh, and Benjamin Podszun's answer about assigning the same array (not a copy of the same array) to Directions apply as well!
(Assuming that Directions isn't a getter that you created to return a copy of an array instead of a reference!)

I can't seem to generate PsuedoRandom numbers based on co-ordinates

I am trying to create an infinite map of 1 and 0's procedurally generated by PRNG but only storing a grid of 7x7 which is viewable. e.g. on the screen you will see a 7x7 grid which shows part of the map of 1's and 0's and when you push right it then shifts the 1's and 0's across and puts the next row in place.
If you then shift left as it is pseudo randomly generated it regenerates and shows the original. You will be able to shift either way and up and down infinitely.
The problem I have is that the initial map does not look that random and the pattern repeats itself as you scroll right or left and I think it is something to do with how I generate the numbers. e.g. you could start at viewing x=5000 and y= 10000 and therefore it need to generate a unique seed from these two values.
(Complexity is the variable for the size of the viewport)
void CreateMap(){
if(complexity%2==0) {
complexity--;
}
if (randomseed) {
levelseed=Time.time.ToString();
}
psuedoRandom = new System.Random(levelseed.GetHashCode());
offsetx=psuedoRandom.Next();
offsety=psuedoRandom.Next();
levellocation=psuedoRandom.Next();
level = new int[complexity,complexity];
middle=complexity/2;
for (int x=0;x<complexity;x++){
for (int y=0;y<complexity;y++){
int hashme=levellocation+offsetx+x*offsety+y;
psuedoRandom = new System.Random(hashme.GetHashCode());
level[x,y]=psuedoRandom.Next(0,2);
}
}
}
This is what I use for left and right movement,
void MoveRight(){
offsetx++;
for (int x=0;x<complexity-1;x++){
for (int y=0;y<complexity;y++){
level[x,y]=level[x+1,y];
}
}
for (int y=0;y<complexity;y++){
psuedoRandom = new System.Random(levellocation+offsetx*y);
level[complexity-1,y]=psuedoRandom.Next(0,2);
}
}
void MoveLeft(){
offsetx--;
for (int x=complexity-1;x>0;x--){
for (int y=0;y<complexity;y++){
level[x,y]=level[x-1,y];
}
}
for (int y=0;y<complexity;y++){
psuedoRandom = new System.Random(levellocation+offsetx*y);
level[0,y]=psuedoRandom.Next(0,2);
}
}
To distill it down I need to be able to set
Level[x,y]=Returnrandom(offsetx,offsety)
int RandomReturn(int x, int y){
psuedoRandom = new System.Random(Whatseedshouldiuse);
return (psuedoRandom.Next (0,2));
}
Your problem I think is that you are creating a 'new' random in loops... don't redeclare inside loops as you are... instead declare it at class level and then simply use psuedoRandom.Next e.g. See this SO post for an example of the issue you are experiencing
instead of re-instantiating a Random() class at every iteration like you are doing:
for (int x=0;x<complexity;x++){
for (int y=0;y<complexity;y++){
int hashme=levellocation+offsetx+x*offsety+y;
psuedoRandom = new System.Random(hashme.GetHashCode());
level[x,y]=psuedoRandom.Next(0,2);
}
}
do something more like
for (int x=0;x<complexity;x++){
for (int y=0;y<complexity;y++){
// Give me the next random integer
moreRandom = psuedoRandom.Next();
}
}
EDIT: As Enigmativity has pointed out in a comment below this post, reinstiating at every iteration is also a waste of time / resources too.
P.S. If you really need to do it then why not use the 'time-dependent default seed value' instead of specifying one?
I'd use SipHash to hash the coordinates:
It's relatively simple to implement
It takes both a key (seed for your map) and a message (the coordinates) as inputs
You can increase or decrease the number of rounds as a quality/performance trade-off.
Unlike your code, the results will be reproducible across processes and machines.
The SipHash website lists two C# implementations:
https://github.com/tanglebones/ch-siphash
https://github.com/BrandonHaynes/siphash-csharp
I've play a bit to create rnd(x, y, seed):
class Program
{
const int Min = -1000; // define the starting point
static void Main(string[] args)
{
for (int x = 0; x < 10; x++)
{
for (int y = 0; y < 10; y++)
Console.Write((RND(x, y, 123) % 100).ToString().PadLeft(4));
Console.WriteLine();
}
Console.ReadKey();
}
static int RND(int x, int y, int seed)
{
var rndX = new Random(seed);
var rndY = new Random(seed + 123); // a bit different
// because Random is LCG we can only move forward
for (int i = Min; i < x - 1; i++)
rndX.Next();
for (int i = Min; i < y - 1; i++)
rndY.Next();
// return some f(x, y, seed) = function(rnd(x, seed), rnd(y, seed))
return rndX.Next() + rndY.Next() << 1;
}
}
It works, but ofc an afwul implementation (because Random is LCG), it should give a general idea though. It's memory efficient (no arrays). Acceptable compromise is to implement value caching, then while located in certain part of map surrounding values will be taken from cache instead of generating.
For same x, y and seed you will always get same value (which in turn can be used as a cell seed).
TODO: find a way to make Rnd(x) and Rnd(y) without LCG and highly inefficient initialization (with cycle).
I solved it with this, after all of you suggested approaches.
void Drawmap(){
for (int x=0;x<complexity;x++){
for (int y=0;y<complexity;y++){
psuedoRandom = new System.Random((x+offsetx)*(y+offsety));
level[x,y]=psuedoRandom.Next (0,2);
}
}

Binary search slower, what am I doing wrong?

EDIT: so it looks like this is normal behavior, so can anyone just recommend a faster way to do these numerous intersections?
so my problem is this. I have 8000 lists (strings in each list). For each list (ranging from size 50 to 400), I'm comparing it to every other list and performing a calculation based on the intersection number. So I'll do
list1(intersect)list1= number
list1(intersect)list2= number
list1(intersect)list888= number
And I do this for every list. Previously, I had HashList and my code was essentially this: (well, I was actually searching through properties of an object, so I
had to modify the code a bit, but it's basically this:
I have my two versions below, but if anyone knows anything faster, please let me know!
Loop through AllLists, getting each list, starting with list1, and then do this:
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
foreach (string word in list1)
{
if (block.generator_list.Contains(word))
{
//simple integer count
}
}
}
// a little more code, but the same, but looping through the other list if it's smaller/bigger
Then I make the lists into regular lists, and applied Sort(), which changed my code to
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
for (int i = 0; i < list1_length; i++)
{
var test = list.BinarySearch(list1[i]);
if (test > -1)
{
//simple integer count
}
}
}
The first version takes about 6 seconds, the other one takes more than 20 (I just stop there cuz otherwise it would take more than a minute!!!) (and this is for a smallish subset of the data)
I'm sure there's a drastic mistake somewhere, but I can't find it.
Well I have tried three distinct methods for achieving this (assuming I understood the problem correctly). Please note I have used HashSet<int> in order to more easily generate random input.
setting up:
List<HashSet<int>> allSets = new List<HashSet<int>>();
Random rand = new Random();
for(int i = 0; i < 8000; ++i) {
HashSet<int> ints = new HashSet<int>();
for(int j = 0; j < rand.Next(50, 400); ++j) {
ints.Add(rand.Next(0, 1000));
}
allSets.Add(ints);
}
the three methods I checked (code is what runs in the inner loop):
the loop:
note that you are getting duplicated results in your code (intersecting set A with set B and later intersecting set B with set A).
It won't affect your performance thanks to the list length check you are doing. But iterating this way is clearer.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
}
}
first method:
used IEnumerable.Intersect() to get the intersection with the other list and checked IEnumerable.Count() to get the size of the intersection.
var intersect = allSets[i].Intersect(allSets[j]);
count = intersect.Count();
this was the slowest one averaging 177s
second method:
cloned the smaller set of the two sets I was intersecting, then used ISet.IntersectWith() and checked the resulting sets Count.
HashSet<int> intersect;
HashSet<int> intersectWith;
if(allSets[i].Count < allSets[j].Count) {
intersect = new HashSet<int>(allSets[i]);
intersectWith = allSets[j];
} else {
intersect = new HashSet<int>(allSets[j]);
intersectWith = allSets[i];
}
intersect.IntersectWith(intersectWith);
count = intersect.Count;
}
}
this one was slightly faster, averaging 154s
third method:
did something very similar to what you did iterated over the shorter set and checked ISet.Contains on the longer set.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
count = 0;
if(allSets[i].Count < allSets[j].Count) {
loopingSet = allSets[i];
containsSet = allSets[j];
} else {
loopingSet = allSets[j];
containsSet = allSets[i];
}
foreach(int k in loopingSet) {
if(containsSet.Contains(k)) {
++count;
}
}
}
}
this method was by far the fastest (as expected), averaging 66s
conclusion
the method you're using is the fastest of these three. I certainly can't think of a faster single threaded way to do this. Perhaps there is a better concurrent solution.
I've found that one of the most important considerations in iterating/searching any kind of collection is to choose the collection type very carefully. To iterate through a normal collection for your purposes will not be the most optimal. Try using something like:
System.Collections.Generic.HashSet<T>
Using the Contains() method while iterating over the shorter list of two (as you mentioned you're already doing) should give close to O(1) performance, the same as key lookups in the generic Dictionary type.

Getting all the possible distances between points

I have created a program to spawn a number of points (the number is given by the user).
Therefore the program spawns N points
see the image for an example, it has 3 points in that case
What I need is to get all the possible distances between those villages (In the example it's distance: AB, AC, BC).
The points are stored in a single array (that scores x-coordinate and y-coordinate)
List<Villages>
I know that I new Pythagoras Theorem, I just cannot get the foreach loop right.
I would think you would want a regular nested for-loop rather than foreach.
Something like this should work:
for (int i = 0; i < villageList.Count; ++i)
{
for (int j = i + 1; j < villageList.Count; ++j)
{
distanceFunc(villageList[i], villagelist[j]);
}
}
Where distanceFunc is whatever implementation of a distance function you want to use and villageList is your List of villages.
The reason you would use for-loops is because you need the inner loop to start one element past the the outer loop (i + 1), and foreach loops don't let you easily access the index you're currently at (they let you access the element itself, but not easily see it's position in the array).
You need two for loops:
var villages = new List<Villages>() { ... };
for (int i = 0; i < villages.Count - 1; i++)
for (int j = i + 1; j < villages.Count; j++)
Console.WriteLine(getDistance(villages[i], villages[j]));
Where getDistance you should write yourself. It should return a distance between two specified Villages.
How about something like this pseudocode:
villages = [a, b, c, ...]
for i=0 to len(villages)-2:
for j=i+1 to len(villages)-1:
print(villages[i], villages[j], dist(villages[i], villages[j]))

Get 3 most common Point from List<Point>

I have a quick question that I haven't found out how to do efficiently (in C#).
I have a list array of Points (X,Y). I need to find which 3 points are the tightest cluster. It's for a mapping project.
What would the best way to do this be? There's only about 6 to 9 items in the list.
Thanks in advance.
Cheers!
For such small numbers, the brute force method should work just fine. With six points, there are 20 possible combinations of three points. With 9 points, there are 84 possible combinations. I wouldn't recommend this approach for a lot of points, but with just a handful, it's going to be plenty fast enough and it's dead simple to write.
You can easily generate the combinations:
for (int i = 0; i < points.Length - 2; ++i)
{
for (j = i + 1; j < points.Length - 1; j++)
{
for (k = j + 1; k < points.Length; k++)
{
// Here, your three points are
// points[i], points[j], and points[k]
// compute "tightness" and store
}
}
}
You'll need a structure to hold your combinations:
struct PointGroup
{
public readonly int i;
public readonly int j;
public readonly int k;
public readonly double tightness;
public PointGroup(int i, int j, int k, double tight)
{
this.i = i;
this.j = j;
this.k = k;
this.tightness = tight;
}
}
If you create one of those structures for each group and store them in an array, you can simply sort the array and take the best three.
Your bigger problem is coming up with a definition of "tight group." Also, you have to decide if a point can be in more than one of those "tightest" groups. Three possible ways to define tightness are:
The sum of the distances between the points is minimized.
The average distance from each point to the center of the group is minimized.
The circumference of the triangle formed by the three points is minimized.
Undoubtedly there are more.
If the points are not identical, this becomes a form of cluster analysis.
There are various algorithms that differ in how they measure and "cluster" points, though with only a few points, a brute force approach might be the easiest... You could just measure the distance between each pair of points, and sort...
You can simplify the problem as follows:
Don't check a Point against itself; distance is zero.
Exploit symmetry: distance from Point i to Point j is the same as Point j to Point i
Those eliminate a number of combinations.
But, given those, you have to calculate the distance between each pair and sort.

Categories

Resources