List of Lists in C#, problem with indexes - c#

In C#, I create a list of lists and try to access/modify the elements. There is a problem in that it seems that an operation (e.g. adding a constant) seems to apply not only to the wanted index but to all elements.
Here is the piece of code:
List<List<double>> D = new List<List<double>>();
List<double> ttmp = new List<double>(new double[256]);
for (int i = 0; i < 100; i++)
{
D.Add(ttmp);
}
for (int i = 0; i < 100; i++)
{
D[i][0] = D[i][0] + 1;
}
D is a list of 100 lists, each of size 256. It initially contains only zeroes. In the second loop, I ask that the first element of every of the 100 lists be incremented by one.
As a results, the entire "matrix" is filled with ones, i.e. not only D[0][0], D[1][0] ... D[99][0], but also D[0][1], D[0][2] , etc.
Why is that?
NB: the C++ equivalent with vector<vector<double>> works perfectly fine...

When code is executed, result is not that
D[0][0], D[1][0] ... D[99][0], but also D[0][1], D[0][2] are modified.
Result is that inner lists have [0] element equal to 100.
Why so? Because you created one list, and added it 100 times into D. But that is the same list - when you modify it, you modify all instances (because this is reference data type).
Change it instead to:
List<List<double>> D = new List<List<double>>();
for (int i = 0; i < 100; i++)
{
D.Add(new List<double>(new double[256]));
}
foreach (var innerList in D)
{
innerList[0]++;
}

Related

Problem adding an item into a 2 dimension List in C# (unexpected behavior)

Can you tell me why am I getting the following result from this code
public static void CountSort(List<List<string>> arr)
{
for (int i = 0; i < arr.Count / 2; i++)
arr[i][1] = "-";
List<List<string>> result = Enumerable.Repeat(new List<string>(), arr.Count).ToList();
for (int i = 0; i < arr.Count; i++)
{
string tempValue = arr[i][1];
int index = int.Parse(arr[i][0]);
result[index].Add(tempValue);
}
}
The result is multi dimension List of strings. Whenever the loop goes through, it adds that temp value to all of the sub arrays rather than adding it to the one I want to add it based on index.
I am breaking my head now trying to figure out why is that.
Output looks something like this:
List<List<string>("1"), List<string>("1"),List<string>("1")>
Output I am expecting is :
List<List<string>(), List<string>("1"), List<string>()>
Look at this line:
result = Enumerable.Repeat(new List<string>(), arr.Count).ToList();
This creates one List<string> instance and repeats this instance in the resulting enumerable/list. Again, only one List<string> instance is being created here. Regardless which index you are using on result to get a sub list, it's always the same List<string> instance.
Thus, your code adds the temp values to all sub lists, because all the sub lists are in reality just the same single List<string> instance.
Solution: Instead of using Enumerable.Repeat (which just repeats the given List<string> instance instead of creating new List<string> instances) just do it the old-fashioned way and create all the required List<string> instances yourself:
List<List<string>> result = new();
for (int i = 0; i < arr.Count; ++i)
result.Add(new List<string>());

How to uodate array inside a for loop and add it to a list

I am trying to update an array and add it to a list if a certain condition is true. As you can see in my code the array "rows" is updated every time inside the if condition, and the it is added to "checkList".
The problem is that when I iterate through the list to check the values, it seems that only the last value of rows has been added in every entry in the list.
Here is some code to explain
int[] rows = new int[2];
List<int[]> checkList = new List<int[]>();
for (int i = 0; i < 4; i++)
{
for (int j = 0; j < 4; j++)
{
if (true)
{
rows[0] = i;
rows[1] = j;
checkList.Add(rows);
}
}
}
foreach (var row in checkList)
{
Console.WriteLine(row[0] + " " + row[1]);
}
Output:
I hope someone can explain this. Thanks
Most object types in .NET (including arrays) are passed by reference, so checkList.Add(rows); adds a reference to the same array to the list, multiple times.
Instead, you'll want to create a new array instance every time:
List<int[]> checkList = new List<int[]>();
for (int i = 0; i < 4; i++)
{
for (int j = 0; j < 4; j++)
{
if (true)
{
checkList.Add(new int[]{ i, j });
}
}
}
I believe the issue here is that when you are
checkList.Add(rows);
You are adding a reference to the rows array every time to the list, not a separate copy of it. This leads to your current behaviour.
A solution would be to instantiate the array inside the loop, so a new array is created every iteration.
List<int[]> checkList = new List<int[]>();
for (int i = 0; i < 4; i++)
{
for (int j = 0; j < 4; j++)
{
if (true)
{
int[] rows = new int[2];
rows[0] = i;
rows[1] = j;
checkList.Add(rows);
}
}
}
As a supplement to Matthias answer, one of the things that's perhaps not easy to appreciate about C# is that most variables you have and use are merely a reference to something else. When you assign some variable like this:
int[] rows = new int[2];
C# creates some space in memory to keep an array of 2 integers, it attaches a reference to it, and that thing becomes your variable that you use, named rows. If you then do:
int[] rows2 = rows;
It doesn't clone the memory space used and create a new array, it just creates another reference attached to the same data in memory. If the data were a dog, it now has 2 leads attached to its collar but there is still only one dog. You can pull on either lead to urge the dog to stop peeing on a car, but it's the same dog you're affecting.
Array/list slots are just like variables in this regard. To say you have:
List<int[]> checkList = new List<int[]>();
Means declare a list where each of its slots are a variable capable of referring to an int array. It's conceptually no different to saying:
int[] checkList0 = row;
int[] checkList1 = row;
int[] checkList2 = row;
int[] checkList3 = row;
It's just that those numbers are baked into the name, whereas a list permits you a way of varying the name programmatically (and having more than 4 slots):
checkList[0] = row;
checkList[1] = row;
checkList[2] = row;
checkList[3] = row;
checkList[0] is conceptually the entire variable name, just like checkList0 is a variable name, and remember that this is hence just another variable that is just a reference to that same array in memory.
By not making a new array each time, you attached every variable slot in the list to the same array in memory, and thus you ended up with something in memory that looks like:
The black is the list, the blue is the array. Every list slot is just a reference to the same array. You might have changed the numbers in the array 200 times, but at the end of the oepration, because there was only ever one array, you only see the final set of numbers you wrote into the array. You might have attached 20 leads to your dog and pulled each of them once, but it's still just the same dog that has 20 times been stopped from peeing on 20 cars.
Matthias answer works (and is how it should be done) because you concretely make a new array each time
Numbers in blue are fabricated and not intended to represent the answers you should see printed; the concept being explained is that of linking to new array objects in memory
You'd be forgiven for thinking that a clone would be made, bcause it is for int. int is a value type, whcih means the value is copied when it's used:
int x = 1;
int y = x;
y = y + 1;
y is now 2, but x is still 1. It'd be pretty hard work to write C# if it wasn't this way i.e. if every time you incremented some int variable, every other variable that had touched the variable that it came from was also affected.. So I think it's perhaps intrinsically reasonable to assume that whenever an assignment of anything is made, changes that affect the value of the assigned variable don't affect earlier iterations of it.. but that's not the case. There's this clear divide between value types (types whose data is copied/cloned when they're assigned) and reference types (types whose data is not copied/cloned). While int is a value type (cloned), an int[] is a reference type (not cloned)
..and that's something you'll really need to get down with and remember
Roll on the what's ref/out for? query :D

Binary search slower, what am I doing wrong?

EDIT: so it looks like this is normal behavior, so can anyone just recommend a faster way to do these numerous intersections?
so my problem is this. I have 8000 lists (strings in each list). For each list (ranging from size 50 to 400), I'm comparing it to every other list and performing a calculation based on the intersection number. So I'll do
list1(intersect)list1= number
list1(intersect)list2= number
list1(intersect)list888= number
And I do this for every list. Previously, I had HashList and my code was essentially this: (well, I was actually searching through properties of an object, so I
had to modify the code a bit, but it's basically this:
I have my two versions below, but if anyone knows anything faster, please let me know!
Loop through AllLists, getting each list, starting with list1, and then do this:
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
foreach (string word in list1)
{
if (block.generator_list.Contains(word))
{
//simple integer count
}
}
}
// a little more code, but the same, but looping through the other list if it's smaller/bigger
Then I make the lists into regular lists, and applied Sort(), which changed my code to
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
for (int i = 0; i < list1_length; i++)
{
var test = list.BinarySearch(list1[i]);
if (test > -1)
{
//simple integer count
}
}
}
The first version takes about 6 seconds, the other one takes more than 20 (I just stop there cuz otherwise it would take more than a minute!!!) (and this is for a smallish subset of the data)
I'm sure there's a drastic mistake somewhere, but I can't find it.
Well I have tried three distinct methods for achieving this (assuming I understood the problem correctly). Please note I have used HashSet<int> in order to more easily generate random input.
setting up:
List<HashSet<int>> allSets = new List<HashSet<int>>();
Random rand = new Random();
for(int i = 0; i < 8000; ++i) {
HashSet<int> ints = new HashSet<int>();
for(int j = 0; j < rand.Next(50, 400); ++j) {
ints.Add(rand.Next(0, 1000));
}
allSets.Add(ints);
}
the three methods I checked (code is what runs in the inner loop):
the loop:
note that you are getting duplicated results in your code (intersecting set A with set B and later intersecting set B with set A).
It won't affect your performance thanks to the list length check you are doing. But iterating this way is clearer.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
}
}
first method:
used IEnumerable.Intersect() to get the intersection with the other list and checked IEnumerable.Count() to get the size of the intersection.
var intersect = allSets[i].Intersect(allSets[j]);
count = intersect.Count();
this was the slowest one averaging 177s
second method:
cloned the smaller set of the two sets I was intersecting, then used ISet.IntersectWith() and checked the resulting sets Count.
HashSet<int> intersect;
HashSet<int> intersectWith;
if(allSets[i].Count < allSets[j].Count) {
intersect = new HashSet<int>(allSets[i]);
intersectWith = allSets[j];
} else {
intersect = new HashSet<int>(allSets[j]);
intersectWith = allSets[i];
}
intersect.IntersectWith(intersectWith);
count = intersect.Count;
}
}
this one was slightly faster, averaging 154s
third method:
did something very similar to what you did iterated over the shorter set and checked ISet.Contains on the longer set.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
count = 0;
if(allSets[i].Count < allSets[j].Count) {
loopingSet = allSets[i];
containsSet = allSets[j];
} else {
loopingSet = allSets[j];
containsSet = allSets[i];
}
foreach(int k in loopingSet) {
if(containsSet.Contains(k)) {
++count;
}
}
}
}
this method was by far the fastest (as expected), averaging 66s
conclusion
the method you're using is the fastest of these three. I certainly can't think of a faster single threaded way to do this. Perhaps there is a better concurrent solution.
I've found that one of the most important considerations in iterating/searching any kind of collection is to choose the collection type very carefully. To iterate through a normal collection for your purposes will not be the most optimal. Try using something like:
System.Collections.Generic.HashSet<T>
Using the Contains() method while iterating over the shorter list of two (as you mentioned you're already doing) should give close to O(1) performance, the same as key lookups in the generic Dictionary type.

How do I parallelise this with or without PLINQ?

I have the following code snippet:
// Initialise rectangular matrix with [][] instead of [,]
double data[][] = new double[m];
for (int i = 0; i < m; i++)
data[i] = new double[n];
// Populate data[][] here...
// Code to run in parallel:
for (int i = 0; i < m; i++)
data[i] = Process(data[i]);
If this makes sense, I have a matrix of doubles. I need to apply a transformation to each individual row of the matrix. It is "embarrassingly parallel", as there is no connection for the data from one row to another.
If I do something like:
data.AsParallel().ForAll(row => { row = Process[row]; });
First of all, I don't know whether data.AsParallel() knows to only look at the first subscript, or if it will enumerate all m * n doubles. Secondly, since row is the element I'm enumerating over, I have no idea if I can change it like this - I suspect not.
So, with or without PLINQ, what is a good way to parallelise this loop in C#?
Here are two ways to do it:
data.AsParallel().ForAll(row =>
{
Process(row);
});
Parallel.For(0, data.Length, rowIndex =>
{
Process(data[rowIndex]);
});
In both cases, the one-dimensional array of doubles is passed by reference and modifying values in your Process method will modify the data array.

Which collection type to use?

I have a scenario where I have a list of classes, and I want to mix up the order. For example:
private List<Question> myQuestions = new List<Question>();
So, given that this is now populated with a set of data, I want to mix up the order. My first thought was to create a collection of integers numbered from 1 to myQuestions.Count, assign one at random to each question and then loop through them in order; however, I can’t seem to find a suitable collection type to use for this. An example of what I mean would be something like this:
for (int i = 0; i <= myQuestions.Count -1; i++)
tempCollection[i] = myQuestions[rnd.Next(myQuestions.Count-1)];
But I’m not sure what tempCollection should be – it just needs to be a single value that I can remove as I use it. Does anyone have any suggestions as to which type to use, or of a better way to do this?
I suggest you copy the results into a new List<Question> and then shuffle that list.
However, I would use a Fisher-Yates shuffle rather than the one you've given here. There are plenty of C# examples of that on this site.
For example, you might do:
// Don't create a new instance of Random each time. That's a detail
// for another question though.
Random rng = GetAppropriateRandomInstanceForThread();
List<Question> shuffled = new List<Question>(myQuestions);
for (int i = shuffled.Count - 1; i > 0; i--)
{
// Swap element "i" with a random earlier element it (or itself)
int swapIndex = rng.Next(i + 1);
Question tmp = shuffled[i];
shuffled[i] = shuffled[swapIndex];
shuffled[swapIndex] = tmp;
}
You could use Linq and order by a random value:
List<string> items = new List<string>();
items.Add("Foo");
items.Add("Bar");
items.Add("Baz");
foreach (string item in items.OrderBy(c => Guid.NewGuid()))
{
Console.WriteLine(item);
}
temp Collection should be same type as myQuestions.
I would also suggest a change in your code:
for (int i = 0; i <= myQuestions.Count -1; i++)
to
for (int i = 0; i < myQuestions.Count; i++)
Does the same thing, but this is how most programers do it so it will make your code simpler to read.

Categories

Resources