Getting array subsets efficiently - c#

Is there an efficient way to take a subset of a C# array and pass it to another peice of code (without modifying the original array)? I use CUDA.net which has a function which copies an array to the GPU. I would like to e.g. pass the function a 10th of the array and thus copy each 10th of the array to the GPU seperately (for pipelining purposes).
Copying the array in this way should be as efficient as copying it in one go. It can be done with unsafe code and just referencing the proper memory location but other than that I'm not sure. The CopyTo function copies the entire array to another array so this does not appear useful.

Okay, I'd misunderstood the question before.
What you want is System.Buffer.BlockCopy or System.Array.Copy.
The LINQ ways will be hideously inefficient. If you're able to reuse the buffer you're copying into, that will also help the efficiency, avoiding creating a new array each time - just copy over the top. Unless you can divide your "big" array up equally though, you'll need a new one for the last case.

I'm not sure how efficient this is but...
int[] myInts = new int[100];
//Code to populate original arrray
for (int i = 0; i < myInts.Length; i += 10)
{
int[] newarray = myInts.Skip(i).Take(10).ToArray();
//Do stuff with new array
}

You could try Marshal.Copy if you need to go from an array of bytes to an unmanaged pointer. That avoids creating unsafe code yourself.
Edit: This would clearly only work if you reimplement their API. Sorry - misunderstood. You want an efficient subarray method.
It strikes me that what you really want is an api in the original class of the form
void CopyToGpu(byte[] source, int start, int length);

You could use extension methods and yield return:
public static IEnumerable Part<T>(this T[] array, int startIndex, int endIndex )
{
for ( var currentIndex = startIndex; currentIndex < endIndex; ++currentIndex )
yield return array[currentIndex];
}

Related

Algorithm for generating a nearly sorted list on predefined data

Note: This is part 1 of a 2 part question.
Part 2 here
I'm wanting to more about sorting algorithms and what better way to do than then to code! So I figure I need some data to work with.
My approach to creating some "standard" data will be as follows: create a set number of items, not sure how large to make it but I want to have fun and make my computer groan a little bit :D
Once I have that list, I'll push it into a text file and just read off that to run my algorithms against. I should have a total of 4 text files filled with the same data but just sorted differently to run my algorithms against (see below).
Correct me if I'm wrong but I believe I need 4 different types of scenarios to profile my algorithms.
Randomly sorted data (for this I'm going to use the knuth shuffle)
Reversed data (easy enough)
Nearly sorted (not sure how to implement this)
Few unique (once again not sure how to approach this)
This question is for generating a nearly sorted list.
Which approach is best to generate a nearly sorted list on predefined data?
To "shuffle" a sorted list to make it "almost sorted":
Create a list of functions you can think of which you can apply to parts of the array, like:
Negate(array, startIndex, endIndex);
Reverse(array, startIndex, endIndex);
Swap(array, startIndex, endIndex);
For i from zero to some function of the array's length (e.g. Log(array.Length):
Randomly choose 2 integers*
Randomly choose a function from the functions you thought of
Apply that function to those indices of the array
*Note: The integers should not be constricted to the array size. Rather, choose random integers and "wrap" around the array -- that way the elements near the ends will have the same chance of being modified as the elements in the middle.
Answering my own question here. All this does is taking a sorted list and shuffling up small sections of it.
public static T[] ShuffleBagSort<T>(T[] array, int shuffleSize)
{
Random r = _random;
for (int i = 0; i < array.Length; i += shuffleSize)
{
//Prevents us from getting index out of bounds, while still getting a shuffle of the
//last set of un shuffled array, but breaks for loop if the number of unshuffled array is 1
if (i + shuffleSize > array.Length)
{
shuffleSize = array.Length - i;
if (shuffleSize <= 1) // should never be less than 1, don't think that's possible lol
continue;
}
if (i % shuffleSize == 0)
{
for (int j = i; j < i + shuffleSize; j++)
{
// Pick random element to swap from our small section of the array.
int k = r.Next(i, i + shuffleSize);
// Swap.
T tmp = array[k];
array[k] = array[j];
array[j] = tmp;
}
}
}
return array;
}
Sort the array.
Start sorting it in descending order with bubble sort
Stop after a few iterations (depending how much 'dis-sorted' you want the array to be
Add some randomness (each time when bubblesort wants to swap two elements toss a coin and perform that operation or not depending on the result, or use a different probability than 50/50 for that)
This will give you an array which will be roughly equally modified across its whole range, preserving most of the order (the begining will hold the least elements, the end the greatest). That's because the changes performed by bubblesort with a random test will be rather local. It won't mix the whole array at once so much that it wouldn't resemble the original.
If you want to you can also completely randomly shuffle whole parts of the array (but keep the parts not to big because, you'll completely loose the ordering).
Or you may also randomly swap whole sorted parts of the array. That would be an interesing test case, for example:
[1,2,3,4,5,6,7,8] -> [1,2,6,7,8,3,4,5]
The almost sorted list is the reason why Timsort (python) is so efficient in the real world is because data is typically "almost sorted" . There is an article about it explaining the math behind the entropy of data.

C# array of objects, very large, looking for a better way

Ok, so In one of my projects, im trying to remake the way it stores certain variables, i have a simple array of objects. The class that these objects refer to is:
class Blocks
{
public byte type = Block.Empty;
byte lastblock = Block.Zero;
}
I plan to add more to it, in the current class type is what the objects current value is, and lastblock is what the object used to be.
i create the array like this:
blocks = new Blocks[width * depth * length];
for (int i = 0; i < ((width * length) * depth); i++)
{
blocks[i] = new Blocks();
}
The problem that I'm having, is that when i create an array that's very large (512,512,512 or 134217728 for those of you whom don't like math), the array gets huge, upwards of 3.5 gigs.
The old way that this array was created was a little simpler, but much harder to expand, it simple created an array of bytes representing the current block, and seems to only use 2 megs of ram when its actually loaded (which i dont get since 134217728 bytes should be around 134 megs... right?). It just baffles me that object references could generate that much more ram usage.
Am i doing something wrong, or should i just go back to the old way it was done? I would like the object references simply because it means that all my variables are in 1 array rather then in 4 separate arrays, which seems as though it would be better for the system.
EDIT:
After working through a few different ways of doing this, i found that changing
class Blocks
to
struct Blocks
Made a world of difference, Thank you community for that welcome tip for future use, unfortunately i didn't want to just add two bytes in a struct and call it done, that was where i stopped to test my design and wound up with the original problem. After adding anything else to the struct (anything else that's on my list at least, meaning either a player object reference or a player name string.) It causes an out-of-memory exception that means I'm not going to be able to use this system after all.
But the knowledge of how to do it will be quite helpful in the future. For that, thank you again.
Here's what I tried with a structure type instead of a class type:
public struct Block
{
public byte _current;
public byte _last;
}
public static void RunSnippet()
{
Block[] blocks = new Block[512 * 512 * 512];
for (int i = 0; i < ((512 * 512) * 512); i++)
{
blocks[i] = new Block();
}
}
The snippet ran almost instantly and ate around 267 Mb of RAM.
So give struct a try if that's possible.
You can use List class to manage unlimited number of objects. Please take a look at the link that I provided. You can add infinite (well, theoretically) number of objects into a list.
Using lists, you can easily access any item by their index. It also has methods to search, sort and manipulate objects contained in it.
If you use list, your code will look somewhat like below -
List<Blocks> blocks = new List<Blocks>();
for (int i = 0; i < ((width * length) * depth); i++) // The amount of items that you want to add
{
Blocks b = new Blocks();
blocks.Add(b);
}
You can access every item in this list as follows -
foreach(Blocks b in blocks)
{
// You have the object, do whatever you want
}
You can find any particular index of an object contained in the list. See this method example.
So using a list, you will be able to easily manage a large number of objects in an uniform way.
To learn more, go here.
You should consider using "struct" instead of "class".
http://msdn.microsoft.com/en-us/library/ah19swz4(v=vs.71).aspx
"The struct type is suitable for representing lightweight objects such as Point, Rectangle, and Color. Although it is possible to represent a point as a class, a struct is more efficient in some scenarios. For example, if you declare an array of 1000 Point objects, you will allocate additional memory for referencing each object. In this case, the struct is less expensive."
Please publish your results if you try so.
When you create the array, you also instantiate a Blocks on each cell. Do you really need to do that?
blocks[i] = new Blocks();
When you don't instantiate a Blocks, you'd just have an array of empty references. In the access code you could check for null and return a default value. Something along these lines:
if(blocks[i,j] == null) return new Blocks();
else return blocks[i,j];
When writing also check if there is one, if not, create it first. That should save a lot of memory.
Using jagged arrays or nested lists should also help a lot.
Regards GJ
Structs are the way forward here, and would open up the possibility of optimising with unsafe code/pointer arithmetic
struct Block
{
byte Current;
byte Last;
}
Block[] blocks = new Block[512 * 512 * 512];
unsafe
{
Block* currentBlock = &blocks;
for (int i = 0; i < (512 * 512) * 512; i++)
{
currentBlock->Current = 0xff;
currentBlock->Last = 0x00;
currentBlock++;
}
}
Of course someone will come along and say mutable structs are evil! (Only if you dont know how to use them)
Read this: Object Overhead: The Hidden .NET Memory Allocation Cost.
The total cost of an object of yours is like 16 bytes (on a 32 bits system). (8 bytes for "header", 4 bytes for your fields, 4 for the reference) And 512 * 512 * 512 * 16 = 2.1gb :-)
But you are probably on a 64 bits system, so it's 16 + 8 + 8 = 28, so 4.29gb.

C# linked lists

very basic question, but is there any ToArray-like function for c# linked lists that would return an array of only part of the elements in the linkedlist.
e.g.: let's say my list has 50 items and I need an array of only the first 20. I really want to avoid for loops.
Thanks,
PM
Use Linq?
myLinkedList.Take(20).ToArray()
or
myLinkedList.Skip(5).Take(20).ToArray()
You say you "really want to avoid for loops" - why?
If you're using .NET 3.5 (or have LINQBridge), it's really easy:
var array = list.Take(20).ToArray();
... but obviously that will have to loop internally.
Note that this will create a smaller array if the original linked list has fewer than 20 elements. It's unclear whether or not that's what you want.
Something is going to have to loop internally, sooner or later - it's not like there's going to be a dedicated CPU instruction for "navigate this linked list and copy a fixed number of pointers into a new array". So the question is really whether you do it or a library method.
If you can't use LINQ, it's pretty easy to write the equivalent code yourself:
int size = Math.Min(list.Count, 20);
MyType[] array = new MyType[size];
var node = list.First;
for (int i = 0; i < size; i++)
{
array[i] = node.Value;
node = node.Next;
}
That will actually be slightly more efficient than the LINQ approach, too, because it creates the array to be exactly the right size to start with. Yes, it uses a loop - but as I say, something's got to.
If you're using the LinkedList collection class (from System.Collections.Generic), you can use LINQ to get it:
var myArray = list.Take(20).ToArray();

Is it correct to use Array.CopyTo to copy elements or should a for-loop always be used?

It's easier to write
intArray1.CopyTo( intArray2, 0 )
than the for-loop equivalent, but System.Array does not provide any generic Copy/CopyTo methods.
Is it better to write the for-loop? Or is using Copy/CopyTo compiled or JIT'd efficiently enough?
Array.Copy/CopyTo will perform faster than a manual loop in most cases as it can do direct memory copying.
If you don't have huge arrays or speed is not an issue, use whatever would look best in your code where you need to copy the items.
If you are copying an array of primitive types as your sample would imply, you can us the memory copy technique yourself using the Buffer classes BlockCopy method.
int[] CopyArray(int[] A, int index)
{
const int INT_SIZE = 4;
int length = A.Length - index;
int[] B = new int[A.Length - index];
Buffer.BlockCopy(A, index * INT_SIZE, B,
0 * INT_SIZE, length * INT_SIZE);
return B;
}
This method is the most efficient manner in which to copy an array of primitives. (It only works with primitives)
I say if you know that you want to copy the entirety of the first array to the second array without changing the values or doing any specific processing on the copy, then use Array.CopyTo.
There are some limitations to this. The array must only have a single dimension as I remember it. Also if the arrays are quite large you might have some speed related issues with the copyto, but I would imagine that would only come into play with very large arrays. So, I would try it and test it, but your mileage may vary.

Quickest way to determine if a 2D array contains an element?

Let's assume that I've got 2d array like :
int[,] my_array = new int[100, 100];
The array is filled with ints. What would be the quickest way to check if a target-value element is contained within the array ?
(* this is not homework, I'm trying to come up with most efficient solution for this case)
If the array isn't sorted in some fashion, I don't see how anything would be faster than checking every single value using two for statements. If it is sorted you can use a binary search.
Edit:
If you need to do this repeatedly, your approach would depend on the data. If the integers within this array range only up to 256, you can have a boolean array of that length, and go through the values in your data flipping the bits inside the boolean array. If the integers can range higher you can use a HashSet. The first call to your contains function would be a little slow because it would have to index the data. But subsequent calls would be O(1).
Edit1:
This will index the data on the first run, benchmarking found that the Contains takes 0 milliseconds to run after the first run, 13 to index. If I had more time I might multithread it and have it return the result, while asynchronously continuing indexing on the first call. Also since arrays are reference types, changing the value of data passed before or after it has been indexed will provide strange functionality, so this is just a sample and should be refactored prior to use.
private class DataContainer
{
private readonly int[,] _data;
private HashSet<int> _index;
public DataContainer(int[,] data)
{
_data = data;
}
public bool Contains(int value)
{
if (_index == null)
{
_index = new HashSet<int>();
for (int i = 0; i < _data.GetLength(0); i++)
{
for (int j = 0; j < _data.GetLength(1); j++)
{
_index.Add(_data[i, j]);
}
}
}
return _index.Contains(value);
}
}
Assumptions:
There is no kind of ordering in the arrays we can take advantage of
You are going to check for existence in the array several times
I think some kind of index might work nicely. If you want a yes/no answer if a given number is in the array. A hash table could be used for this, giving you a constant O(k) for lookups.
Also don't forget that realistically, for small MxN array sizes, it might actually be faster just to do a linear O(n) lookup.
create a hash out of the 2d array, where
1 --> 1st row
2 --> 2nd row
...
n --> nth row
O(n) to check the presence of a given element, assuming each hash check gives O(1).
This data structure gives you an opportunity to preserve your 2d array.
upd: ignore the above, it does not give any value. See comments
You could encapsulate the data itself, and keep a Dictionary along with it that gets modified as the data gets modified.
The key of the Dictionary would be the target element value, and the value would be the number of entries of the element. To test if an element exists, simply check the dictionary for a count > 0, which is somewhere between O(1) and O(n). You could also get other statistics on the data much quicker with this construct, particularly if the data is sparse.
The biggest drawback to this solution is that data modifications have more operations involved (still should be O(1), though), so if you're mostly doing data manipulation, then this might not be suitable.

Categories

Resources