break up array into little arrays - c#

i am sending out email to a list of people. I have the list of recipients in array but the list can get up to 500 people. There is a limitation on the number of recipients that my mail server sends out at once (50 recipients)
so if the list is > 50 i need to break it up in to different mails.
What is the best way to take one array and break it up into arrays of 50
for example:
if array is 120 long, i would expect 3 arrays returned, one with 50, another with 50 and a third with 20.

You could use the Batch operation from MoreLINQ:
Person[] array = ...;
var arrays = list.Batch(50).Select(x = x.ToArray());
foreach (Person[] shorterArray in arrays)
{
...
}
(If you're happy with IEnumerable<Person> instead of arrays, you don't need the Select call of course.)

Maybe ArraySegment<T> works for you? You'd have to split it up manually though, but this is not hard in a loop.
int recipient = 0;
while (recipient < recipients.Count) {
ArraySegment<string> recipientSegment = new ArraySegment<string>(recipients, recipient, Math.Min(50, recipients.Count-recipient));
// build your message here, using the recipientSegment for the names
recipient += 50;
}

I would simply iterate over the complete array, building up the recipients string, then sending out an email when the limit is reached, then resetting the string and continuing on with the iteration until the next limit event or until the end of the array is reached.

If you can use LINQ when you may find this useful: Linq: How to group by maximum number of items

Shouldn't LINQ be the right stuff for this?

A common method for "paging" results from a set is to combine the Skip and Take methods provided by LINQ. This solution is great because it can be further combined with other LINQ methods to implement filtering, ordering, etc. as needed.
I'm not sure what the performance considerations are for your application, so keep in mind that this may not perform very well for sets where the number of pages is relatively large (i.e., batch size is significantly smaller than the total size of the set), but it's at least fairly straightforward for anyone familiar with this style of coding.
Here's an example of what this implementation might look like:
List<EmailAddress> list = new List<EmailAddress>();
const int BATCH_SIZE = 50;
for (int i = 0; i < list.Count; i += BATCH_SIZE)
{
IEnumerable<EmailAddress> currentBatch =
list.Skip(i).Take(BATCH_SIZE);
// do stuff...
}

Related

Growing and shrinking a List<int> vs a big sized bool Array using the index as value

I'm unable to determine whether having a growing and shrinking List vs using a big bool Array will be more efficient for my application.
To expand on this comparison and the actual situation, here are examples of each option I believe I have:
Option 1 (List):
public List<int> list = new List<int>();
while (true) { // game loop
list.Add(Random.Range(0-300));
list.Add(Random.Range(0-300));
... // maximum of 10 of these can happen
if (list.Contains(42)) { // roughly 10 - 50 of these checks can be true
list.Remove(42);
}
}
Option 2 (Array):
bool[] arr = new bool[300];
while (true) { // game loop
arr[Random.Range(0-300)] = true;
arr[Random.Range(0-300)] = true;
... // maximum of 10 of these can happen
for (int i = 0; i < 300; i++) {
if (arr[i]) { // roughly 10 - 50 of these checks can be true
arr[i] = false;
}
}
}
So essentially my question is:
At what point does too many .Contains checks become more expensive than a for loop over each possible element (based on my ranges)?
IMPORTANT
This is not a List vs Array question. The datatypes are important because of the condition checks. So it is specifically an integer list vs bool array comparison since these two options can give me the same results.
I would say the array implementation would be much faster. In addition to the cost of resizing the array internally when you call List.Add(T) or List.Remove(T), if you check the List implementation code. You will notice the List.Contains(T) and List.Remove(T) both are using IndexOf(T) in which I believe is having looping/iteration through the list internally. In your example, you want to call List.Contains(T) and List.Remove(T) around 10-50 times. It means at best case it will cost you 20 (contains+remove), but in the worst case it will cost you (N * 50) + N where N is the number of items in your list.
With this information, I could conclude if your list growing bigger, the performance will much worse.
If you're looking more into performance, maybe it's worth taking a look at HashSet data structure. It has much better performance in look up and remove operations than a List.
Here's an interesting writeup on Array vs List for both for, foreach, EnumerableForEach and Sum by Jon Skeet:
https://codeblog.jonskeet.uk/2009/01/29/for-vs-foreach-on-arrays-and-lists/
As per the article, the performance goes like this:
============ int[] ============
For 1.00
ForHoistLength 2.03
ForEach 1.36
IEnumerableForEach 15.22
Enumerable.Sum 15.73
============ List<int> ============
For 2.82
ForHoistLength 3.49
ForEach 4.78
IEnumerableForEach 25.71
Enumerable.Sum 26.03
Results can be quantified over like int array for a for loop is 2.8 times faster. If you know the size of an array and its fixed, go with Array, else List.
Here is another link: Performance of Arrays vs. Lists
and also, stay away from Linq for large data and go with for/foreach loops.

IEnumerable<string> and string[]

Is there any advantage to using this
private static IEnumerable<string> GetColumnNames(this IDataReader reader)
{
for (int i = 0; i < reader.FieldCount; i++)
yield return reader.GetName(i);
}
instead of this
private static string[] GetColumnNames(this IDataReader reader)
{
var columnNames = new string[reader.FieldCount];
for (int i = 0; i < reader.FieldCount; i++)
columnNames[i] = reader.GetName(i);
return columnNames;
}
Here is how I use this method
int orderId = _noOrdinal;
IEnumerable<string> columnNames = reader.GetColumnNames();
if (columnNames.Contains("OrderId"))
orderId = reader.GetOrdinal("OrderId");
while (reader.Read())
yield return new BEContractB2CService
{
//..................
Order = new BEOrder
{ Id = orderId == _noOrdinal ?
Guid.Empty : reader.GetGuid(orderId)
},
//............................
The two approaches are quite different so it depends on what you are subsequently going to do with the result I would say.
For example:
The first case requires the data reader to remain open until the result is read, the second doesn't. So how long are you going to hold this result for and do you want to leave the data reader open that long.
The first case is less performant if you are definitely going to read the data, but probably more performant if you often don't, particularly if there is a lot of data.
The result from your first case should only be read/iterated/searched once. Then second case can be stored and searched multiple times.
If you have a large amount of data then the first case could be used in such a way that you don't need to bring all that data in to memory in one go. But again that really depends on what you do with the IEnumerable in the calling method.
Edit:
Given your use-case the methods are probably pretty much equivalent for any given measure of 'good-ness'. Tables don't tend to have many columns, and your use of .Contains ensures the data will be read every time. Personally I would stick with the array method here if only because it's a more straightforward approach.
What's the next line of the code... is it looking for a different column name? If so the second case is the only way to go.
On reason off the top of my head: The array version means you have to spend time building the array first. Most of the code's clients may not necessarily need a specific array. Mostly, i've found, that most code is just going to iterate over it in which case, why waste time building an array (or list as an alternative) you never actually need.
The first one is lazy. That is your code is not evaluated until you iterate the enumerable and because you use closures it will run the code until it yields a value then turn control back to the calling code until you iterate to the next value via MoveNext. Additionally with linq you can achieve the second one by calling the first and then calling ToArray. The reason you might want to do this is to make sure you get the data as it is when you make the call versus when you iterate in case the values change in between.
One advantage has to do with memory consumption. If FieldCount is say 1 million, then the latter needs to allocate an array with 1 million entries, while the former does not.
This benefit depends on how the method is consumed though. For example, if you are processing a list of files one-by-one, then there is no need to know all the files up front.

How do I insert an int into a sorted array quickly?

I'd like to insert an int into a sorted array. This operation is going to be performed very often, so it needs to be as fast as possible.
It is possible and even preferred to use a List or any other class instead of an array
All values are in the 1 to 34 range
The array typically contains exactly 14 values
I was thinking of many different approaches, including binary search and simple insert-on-copy, but found it hard to decide. Also, I felt like I missed an idea. Do you have experiences on this topic or any new ideas to consider?
I will use an int array whose length is 35(because you said range 1-34) to record the status of the numbers.
int[] status = Enumerable.Repeat(0, 35).ToArray();
//an array contains 35 zeros
//which means currently there is no elements in the array
status[10] = 1; // now the array have only one number: 10
status[11] ++; // a new number 11 is added to the list
So if you want to add a number i to the list:
status[i]++; // O(1) to add a number
To remove an i from the list:
status[i]--; // O(1) to remove a number
Want to know all the numebrs in the list?
for (int i = 0; i < status.Length; i++)
{
if (status[i] > 0)
{
for (int j = 0; j < status[i]; j++)
Console.WriteLine(i);
}
}
//or more easier using LINQ
var result = status.SelectMany((i, index) => Enumerable.Repeat(index, i));
The following example may help you understand my code better:
the real number array: 1 12 12 15 9 34 // i don't care if it's sorted
the status array: status[1]=1,status[12]=2,status[15]=1,status[9]=1,status[34]=1
all others are 0
At 14 values this is a pretty small array, I don't think switching to a smarter data structure such as a list will win you much, especially if you fast good random access. Even binary search may actually be slower than linear search at this scale. Are you sure that, say, insert-on-copy does not satisfy your performance requirements?
This operation is going to be performed very often, so it needs to be as fast as possible.
The things that you notice happen "very often" are frequently not the bottlenecks in the program - it's often surprising what the actual bottlenecks are. You should code something simple and measure the actual performance of your program before performing any optimizations.
I was thinking of many different approaches, including binary search and simple insert-on-copy, but found it hard to decide.
Assuming that this is the bottleneck, the big-O performance of the different methods is not going to be relevant here because of the small size of your array. It is easier to just try a few different approaches, measure the results, see which performs best and choose that method. If you have followed the advice from the first paragraph you already have a profiler setup that you can use for this step too.
For inserting into the middle, a LinkedList<int> would be the fastest option - anything else involves copying data. At 14 elements, don't stress over binary search etc - just walk forwards to the item you want:
using System;
using System.Collections.Generic;
static class Program
{
static void Main()
{
LinkedList<int> data = new LinkedList<int>();
Random rand = new Random(12345);
for (int i = 0; i < 20; i++)
{
data.InsertSortedValue(rand.Next(300));
}
foreach (int i in data) Console.WriteLine(i);
}
}
static class LinkedListExtensions {
public static void InsertSortedValue(this LinkedList<int> list, int value)
{
LinkedListNode<int> node = list.First, next;
if (node == null || node.Value > value)
{
list.AddFirst(value);
}
else
{
while ((next = node.Next) != null && next.Value < value)
node = next;
list.AddAfter(node, value);
}
}
}
Doing the brute-force approach is the best decision here because 14 isn't a number :). However, this is not a scalable decision, since should 14 become 14000 one day that will cause problems
What is the most common operation with your array?
Insert? Read?
Heap data structure will give you O(log(14)) for both of them. SortedDictionary may hit your performance.
Using a simple array will give you O(1) for reading and O(14) for insert.
By the way, have you tried System.Collections.Generic.SortedDictionary ot System.Collections.Generic.SortedList?
If you're on .Net 4 you should take a look at the SortedSet<T>. Otherwise take a look at SortedDictionary<TKey, TValue> where you make TValue as object and just put null into it, cause you're just interested into the keys.
If there is no repeated value on the array and the possible values won´t change maybe a fixed size array where the value is equal to the index is a good choice
Both insert and read are O(1)
You have a range of possible values from 1-34 which is rather narrow. So the fastest way would likely be using an array with 34 slots. To insert a number n just do array[n-1]++ and to remove it do array[n.1]-- (if n>0).
To check if a value exists in your collection you do array[n-1]>0.
edit: Damn...Danny was faster. :)
Write a method takes an array of integers and sorts them in place using Bubble Sort. The method is not allowed to create any additional arrays. Bubble Sort is a simple sorting algorithm that works by looping through the array to be sorted, comparing each pair of adjacent elements and swapping them if they are in the wrong order.

C# linked lists

very basic question, but is there any ToArray-like function for c# linked lists that would return an array of only part of the elements in the linkedlist.
e.g.: let's say my list has 50 items and I need an array of only the first 20. I really want to avoid for loops.
Thanks,
PM
Use Linq?
myLinkedList.Take(20).ToArray()
or
myLinkedList.Skip(5).Take(20).ToArray()
You say you "really want to avoid for loops" - why?
If you're using .NET 3.5 (or have LINQBridge), it's really easy:
var array = list.Take(20).ToArray();
... but obviously that will have to loop internally.
Note that this will create a smaller array if the original linked list has fewer than 20 elements. It's unclear whether or not that's what you want.
Something is going to have to loop internally, sooner or later - it's not like there's going to be a dedicated CPU instruction for "navigate this linked list and copy a fixed number of pointers into a new array". So the question is really whether you do it or a library method.
If you can't use LINQ, it's pretty easy to write the equivalent code yourself:
int size = Math.Min(list.Count, 20);
MyType[] array = new MyType[size];
var node = list.First;
for (int i = 0; i < size; i++)
{
array[i] = node.Value;
node = node.Next;
}
That will actually be slightly more efficient than the LINQ approach, too, because it creates the array to be exactly the right size to start with. Yes, it uses a loop - but as I say, something's got to.
If you're using the LinkedList collection class (from System.Collections.Generic), you can use LINQ to get it:
var myArray = list.Take(20).ToArray();

Array Question. Any ideas on how to solve my problem?

I have around 1000 array elements. The elements consist of 1 user ID per element.
What I would like to do is shorten this array so each element contains 10 user ID's per element and each user ID per element is delimited by a comma.
Current array:
324234
2342234
0983242
....
New Array:
324234,2342342,234234234,234234,5436436,457456,456456,234234,234234,546456436
34234,23423426,54645654,34532423,23423432,4634634,2342342,234234,264353,345345
....
You may be asking WHY DO THAT?!
Well, I am sending the ID's to a post request. The request accepts either 1 ID or more. To shorten the number of requests I would like to send along 10 ID's per request.
Any ideas on how to group the old array into the new shorter array?
Then why don't you send them all once as a CSV ?
Later edit:
Note: code not tested
int amountPerCall = 10;
List<string> ids = new ArrayList<string>();
// add ids...
for (int i = 0 ; i < ids.Count; i += amountPerCall) {
Send(String.Join(",", ids, i, amountPerCall));
}
You could use MoreLINQ's Batch method:
IEnumerable<string> batches = input.Batch(10, ids => string.Join(",", ids));
If you don't want to use MoreLINQ itself, you can just look at the code to get an idea for what it does.
(Depending on which version of .NET you're using, you may need to convert the input to string.Join into an array. .NET 4 has introduced some helpful new overloads to that method.)

Categories

Resources