Take specific number of array first then process

Take specific number of array first then process - c#

I have this code below that:
InstanceCollection instances = this.MyService(typeID, referencesIDs);
My problem here is when the referencesIDs.Count() is greater than a specific count, it throws an error which is related to SQL.
Suggested to me is to call the this.MyService multiple times so it won't process many referencesIDs.
What is the way to do that? I am thinking of using a while loop like this:
while (referencesIDs.Count() != maxCount)
{
newReferencesIDs = referencesIDs.Take(500).ToArray();
instances = this.MyService(typeID, newReferencesIDs);
maxCount += newReferencesIDs.Count();
}
The problem that I can see here is that how can I remove the first 500 referencesIDs on the newReferencesIDs? Because if I won't remove the first 500 after the first loop, it will continue to add the referencesIDs.

Are you just looking to update the referencesIDs value? Something like this?:
referencesIDs = referencesIDs.Skip(500);
Then the next time you call .Take(500) on referencesIDs it'll get the next 500 values.
Conversely, without updating the referencesIDs variable, you can include the Skip in your loop. Something like this:
var pageSize = 500;
var skipCount = 0;
while(...)
{
newReferencesIDs = referencesIDs.Skip(skipCount).Take(pageSize).ToArray();
skipCount += pageSize;
...
}

My first choice would be to fix the service, if you have access to it. A SQL-specific error could be a result of an incomplete database configuration, or a poorly written SQL query on the server. For example, Oracle limits IN lists in SQL queries to about 1000 items by default, but your Oracle DBA should be able to re-configure this limit for you. Alternatively, server side programmers could rewrite their query to avoid hitting this limit in the first place.
If this does not work, you could split your list into blocks of max size that does not trigger the error, make multiple calls to the server, and combine the instances on your end, like this:
InstanceCollection instances = referencesIDs
.Select((id, index) => new {Id = id, Index = index})
.GroupBy(p => p.Index / 500) // 500 is the max number of IDs
.SelectMany(g => this.MyService(typeID, g.Select(item => item.Id).ToArray()))
.ToList();

If you want a general way of splitting lists into chunks, you can use something like:
/// <summary>
/// Split a source IEnumerable into smaller (more manageable) lists.
/// </summary>
public static IEnumerable<IList<TSource>>
SplitIntoChunks<TSource>(this IEnumerable<TSource> source, int chunkSize)
{
long i = 1;
var list = new List<TSource>();
foreach (var t in source)
{
list.Add(t);
if (i++ % chunkSize == 0)
{
yield return list;
list = new List<TSource>();
}
}
if (list.Count > 0)
yield return list;
}
And then you can use SelectMany to flatten results:
InstanceCollection instances = referencesIDs
.SplitIntoChunks(500)
.SelectMany(chunk => MyService(typeID, chunk))
.ToList();

Related

Is it possible to always take 3 objects and if only 2 exists it returns 3 but one has null values in?

I need to return the 3 latest elements in a collection... If use Linq e.g. .OrderByDescending(a => a.Year).Take(3) then this is fine as long as the collection contains at least 3 elements. What I want is for it always to return 3, so for example if there are only 2 items then the last item would be a blank/initialised element (ideally where I could configure what was returned)
Is this possible?

You can concatenate the sequence with another (lazily created) sequence of 3 elements:
var result = query
.OrderByDescending(a => a.Year)
.Concat(Enumerable.Range(0, 3).Select(_ => new ResultElement()))
.Take(3);
Or perhaps:
var result = query
.OrderByDescending(a => a.Year)
.Concat(Enumerable.Repeat(new ResultElement(), 3))
.Take(3);
(The latter will end up with duplicate references and will always create an empty element, so I'd probably recommend the former... but it depends on the context. You might want to use Enumerable.Repeat(null, 3) and handle null elements instead.)

You could write your own extension method:
public static IEnumerable<T> TakeAndCreate<T>(this IEnumerable<T> input, int amount, Func<T> defaultElement)
{
int counter = 0;
foreach(T element in input.Take(amount))
{
yield return element;
counter++;
}
for(int i = 0; i < amount - counter; i++)
{
yield return defaultElement.Invoke();
}
}
Usage is
var result = input.OrderByDescending(a => a.Year).TakeAndCreate(3, () => new ResultElement());
One advantage of this solution is that it will create new elements only if they are acutally needed, which might be good for performance if you have a lot of elements to be created or their creation is not trivial.
Online demo: https://dotnetfiddle.net/HHexGd

LINQ statements to do a nested list of DynamicTableEntities in chunks of 100

I am attempting to create batches of 100 records i can delete from Azure Table Storage. I found a great article on efficiently creating batches to delete table records here: https://blog.bitscry.com/2019/03/25/efficiently-deleting-rows-from-azure-table-storage/
and have followed this along. the issue i am facing that is different from the example in this blog post is that my deletes will have different partition keys. so rather than simply splitting my results into batches of 100 (as it does in the example) i first need to split them into groups of like partition keys, and THEN examine those lists, and further sub-divide them if the count is greater than 100 (as Azure recommends only batches of 100 records at a time, and they all require the same partition key)
Let me say i am TERRIBLE with enumerable LINQ and the non-query style that is described in this blog post so i'm a bit lost. i have written a small work around that does create these batches by the partition ID, and the code works to delete them, i just am not handling the possibility that there may be more than 100 rows to delete based on the partition key. So the code below is just used as an example to show you how i approached splitting the updates by partition key.
List<string> partitionKeys = toDeleteEntities.Select(x => x.PartitionKey).Distinct().ToList();
List<List<DynamicTableEntity>> chunks = new List<List<DynamicTableEntity>>();
for (int i = 0; i < partitionKeys.Count; ++i)
{
var count = toDeleteEntities.Where(x => x.PartitionKey == partitionKeys[i]).Count();
//still need to figure how to split by groups of 100.
chunks.Add(toDeleteEntities.Distinct().Where(x=>x.PartitionKey == partitionKeys[i]).ToList());
}
i have tried to do multiple groupby statements in a linq function similar to this
// Split into chunks of 100 for batching
List<List<TableEntity>> rowsChunked = tableQueryResult.Result.Select((x, index) => new { Index = index, Value = x })
.Where(x => x.Value != null)
.GroupBy(x => x.Index / 100)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
but once i add a second set parameters to group by (eg: x=>x.PartitionKey) then my select below starts to go pear shaped. The end result object is a LIST of LISTS that contain DyanmicTableEntities and an index
[0]
[0][Entity]
[1][Entity]
...
[99][Entity]
[1]
[0][Entity]
[1][Entity]
...
i hope this makes sense, if not please feel free to ask for clarification.
Thanks in advance.
EDIT FOR CLARIFICATION:
The idea is simply that i want to group by PARTITION Key AND only take 100 rows before creating another row of the SAME partition key and adding the rest of the rows
thanks,

One useful LINQ method you can use is GroupBy(keySelector). It basically divides your collection into groups based on a selector. So in your case, you'd probably want to group by PartitionKey:
var partitionGroups = toDeleteEntities.GroupBy(d => d.PartitionKey);
When you iterate through this collection, you'll get an IGrouping. Finally, to get the correct batch, you can use Skip(int count) and Take(int count)
foreach (var partitionGroup in partitionGroups)
{
var partitionKey = partitionGroup.Key;
int startPosition = 0;
int count = partitionGroup.Count();
while(count > 0)
{
int batchSize = count % maxBatchSize > 0 ? count % maxBatchSize : maxBatchSize;
var partitionBatch = partitionGroup.Skip(startPosition).Take(batchSize);
// process your batches here
chunks.Add(new List<DynamicTableEntry>(partitionBatch));
startPosition += batchSize;
count = count - batchSize;
}
}

Using LINQ to take the top 100 and bottom 100?

I would like to do something like this (below) but not sure if there is a formal/optimized syntax to do so?
.Orderby(i => i.Value1)
.Take("Bottom 100 & Top 100")
.Orderby(i => i.Value2);
basically, I want to sort by one variable, then take the top 100 and bottom 100, and then sort those results by another variable.
Any suggestions?

var sorted = list.OrderBy(i => i.Value);
var top100 = sorted.Take(100);
var last100 = sorted.Reverse().Take(100);
var result = top100.Concat(last100).OrderBy(i => i.Value2);
I don't know if you want Concat or Union at the end. Concat will combine all entries of both lists even if there are similar entries which would be the case if your original list contains less than 200 entries. Union would only add stuff from last100 that is not already in top100.
Some things that are not clear but that should be considered:
If list is an IQueryable to a db, it probably is advisable to use ToArray() or ToList(), e.g.
var sorted = list.OrderBy(i => i.Value).ToArray();
at the beginning. This way only one query to the database is done while the rest is done in memory.
The Reverse method is not optimized the way I hoped for, but it shouldn't be a problem, since ordering the list is the real deal here. For the record though, the skip method explained in other answers here is probably a little bit faster but needs to know the number of elements in list.
If list would be a LinkedList or another class implementing IList, the Reverse method could be done in an optimized way.

You can use an extension method like this:
public static IEnumerable<T> TakeFirstAndLast<T>(this IEnumerable<T> source, int count)
{
var first = new List<T>();
var last = new LinkedList<T>();
foreach (var item in source)
{
if (first.Count < count)
first.Add(item);
if (last.Count >= count)
last.RemoveFirst();
last.AddLast(item);
}
return first.Concat(last);
}
(I'm using a LinkedList<T> for last because it can remove items in O(1))
You can use it like this:
.Orderby(i => i.Value1)
.TakeFirstAndLast(100)
.Orderby(i => i.Value2);
Note that it doesn't handle the case where there are less then 200 items: if it's the case, you will get duplicates. You can remove them using Distinct if necessary.

Take the top 100 and bottom 100 separately and union them:
var tempresults = yourenumerable.OrderBy(i => i.Value1);
var results = tempresults.Take(100);
results = results.Union(tempresults.Skip(tempresults.Count() - 100).Take(100))
.OrderBy(i => i.Value2);

You can do it with in one statement also using this .Where overload, if you have the number of elements available:
var elements = ...
var count = elements.Length; // or .Count for list
var result = elements
.OrderBy(i => i.Value1)
.Where((v, i) => i < 100 || i >= count - 100)
.OrderBy(i => i.Value2)
.ToArray(); // evaluate
Here's how it works:
| first 100 elements | middle elements | last 100 elements |
i < 100 i < count - 100 i >= count - 100

You can write your own extension method like Take(), Skip() and other methods from Enumerable class. It will take the numbers of elements and the total length in list as input. Then it will return first and last N elements from the sequence.
var result = yourList.OrderBy(x => x.Value1)
.GetLastAndFirst(100, yourList.Length)
.OrderBy(x => x.Value2)
.ToList();
Here is the extension method:
public static class SOExtensions
{
public static IEnumerable<T> GetLastAndFirst<T>(
this IEnumerable<T> seq, int number, int totalLength
)
{
if (totalLength < number*2)
throw new Exception("List length must be >= (number * 2)");
using (var en = seq.GetEnumerator())
{
int i = 0;
while (en.MoveNext())
{
i++;
if (i <= number || i >= totalLength - number)
yield return en.Current;
}
}
}
}

Join large list of Integers into LINQ Query

I have LINQ query that returns me the following error:
"The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100".
All I need is to count all clients that have BirthDate that I have their ID's in list.
My list of client ID's could be huge (millions of records).
Here is the query:
List<int> allClients = GetClientIDs();
int total = context.Clients.Where(x => allClients.Contains(x.ClientID) && x.BirthDate != null).Count();
When the query is rewritten this way,
int total = context
.Clients
.Count(x => allClients.Contains(x.ClientID) && x.BirthDate != null);
it causes the same error.
Also tried to make it in different way and it eats all memory:
List<int> allClients = GetClientIDs();
total = (from x in allClients.AsQueryable()
join y in context.Clients
on x equals y.ClientID
where y.BirthDate != null
select x).Count();

We ran into this same issue at work. The problem is that list.Contains() creates a WHERE column IN (val1, val2, ... valN) statement, so you're limited to how many values you can put in there. What we ended up doing was in fact do it in batches much like you did.
However, I think I can offer you a cleaner and more elegant piece of code to do this with. Here is an extension method that will be added to the other Linq methods you normally use:
public static IEnumerable<IEnumerable<T>> BulkForEach<T>(this IEnumerable<T> list, int size = 1000)
{
for (int index = 0; index < list.Count() / size + 1; index++)
{
IEnumerable<T> returnVal = list.Skip(index * size).Take(size).ToList();
yield return returnVal;
}
}
Then you use it like this:
foreach (var item in list.BulkForEach())
{
// Do logic here. item is an IEnumerable<T> (in your case, int)
}
EDIT
Or, if you prefer, you can make it act like the normal List.ForEach() like this:
public static void BulkForEach<T>(this IEnumerable<T> list, Action<IEnumerable<T>> action, int size = 1000)
{
for (int index = 0; index < list.Count() / size + 1; index++)
{
IEnumerable<T> returnVal = list.Skip(index * size).Take(size).ToList();
action.Invoke(returnVal);
}
}
Used like this:
list.BulkForEach(p => { /* Do logic */ });

Well as Gert Arnold mentioned before, making query in chunks solves the problem, but it looks nasty:
List<int> allClients = GetClientIDs();
int total = 0;
const int sqlLimit = 2000;
int iterations = allClients.Count() / sqlLimit;
for (int i = 0; i <= iterations; i++)
{
List<int> tempList = allClients.Skip(i * sqlLimit).Take(sqlLimit).ToList();
int thisTotal = context.Clients.Count(x => tempList.Contains(x.ClientID) && x.BirthDate != null);
total = total + thisTotal;
}

As has been said above, your query is probably being translated to:
select count(1)
from Clients
where ClientID = #id1 or ClientID = #id2 -- and so on up to the number of ids returned by GetClientIDs.
You will need to change your query such that you aren't passing so many parameters to it.
To see the generated SQL you can set the Clients.Log = Console.Out which will cause it to be written to the debug window when it is executed.
EDIT:
A possible alternative to chunking would be to send the IDs to the server as a delimited string, and create a UDF in your database which can covert that string back to a list.
var clientIds = string.Jon(",", allClients);
var total = (from client in context.Clients
join clientIds in context.udf_SplitString(clientIds)
on client.ClientId equals clientIds.Id
select client).Count();
There are lots of examples on Google for UDFs that split strings.

Another alternative and probably the fastest at query time is to add your numbers from the CSV file into a temporary table in your database and then do a join query.
Doing a query in chunks means a lot of round-trips between your client and database. If the list of IDs you are interested in is static or changes rarely, I recommend the approach of a temporary table.

If you don't mind moving the work from the database to the application server and have the memory, try this.
int total = context.Clients.AsEnumerable().Where(x => allClients.Contains(x.ClientID) && x.BirthDate != null).Count();

Multiple SUM using LINQ

I have a loop like the following, can I do the same using multiple SUM?
foreach (var detail in ArticleLedgerEntries.Where(pd => pd.LedgerEntryType == LedgerEntryTypeTypes.Unload &&
pd.InventoryType == InventoryTypes.Finished))
{
weight += detail.GrossWeight;
length += detail.Length;
items += detail.NrDistaff;
}

Technically speaking, what you have is probably the most efficient way to do what you are asking. However, you could create an extension method on IEnumerable<T> called Each that might make it simpler:
public static class EnumerableExtensions
{
public static void Each<T>(this IEnumerable<T> col, Action<T> itemWorker)
{
foreach (var item in col)
{
itemWorker(item);
}
}
}
And call it like so:
// Declare variables in parent scope
double weight;
double length;
int items;
ArticleLedgerEntries
.Where(
pd =>
pd.LedgerEntryType == LedgerEntryTypeTypes.Unload &&
pd.InventoryType == InventoryTypes.Finished
)
.Each(
pd =>
{
// Close around variables defined in parent scope
weight += pd.GrossWeight;
lenght += pd.Length;
items += pd.NrDistaff;
}
);
UPDATE:
Just one additional note. The above example relies on a closure. The variables weight, length, and items should be declared in a parent scope, allowing them to persist beyond each call to the itemWorker action. I've updated the example to reflect this for clarity sake.

You can call Sum three times, but it will be slower because it will make three loops.
For example:
var list = ArticleLedgerEntries.Where(pd => pd.LedgerEntryType == LedgerEntryTypeTypes.Unload
&& pd.InventoryType == InventoryTypes.Finished))
var totalWeight = list.Sum(pd => pd.GrossWeight);
var totalLength = list.Sum(pd => pd.Length);
var items = list.Sum(pd => pd.NrDistaff);
Because of delayed execution, it will also re-evaluate the Where call every time, although that's not such an issue in your case. This could be avoided by calling ToArray, but that will cause an array allocation. (And it would still run three loops)
However, unless you have a very large number of entries or are running this code in a tight loop, you don't need to worry about performance.
EDIT: If you really want to use LINQ, you could misuse Aggregate, like this:
int totalWeight, totalLength, items;
list.Aggregate((a, b) => {
weight += detail.GrossWeight;
length += detail.Length;
items += detail.NrDistaff;
return a;
});
This is phenomenally ugly code, but should perform almost as well as a straight loop.
You could also sum in the accumulator, (see example below), but this would allocate a temporary object for every item in your list, which is a dumb idea. (Anonymous types are immutable)
var totals = list.Aggregate(
new { Weight = 0, Length = 0, Items = 0},
(t, pd) => new {
Weight = t.Weight + pd.GrossWeight,
Length = t.Length + pd.Length,
Items = t.Items + pd.NrDistaff
}
);

You could also group by true - 1 (which is actually including any of the items and then have them counted or summered):
var results = from x in ArticleLedgerEntries
group x by 1
into aggregatedTable
select new
{
SumOfWeight = aggregatedTable.Sum(y => y.weight),
SumOfLength = aggregatedTable.Sum(y => y.Length),
SumOfNrDistaff = aggregatedTable.Sum(y => y.NrDistaff)
};
As far as Running time, it is almost as good as the loop (with a constant addition).

You'd be able to do this pivot-style, using the answer in this topic: Is it possible to Pivot data using LINQ?

Ok. I realize that there isn't an easy way to do this using LINQ. I'll take may foreach loop because I understood that it isn't so bad. Thanks to all of you

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Take specific number of array first then process - c#

Related

Is it possible to always take 3 objects and if only 2 exists it returns 3 but one has null values in?

LINQ statements to do a nested list of DynamicTableEntities in chunks of 100

Using LINQ to take the top 100 and bottom 100?

Join large list of Integers into LINQ Query

Multiple SUM using LINQ

Categories

Resources