IQueryable<T> first free spot - c#

What's the most efficient way to get a first free spot from sorted query returning int collection ?
eg.: {1,2,3,4,6} | result.: 5
At the moment I am using foreach and counter that compares current value from sorted quety.ToList()
which takes about 600ms on 100 000 records.

Unless you multithread it, reading one at a time is your fastest solution since this is an O(n) problem.

I am not sure how you would write it in LINQ, but I suppose a binary search could be faster in this way - starting from the middle, comparing the index with value - if they are equal, continue in the right half, otherwise in the left half etc.
Even if you're starting from an index start_index which is not 1, you can simply compare the value with index increased by start_index.

Why do you use ToList()? Turning it to a list renders the whole IEnumerable idea absolute and you might lose some performance there. Why not iterate with foreach over the IQueryable and find the first missing memeber?

Related

Pure Speed for Lookup Single Value Type c#?

.NET 4.5.1
I have a "bunch" of Int16 values that fit in a range from -4 to 32760. The numbers in the range are not consecutive, but they are ordered from -4 to 32760. In other words, the numbers from 16-302 are not in the "bunch", but numbers 303-400 are in there, number 2102 is not there, etc.
What is the all-out fastest way to determine if a particular value (eg 18400) is in the "bunch"? Right now it is in an Int16[] and the Linq Contains method is used to determine if a value is in the array, but if anyone can say why/how a different structure would deliver a single value faster I would appreciate it. Speed is the key for this lookup (the "bunch" is a static property on a static class).
Sample code that works
Int16[] someShorts = new[] { (short)4 ,(short) 5 , (short)6};
var isInIt = someShorts.Contains( (short)4 );
I am not sure if that is the most performant thing that can be done.
Thanks.
It sounds like you really want BitArray - just offset the value by 4 so you've got a range of [0, 32764] and you should be fine.
That will allocate an array which is effectively 4K in size (32764 / 8), with one bit per value in the array. It will handle finding the relevant element in the array, and applying bit masking. (I don't know whether it uses a byte[] internally or something else.)
This is a potentially less compact representation than storing ranges, but the only cost involved in getting/setting a bit will be computing an index (basically a shift), getting the relevant bit of memory to the CPU, and then bit masking. It takes 1/8th the size of a bool[], making your CPU cache usage more efficient.
Of course, if this is really a performance bottleneck for you, you should compare both this solution and a bool[] approach in your real application - microbenchmarks aren't nearly as important here as how your real app behaves.
Make one bool for each possible value:
var isPresentItems = new bool[32760-(-4)+1];
Set the corresponding element to true if the given item is present in the set. Lookup is easy:
var isPresent = isPresentItems[myIndex];
Can't be done any faster. The bools will fit into L1 or L2 cache.
I advise against using BitArray because it stores multiple values per byte. This means that each access is slower. Bit-arithmetic is required.
And if you want insane speed, don't make LINQ call a delegate once for each item. LINQ is not the first choice for performance-critical code. Many indirections that stall the CPU.
If you want to optimize for lookup time, pick a data structure with O(1) (constant-time) lookups. You have several choices since you only care about set membership, and not sorting or ordering.
A HashSet<Int16> will give this to you, as will a BitArray indexed on max - min + 1. The absolute fastest ad-hoc solution would probably be a simple array indexed on max - min + 1, as #usr suggests. Any of these should be plenty "fast enough". The HashSet<Int16> will probably use the most memory, as the size of the internal hash table is an implementation detail. BitArray would be the most space efficient out of these options.
If you only have a single lookup, then memory should not be a concern, and I suggest first going with a HashSet<Int16>. That solution is easy to reason about and deal with in a bug-free manner, as you don't have to worry about staying within array boundaries; you can simply check set.Contains(n). This is particularly useful if your value range might change in the future. You can fall back to one of the other solutions if you need to optimize further for speed or performance.
One option is to use the HashSet. To find if the value is in it, it is a O(1) operation
The code example:
HashSet<Int16> evenNumbers = new HashSet<Int16>();
for (Int16 i = 0; i < 20; i++)
{
evenNumbers.Add(i);
}
if (evenNumbers.Contains(0))
{
/////
}
Because the numbers are sorted, I would loop through the list one time and generate a list of Range objects that have a start and end number. That list would be much smaller than having a list or dictionary of thousands of numbers.
If your "bunch" of numbers can be identified as a series of intervals, I suggest you use Interval Trees. An interval tree allows dynamic insertion/deletions and also searching if a an interval intersects any interval in the tree is O(log(n)) where n is the number of intervals in the tree. In your case the number of intervals would be way less than the number of ints and the search is much faster.

geting a new sorted list from a list, i only need x items and dont want to sort the whole list before getting the top items

I have a class
public class Entity : IComparable<Entity>
{
public float Priority { get; set; }
{
I create a list and fill it with Y items that are in no particular order
list <Entity> = get_unorderd_list();
now i want to sort the list according to the Priority value , but I only care about getting the highest X items in the right order, for performance reasons I don't want to use a regular .sort() method as X is a lot smaller then Y.
Should I write a custom sorting method? Or is there a way to do this?
edit:not talking about geting a single velue by using .max()
Should I write a custom sorting method?
I don't know of a way of doing it easily from .NET itself. When I implemented sorting for Edulinq, I took this sort of approach - the whole ordering is a quick-sort, but I only sort as much as I need to in order to return the results so far.
Now, that's still a general approach - if you know how many results you need beforehand, you can probably do a lot better. You might want to build a heap-based solution, where you have a bounded heap (max size X) and then iterate over your input, adding values into your heap as you go. As soon as you've filled up the first X elements, you can start discarding new ones which are smaller than your smallest heap element, without even looking at the rest of the tree. For other elements, you perform the normal operations to insert the element, maintaining the ordering within the heap.
See the Wikipedia article on binary heaps stored as arrays for more information about how you might structure your heap.
The trouble is the last item in your unsorted list could be the highest item in X so you will have to iterate over the whole of Y i should think.
if you mean MAX simply use linq Max to find highest priority item.you can not write more efficient method than Max,cause you must compare all items in the list to find max Anyway.
var highestPeriorityItem = unorderd_list.Max(x=>x.Periority);
EDIT:
there is another way more efficient than this , that is keeping list sorted from very beginning.that mean you must keep list sorted with each insert(insert item in sorted order). this way time complexity of finding max item is O(1) and time complexity of inserting new item is O(1) < x < O(N).
hope this helps.

C# LINQ First() faster than ToArray()[0]?

I am running a test.
It looks like:
method 1)
List<int> = new List<int>{1,2,4, .....} //assume 1000k
var result ErrorCodes.Where(x => ReturnedErrorCodes.Contains(x)).First();
method 2)
List<int> = new List<int>{1,2,4, .....} //assume 1000k
var result = ErrorCodes.Where(x => ReturnedErrorCodes.Contains(x)).ToArray()[0];
Why is method 2 is so slow compared to method 1?
You have a jar containing a thousand coins, many of which are dimes. You want a dime. Here are two methods for solving your problem:
Pull coins out of the jar, one at a time, until you get a dime. Now you've got a dime.
Pull coins out of the jar, one at a time, putting the dimes in another jar. If that jar turns out to be too small, move them, one at a time, to a larger jar. Keep on doing that until you have all of the dimes in the final jar. That jar is probably too big. Manufacture a jar that is exactly big enough to hold that many dimes, and then move the dimes, one at a time, to the new jar. Now start taking dimes out of that jar. Take out the first one. Now you've got a dime.
Is it now clear why method 1 is a whole lot faster than method 2?
Erm... because you are creating an extra array (rather than just using the iterator). The first approach stops after the first match (Where is a non-buffered streaming API). The second loads all the matches into an array (presumably with several re-sizes), then takes the first item.
As a side note; you can create infinite sequences; the first approach would still work, the second would run forever (or explode).
It could also be:
var result ErrorCodes.First(x => ReturnedErrorCodes.Contains(x));
(that won't make it any faster, but is perhaps easier to read)
Because of deferred execution.
The code ErrorCodes.Where(x => ReturnedErrorCodes.Contains(x)) doesn't return a collection of integers, instead it returns an expression that is capable of returning a stream of integers. It doesn't do any actual work until you start reading integers from it.
The ToArray method will consume the entire stream and put all the integers in an array. This means that every item in the entire list has to be compared to the error codes.
The First method on the other hand will only get the first item from the stream, and then stop reading from the stream. This will make it a lot faster, because it will stop comparing items from the list to the error codes as soon as it finds a match.
Because ToArray() copies the entire sequence to an array.
Method 2 has to iterate over the whole sequence to build an array, and then returns the first element.
Method 1 just iterates over enough of the sequence to find the first matching element.
ToArray() walks through the whole sequence it has been given and creates and array out of it.
If you don't callt ToArray(), First() lets Where() return just the first item that matches and immediatelly returns.
First() is complexity of O(1)
ToArray()[0] is complexity O(n)+1
var #e = array.GetEnumerator();
// First
#e.MoveNext();
return #e.Current;
// ToArray (with yield [0] should as fast as First...)
while (#e.MoveNext() {
yield return #e.Current;
}
Because in the second example, you are actually converting the IEnumerable<T> to an array, whereas in the first example, no conversion is taking place.
In method 2 the entire array must be converted to an array first. Also, it seems awkward to mix array access when First() is so much more readable.
This makes sense, ToArray probably involves a copy, which is always going to be more expensive, since Linq can't make any guarantees about how you're going to use your array, while First() can just return the single element at the beginning.

In C#, is there a kind of a SortedList<double> that allows fast querying (with LINQ) for the nearest value?

I am looking for a structure that holds a sorted set of double values. I want to query this set to find the closest value to a specified reference value.
I have looked at the SortedList<double, double>, and it does quite well for me. However, since I do not need explicit key/value pairs. this seems to be overkill to me, and i wonder if i could do faster.
Conditions:
The structure is initialised only once, and does never change (no insert/deletes)
The amount of values is in the range of 100k.
The structure is queried often with new references, which must execute fast.
For simplicity and speed, the set's value just below of the reference may be returned, not actually the nearest value
I want to use LINQ for the query, if possible, for simplicity of code.
I want to use no 3rd party code if possible. .NET 3.5 is available.
Speed is more importand than memory footprint
I currently use the following code, where SortedValues is the aforementioned SortedList
IEnumerable<double> nearest = from item in SortedValues.Keys
where item <= suggestion
select item;
return nearest.ElementAt(nearest.Count() - 1);
Can I do faster?
Also I am not 100% percent sure, if this code is really safe. IEnumerable, the return type of my query is not by definition sorted anymore. However, a Unit test with a large test data base has shown that it is in practice, so this works for me. Have you hints regarding this aspect?
P.S. I know that there are many similar questions, but none actually answers my specific needs. Especially there is this one C# Data Structure Like Dictionary But Without A Value, but the questioner does just want to check the existence not find anything.
The way you are doing it is incredibly slow as it must search from the beginning of the list each time giving O(n) performance.
A better way is to put the elements into a List and then sort the list. You say you don't need to change the contents once initialized, so sorting once is enough.
Then you can use List<T>.BinarySearch to find elements or to find the insertion point of an element if it doesn't already exist in the list.
From the docs:
Return Value
The zero-based index of
item in the sorted List<T>,
if item is found; otherwise, a
negative number that is the bitwise
complement of the index of the next
element that is larger than item or,
if there is no larger element, the
bitwise complement of Count.
Once you have the insertion point, you need to check the elements on either side to see which is closest.
Might not be useful to you right now, but .Net 4 has a SortedSet class in the BCL.
I think it can be more elegant as follows:
In case your items are not sorted:
double nearest = values.OrderBy(x => x.Key).Last(x => x.Key <= requestedValue);
In case your items are sorted, you may omit the OrderBy call...

How can I sort an array of strings?

I have a list of input words separated by comma. I want to sort these words by alphabetical and length. How can I do this without using the built-in sorting functions?
Good question!! Sorting is probably the most important concept to learn as an up-and-coming computer scientist.
There are actually lots of different algorithms for sorting a list.
When you break all of those algorithms down, the most fundamental operation is the comparison of two items in the list, defining their "natural order".
For example, in order to sort a list of integers, I'd need a function that tells me, given any two integers X and Y whether X is less than, equal to, or greater than Y.
For your strings, you'll need the same thing: a function that tells you which of the strings has the "lesser" or "greater" value, or whether they're equal.
Traditionally, these "comparator" functions look something like this:
int CompareStrings(String a, String b) {
if (a < b)
return -1;
else if (a > b)
return 1;
else
return 0;
}
I've left out some of the details (like, how do you compute whether a is less than or greater than b? clue: iterate through the characters), but that's the basic skeleton of any comparison function. It returns a value less than zero if the first element is smaller and a value greater than zero if the first element is greater, returning zero if the elements have equal value.
But what does that have to do with sorting?
A sort routing will call that function for pairs of elements in your list, using the result of the function to figure out how to rearrange the items into a sorted list. The comparison function defines the "natural order", and the "sorting algorithm" defines the logic for calling and responding to the results of the comparison function.
Each algorithm is like a big-picture strategy for guaranteeing that ANY input will be correctly sorted. Here are a few of the algorithms that you'll probably want to know about:
Bubble Sort:
Iterate through the list, calling the comparison function for all adjacent pairs of elements. Whenever you get a result greater than zero (meaning that the first element is larger than the second one), swap the two values. Then move on to the next pair. When you get to the end of the list, if you didn't have to swap ANY pairs, then congratulations, the list is sorted! If you DID have to perform any swaps, go back to the beginning and start over. Repeat this process until there are no more swaps.
NOTE: this is usually not a very efficient way to sort a list, because in the worst cases, it might require you to scan the whole list as many as N times, for a list with N elements.
Merge Sort:
This is one of the most popular divide-and-conquer algorithms for sorting a list. The basic idea is that, if you have two already-sorted lists, it's easy to merge them. Just start from the beginning of each list and remove the first element of whichever list has the smallest starting value. Repeat this process until you've consumed all the items from both lists, and then you're done!
1 4 8 10
2 5 7 9
------------ becomes ------------>
1 2 4 5 7 8 9 10
But what if you don't have two sorted lists? What if you have just one list, and its elements are in random order?
That's the clever thing about merge sort. You can break any single list into smaller pieces, each of which is either an unsorted list, a sorted list, or a single element (which, if you thing about it, is actually a sorted list, with length = 1).
So the first step in a merge sort algorithm is to divide your overall list into smaller and smaller sub lists, At the tiniest levels (where each list only has one or two elements), they're very easy to sort. And once sorted, it's easy to merge any two adjacent sorted lists into a larger sorted list containing all the elements of the two sub lists.
NOTE: This algorithm is much better than the bubble sort method, described above, in terms of its worst-case-scenario efficiency. I won't go into a detailed explanation (which involves some fairly trivial math, but would take some time to explain), but the quick reason for the increased efficiency is that this algorithm breaks its problem into ideal-sized chunks and then merges the results of those chunks. The bubble sort algorithm tackles the whole thing at once, so it doesn't get the benefit of "divide-and-conquer".
Those are just two algorithms for sorting a list, but there are a lot of other interesting techniques, each with its own advantages and disadvantages: Quick Sort, Radix Sort, Selection Sort, Heap Sort, Shell Sort, and Bucket Sort.
The internet is overflowing with interesting information about sorting. Here's a good place to start:
http://en.wikipedia.org/wiki/Sorting_algorithms
Create a console application and paste this into the Program.cs as the body of the class.
public static void Main(string[] args)
{
string [] strList = "a,b,c,d,e,f,a,a,b".Split(new [] { ',' }, StringSplitOptions.RemoveEmptyEntries);
foreach(string s in strList.Sort())
Console.WriteLine(s);
}
public static string [] Sort(this string [] strList)
{
return strList.OrderBy(i => i).ToArray();
}
Notice that I do use a built in method, OrderBy. As other answers point out there are many different sort algorithms you could implement there and I think my code snippet does everything for you except the actual sort algorithm.
Some C# specific sorting tutorials
There is an entire area of study built around sorting algorithms. You may want to choose a simple one and implement it.
Though it won't be the most performant, it shouldn't take you too long to implement a bubble sort.
If you don't want to use build-in-functions, you have to create one by your self. I would recommend Bubble sort or some similar algorithm. Bubble sort is not an effective algoritm, but it get the works done, and is easy to understand.
You will find much good reading on wikipedia.
I would recommend doing a wiki for quicksort.
Still not sure why you don't want to use the built in sort?
Bubble sort damages the brain.
Insertion sort is at least as simple to understand and code, and is actually useful in practice (for very small data sets, and nearly-sorted data). It works like this:
Suppose that the first n items are already in order (you can start with n = 1, since obviously one thing on its own is "in the correct order").
Take the (n+1)th item in your array. Call this the "pivot". Starting with the nth item and working down:
- if it is bigger than the pivot, move it one space to the right (to create a "gap" to the left of it).
- otherwise, leave it in place, put the "pivot" one space to the right of it (that is, in the "gap" if you moved anything, or where it started if you moved nothing), and stop.
Now the first n+1 items in the array are in order, because the pivot is to the right of everything smaller than it, and to the left of everything bigger than it. Since you started with n items in order, that's progress.
Repeat, with n increasing by 1 at each step, until you've processed the whole list.
This corresponds to one way that you might physically put a series of folders into a filing cabinet in order: put one in; then put another one into its correct position by pushing everything that belongs after it over by one space to make room; repeat until finished. Nobody ever sorts physical objects by bubble sort, so it's a mystery to me why it's considered "simple".
All that's left now is that you need to be able to work out, given two strings, whether the first is greater than the second. I'm not quite sure what you mean by "alphabetical and length" : alphabetical order is done by comparing one character at a time from each string. If there not the same, that's your order. If they are the same, look at the next one, unless you're out of characters in one of the strings, in which case that's the one that's "smaller".
Use NSort
I ran across the NSort library a couple of years ago in the book Windows Developer Power Tools. The NSort library implements a number of sorting algorithms. The main advantage to using something like NSort over writing your own sorting is that is is already tested and optimized.
Posting link to fast string sort code in C#:
http://www.codeproject.com/KB/cs/fast_string_sort.aspx
Another point:
The suggested comparator above is not recommended for non-English languages:
int CompareStrings(String a, String b) {
if (a < b) return -1;
else if (a > b)
return 1; else
return 0; }
Checkout this link for non-English language sort:
http://msdn.microsoft.com/en-us/goglobal/bb688122
And as mentioned, use nsort for really gigantic arrays that don't fit in memory.

Categories

Resources