Searching List<string> based on string match

Searching List<string> based on string match - c#

I have the following c# List<string>
var lists = new List<string>
{
"a", "b", "c", "ee", "ja"
}
I now want to find the index of the last item whose alphanumeric value is less than or equal to d, which in this case would be 2 - which represents "c"
Can anyone suggest how I can do this? It needs to be fast as it will be searching large lists.
Is there also a way to do the same comparison for the closest match to "ef" or any set of multiple characters
EDIT - I know I could write a for loop to do this, but is there any other way to do this? Maybe a built in function.
I know if it was a numeric function I could use Linq.

You want FindLastIndex
var index = lists.FindLastIndex(value => value.CompareTo("d") < 0);
NOTE: You have to use CompareTo as < doesn't exist for strings.

You'll get great performance by using the BinarySearch method, under the condition that your List is sorted. If it isn't, then don't use this method because you'll get incorrect results.
// List.BinarySearch returns:
// The zero-based index of item in the sorted System.Collections.Generic.List`1,
// if item is found; otherwise, a negative number that is the bitwise complement
// of the index of the next element that is larger than item or, if there is no
// larger element, the bitwise complement of System.Collections.Generic.List`1.Count.
int pos = lists.BinarySearch("d");
int resultPos = pos >= 0 ? pos : ~pos - 1;
Console.WriteLine("Result: " + resultPos);

Related

Please suggest different approach for this CountNumbers algorithm

Implement function CountNumbers that accepts a sorted array of unique integers and counts the number of array elements that are less than the parameter lessThan
For example, SortedSearch.CountNumbers(new int[] { 1, 3, 5, 7 }, 4) should return 2 because there are two array elements less than 4.
Below is my approach. But the score given by online tool for this is 50%. What am i missing?
using System;
public class SortedSearch
{
public static int CountNumbers(int[] sortedArray, int lessThan)
{
int iRes=0;
for (int i=0; i<sortedArray.Length; i++)
{
if(sortedArray[i]< lessThan)
{
iRes=iRes+1;
}
}
return iRes;
}
public static void Main(string[] args)
{
Console.WriteLine(SortedSearch.CountNumbers(new int[] { 1, 3, 5, 7 }, 4));
}
}

Your current solution takes up to O(N) where N is the size of array. You could leverage the fact that your input array is sorted to decrease the time complexity of the solution to by using BinarySearch:
public static int CountNumbers(int[] sortedArray, int lessThan)
{
var result = Array.BinarySearch(sortedArray, lessThan);
return result >= 0 ? result : -1 * result - 1;
}
Why do the strange -1 * result - 1 code? Because, as per the docs:
Returns
The index of the specified value in the specified array, if value is
found; otherwise, a negative number. If value is not found and value
is less than one or more elements in array, the negative number
returned is the bitwise complement of the index of the first element
that is larger than value. If value is not found and value is greater
than all elements in array, the negative number returned is the
bitwise complement of (the index of the last element plus 1). If this
method is called with a non-sorted array, the return value can be
incorrect and a negative number could be returned, even if value is
present in array.
result - 1 reverses the "bitwise complement of (the index of the last element plus 1)".
BinarySearch will generally perform faster than Where or TakeWhile particularly over large sets of data - since it will perform a binary search.
From wikipedia:
In computer science, binary search, also known as half-interval
search, logarithmic search, or binary chop, is a search algorithm that
finds the position of a target value within a sorted array.
The clue to use a binary search is the "accepts a sorted array of unique integers" part of the requirement. My above solution only works, as is, with a sorted array of unique values. It thus seems to me that whomever wrote the online test likely had binary search in mind.

You could make use of Linq for the purpose.
int CountNumbers(IEnumerable<int> source,int limit)
{
return source.TakeWhile(x=>x<limit).Count();
}
Since it is already mentioned in OP that the input array is sorted, you can exit your search when you find the first element greater than the limit. TakeWhile method would help you in the same.
The above method, would select all elements while the condition is met, and finds the count of items.
Example.
var result = CountNumbers(new int[] {1, 3, 5, 7},4);
Output : 2

Is there a fast extraction of specific value located inside tags in a slice from a list of strings possible?

I'm using a 2 step regex to extract the value of the first occurance of a specific marker inside a list of strings:
Regex regexComplete = new Regex(
#"MyNumberMarker"
+ #"[\d]+"
+ #"[\s]+Endmarker"
);
Regex regexOnlyNumber = new Regex(
#"MyNumberMarker"
+ #"[\d]+"
);
int indexmyNumber = eintraegeListe.FindIndex(
5,
10000,
x => regexComplete.IsMatch(x)
);
if (indexmyNumber >= 0)
{
int myNumber = 0;
string myNumberString = regexOnlyNumber.Match(regexComplete.Match(eintraegeListe[indexmyNumber]).Value).Value;
myNumberString = myNumberString.Replace("MyNumberMarker", "").Replace("\n", "").Replace("\r", "").Trim();
if (Int32.TryParse(myNumberString, out myNumber))
{
return myNumber;
}
}
As one can see the value I really want is located between "MyNumberMarker" and "Endmarker". It is in a specific part of the list which I search through with the findIndex command. Then I use regex to extract the complete value + tag and reduce it to "just" the begin tag and the value and then manually cut away the begin tag and all could be white spaces (including \n and \r).
Now this works quite fine as intended but if I do this a couple of thousand times it is quite slow in the end. Thus my question.
Is there any better (faster) way to do this?
As a note: eintraegeListe can have between 100 and 30000 entries.
For example if I have the following small list:
[0]This is a test
[1]22.09.2015 01:00:00
[2]Until 22.09.2015 03:00:00
[3]................................
[4]................................
[5]........ TESTDATA
[6]...............................
[7]................................
[8]MyNumberMarker519 Endmarker
[9]This is a small
[10]Slice of Test data with
[11]520 - 1 as data.
I would expect 519 to be returned.

Since you are returning a single item, the performance of code past FindIndex is irrelevant: it is executed only once, and it takes a single string, so it should complete in microseconds on any modern hardware.
The code that takes the bulk of CPU is in x => regexComplete.IsMatch(x) call. You can tell that this code is returning false most of the time, because the loop is over the first time it returns true.
This means that you should be optimizing for the negative case, i.e. returning false as soon as you can. One way to achieve this would be to look for "MyNumberMarker" before employing regex. If there is no marker, return false right away; otherwise, fall back on using the regex, and start from the position where you found the marker:
int indexmyNumber = eintraegeListe.FindIndex(
5,
10000,
x => {
// Scan the string for the marker in non-regex mode
int pos = x.IndexOf("MyNumberMarker", StringComparison.Ordinal);
// If the marker is not there, do not bother with regex, and return false
return pos < 0
? false
// Only if the marker is there, check the match with regex.
: regexComplete.IsMatch(x, pos);
}
);

You can actually merge the two regexps into 1 containing a capturing group that will let you access the sequence of digits directly via the group name (here, "number").
Regex regexComplete = new Regex(#"MyNumberMarker(?<number>\d+)\s+Endmarker");
Now, you do not need regexOnlyNumber.
Then, you can add a non-regex condition as in the other answer. Maybe this will be enough (the .Contains will be executed first and the whole expression should evaluate to false if the first condition is not met - see "short-circuit" evaluation) (IndexOf with StringComparison.Ordinal looks preferable anyway):
int indexmyNumber = eintraegeListe.FindIndex(5, 10000, x => x.Contains("MyNumberMarker") && regexComplete.IsMatch(x));
And then:
if (indexmyNumber >= 0)
{
int myNumber = 0;
string myNumberString = regexComplete.Match(eintraegeListe[indexmyNumber]).Groups["number"].Value;
if (Int32.TryParse(myNumberString, out myNumber))
{
return myNumber;
}
}

LINQ Query Determine Input is in List Boundaries?

I have a List of longs from a DB query. The total number in the List is always an even number, but the quantity of items can be in the hundreds.
List item [0] is the lower boundary of a "good range", item [1] is the upper boundary of that range. A numeric range between item [1] and item [2] is considered "a bad range".
Sample:
var seekset = new SortedList();
var skd= 500;
while( skd< 1000000 )
{
seekset.Add(skd, 0);
skd = skd+ 100;
}
If an input number is compared to the List items, if the input number is between 500-600 or 700-800 it is considered "good", but if it is between 600-700 it is considered "bad".
Using the above sample, can anyone comment on the right/fast way to determine if the number 655 is a "bad" number, ie not within any good range boundary (C#, .NET 4.5)?
If a SortedList is not the proper container for this (eg it needs to be an array), I have no problem changing, the object is static (lower case "s") once it is populated but can be destroyed/repopulated by other threads at any time.

The following works, assuming the list is already sorted and both of each pair of limits are treated as "good" values:
public static bool IsGood<T>(List<T> list, T value)
{
int index = list.BinarySearch(value);
return index >= 0 || index % 2 == 0;
}

If you only have a few hundred items then it's really not that bad. You can just use a regular List and do a linear search to find the item. If the index of the first larger item is even then it's no good, if it's odd then it's good:
var index = data.Select((n, i) => new { n, i })
.SkipWhile(item => someValue < item.n)
.First().i;
bool isValid = index % 2 == 1;
If you have enough items that a linear search isn't desirable then you can use a BinarySearch to find the next largest item.
var searchValue = data.BinarySearch(someValue);
if (searchValue < 0)
searchValue = ~searchValue;
bool isValid = searchValue % 2 == 1;

I am thinking that LINQ may not be best suited for this problem because IEnumerable forgets about item[0] when it is ready to process item[1].
Yes, this is freshman CS, but the fastest in this case may be just
// untested code
Boolean found = false;
for(int i=0; i<seekset.Count; i+=2)
{
if (valueOfInterest >= seekset[i] &&
valueOfInterest <= seekset[i+1])
{
found = true;
break; // or return;
}
}
I apologize for not directly answering your question about "Best approach in Linq", but I sense that you are really asking about best approach for performance.

Comparing arrays and returning a similarity figure C#

I am trying to return a number that represents the similarity between two arrays.
I.e :
Array1: {Katy, Jenny, Sarah, Ben, Jill, Tina}
Array2: {Katy, John, Sam, Ben, Jill, Linda}
I want to return the number 3 because three comparisons are correct. Is this
possible? I can't think of any functions that will do this for me.

This is how you can count the amount of items that are equal in matching indices.
var c = arr1.Where((x, i) => x.Equals(arr2[i])).Count();
Note that you might want to assure that you don't try to access arr2 in an index that is out of range:
var c = arr1.Take(arr2.Length).Count(...);
If you don't care about index positions, you should use nemesv's solution.

There are many ways to do this. Since others have already specified a few ways, I will try to post a different way of doing the same.
If you consider matching based on index, you can do something like this using Zip
var cnt = 0;
Array1.Zip(Array2,(a,b)=>{
if(a.Equals(b)) ++cnt;
return string.Empty; //we dont need this
}).Count(); // use tolist or count to force evaluation
If you don't care about ordering and are just concerned about matching, you can use Intersect
Array1.Intersect(Array2).Count()

The way I would approach this problem is too take the value in the first array and compare it with every other value in the second array. If they match than increase a compare counter and that will tell you their are three comparisons that match.

This works for me:
var array1 = new string[] {"Katy", "Jenny", "Sarah", "Ben", "Jill", "Tina"};
var array2 = new string[] {"Katy", "John", "Sam", "Ben", "Jill", "Linda"};
var similarity = (array1.Length + array2.Length) - array1.Union(array2).Count();
Edit: Oh just saw you want them to be in the same position.

You're saying "According to index", assuming you mean that if "John" is on position 1 in the first list, and on position 2 on the second list => no match.
In that case:
int maxItems = Math.Min(arr1.Length, arr2.Length);
int matchCount = 0;
for(int i = 0; i < maxItems; i++)
{
if(object.Equals(arr1[i], arr2[i]))
matchCount++;
}

I'd do it like this:
int count = array1.Zip(array2, (a, b) => a.Equals(b)).Count(b => b);
The zip part returns an IEnumerable<bool> and the count part count how many times true occurs in that list.

I have a sorted list of key/value pairs, and want to find the values adjacent to a new key

I have a list of key/value pairs (probably will be using a SortedList) and I won't be adding new values.
Instead I will be using new keys to get bounding values. For example if I have the following key/value pairs:
(0,100) (6, 200), (9, 150), (15, 100), (20, 300)
and I have the new key of 7, I want it to return 200 and 150, because 7 is between 6 and 9.
If I give 15 I want it to return 100 and 100 (because 15 is exactly 15). I want something like a binary search.
Thanks

You can do this with List<T>.BinarySearch:
var keys = new List<int>(sortedList.Keys);
int index = keys.BinarySearch(target);
int lower;
int upper;
if (index >= 0) {
lower = upper = index;
}
else {
index = ~index;
upper = index < keys.Count ? index : index - 1;
lower = index == 0 ? index : index - 1;
}
Console.WriteLine("{0} => {1}, {2}",
target, sortedList[keys[lower]], sortedList[keys[upper]]);
You have to use the return value of List<T>.BinarySearch to get to the boundary values. From msdn, its return value is:
"The zero-based index of item in the sorted List<T>, if item is found; otherwise, a negative number that is the bitwise complement of the index of the next element that is larger than item or, if there is no larger element, the bitwise complement of Count."
Also, for elements that fall below the first or beyond the last, this code "returns" the first and the last twice, respectively. This might not be what you want, but it's up to you to define your boundary conditions. Another one is if the collection is empty, which I didn't address.

Yep, you want exactly binary search -- use the List<t>.BinarySearch method, specifically the overload taking a IComparer second argument (and implement that interface with a simple aux class that just compares keys).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Searching List<string> based on string match - c#

You want FindLastIndex var index = lists.FindLastIndex(value => value.CompareTo("d") < 0); NOTE: You have to use CompareTo as < doesn't exist for strings.

Related

Please suggest different approach for this CountNumbers algorithm

Is there a fast extraction of specific value located inside tags in a slice from a list of strings possible?

LINQ Query Determine Input is in List Boundaries?

Comparing arrays and returning a similarity figure C#

I have a sorted list of key/value pairs, and want to find the values adjacent to a new key

Categories

Resources