Faster way of finding all indices of specific string in an array

Faster way of finding all indices of specific string in an array - c#

Below code is used to find all indices of a string that might occur only once in an array but the code isn't very fast. Does somebody know a faster and more efficient way to find unique strings in an array?
using System;
using System.Collections.Generic;
using System.Linq;
public static class EM
{
// Extension method, using Linq to find indices.
public static int[] FindAllIndicesOf<T>(this IEnumerable<T> values, T val)
{
return values.Select((b,i) => Equals(b, val) ? i : -1).Where(i => i != -1).ToArray();
}
}
public class Program
{
public static string FindFirstUniqueName(string[] names)
{
var results = new List<string>();
for (var i = 0; i < names.Length; i++)
{
var matchedIndices = names.FindAllIndicesOf(names[i]);
if (matchedIndices.Length == 1)
{
results.Add(names[matchedIndices[0]]);
break;
}
}
return results.Count > 0 ? results[0] : null;
}
public static void Main(string[] args)
{
Console.WriteLine("Found: " + FindFirstUniqueName(new[]
{
"James",
"Bill",
"Helen",
"Bill",
"Helen",
"Giles",
"James",
}
));
}
}

Your solution has O(n^2) complexity. You can improve it to O(n) by using Hash-Map.
Consider a Hash-Map with which in each name has it number of recurrences in your original list. Now all you have to do is check all key in the dictionary (aka hash-map) and return all that equal to 1. Notice that check all key in this dictionary is less then o(n) because it can not hold more then n names.
To implement this dictionary in C# you do as follow:
List<string> stuff = new List<string>();
var groups = stuff.GroupBy(s => s).Select(
s => new { Stuff = s.Key, Count = s.Count() });
var dictionary = groups.ToDictionary(g => g.Stuff, g => g.Count);
Taken from here or as suggested by juharr
O(n) is the minimum require as you will have to go over all names at least once.

Related

Group list of strings with common prefixes

Suppose I have a list of strings [city01, city01002, state02, state03, city04, statebg, countryqw, countrypo]
How do I group them in a dictionary of <string, List<Strings>> like
city - [city01, city04, city01002]
state- [state02, state03, statebg]
country - [countrywq, countrypo]
If not code, can anyone please help with how to approach or proceed?

As shown in other answers you can use the GroupBy method from LINQ to create this grouping based on any condition you want. Before you can group your strings you need to know the conditions for how a string is grouped. It could be that it starts with one of a set of predefined prefixes, grouped by whats before the first digit or any random condition you can describe with code. In my code example the groupBy method calls another method for every string in your list and in that method you can place the code you need to group the strings as you want by returning the key to group the given string under. You can test this example online with dotnetfiddle: https://dotnetfiddle.net/UHNXvZ
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static void Main()
{
List<string> ungroupedList = new List<string>() {"city01", "city01002", "state02", "state03", "city04", "statebg", "countryqw", "countrypo", "theFirstTown"};
var groupedStrings = ungroupedList.GroupBy(x => groupingCondition(x));
foreach (var a in groupedStrings) {
Console.WriteLine("key: " + a.Key);
foreach (var b in a) {
Console.WriteLine("value: " + b);
}
}
}
public static string groupingCondition(String s) {
if(s.StartsWith("city") || s.EndsWith("Town"))
return "city";
if(s.StartsWith("country"))
return "country";
if(s.StartsWith("state"))
return "state";
return "unknown";
}
}

You can use LINQ:
var input = new List<string>()
{ "city01", "city01002", "state02",
"state03", "city04", "statebg", "countryqw", "countrypo" };
var output = input.GroupBy(c => string.Join("", c.TakeWhile(d => !char.IsDigit(d))
.Take(4))).ToDictionary(c => c.Key, c => c.ToList());

i suppose you have a list of references you are searching in the list:
var list = new List<string>()
{ "city01", "city01002", "state02",
"state03", "city04", "statebg", "countryqw", "countrypo" };
var tofound = new List<string>() { "city", "state", "country" }; //references to found
var result = new Dictionary<string, List<string>>();
foreach (var f in tofound)
{
result.Add(f, list.FindAll(x => x.StartsWith(f)));
}
In the result, you have the dictionary wanted. If no value are founded for a reference key, the value of key is null

Warning: This answer has a combinatorial expansion and will fail if your original string set is large. For 65 words I gave up after running for a couple of hours.
Using some IEnumerable extension methods to find Distinct sets and to find all possible combinations of sets, you can generate a group of prefixes and then group the original strings by these.
public static class IEnumerableExt {
public static bool IsDistinct<T>(this IEnumerable<T> items) {
var hs = new HashSet<T>();
foreach (var item in items)
if (!hs.Add(item))
return false;
return true;
}
public static bool IsEmpty<T>(this IEnumerable<T> items) => !items.Any();
public static IEnumerable<IEnumerable<T>> AllCombinations<T>(this IEnumerable<T> start) {
IEnumerable<IEnumerable<T>> HelperCombinations(IEnumerable<T> items) {
if (items.IsEmpty())
yield return items;
else {
var head = items.First();
var tail = items.Skip(1);
foreach (var sequence in HelperCombinations(tail)) {
yield return sequence; // Without first
yield return sequence.Prepend(head);
}
}
}
return HelperCombinations(start).Skip(1); // don't return the empty set
}
}
var keys = Enumerable.Range(0, src.Count - 1)
.SelectMany(n1 => Enumerable.Range(n1 + 1, src.Count - n1 - 1).Select(n2 => new { n1, n2 }))
.Select(n1n2 => new { s1 = src[n1n2.n1], s2 = src[n1n2.n2], Dist = src[n1n2.n1].TakeWhile((ch, n) => n < src[n1n2.n2].Length && ch == src[n1n2.n2][n]).Count() })
.SelectMany(s1s2d => new[] { new { s = s1s2d.s1, s1s2d.Dist }, new { s = s1s2d.s2, s1s2d.Dist } })
.Where(sd => sd.Dist > 0)
.GroupBy(sd => sd.s.Substring(0, sd.Dist))
.Select(sdg => sdg.Distinct())
.AllCombinations()
.Where(sdgc => sdgc.Sum(sdg => sdg.Count()) == src.Count)
.Where(sdgc => sdgc.SelectMany(sdg => sdg.Select(sd => sd.s)).IsDistinct())
.OrderByDescending(sdgc => sdgc.Sum(sdg => sdg.First().Dist)).First()
.Select(sdg => sdg.First())
.Select(sd => sd.s.Substring(0, sd.Dist))
.ToList();
var groups = src.GroupBy(s => keys.First(k => s.StartsWith(k)));

C#: How to split a string with a changing prefix

Hello I looked at several post about this topics but no answer could help me.
I extract data about various machines which look like this:
"time, M1.A, M1.B, M1.C, M2.A, M2.B, M2.C, M3.A, M3.B, M3.C"
M1 is the prefix which specifies which machine. A,B,C are attributes of this machine like temperature, pressure, etc.
The output should then look like this:
{{"time", "M1.A", "M1.B", "M1.C"}, {"time", "M2.A",....}}
I know that I could possibly split at "," and then create the list but I was wondering if there is another way to detect if the prefix changed.

Regex.Matches(myList, #"M(?<digit>\d+)\..") //find all M1.A etc
.Cast<Match>() //convert the resulting list to an enumerable of Match
.GroupBy(m => m.Groups["digit"].Value) //find the groups with the same digits
.Select(g => new[] { "time" }.Union(g.Select(m => m.Value)).ToArray());
//combine the groups into arrays beginning with "time"

You mention "the output should then look like this...", but then you mention a list, so I'm going to assume that you mean to make the original string into a list of lists of strings.
List<string> split = new List<string>(s.Split(','));
string first = split[0];
split.RemoveAt(0);
List<List<string>> result = new List<List<string>>();
foreach (var dist in split.Select(o => o.Split('.')[0]).Distinct())
{
List<string> temp = new List<string> {first};
temp.AddRange(split.Where(o => o.StartsWith(dist)));
result.Add(temp);
}
This does the original split, removes the first value (you didn't really specify that, I assumed), then loops around each machine. The machines are created by splitting each value further by '.' and making a distinct list. It then selects all values in the list that start with the machine and adds them with the first value to the resulting list.

Using Regex I created a dictionary :
string input = "time, M1.A, M1.B, M1.C, M2.A, M2.B, M2.C, M3.A, M3.B, M3.C";
string pattern1 = #"^(?'name'[^,]*),(?'machines'.*)";
Match match1 = Regex.Match(input, pattern1);
string name = match1.Groups["name"].Value;
string machines = match1.Groups["machines"].Value.Trim();
string pattern2 = #"\s*(?'machine'[^.]*).(?'attribute'\w+)(,|$)";
MatchCollection matches = Regex.Matches(machines, pattern2);
Dictionary<string, List<string>> dict = matches.Cast<Match>()
.GroupBy(x => x.Groups["machine"].Value, y => y.Groups["attribute"].Value)
.ToDictionary(x => x.Key, y => y.ToList());

Some quick example for you. I think is better to parse it by you own way and have string structure of your Machine-Attribute pair.
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp4 {
class Program {
static void Main(string[] args) {
string inputString = "time, M1.A, M1.B, M1.C, M2.A, M2.B, M2.C, M3.A, M3.B, M3.C";
string[] attrList = inputString.Split(',');
// 1. Get all machines with attributes
List<MachineAttribute> MachineAttributeList = new List<MachineAttribute>();
for (int i = 1; i < attrList.Length; i++) {
MachineAttributeList.Add(new MachineAttribute(attrList[i]));
}
// 2. For each machine create
foreach (var machine in MachineAttributeList.Select(x=>x.Machine).Distinct()) {
Console.Write(attrList[0]);
foreach (var attribute in MachineAttributeList.Where(x=>x.Machine == machine)) {
Console.Write(attribute + ",");
}
Console.WriteLine();
}
Console.ReadLine();
}
}
public class MachineAttribute {
public string Machine { get; }
public string Attribute { get; }
public MachineAttribute(string inputData) {
var array = inputData.Split('.');
if (array.Length > 0) Machine = array[0];
if (array.Length > 1) Attribute = array[1];
}
public override string ToString() {
return Machine + "." + Attribute;
}
}
}

Generate all possible coverage options

Suppose I have 2 lists: one containing strings, one containing integers, they differ in length. The application I am building will use these lists to generate combinations of vehicle and coverage areas. Strings represent area names and ints represent vehicle ID's.
My goal is to generate a list of all possible unique combinations used for further investigation. One vehicle can service many areas, but one area can't be served by multiple vehicles. Every area must receive service, and every vehicle must be used.
So to conclude the constraints:
Every area is used only once
Every vehicle is used at least once
No area can be left out.
No vehicle can be left out
Here is an example:
public class record = {
public string areaId string{get;set;}
public int vehicleId int {get;set;}
}
List<string> areas = new List<string>{ "A","B","C","D"};
List<int> vehicles = new List<int>{ 1,2};
List<List<record>> uniqueCombinationLists = retrieveUniqueCombinations(areas,vehicles);
I just have no clue how to make the retrieveUniqueCombinations function. Maybe I am just looking wrong or thinking too hard. I am stuck thinking about massive loops and other brute force approaches. An explanation of a better approach would be much appreciated.
The results should resemble something like this, I think this contains all possibilities for this example.
A1;B1;C1;D2
A1;B1;C2;D1
A1;B2;C1;D1
A2;B1;C1;D1
A2;B2;C2;D1
A2;B2;C1;D2
A2;B1;C2;D2
A1;B2;C2;D2
A2;B1;C1;D2
A1;B2;C2;D1
A2;B2;C1;D1
A1;B1;C2;D2
A2;B1;C2;D1
A1;B2;C1;D2

Here's something I threw together that may or may not work. Borrowing heavily from dtb's work on this answer.
Basically, I generate them all, then remove the ones that don't meet the requirements.
List<string> areas = new List<string> { "A", "B", "C", "D" };
List<int> vehicles = new List<int> { 1, 2 };
var result = retrieveUniqueCombinations(areas, vehicles);
result.ToList().ForEach((recordList) => {
recordList.ToList().ForEach((record) =>
Console.Write("{0}{1};", record.areaId, record.vehicleId));
Console.WriteLine();
});
public IEnumerable<IEnumerable<record>> retrieveUniqueCombinations(IEnumerable<string> areas, IEnumerable<int> vehicles)
{
var items = from a in areas
from v in vehicles
select new record { areaId = a, vehicleId = v };
var result = items.GroupBy(i => i.areaId).CartesianProduct().ToList();
result.RemoveAll((records) =>
records.All(record =>
record.vehicleId == records.First().vehicleId));
return result;
}
public class record
{
public string areaId { get; set; }
public int vehicleId { get; set; }
}
static class Extensions
{
public static IEnumerable<IEnumerable<T>> CartesianProduct<T>(
this IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] { item }));
}
}
This produces the following:
A1;B1;C1;D2;
A1;B1;C2;D1;
A1;B1;C2;D2;
A1;B2;C1;D1;
A1;B2;C1;D2;
A1;B2;C2;D1;
A1;B2;C2;D2;
A2;B1;C1;D1;
A2;B1;C1;D2;
A2;B1;C2;D1;
A2;B1;C2;D2;
A2;B2;C1;D1;
A2;B2;C1;D2;
A2;B2;C2;D1;
Note that these are not in the same order as yours, but I'll leave the verification to you. Also, there's likely a better way of doing this (for instance, by putting the logic in the RemoveAll step in the CartesianProduct function), but hey, you get what you pay for ;).

So lets use some helper classes to convert numbers to IEnumerable<int> enumerations in different bases. It may be more efficient to use List<> but since we are trying to use LINQ:
public static IEnumerable<int> LeadingZeros(this IEnumerable<int> digits, int minLength) {
var dc = digits.Count();
if (dc < minLength) {
for (int j1 = 0; j1 < minLength - dc; ++j1)
yield return 0;
}
foreach (var j2 in digits)
yield return j2;
}
public static IEnumerable<int> ToBase(this int num, int numBase) {
IEnumerable<int> ToBaseRev(int n, int nb) {
do {
yield return n % nb;
n /= nb;
} while (n > 0);
}
foreach (var n in ToBaseRev(num, numBase).Reverse())
yield return n;
}
Now we can create an enumeration that lists all the possible answers (and a few extras). I converted the Lists to Arrays for indexing efficiency.
var areas = new List<string> { "A", "B", "C", "D" };
var vehicles = new List<int> { 1, 2 };
var areasArray = areas.ToArray();
var vehiclesArray = vehicles.ToArray();
var numVehicles = vehiclesArray.Length;
var numAreas = areasArray.Length;
var NumberOfCombos = Convert.ToInt32(Math.Pow(numVehicles, numAreas));
var ansMap = Enumerable.Range(0, NumberOfCombos).Select(n => new { n, nd = n.ToBase(numVehicles).LeadingZeros(numAreas)});
Given the enumeration of the possible combinations, we can convert into areas and vehicles and exclude the ones that don't use all vehicles.
var ans = ansMap.Select(nnd => nnd.nd).Select(m => m.Select((d, i) => new { a = areasArray[i], v = vehiclesArray[d] })).Where(avc => avc.Select(av => av.v).Distinct().Count() == numVehicles);

Easy way to select more than one field using LINQ

Take a look of this sample object,
public class Demo
{
public string DisplayName { get; set; }
public int Code1 { get; set; }
public int Code2 { get; set; }
...
}
and lets say I want to put all codes (Code1, Code2) in one list (IEnumerable)... one way is this one:
var codes = demoList.Select(item => item.Code1).ToList();
codes.AddRange(demoList.Select(item => item.Code2));
//var uniqueCodes = codes.Distinct(); // optional
I know this is not a nice neither optimal solution, so I am curious to know what will be a better approach / (best practice)?

How about with SelectMany:
var codes = demoList.SelectMany(item => new[] { item.Code1, item.Code2 });
By the way, the idiomatic way of doing a concatenation in LINQ is with Concat:
var codes = demoList.Select(item => item.Code1)
.Concat(demoList.Select(item => item.Code2));

Linq is not a silver bullet to kill everything
For your intent i'd propose the following
var codes = new List<int>(demoList.Count * 2);
foreach(var demo in demoList)
{
codes.Add(demo.Code1);
codes.Add(demo.Code2);
}
BENCHMARK
I did a benchmark iterating a list of 1 million and 1 thousand instances with my solution and Ani's
Amount: 1 million
Mine : 2ms
Ani's: 20ms
Amount 1000 items
Mine : 1ms
Ani's: 12ms
the sample code
List<MyClass> list = new List<MyClass>(1000);
for (int i = 0; i < 100000; i++)
{
list.Add(new MyClass
{
Code1 = i,
Code2 = i * 2,
});
}
System.Diagnostics.Stopwatch timer1 = System.Diagnostics.Stopwatch.StartNew();
var resultLinq = list.SelectMany(item => new[] { item.Code1, item.Code2 }).ToList();
Console.WriteLine("Ani's: {0}", timer1.ElapsedMilliseconds);
System.Diagnostics.Stopwatch timer2 = System.Diagnostics.Stopwatch.StartNew();
var codes = new List<int>(list.Count * 2);
foreach (var item in list)
{
codes.Add(item.Code1);
codes.Add(item.Code2);
}
Console.WriteLine("Mine : {0}", timer2.ElapsedMilliseconds);
}

// this won't return duplicates so no need to use Distinct.
var codes = demoList.Select(i=> i.Code1)
.Union(demoList.Select(i=>i.Code2));
Edited just for completeness (see #Ani answer) after some comments:
// Optionally use .Distinct()
var codes = demoList.Select(i=>i.Code1)
.Concat(demoList.Select(i=>i.Code2))
.Distinct();

Even the code you have written is perfect,i am just giving you another option
Try this
var output = Enumerable.Concat(demoList.Select(item => item.Code1).ToList(), demoList.Select(item => item.Code2).ToList()).ToList();

The Luis' answer is good enough for me. but I did re-factored it, using extension methods for any numbers of fields... and the optimal result still Luis's answer. (example of 100000 records)
Ani's: 21
Luis: 4
Jaider's: 15
Here my extension method.
public static IEnumerable<T> SelectExt<R, T>(this IEnumerable<R> list, params Func<R, T>[] GetValueList)
{
var result = new List<T>(list.Count() * GetValueList.Length);
foreach (var item in list)
{
foreach (var getValue in GetValueList)
{
var value = getValue(item);
result.Add(value);
}
}
return result;
}
The usage, will be:
var codes = demoList.SelectExt(item => item.Code1, item => item.Code2).ToList();

How to merge multi sets in LinQ

I have 3 sets in Linq, like this:
struct Index
{
string code;
int indexValue;
}
List<Index> reviews
List<Index> products
List<Index> pages
These lists have different code.
I want to merge these sets as following:
Take the first in reviews
Take the first in products
Take the first in pages
Take the second in reviews
-... and so on, note that these lists are not same-size.
How can I do this in Linq?
EDIT: Wait, is there a change to do this without .NET 4.0?
Thank you very much

You could use Zip to do your bidding.
var trios = reviews
.Zip(products, (r, p) => new { Review = r, Product = p })
.Zip(pages, (rp, p) => new { rp.Review, rp.Product, Page = p });
Edit:
For .NET 3.5, it's possible to implement Zip quite easily: but there are a few gotcha s. Jon Skeet has a great post series on how to implement LINQ to objects operators (for educational purposes), including this post, on Zip. The source code of the whole series, edulinq, can be found on Google Code.

The simple answer
To merge them into a common list without any common data, using the order they appear this, you can use the Zip method:
var rows = reviews
.Zip(products, (r, p) => new { Review = r, Product = p })
.Zip(pages, (rp, page) => new { rp.Review, rp.Product, Page = page });
The problem with this solution is that the lists must be identical length, or your result will be chopped to the shortest list of those original three.
Edit:
If you can't use .Net 4, check out Jon Skeet's blog posts on a clean-room implementation of Linq and His article on Zip in particular.
If you're using .Net 2, then try his library (possibly) or try LinqBridge
How to deal with different-lengthed lists
You can pre-pad the list to the desired length. I couldn't find an existing method to do this, so I'd use an extension method:
public static class EnumerableExtensions
{
public static IEnumerable<T> Pad<T>(this IEnumerable<T> source,
int desiredCount, T padWith = default(T))
{
// Note: Not using source.Count() to avoid double-enumeration
int counter = 0;
var enumerator = source.GetEnumerator();
while(counter < desiredCount)
{
yield return enumerator.MoveNext()
? enumerator.Current
: padWith;
++counter;
}
}
}
You can use it like this:
var paddedReviews = reviews.Pad(desiredLength);
var paddedProducts = products.Pad(desiredLength,
new Product { Value2 = DateTime.Now }
);
Full compiling sample and corresponding output
using System;
using System.Collections.Generic;
using System.Linq;
class Review
{
public string Value1;
}
class Product
{
public DateTime Value2;
}
class Page
{
public int Value3;
}
public static class EnumerableExtensions
{
public static IEnumerable<T> Pad<T>(this IEnumerable<T> source,
int desiredCount, T padWith = default(T))
{
int counter = 0;
var enumerator = source.GetEnumerator();
while(counter < desiredCount)
{
yield return enumerator.MoveNext()
? enumerator.Current
: padWith;
++counter;
}
}
}
class Program
{
static void Main(string[] args)
{
var reviews = new List<Review>
{
new Review { Value1 = "123" },
new Review { Value1 = "456" },
new Review { Value1 = "789" },
};
var products = new List<Product>()
{
new Product { Value2 = DateTime.Now },
new Product { Value2 = DateTime.Now.Subtract(TimeSpan.FromSeconds(5)) },
};
var pages = new List<Page>()
{
new Page { Value3 = 123 },
};
int maxCount = Math.Max(Math.Max(reviews.Count, products.Count), pages.Count);
var rows = reviews.Pad(maxCount)
.Zip(products.Pad(maxCount), (r, p) => new { Review = r, Product = p })
.Zip(pages.Pad(maxCount), (rp, page) => new { rp.Review, rp.Product, Page = page });
foreach (var row in rows)
{
Console.WriteLine("{0} - {1} - {2}"
, row.Review != null ? row.Review.Value1 : "(null)"
, row.Product != null ? row.Product.Value2.ToString() : "(null)"
, row.Page != null ? row.Page.Value3.ToString() : "(null)"
);
}
}
}
123 - 9/7/2011 10:02:22 PM - 123
456 - 9/7/2011 10:02:17 PM - (null)
789 - (null) - (null)
On use of the Join tag
This operation isn't a logical Join. This is because you're matching on index, not on any data out of each object. Each object would have to have other data in common (besides their position in the lists) to be joined in the sense of a Join that you would find in a relational database.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Faster way of finding all indices of specific string in an array - c#

Related

Group list of strings with common prefixes

C#: How to split a string with a changing prefix

Generate all possible coverage options

Easy way to select more than one field using LINQ

How to merge multi sets in LinQ

Categories

Resources