Dictionary<string, string> Value lookup performance - c#

I am working on a small project but have run into a performance roadblock.
I have a Dictionary<string, string>()
I have a string[].
Lets say my Dictionary has 50,000 entries, and my string[] has 30,000 entries.
I want to collect the Keys from my Dictionary where the value.ToCharArray().OrderBy(x => x) equals a value.ToCharArray().OrderBy(x => x) of my string[].
I have tried reducing the number of KeyValue pairs I have to look through by comparing the length of my string[] value to the values in the Dictionary, but that has not really gained me any performance.
Does anyone have an ideas how I can improve the performance of this lookup?
Thanks!
To expand the pseudocode:
var stringToLookUp = GetSomeStrings(s.ToString()).Select(x => x).OrderBy(x => x).ToArray();
var aDictionaryOfStringString = GetDictionary(Resources.stringList);
var results = new List<string>();
foreach (var theString in stringToLookUp.Where(aString=> aString.Length > 0))
{
if (theString.Length > 0)
{
var theStringClosure = theString;
var filteredKeyValuePairs = aDictionaryOfStringString.Where(w => w.Value.Length == theStringClosure.Length && !results.Contains(w.Key)).ToArray();
var foundStrings = filteredKeyValuePairs.Where(kv => kv.Value.ToCharArray().OrderBy(c => c).ToArray().SequenceEqual(theStringClosure))
.Select(kv => kv.Key)
.ToArray();
if (foundStrings.Any()) results.AddRange(foundStrings);
}
}

I think principal problem is you iterate over whole dictionary in every single iteration - this is O(N^2). Better build hashset based on your modified key (either from dictionary or from array) and iterate over the second. This is O(N).
// some values
var dictionary = new Dictionary<string, string>();
var fields = new string[]{};
string[] modifiedFields = new string[fields.Length];
for(var i =0; i < fields.Length; i++)
{
modifiedFields[i] = new string(fields[i].ToCharArray().OrderBy(x =>x).ToArray());
}
var set = new HashSet<string>(modifiedFields);
var results = new List<string>();
foreach(var pair in dictionary)
{
string key = new string(pair.Value.ToCharArray().OrderBy(x =>x).ToArray());
if (set.Contains(key))
{
results.Add(pair.Key);
}
}

You can try this
var stringToLookUp = GetSomeStrings(s.ToString()).Select(x => x).OrderBy(x => x).ToArray();
var aDictionaryOfStringString = GetDictionary(Resources.stringList);
var results = aDictionaryOfStringString.Where(kvp => stringToLookUp.Select(s => s.OrderBy(x => x)).Contains(kvp.Value.OrderBy(x => x))).Select(kvp => kvp.Key).ToList();

Related

how to find members that exist in at least two lists in a list of lists

I have an array of lists:
var stringLists = new List<string>[]
{
new List<string>(){ "a", "b", "c" },
new List<string>(){ "d", "b", "c" },
new List<string>(){ "a", "d", "c" }
};
I want to extract all elements that are common in at least 2 lists. So for this example, I should get all elements ["a", "b", "c", "d"]. I know how to find elements common to all but couldn't think of any way to solve this problem.
You could use something like this:
var result = stringLists.SelectMany(l => l.Distinct())
.GroupBy(e => e)
.Where(g => g.Count() >= 2)
.Select(g => g.Key);
Just for fun some iterative solutions:
var seen = new HashSet<string>();
var current = new HashSet<string>();
var result = new HashSet<string>();
foreach (var list in stringLists)
{
foreach(var element in list)
if(current.Add(element) && !seen.Add(element))
result.Add(element);
current.Clear();
}
or:
var already_seen = new Dictionary<string, bool>();
foreach(var list in stringLists)
foreach(var element in list.Distinct())
already_seen[element] = already_seen.ContainsKey(element);
var result = already_seen.Where(kvp => kvp.Value).Select(kvp => kvp.Key);
or (inspired by Tim's answer):
int tmp;
var items = new Dictionary<string,int>();
foreach(var str in stringLists.SelectMany(l => l.Distinct()))
{
items.TryGetValue(str, out tmp);
items[str] = tmp + 1;
}
var result = items.Where(kv => kv.Value >= 2).Select(kv => kv.Key);
You could use a Dictionary<string, int>, the key is the string and the value is the count:
Dictionary<string, int> itemCounts = new Dictionary<string,int>();
for(int i = 0; i < stringLists.Length; i++)
{
List<string> list = stringLists[i];
foreach(string str in list.Distinct())
{
if(itemCounts.ContainsKey(str))
itemCounts[str] += 1;
else
itemCounts.Add(str, 1);
}
}
var result = itemCounts.Where(kv => kv.Value >= 2);
I use list.Distinct() since you only want to count occurences in different lists.
As requested, here is an extension method which you can reuse with any type:
public static IEnumerable<T> GetItemsWhichOccurAtLeastIn<T>(this IEnumerable<IEnumerable<T>> seq, int minCount, IEqualityComparer<T> comparer = null)
{
if (comparer == null) comparer = EqualityComparer<T>.Default;
Dictionary<T, int> itemCounts = new Dictionary<T, int>(comparer);
foreach (IEnumerable<T> subSeq in seq)
{
foreach (T x in subSeq.Distinct(comparer))
{
if (itemCounts.ContainsKey(x))
itemCounts[x] += 1;
else
itemCounts.Add(x, 1);
}
}
foreach(var kv in itemCounts.Where(kv => kv.Value >= minCount))
yield return kv.Key;
}
Usage is simple:
string result = String.Join(",", stringLists.GetItemsWhichOccurAtLeastIn(2)); // a,b,c,d
Follow these steps:
Create a Dictionary element -> List of indices
loop over all lists
for list number i: foreach element in the list: add i to the list in the dictionary at position : dictionary[element].Add(i) (if not already present)
Count how many lists in the dictionary have two entries
You can use SelectMany to flatten the list and then pick all elemeents which occur twice or more:
var singleList = stringLists.SelectMany(p => p);
var results = singleList.Where(p => singleList.Count(q => p == q) >= 2).Distinct();

Split string into custom List<T>

I need to split a custom string into the following format using C#.
The following string: AD=Demo,OU=WEB,OU=IT,L=MyCity,C=MyCountry, i want to split it at comma into a
List<CustomDictionary> myList = new
List<CustomDictionary>();
Based on the text above and after the split, the myList list should contain 5 objects of type CustomDictionary.
object1.Key = AD
object1.Value = Demo
object2.Key = OU
object2.Value = WEB
object3.Key = OU
object3.Value = IT
object4.Key = L
object4.Value = MyCity
object5.Key = C
object5.Value = MyCountry
Here is the CustomObject class
public class CustomDictionary
{
public string Key { get; set; }
public string Value { get; set; }
public CustomDictionary(string key, string value)
{
this.Key = key;
this.Value = value;
}
}
So far I tried this:
Here I am stuck!
List<CustomDictionary> keyVal = new List<CustomDictionary>val.Split(',').Select(x=>x.Split('=')).Select(x=>x.));
where val is the actual string ...
With linq:
var query = from x in str.Split(',')
let p = x.Split('=')
select new CustomDictionary(p[0], p[1]);
var list = query.ToList();
Also seems like you want to get a dictionary as a result. If so, try this code:
var dict = str.Split(',').Select(x => x.Split('='))
.ToDictionary(x => x[0], x => x[1]);
To handle duplicate keys, you can store objects in Lookup. Just call ToLookup instead of ToDictionaty.
After splitting the second time you create a CustomDictionary from the items in that array, then use ToList to make a list of the result.
List<CustomDictionary> keyVal =
val.Split(',')
.Select(x => x.Split('='))
.Select(a => new CustomDictionary(a[0], a[1]))
.ToList();
There is already a class in the framework having a key and value, which you can use instead:
List<KeyValuePair<string, string>> keyVal =
val.Split(',')
.Select(x => x.Split('='))
.Select(a => new KeyValuePair<string, string>(a[0], a[1]))
.ToList();
You can also use a Dictionary<string, string> instead of a list of key-value pairs. It stores the value based on the hash code of the key, so getting a value by key is much faster than looking through a list (but it doesn't retain the order of the items):
Dictionary<string, string> keyVal =
val.Split(',')
.Select(x => x.Split('='))
.ToDictionary(a => a[0], a => a[1]);
This is how you would do it:
var parts = theString.Split(',');
var myList = new List<CustomDictionary>();
foreach(string part in parts)
{
var kvp = part.Split('=');
myList.Add(new CustomDictionary(kvp[0], kvp[1]));
}
This can also be done using LINQ.
Since you have 2 OUs you can't use Dictionary. Instead use Lookup
string input = "AD=Demo,OU=WEB,OU=IT,L=MyCity,C=MyCountry";
var dict = Regex.Matches(input, #"(\w+)=([^,$]+)").Cast<Match>()
.ToLookup(m => m.Groups[1].Value, m => m.Groups[2].Value);
what about MyString.split(','); and the on each string you get:
CO.key = SubString.split('=')[0];
CO.value = SubString.split('=')[1];
With LINQ:
List<CustomDictionary> myList = (from x in input.Split(new char[] { ',' })
select
new CustomDictionary (x.Substring(0, x.IndexOf('=')), x.Substring(x.IndexOf('=') + 1))
).ToList();
string str = "AD=Demo,OU=WEB,OU=IT,L=MyCity,C=MyCountry";
var result = str.Split(',').Select(s =>
{
var tmp = s.Split('=');
return new CustomDictionary(tmp[0], tmp[1]);
}).ToList();

How to check for duplicates in an array and then do something with their values?

I have an array for example("1:2","5:90","7:12",1:70,"29:60") Wherein ID and Qty are separated by a ':' (colon), what I want to do is when there's a duplicate of IDs the program will add the qty and return the new set of arrays so in the example it will become ("1:72","5:90","7:12","29:60").
Ex.2 ("1:2","5:90","7:12","1:70","29:60","1:5") becomes ("1:77","5:90","7:12","29:60").
I want to solve it without using linq.
var foo = array.Select(s => s.Split(':'))
.GroupBy(x => x[0])
.Select(g =>
String.Format(
"{0}:{1}",
g.Key,
g.Sum(x => Int32.Parse(x[1]))
)
)
.ToArray();
Note, it's not necessary to parse the "keys," only the values.
Without LINQ:
var dictionary = new Dictionary<string, int>();
foreach (var group in array) {
var fields = group.Split(':');
if (!dictionary.ContainsKey(fields[0])) {
dictionary.Add(fields[0], 0);
}
dictionary[fields[0]] += Int32.Parse(fields[1]);
}
string[] foo = new string[dictionary.Count];
int index = 0;
foreach (var kvp in dictionary) {
foo[index++] = String.Format("{0}:{1}", kvp.Key, kvp.Value);
}
You have to do this manually. Loop through each list, check the ID for each element. Put it in a Dictionary<int, int>, Dictionary<id, qt>. If the dictionary contains the id, add it to the value.
Loop, add, check using Dictionary class.
If you want it without LINQ...
var totalQuantities = new Dictionary<int, int>();
foreach(var raw in sourceArr) {
var splitted = raw.Split(':');
int id = int.Parse(splitted[0]);
int qty = int.Parse(splitted[1]);
if(!totalQuantities.ContainsKey(id)) {
totalQuantities[id] = 0;
}
totalQuantities[id] += qty;
}
var result = new string[totalQuantities.Count];
int i=0;
foreach(var kvp in totalQuantities) {
result[i] = string.Format("{0}:{1}", kvp.Key, kvp.Value);
i++;
}
(
from raw in arr
let splitted = raw.Split(':')
let id = int.Parse(splitted[0])
let qty = int.Parse(splitted[1])
let data = new { id, qty }
group data by data.id into grp
let totalQty = grp.Sum(val => val.qty)
let newStr = string.Format("{0}:{1}", grp.Key, totalQty
select newStr
)
.ToArray()
Note that the code may contain accidental errors, as it was written in notepad.
var input=new string[]{"1:2","5:90","7:12","1:70","29:60","1:5"};
var result=input
.Select(s=>s.Split(':'))
.Select(x=>x.Select(s=>int.Parse(s)).ToArray())
.GroupBy(x=>x[0])
.Select(g=>g.Key+":"+g.Sum(x=>x[1]));
I was too lazy to specify the culture everywhere. You probably want to do that before putting it into production, or it will fail for cultures with unusual integer representations.
var totals=new Dictionary<int,int>
foreach(string s in input)
{
string[] parts=s.Split(':');
int id=int.Parse(parts[0]);
int quantity=int.Parse(parts[0]);
int totalQuantity;
if(!totals.TryGetValue(id,out totalQuantity))
totalQuantity=0;//Yes I know this is redundant
totalQuanity+=quantity;
totals[id]=totalQuantity;
}
var result=new List<string>();
foreach(var pair in totals)
{
result.Add(pair.Key+":"+pair.Value);
}
try this:
List<string> items = new List<string>(new string[] { "1:2", "5:90", "7:12", "1:70", "29:60" });
Dictionary<string, int> dictionary = new Dictionary<string, int>();
foreach (string item in items)
{
string[] data = item.Split(':');
string key = data[0];
if (!dictionary.ContainsKey(data[0]))
{
int value = dictionary[data[0]];
dictionary[key] += int.Parse(data[1]);
}
}
//Used dictionary values here

How to convert a String[] to an IDictionary<String, String>?

How to convert a String[] to an IDictionary<String, String>?
The values at the indices 0,2,4,... shall be keys, and consequently values at the indices 1,3,5,... shall be values.
Example:
new[] { "^BI", "connectORCL", "^CR", "connectCR" }
=>
new Dictionary<String, String> {{"^BI", "connectORCL"}, {"^CR", "connectCR"}};
I'd recommend a good old for loop for clarity. But if you insist on a LINQ query, this should work:
var dictionary = Enumerable.Range(0, array.Length/2)
.ToDictionary(i => array[2*i], i => array[2*i+1])
Dictionary<string,string> ArrayToDict(string[] arr)
{
if(arr.Length%2!=0)
throw new ArgumentException("Array doesn't contain an even number of entries");
Dictionary<string,string> dict=new Dictionary<string,string>();
for(int i=0;i<arr.Length/2;i++)
{
string key=arr[2*i];
string value=arr[2*i+1];
dict.Add(key,value);
}
return dict;
}
There's really no easy way to do this in LINQ (And even if there were, it's certainly not going to be clear as to the intent). It's easily accomplished by a simple loop though:
// This code assumes you can guarantee your array to always have an even number
// of elements.
var array = new[] { "^BI", "connectORCL", "^CR", "connectCR" };
var dict = new Dictionary<string, string>();
for(int i=0; i < array.Length; i+=2)
{
dict.Add(array[i], array[i+1]);
}
Something like this maybe:
string[] keyValues = new string[20];
Dictionary<string, string> dict = new Dictionary<string, string>();
for (int i = 0; i < keyValues.Length; i+=2)
{
dict.Add(keyValues[i], keyValues[i + 1]);
}
Edit: People in the C# tag are damn fast...
If you have Rx as a dependency you can do:
strings
.BufferWithCount(2)
.ToDictionary(
buffer => buffer.First(), // key selector
buffer => buffer.Last()); // value selector
BufferWithCount(int count) takes the first count values from the input sequence and yield them as a list, then it takes the next count values and so on. I.e. from your input sequence you will get the pairs as lists: {"^BI", "connectORCL"}, {"^CR", "connectCR"}, the ToDictionary then takes the first list item as key and the last ( == second for lists of two items) as value.
However, if you don't use Rx, you can use this implementation of BufferWithCount:
static class EnumerableX
{
public static IEnumerable<IList<T>> BufferWithCount<T>(this IEnumerable<T> source, int count)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (count <= 0)
{
throw new ArgumentOutOfRangeException("count");
}
var buffer = new List<T>();
foreach (var t in source)
{
buffer.Add(t);
if (buffer.Count == count)
{
yield return buffer;
buffer = new List<T>();
}
}
if (buffer.Count > 0)
{
yield return buffer;
}
}
}
It looks like other people have already beaten me to it and/or have more efficient answers but I'm posting 2 ways:
A for loop might be the clearest way to accomplish in this case...
var words = new[] { "^BI", "connectORCL", "^CR", "connectCR" };
var final = words.Where((w, i) => i % 2 == 0)
.Select((w, i) => new[] { w, words[(i * 2) + 1] })
.ToDictionary(arr => arr[0], arr => arr[1])
;
final.Dump();
//alternate way using zip
var As = words.Where((w, i) => i % 2 == 0);
var Bs = words.Where((w, i) => i % 2 == 1);
var dictionary = new Dictionary<string, string>(As.Count());
var pairs = As.Zip(Bs, (first, second) => new[] {first, second})
.ToDictionary(arr => arr[0], arr => arr[1])
;
pairs.Dump();
FYI, this is what I ended up with using a loop and implementing it as an extension method:
internal static Boolean IsEven(this Int32 #this)
{
return #this % 2 == 0;
}
internal static IDictionary<String, String> ToDictionary(this String[] #this)
{
if (!#this.Length.IsEven())
throw new ArgumentException( "Array doesn't contain an even number of entries" );
var dictionary = new Dictionary<String, String>();
for (var i = 0; i < #this.Length; i += 2)
{
var key = #this[i];
var value = #this[i + 1];
dictionary.Add(key, value);
}
return dictionary;
}
Pure Linq
Select : Project original string value and its index.
GroupBy : Group adjacent pairs.
Convert each group into dictionary entry.
string[] arr = new string[] { "^BI", "connectORCL", "^CR", "connectCR" };
var dictionary = arr.Select((value,i) => new {Value = value,Index = i})
.GroupBy(value => value.Index / 2)
.ToDictionary(g => g.FirstOrDefault().Value,
g => g.Skip(1).FirstOrDefault().Value);

an array question

i have an array below
string stringArray = new stringArray[12];
stringArray[0] = "0,1";
stringArray[1] = "1,3";
stringArray[2] = "1,4";
stringArray[3] = "2,1";
stringArray[4] = "2,4";
stringArray[5] = "3,7";
stringArray[6] = "4,3";
stringArray[7] = "4,2";
stringArray[8] = "4,8";
stringArray[9] = "5,5";
stringArray[10] = "5,6";
stringArray[11] = "6,2";
i need to transform like below
List<List<string>> listStringArray = new List<List<string>>();
listStringArray[["1"],["3","4"],["1","4"],["7"],["3","2","8"],["5","6"],["2"]];
how is that possible?
I think what you actually want is probably this:
var indexGroups = x.Select(s => s.Split(',')).GroupBy(s => s[0], s => s[1]);
This will return the elements as a grouped enumeration.
To return a list of lists, which is what you literally asked for, then try:
var lists = x.Select(s => s.Split(',')).GroupBy(s => s[0], s => s[1])
.Select(g => g.ToList()).ToList();
There's no shorthand like that. You'll have to break into a loop and split each array and add to the list.
Non LINQ version (I must admit its much uglier, but you may have no choice)
var index = new Dictionary<string, List<string>>();
foreach (var str in stringArray) {
string[] split = str.Split(',');
List<string> items;
if (!index.TryGetValue(split[0], out items)) {
items = new List<string>();
index[split[0]] = items;
}
items.Add(split[1]);
}
var transformed = new List<List<string>>();
foreach (List<string> list in index.Values) {
transformed.Add(list);
}

Categories

Resources