Scan nested dictionary with LINQ syntax - c#

I have this working code:
using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
public class Example {
public static void Main(string[] args) {
var files = new Dictionary<string, Dictionary<string, int>>()
{ { "file1", new Dictionary<string, int>() { { "A", 1 } } } };
foreach(var file in files) {
File.WriteAllLines(file.Key + ".txt", file.Value.Select(
item => item.Key + item.Value.ToString("000")).ToArray());
}
}
}
But I want to change the foreach to LINQ syntax. Nothing I already tried worked.

Is this what you are after?
var files = new Dictionary<string, Dictionary<string, int>>()
{ { "file1", new Dictionary<string, int>() { { "A", 1 } } } };
files.ForEach(kvp =>
File.WriteAllLines(kvp.Key + ".txt", kvp.Value.Select(
item => item.Key + item.Value.ToString("000")).ToArray()));
As per Alexei's comment, IEnumerable.ForEach isn't a standard extension method as it implies mutation, which isn't the aim of functional programming. You can add it with a helper method like this one:
public static void ForEach<T>(
this IEnumerable<T> source,
Action<T> action)
{
foreach (T element in source)
action(element);
}
Also, your original title implied that the initializer syntax for Dictionaries is unwieldy. What you can do to reduce the amount of typing / code real estate for a large number of elements is to build up an array of anonymous objects and then ToDictionary(). Unfortunately there is a small performance impact:
var files = new [] { new { key = "file1",
value = new [] { new {key = "A", value = 1 } } } }
.ToDictionary(
_ => _.key,
_ => _.value.ToDictionary(x => x.key, x => x.value));

foreach is exactly what you should be using here. LINQ is all about querying data: projecting, filtering, sorting, grouping, etc. You're trying to execute an action for each element in collection which is already in there as you it.
Just iterate using foreach.
There are reasons why there is no ForEach extension method on IEnumerable<T>:
Why is there not a ForEach extension method on the IEnumerable interface?
Why I Don’t Use the ForEach Extension Method
It's mostly about:
The reason to not use ForEach is that it blurs the boundary between pure functional code and state-full imperative code.
The only reason I can see not to use foreach loop is when you want to make your actions run in parallel by using Parallel.ForEach instead:
Parallel.ForEach(
files,
kvp => File.WriteAllLines(kvp.Key + ".txt", kvp.Value.Select(
item => item.Key + item.Value.ToString("000")).ToArray()));
Having ForEach extension method on IEnumerable<T> is a bad design and I advice against it.

Related

How is this parallel for not processing all elements?

I've created this normal for loop:
public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
Dictionary<string, Dictionary<string, bool>> filesAnalyzed = new Dictionary<string, Dictionary<string, bool>>();
foreach (var item in files)
{
filesAnalyzed[item] = AnalyzeFile(item, dependencies);
}
return filesAnalyzed;
}
The loop just checks if each file that is in the variable "files" has all the dependencies specified in the variable "dependencies".
the "files" variable should only have unique elements because it is used as the key for the result, a dictionary, but I check this before calling the method.
The for loop works correctly and all elements are processed in single thread, so I wanted to increase the performance by changing to a parallel for loop, the problem is that not all the elements that come from the "files" variable are being processed in the parallel for (in my test case I get 30 elements instead of 53).
I've tried to increase the timespan, or to remove all the "Monitor.TryEnter" code and use just a lock(filesAnalyzed) but still got the same result
I'm not very familiar with the paraller for, so it might be something in the syntax that I'm using.
public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
var filesAnalyzed = new Dictionary<string, Dictionary<string, bool>>();
Parallel.For<KeyValuePair<string, Dictionary<string, bool>>>(
//start index
0,
//end index
files.Count(),
// initialization?
()=>new KeyValuePair<string, Dictionary<string, bool>>(),
(index, loop, result) =>
{
var temp = new KeyValuePair<string, Dictionary<string, bool>>(
files.ElementAt(index),
AnalyzeFile(files.ElementAt(index), dependencies));
return temp;
}
,
//finally
(x) =>
{
if (Monitor.TryEnter(filesAnalyzed, new TimeSpan(0, 0, 30)))
{
try
{
filesAnalyzed.Add(x.Key, x.Value);
}
finally
{
Monitor.Exit(filesAnalyzed);
}
}
}
);
return filesAnalyzed;
}
any feedback is appreciated
Assuming the code inside AnalyzeFile and dependencies is thread safe, how about something like this:
var filesAnalyzed = files
.AsParellel()
.Select(x => new{Item = x, File = AnalyzeFile(x, dependencies)})
.ToDictionary(x => x.Item, x=> x.File);
Rewrite your normal loop this way:
Parallel.Foreach(files, item=>
{
filesAnalyzed[item] = AnalyzeFile(item, dependencies);
});
You should also use ConcurrentDictionary except Dictionary to make all process thread-safe
You can simplify your code a lot if you use Parallel LINQ instead :
public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
var filesAnalyzed = ( from item in files.AsParallel()
let result=AnalyzeFile(item, dependencies)
select (Item:item,Result:result)
).ToDictionary( it=>it.Item,it=>it.Result)
return filesAnalyzed;
}
I used tuple syntax in this case to avoid noise. It also cuts down on allocations.
Using method syntax, the same can be written as :
var filesAnalyzed = files.AsParallel()
.Select(item=> (item, AnalyzeFile(item, dependencies)))
.ToDictionary( it=>it.Item,it=>it.Result)
Dictionary<> isn't thread-safe for modification. If you wanted to use Parallel.ForEach without locking, you'd have to use ConcurrentDictionary
var filesAnalyzed = ConcurrentDictionary<string,Dictionary<string,bool>>;
Parallel.ForEach(files,file => {
filesAnalyzed[item] = AnalyzeFile(item, dependencies);
});
In this case at least, there is no benefit in using Parallel over PLINQ.
Hard to say what is exactly going wrong without debugging the code. Just looking at it though I would have used a ConcurrentDictionary for filesAnalyzed variable instead of a normal `Dictionary and get rid of the Monitor.
I would also check whether same key already exists in the dictionary filesAnalyzed, it could be that you are trying to add a kvp withthe key that is added to the dictionary already.

How to sort by descending nested Dictionary in C# using LINQ

I need some help to sort by descending nested Dictionary which is quite hard for me because I'm not so advanced , I've been searching many sites but with no success. If someone can give me a hand with this I'll be grateful. So here is the code
Dictionary<string, Dictionary<string, int>> champLeague = new Dictionary<string, Dictionary<string, int>>();
For example when I add -
Barcelona, Arsenal, 1
Man Unted, Liverpool, 2
Man City, Stoke City, 3
And I want to print out the dictionary ordered by descending by the second dictionary's value like this :
var orderedDic = champLeague.OrderByDescending(x => x.Value.Values).ThenBy(x => x.Value.Keys)
And try foreach(var kvp in orderedDic){Console.WriteLine(kvp.Key)}
It throws me an exception : "unhandled exception at least one object must be implemented IComparable"
I want to look like this :
Man City
Man United
Barcelona
foreach (var firstTeam in dictionary)
{
Console.WriteLine($"{firstTeam.Key}");
foreach (var secondTeam in firstTeam.Value.OrderByDescending(x => x.Value))
{
Console.WriteLine($"# {secondTeam .Key} -> {secondTeam .Value}");
}
}
By what I understand, you want to sort the matches in descending order with respect to the number of goals. For this particular problem, I think usage of dictionary is not recommended. You can use tuples instead. They will save you the hassle when a team has more than one matches.
Here is the code.
using System;
using System.Linq;
using System.Collections.Generic;
public class Test
{
public static void Main()
{
var tuple1 =
new Tuple<string, string, int>("Man City", "Stoke City", 3);
var tuple2 =
new Tuple<string, string, int>("Man Unted", "Liverpool", 2);
var tuple3 =
new Tuple<string, string, int>("Barcelona", "Arsenal", 1);
var championsLeague = new List<Tuple<string, string, int>>();
championsLeague.Add(tuple1);
championsLeague.Add(tuple2);
championsLeague.Add(tuple3);
//Item3 is the third item from left that mentioned in the definition of the Tuple. ie. Number of goals.
var lst = championsLeague.OrderByDescending(x => x.Item3)
.Select(x=> x.Item1); // Item1 is the first item mentioned in the tuple definition.
lst.ToList().ForEach(Console.WriteLine);
}
}
You should try
var allMatches = new List<KeyValuePair<KeyValuePair<string, string>, int>>();
foreach(var left in champLeage.Keys)
{
foreach(var right in left){
allMatches.Add(new KeyValuePair(new KeyValuePair<left, right>(left, right.Key), right.Value);
}
}
foreach(var match in allMatches.OrderByDescending(x => x.Value)){
ConsoleWriteLine("{0} - {1} : {2}", match.Key.Key, match.Key.Value, match.Value);
}
This is not efficient or "pretty". You should use classes. A Match class that have 2 teams and a Result or something like that

Generic List Contains() perfomance and alternatives

I need to store big amount of key, value pairs where key is not unique. Both key and value are strings. And items count is about 5 million.
My goal is to hold only unique pairs.
I've tried to use List<KeyValuePair<string, string>>, but the Contains() is extremely slow.
LINQ Any() looks a little bit faster, but still too slow.
Are there any alternatives to perform the search faster on a generic list? Or maybe I should use another storage?
I would use a Dictionary<string, HashSet<string>> mapping one key to all its values.
Here is a full solution. First, write a couple of extension methods to add a (key,value) pair to your Dictionary and another one to get all (key,value) pairs. Note that I use arbitrary types for keys and values, you can substitute this with string without problem.
You can even write these methods somewhere else instead of as extensions, or not use methods at all and just use this code somewhere in your program.
public static class Program
{
public static void Add<TKey, TValue>(
this Dictionary<TKey, HashSet<TValue>> data, TKey key, TValue value)
{
HashSet<TValue> values = null;
if (!data.TryGetValue(key, out values)) {
// first time using this key? create a new HashSet
values = new HashSet<TValue>();
data.Add(key, values);
}
values.Add(value);
}
public static IEnumerable<KeyValuePair<TKey, TValue>> KeyValuePairs<TKey, TValue>(
this Dictionary<TKey, HashSet<TValue>> data)
{
return data.SelectMany(k => k.Value,
(k, v) => new KeyValuePair<TKey, TValue>(k.Key, v));
}
}
Now you can use it as follows:
public static void Main(string[] args)
{
Dictionary<string, HashSet<string>> data = new Dictionary<string, HashSet<string>>();
data.Add("k1", "v1.1");
data.Add("k1", "v1.2");
data.Add("k1", "v1.1"); // already in, so nothing happens here
data.Add("k2", "v2.1");
foreach (var kv in data.KeyValuePairs())
Console.WriteLine(kv.Key + " : " + kv.Value);
}
Which will print this:
k1 : v1.1
k1 : v1.2
k2 : v2.1
If your key mapped to a List<string> then you would need to take care of duplicates yourself. HashSet<string> does that for you already.
I guess that Dictionary<string, List<string>> will do the trick.
I would consider using some in-proc NoSQL database like RavenDB (RavenDB Embedded in this case) as they state on their website:
RavenDB can be used for application that needs to store millions of records and has fast query times.
Using it is requires no big boilerplate (example from RavenDB website):
var myCompany = new Company
{
Name = "Hibernating Rhinos",
Employees = {
new Employee
{
Name = "Ayende Rahien"
}
},
Country = "Israel"
};
// Store the company in our RavenDB server
using (var session = documentStore.OpenSession())
{
session.Store(myCompany);
session.SaveChanges();
}
// Create a new session, retrieve an entity, and change it a bit
using (var session = documentStore.OpenSession())
{
Company entity = session.Query<Company>()
.Where(x => x.Country == "Israel")
.FirstOrDefault();
// We can also load by ID: session.Load<Company>(companyId);
entity.Name = "Another Company";
session.SaveChanges(); // will send the change to the database
}
To make a unique list you want to use .Distinct() to generate it, not .Contains(). However whatever class holds your strings must implement .GetHashCode() and .Equals() correctly to get good performance or you must pass in a custom comparer.
Here is how you could do it with a custom comparer
private static void Main(string[] args)
{
List<KeyValuePair<string, string>> giantList = Populate();
var uniqueItems = giantList.Distinct(new MyStringEquater()).ToList();
}
class MyStringEquater : IEqualityComparer<KeyValuePair<string, string>>
{
//Choose which comparer you want based on if you want your comparisions to be case sensitive or not
private static StringComparer comparer = StringComparer.OrdinalIgnoreCase;
public bool Equals(KeyValuePair<string, string> x, KeyValuePair<string, string> y)
{
return comparer.Equals(x.Key, y.Key) && comparer.Equals(x.Value, y.Value);
}
public int GetHashCode(KeyValuePair<string, string> obj)
{
unchecked
{
int x = 27;
x = x*11 + comparer.GetHashCode(obj.Key);
x = x*11 + comparer.GetHashCode(obj.Value);
return x;
}
}
}
Also per your comment in the other answer you could also use the above comparer in a HashSet and have it store your unique items that way. You just need to pass in the comparer in to the constructor.
var hashSetWithComparer = new HashSet<KeyValuePair<string,string>(new MyStringEquater());
You will most likely see an improvement if you use a HashSet<KeyValuePair<string, string>>.
The test below finishes on my machine in about 10 seconds. If I change...
var collection = new HashSet<KeyValuePair<string, string>>();
...to...
var collection = new List<KeyValuePair<string, string>>();
...I get tired of waiting for it to complete (more than a few minutes).
Using a KeyValuePair<string, string> has the advantage that equality is determined by the values of Key and Value. Since strings are interned, and KeyValuePair<TKey, TValue> is a struct, pairs with the same Key and Value will be considered equal by the runtime.
You can see that equality with this test:
var hs = new HashSet<KeyValuePair<string, string>>();
hs.Add(new KeyValuePair<string, string>("key", "value"));
var b = hs.Contains(new KeyValuePair<string, string>("key", "value"));
Console.WriteLine(b);
One thing that's important to remember, though, is that the equality of pairs depends on the internment of strings. If, for some reason, your strings aren't interned (because they come from a file or something), the equality probably won't work.
using System;
using System.Collections.Generic;
using System.Diagnostics;
namespace ConsoleApplication1 {
internal class Program {
static void Main(string[] args) {
var key = default(string);
var value = default(string);
var collection = new HashSet<KeyValuePair<string, string>>();
for (var i = 0; i < 5000000; i++) {
if (key == null || i % 2 == 0) {
key = "k" + i;
}
value = "v" + i;
collection.Add(new KeyValuePair<string, string>(key, value));
}
var found = 0;
var sw = new Stopwatch();
sw.Start();
for (var i = 0; i < 5000000; i++) {
if (collection.Contains(new KeyValuePair<string, string>("k" + i, "v" + i))) {
found++;
}
}
sw.Stop();
Console.WriteLine("Found " + found);
Console.WriteLine(sw.Elapsed);
Console.ReadLine();
}
}
}
Have you tried using a Hashset? Much quicker than lists when large numbers are involved although I don't know if it'd still be too slow.
This answer has a lot of information: HashSet vs. List performance

Using LINQ's Zip with a closure that doesn't return a value

Disclaimer: this question is driven by my personal curiosity more than an actual need to accomplish something. So my example is going to be contrived.
Nevertheless I think it's an issue that might very well crop up.
Let's say we are using Zip to iterate over two sequences, invoking a void method that just throws an exception if one item of the couple is found to be different from the other (therefore discarding any return value). The point here is not that the method throws an exception, so much as it returns void.
In other words, we're kind of doing a ForEach over two collections (and by the way, I know what Eric Lippert thinks about ForEach, and fully agree with him and never use it).
Now, Zip wants a Func<TFirst, TSecond, TResult>, so of course passing something equivalent to Action<TFirst, TSecond> won't work.
My question is: is there an idiomatic way that is better than this (i.e. returning a dummy value)?
var collection1 = new List<int>() { ... };
var collection2 = new List<int>() { ... };
collection1.Zip(collection2, (first, second) =>
{
VoidMethodThatThrows(first, second);
return true;
});
Use Zip() to throw the items into an object, then do your foreach however way you choose (do a normal foreach loop please, not the bad ToList/ForEach combo).
var items = collection1.Zip(collection2, (x, y) => new { First = x, Second = y });
foreach (var item in items)
{
VoidMethodThatThrows(item.First, item.Second);
}
As of C# 7.0, improved tuple support and deconstruction makes it far more pleasing to work with.
var items = collection1.Zip(collection2, (x, y) => (x, y));
// or collection1.Zip(collection2, ValueTuple.Create);
foreach (var (first, second) in items)
{
VoidMethodThatThrows(first, second);
}
Furthermore, .NET Core and 5 adds an overload which automatically pairs the values into tuples so you don't have to do that mapping.
var items = collection1.Zip(collection2); // IEnumerable<(Type1, Type2)>
.NET 6 adds a third collection to the mix.
var items = collection1.Zip(collection2, collection3); // IEnumerable<(Type1, Type2, Type3)>
I often need to execute an action on each pair in two collections. The Zip method is not useful in this case.
This extension method ForPair can be used:
public static void ForPair<TFirst, TSecond>(this IEnumerable<TFirst> first, IEnumerable<TSecond> second,
Action<TFirst, TSecond> action)
{
using (var enumFirst = first.GetEnumerator())
using (var enumSecond = second.GetEnumerator())
{
while (enumFirst.MoveNext() && enumSecond.MoveNext())
{
action(enumFirst.Current, enumSecond.Current);
}
}
}
So for your example, you could write:
var collection1 = new List<int>() { 1, 2 };
var collection2 = new List<int>() { 3, 4 };
collection1.ForPair(collection2, VoidMethodThatThrows);

Create sequence consisting of multiple property values

I have an existing collection of objects with two properties of interest. Both properties are of the same type. I want to create a new sequence consisting of the property values. Here's one way (I'm using tuples instead of my custom type for simplicity):
var list = new List<Tuple<string, string>>
{ Tuple.Create("dog", "cat"), Tuple.Create("fish", "frog") };
var result =
list.SelectMany(x => new[] {x.Item1, x.Item2});
foreach (string item in result)
{
Console.WriteLine(item);
}
Results in:
dog
cat
fish
frog
This gives me the results I want, but is there a better way to accomplish this (in particular, without the need to create arrays or collections)?
Edit:
This also works, at the cost of iterating over the collection twice:
var result = list.Select(x => x.Item1).Concat(list.Select(x => x.Item2));
If you want to avoid creating another collection, you could yield the results instead.
void Main()
{
var list = new List<Tuple<string, string>>
{ Tuple.Create("dog", "cat"), Tuple.Create("fish", "frog") };
foreach (var element in GetSingleList(list))
{
Console.WriteLine (element);
}
}
// A reusable extension method would be a better approach.
IEnumerable<T> GetSingleList<T>(IEnumerable<Tuple<T,T>> list) {
foreach (var element in list)
{
yield return element.Item1;
yield return element.Item2;
}
}
I think your approach is fine and I would stick with that. The use of the array nicely gets the job done when using SelectMany, and the final result is an IEnumerable<string>.
There are some alternate approaches, but I think they're more verbose than your approach.
Aggregate approach:
var result = list.Aggregate(new List<string>(), (seed, t) =>
{
seed.Add(t.Item1);
seed.Add(t.Item2);
return seed;
});
result.ForEach(Console.WriteLine);
ForEach approach:
var result = new List<string>();
list.ForEach(t => { result.Add(t.Item1); result.Add(t.Item2); });
result.ForEach(Console.WriteLine);
In both cases a new List<string> is created.

Categories

Resources