I'm writing a program that will simply read 2 different .csv files containing following information:
file 1 file2
AA,2.34 BA,6.45
AB,1.46 BB,5.45
AC,9.69 BC,6.21
AD,3.6 AC,7.56
Where first column is string, second is double.
So far I have no difficulty in reading those files and placing values to the List:
firstFile = new List<KeyValuePair<string, double>>();
secondFile = new List<KeyValuePair<string, double>>();
I'm trying to instruct my program:
to take first value from the first column from the first row of the first file (in this case AA)
and look if there might be a match in the entire first column in the second file.
If string match is found, compare their corresponding second values (double in this case), and if in this case match found, add the entire row to the separate List.
Something similar to the below pseudo-code:
for(var i=0;i<firstFile.Count;i++)
{
firstFile.Column[0].value[i].SearchMatchesInAnotherFile(secondFile.Column[0].values.All);
if(MatchFound)
{
CompareCorrespondingDoubles();
if(true)
{
AddFirstValueToList();
}
}
}
Instead of List I tried to use Dictionary but this data structure is not sorted and no way to access the key by the index.
I'm not asking for the exact code to provide, rather the question is:
What would you suggest to use as an appropriate data structure for this program so that I can investigate myself further?
KeyValuePair is actually only used for Dictionarys. I suggest to create your own custom type:
public class MyRow
{
public string StringValue {get;set;}
public double DoubleValue {get;set;}
public override bool Equals(object o)
{
MyRow r = o as MyRow;
if (ReferenceEquals(r, null)) return false;
return r.StringValue == this.StringValue && r.DoubleValue == this.DoubleValue;
}
public override int GetHashCode()
{
unchecked { return StringValue.GetHashCode ^ r.DoubleValue.GetHashCode(); }
}
}
And store the files in lists of this type:
List<MyRow> firstFile = ...
List<MyRow> secondFile = ...
Then you can determine the intersection (all elements that occure in both lists) via LINQ's Intersect method:
var result = firstFile.Intersect(secondFile).ToList();
It's necessary to override Equals and GetHashCode, because otherwise Intersect would only make a reference comparison. Alternativly you could implement an own IEqualityComparer<MyRow, MyRow> that does the comparison and pass it to the appropriate Intersect overload, too.
But if you can ensure that the keys (the string values are unique), you can also use a
Dictionary<string, double> firstFile = ...
Dictionary<string, double> secondFile = ...
And in this case use this LINQ statement:
var result = new Dictionary<string, double>(
firstFile.Select(x => new { First = x, Second = secondFile.FirstOrDefault(y => x.Key == y.Key) })
.Where(x => x.Second?.Value == x.First.Value));
which had a time complexity of O(m+n) while the upper solution would be O(m*n) (for m and n being the row counts of the two files).
I am trying to write something similar to the following with LINQ:
var media = from s in db.Media select s;
string[] criteria = {"zombies", "horror"};
mediaList.RemoveAll(media.Where(s => s.description.Inersect(criteria).Any()));
//mediaList is a List(T) containing instances of the Media model.
I thought linq where list contains any in list's solution would apply in this case but my compiler complains that "string does not contain a definition for Intersect".
The behaviour I am expecting is for Media items that contain the words zombies or horror but not both in their description to be taken out of the list i.e.
A horror movie.
A movie with a lot of zombies.
But items like the following should stay in the list:
A horror movie with zombies.
The best zombies and the best horror.
The Media class:
public class Media
{
public int mediaID { get; set; }
public string type { get; set; }
public string description { get; set; }
}
The description field contains very long paragraphs. I am afraid the solution is very obvious but for the life of me I cannot work it out.
EDIT: added a better explanation of the behaviour expected.
Your confusing some methods here.
List<T>.RemoveAll() takes a Predicate<T> as parameter and removes all elements from the list for which this prediate returns true. So what you want could be somehting like that:
mediaList.RemoveAll(m => criteria.Any(crit => m.description.Contains(crit));
But note that this will also remove "A movie about nonzombies".
UPDATE after your clarification:
mediaList.RemoveAll(m =>
{
int count = criteria.Count(crit => m.description.Contains(crit));
return count > 0 && count < criteria.Length;
});
This removes all entries that contain at least one word of criteria, but not all of them. (it still does not match "whole words only", though).
You should use
var reuslt = mediaList.RemoveAll(media => criteria.Any(c => s.description.Contains(c));
You can't intersect a string with a string[] array, but you could split the description string into words and then do the intersection:
mediaList.RemoveAll(entry => entry.description.Split(new string[]{" "},
StringSplitOptions.None).Intersect(criteria).Any());
This avoids the problem of matching words that are containing a substring of the criterion strings.
I have basically two string-lists and want to get the elements of the first list that contain every word of the second list.
List<Sentence> sentences = new List<Sentence> { many elements };
List<string> keyWords= new List<string>{"cat", "the", "house"};
class Sentence
{
public string shortname {get; set; }
}
Now, how do I perform a contain-check for every element of the keyWords-List for a sentence? Something like
var found = sentences.Where(x => x.shortname.ContainsAll(keyWords)));
Try this:
var found = sentences.Where(x=> keyWords.All(y => x.shortname.Contains(y)));
The All method is used to filter out those sentences which contain all keywords from the list of keywords.
Use All
sentences.Where(x => keywords.All(k => x.shortname.Contains(k)));
If you find this to be a common search, you could create your own extension method
public static bool ContainsAll<T>(this IEnumerable<T> src, IEnumerable<T> target)
{
return target.All(x => src.Contains(x));
}
This would allow you to write the code as you originall expressed it
sentences.Where(x => x.shortname.ContainsAll(keywords));
sentences.Where(s => keyWords.All(kw => s.shortname.Contains(kw)));
Use All, it returns true only if all elements in the sequence satisfy the condition
With the code below, on the foreach, I get an exception.
I place breakpoint on the csv (second line), I expand the result, I see 2 entries thats ok.
When I do the same on the csv in the foreach, I get an excpetion : can't read from closed text reader.
Any idea ?
Thanks,
My CSV file :
A0;A1;A2;A3;A4
B0;B1;B2;B3;B4
The code
var lines = File.ReadLines("filecsv").Select(a => a.Split(';'));
IEnumerable<IEnumerable<MyClass>> csv =
from line in lines
select (from piece in line
select new MyClass
{
Field0 = piece[0].ToString(),
Field1 = piece[1].ToString()
}
).AsEnumerable<MyClass>();
foreach (MyClass myClass in csv)
Console.WriteLine(myClass.Field0);
Console.ReadLine();
MyClass :
public class MyClass
{
public string Field0 { get; set; }
public string Field1 { get; set; }
}
Perhaps something like this instead, will give you exactly what you want:
var jobs = File.ReadLines("filecsv")
.Select(line => line.Split(','))
.Select(tokens => new MyClass { Field0 = tokens[0], Field1 = tokens[1] })
.ToList();
The problem you have is that you're saving the Enumerable, which has delayed execution. You're then looking at it through the debugger, which loops through the file, does all the work and disposes of it. Then you try and do it again.
The above code achieves what you currently want, is somewhat cleaner, and forces conversion to a list so the lazy behaviour is gone.
Note also that I can't see how your from piece in line could work correctly as it currently stands.
Perhabs it is because LINQ does not directly read all the items, it just creates the connection it read if it is needed.
You could try to cast:
var lines = File.ReadLines("filecsv").Select(a => a.Split(';')).ToArray();
I suspect it is a combination of the yield keyword (used in Select()) and the internal text reader (in ReadLines) not "agreeing".
Changes the lines variable to var lines = File.ReadLines("filecsv").Select(a => a.Split(';')).ToArray();
That should sort it.
Is there any easy LINQ expression to concatenate my entire List<string> collection items to a single string with a delimiter character?
What if the collection is of custom objects instead of string? Imagine I need to concatenate on object.Name.
string result = String.Join(delimiter, list);
is sufficient.
Warning - Serious Performance Issues
Though this answer does produce the desired result, it suffers from poor performance compared to other answers here. Be very careful about deciding to use it
By using LINQ, this should work;
string delimiter = ",";
List<string> items = new List<string>() { "foo", "boo", "john", "doe" };
Console.WriteLine(items.Aggregate((i, j) => i + delimiter + j));
class description:
public class Foo
{
public string Boo { get; set; }
}
Usage:
class Program
{
static void Main(string[] args)
{
string delimiter = ",";
List<Foo> items = new List<Foo>() { new Foo { Boo = "ABC" }, new Foo { Boo = "DEF" },
new Foo { Boo = "GHI" }, new Foo { Boo = "JKL" } };
Console.WriteLine(items.Aggregate((i, j) => new Foo{Boo = (i.Boo + delimiter + j.Boo)}).Boo);
Console.ReadKey();
}
}
And here is my best :)
items.Select(i => i.Boo).Aggregate((i, j) => i + delimiter + j)
Note: This answer does not use LINQ to generate the concatenated string. Using LINQ to turn enumerables into delimited strings can cause serious performance problems
Modern .NET (since .NET 4)
This is for an array, list or any type that implements IEnumerable:
string.Join(delimiter, enumerable);
And this is for an enumerable of custom objects:
string.Join(delimiter, enumerable.Select(i => i.Boo));
Old .NET (before .NET 4)
This is for a string array:
string.Join(delimiter, array);
This is for a List<string>:
string.Join(delimiter, list.ToArray());
And this is for a list of custom objects:
string.Join(delimiter, list.Select(i => i.Boo).ToArray());
using System.Linq;
public class Person
{
string FirstName { get; set; }
string LastName { get; set; }
}
List<Person> persons = new List<Person>();
string listOfPersons = string.Join(",", persons.Select(p => p.FirstName));
Good question. I've been using
List<string> myStrings = new List<string>{ "ours", "mine", "yours"};
string joinedString = string.Join(", ", myStrings.ToArray());
It's not LINQ, but it works.
You can simply use:
List<string> items = new List<string>() { "foo", "boo", "john", "doe" };
Console.WriteLine(string.Join(",", items));
Happy coding!
I think that if you define the logic in an extension method the code will be much more readable:
public static class EnumerableExtensions {
public static string Join<T>(this IEnumerable<T> self, string separator) {
return String.Join(separator, self.Select(e => e.ToString()).ToArray());
}
}
public class Person {
public string FirstName { get; set; }
public string LastName { get; set; }
public override string ToString() {
return string.Format("{0} {1}", FirstName, LastName);
}
}
// ...
List<Person> people = new List<Person>();
// ...
string fullNames = people.Join(", ");
string lastNames = people.Select(p => p.LastName).Join(", ");
List<string> strings = new List<string>() { "ABC", "DEF", "GHI" };
string s = strings.Aggregate((a, b) => a + ',' + b);
I have done this using LINQ:
var oCSP = (from P in db.Products select new { P.ProductName });
string joinedString = string.Join(",", oCSP.Select(p => p.ProductName));
Put String.Join into an extension method. Here is the version I use, which is less verbose than Jordaos version.
returns empty string "" when list is empty. Aggregate would throw exception instead.
probably better performance than Aggregate
is easier to read when combined with other LINQ methods than a pure String.Join()
Usage
var myStrings = new List<string>() { "a", "b", "c" };
var joinedStrings = myStrings.Join(","); // "a,b,c"
Extensionmethods class
public static class ExtensionMethods
{
public static string Join(this IEnumerable<string> texts, string separator)
{
return String.Join(separator, texts);
}
}
This answer aims to extend and improve some mentions of LINQ-based solutions. It is not an example of a "good" way to solve this per se. Just use string.Join as suggested when it fits your needs.
Context
This answer is prompted by the second part of the question (a generic approach) and some comments expressing a deep affinity for LINQ.
The currently accepted answer does not seem to work with empty or singleton sequences. It also suffers from a performance issue.
The currently most upvoted answer does not explicitly address the generic string conversion requirement, when ToString does not yield the desired result. (This can be remedied by adding a call to Select.)
Another answer includes a note that may lead some to believe that the performance issue is inherent to LINQ. ("Using LINQ to turn enumerables into delimited strings can cause serious performance problems.")
I noticed this comment about sending the query to the database.
Given that there is no answer matching all these requirements, I propose an implementation that is based on LINQ, running in linear time, works with enumerations of arbitrary length, and supports generic conversions to string for the elements.
So, LINQ or bust? Okay.
static string Serialize<T>(IEnumerable<T> enumerable, char delim, Func<T, string> toString)
{
return enumerable.Aggregate(
new StringBuilder(),
(sb, t) => sb.Append(toString(t)).Append(delim),
sb =>
{
if (sb.Length > 0)
{
sb.Length--;
}
return sb.ToString();
});
}
This implementation is more involved than many alternatives, predominantly because we need to manage the boundary conditions for the delimiter (separator) in our own code.
It should run in linear time, traversing the elements at most twice.
Once for generating all the strings to be appended in the first place, and zero to one time while generating the final result during the final ToString call. This is because the latter may be able to just return the buffer that happened to be large enough to contain all the appended strings from the get go, or it has to regenerate the full thing (unlikely), or something in between. See e.g. What is the Complexity of the StringBuilder.ToString() on SO for more information.
Final Words
Just use string.Join as suggested if it fits your needs, adding a Select when you need to massage the sequence first.
This answer's main intent is to illustrate that it is possible to keep the performance in check using LINQ. The result is (probably) too verbose to recommend, but it exists.
You can use Aggregate, to concatenate the strings into a single, character separated string but will throw an Invalid Operation Exception if the collection is empty.
You can use Aggregate function with a seed string.
var seed = string.Empty;
var seperator = ",";
var cars = new List<string>() { "Ford", "McLaren Senna", "Aston Martin Vanquish"};
var carAggregate = cars.Aggregate(seed,
(partialPhrase, word) => $"{partialPhrase}{seperator}{word}").TrimStart(',');
you can use string.Join doesn’t care if you pass it an empty collection.
var seperator = ",";
var cars = new List<string>() { "Ford", "McLaren Senna", "Aston Martin Vanquish"};
var carJoin = string.Join(seperator, cars);