C#: LINQ query with split and parsing - c#

I have an object with a String field containing a comma separated list of integers in it. I'm trying to use LINQ to retrieve the ones that have a specific number in the list.
Here's my approach
from p in source
where (p.Keywords.Split(',').something.Contains(val))
select p;
Where p.Keywords is the field to split.
I've seen the following in the net but just doesn't compile:
from p in source
where (p.Keywords.Split(',').Select(x=>x.Trim()).Contains(val))
select p;
I'm a LINQ newbie, but had success with simpler queries.
Update:
Looks like I was missing some details:
source is a List containing the object with the field Keywords with strings like 1,2,4,7
Error I get is about x not being defined.

Here's an example of selecting numbers that are greater than 3:
string str = "1,2,3,4,5,6,7,8";
var numbers = str.Split(',').Select(int.Parse).Where(num => num > 3); // 4,5,6,7,8
If you have a list then change the Where clause:
string str = "1,2,3,4,5,6,7,8";
List<int> relevantNums = new List<int>{5,6,7};
var numbers = str.Split(',').Select(int.Parse).Where(num => relevantNums.Contains(num)); // 5,6,7
If you are not looking for number but for strings then:
string str = "1,2,3,4,5,6,7,8";
List<string> relevantNumsStr = new List<string>{"5","6","7"};
var numbers = str.Split(',').Where(numStr => relevantNumsStr.Contains(numStr)); // 5,6,7

Here is an example of how you can achieve this. For simplicity I did to string on the number to check for, but you get the point.
// class to mimic what you structure
public class MyObj
{
public string MyStr{get;set;}
}
//method
void Method()
{
var myObj = new List <MyObj>
{
new MyObj{ MyStr="1,2,3,4,5"},
new MyObj{ MyStr="9,2,3,4,5"}
};
var num =9;
var searchResults = from obj in myObj
where !string.IsNullOrEmpty(obj.MyStr) &&
obj.MyStr.Split(new []{','})
.Contains(num.ToString())
select obj;
foreach(var item in searchResults)
Console.WriteLine(item.MyStr);
}

Thanks for all the answers, although not in the right language they led me to the answer:
from p in source where (p.Keywords.Split(',').Contains(val.ToString())) select p;
Where val is the number I'm looking for.

Related

C# List of splitted strings

I have the following problem.
I have these strings with whitespace between them.
"+name:string" "+age:int"
I split them with this code:
List<string> stringValueList = new List<string>();
stringValueList = System.Text.RegularExpressions.Regex.Split(stringValue, #"\s{2,}").ToList<string>();
now the elements of List looks like this
"+name:string"
"+age:int"
Now I want to split these strings and create Objects.
This looks like this:
// Storing the created objects in a List of objects
List<myObject> objectList = new List<myObject>();
for(i = 1; i < stringValueList.Count ; i+=2)
{
myObject object = new myObject();
object.modifier = '+';
object.name = stringValueList[i-1].Trim('+'); // out of the example the object.name should be "name"
object.type = stringValueList[i]; // out of the example the object.type value should "string"
objectList.Add(object);
}
At the end I should get two objects with these values:
List<myObject> objectList{ myObject object1{modifier = '+' , name ="name" , type="string"}, myObject object2{modifier='+', name="age" type="int"}}
But my result looks like this:
List<myObject> objectList {myObject object1 {modifier='+', name="name:string" type="+age:int"}}
So instead of getting 2 Objects, I am getting 1 Object. It puts both strings into the elements of the first object.
Can anyone help me out? I guess my problem is in the for loop because i-1 value is the first string in the List and i is the second string but I cant change this.
I guess my problem is in the for loop because i-1 value is the first string in the List and i is the second string but I cant change this.
I don't know why you do i += 2, because apparently you want to split each string in two again. So just have to change that.
Use foreach(), and inside your loop, split your string again:
foreach (var stringValue in stringValueList)
{
myObject object = new myObject();
var kvp = stringValue.Split(':');
object.modifier = '+';
object.name = kvp[0].Trim('+');
object.type = kvp[1];
objectList.Add(object);
}
Of course this code assumes your inputs are always valid; you'd have to add some boundary checks to make it more robust.
Alternatively, you could expand your Regex formula to do the whole thing in one go.
For example, with (?<=")[+](.*?):(.*?)(?="), all you'd have to do is assign the matched group values.
foreach (Match m in Regex.Matches(stringValue, "(?<=\")[+](.*?):(.*?)(?=\")"))
{
myObject obj = new myObject
{
modifier = '+',
name = m.Groups[1].Value,
type = m.Groups[2].Value
};
objectList.Add(obj);
}
It's interesting to see how others approach a problem. I would have done something like this:
public class MyObject
{
public char Modifier { get; set; }
public string Name { get; set; }
public string Type { get; set; }
public static IEnumerable<MyObject> Parse(string str)
{
return str
.Split(' ')
.Where(s => string.IsNullOrEmpty(s) == false)
.ToList()
.ForEach(i =>
{
var sections = i.Remove(0, 1).Split(':');
return new MyObject()
{
Modifier = i[0],
Name = sections[0],
Type = sections[1]
};
});
}
}

Compare two Lists using Linq for partial matches

I tried looking through some of the other questions, but couldn't find any that did a partial match.
I have two List<string>
They have codes in them. One is a list of selected codes, one is a list of required codes. The entire code list is a tree though, so they have sub codes. An example would be
Code B
Code B.1
Code B.11
So lets say the Required code is B, but anything under it's tree will meet that requirement, so if the Selected codes are A and C the match would fail, but if one of the selected codes was B.1 it contains the partial match.
I just need to know if any of the selected codes partially match any of the required codes. Here is my current attempt at this.
//Required is List<string> and Selected is a List<string>
int count = (from c in Selected where c.Contains(Required.Any()) select c).Count();
The error I get is on the Required.Any() and it's cannot convert from bool to string.
Sorry if this is confusing, let me know if adding any additional information would help.
I think you need something like this:
using System;
using System.Collections.Generic;
using System.Linq;
static class Program {
static void Main(string[] args) {
List<string> selected = new List<string> { "A", "B", "B.1", "B.11", "C" };
List<string> required = new List<string> { "B", "C" };
var matching = from s in selected where required.Any(r => s.StartsWith(r)) select s;
foreach (string m in matching) {
Console.WriteLine(m);
}
}
}
Applying the Any condition on required in this way should give you the elements that match - I'm not sure if you should use StartsWith or Contains, that depends on your requirements.
If selected and required lists are large enough the following is faster than the accepted answer:
static void Main(string[] args)
{
List<string> selected = new List<string> { "A", "B", "B.1", "B.11", "C" };
List<string> required = new List<string> { "B", "C" };
required.Sort();
var matching = selected.Where(s =>
{
int index = required.BinarySearch(s);
if (index >= 0) return true; //exact match
index = ~index;
if (index == 0) return false;
return s.StartsWith(required[index - 1]);
});
foreach (string m in matching)
{
Console.WriteLine(m);
}
}
Given n = required.Count and m = required.Count the accepted answer algorithm complexity is O(n*m). However what I propose has a better algorithm complexity: O((n+m)*Log(n))
This query finds any match that exists in two lists. If a value exists in both lists, it returns true, otherwise false.
List<string> listString1 = new List<string>();
List<string> listString2 = new List<string>();
listString1.Add("A");
listString1.Add("B");
listString1.Add("C");
listString1.Add("D");
listString1.Add("E");
listString2.Add("C");
listString2.Add("X");
listString2.Add("Y");
listString2.Add("Z");
bool isItemExist = listString1.Any(x => listString2.Contains(x));

How to use GroupBy using Dynamic LINQ

I am trying to do a GroupBy using Dynamic LINQ but have trouble getting it to work.
This is some sample code illustrating the problem:
List<dtoMyAlbum> listAlbums = new List<dtoMyAlbum>();
for (int i = 0; i < 5000; i++)
{
dtoMyAlbum album = new dtoMyAlbum
{
Author = "My Author",
BookID = i,
CurrSymbol = "USD",
Price = 23.23,
Shop = i % 3 == 0 ? "TESCO" : "HMV"
};
listAlbums.Add(album);
}
IQueryable<dtoMyAlbum> mydata = listAlbums.AsQueryable();
int count = mydata.Count();
//var mydataGrouped = mydata.GroupBy(a => a.Shop); // <-- this works well (but is not dynamic....)
var mydataGrouped = mydata.GroupBy("Shop"); // <-- does not compile but is kind of what I want...
foreach (var group in mydataGrouped)
{
//count = group.Count();
}
I realise that I am missing the 'elementSelector' in the GroupBy overload but all I want to do is to end up with (in this case) two sets of dtoMyAlbum objects so I wish to select ALL elements for all sets...
How would I go about this?
There is default it defined, you can use it to return matched elements:
var mydataGrouped = mydata.GroupBy("Shop", "it");
To iterate through results you should additionally Select elements to name it and use dynamics:
var mydataGrouped = mydata.GroupBy("Shop", "it").Select("new (it.Key as Shop, it as Albums)");
foreach (dynamic group in mydataGrouped)
{
foreach (dynamic album in group.Albums)
{
Console.WriteLine(album.Author);
}
}
You may construct the group by expression dynamically or give a try to this Dynamic LINQ library presented on ScottGu's page:
http://weblogs.asp.net/scottgu/archive/2008/01/07/dynamic-linq-part-1-using-the-linq-dynamic-query-library.aspx
This is the method I have copied from DynamicQuery:
public static IQueryable GroupBy(this IQueryable source, string keySelector, string elementSelector, params object[] values) {..}
In my example I provide the keySelector and the elementSelector.
listAlbums.AsQueryable().GroupBy("it.Shop", "it").Select("Author"); You can use new with GroupBy or with select for the new types.
It like this in OOP class.

In C#, What is the fastest way to search for elements in a list but do a "StartsWith()" search?

I have a list of strings:
var list = new List<string>();
list.Add("CAT");
list.Add("DOG");
var listofItems = new List<string>();
listofItems .Add("CATS ARE GOOD");
listofItems .Add("DOGS ARE NICE");
listofItems .Add("BIRD");
listofItems .Add("CATAPULT");
listofItems .Add("DOGGY");
and now i want a function like this:
listofItems.Where(r=> list.Contains(r));
but instead of Contains, i want it to do a starts with check so 4 out of the 5 items would be returned (BIRD would NOT).
What is the fastest way to do that?
You can use StartsWith inside of an Any
listofItems.Where(item=>list.Any(startsWithWord=>item.StartsWith(startsWithWord)))
You can visualize this as a double for loop, with the second for breaking out as soon as it hits a true case
var filteredList = new List<String>();
foreach(var item in listOfItems)
{
foreach(var startsWithWord in list)
{
if(item.StartsWith(startsWithWord))
{
filteredList.Add(item)
break;
}
}
}
return filteredList;
The fastest way would be usage of another data structure, for example Trie. Basic C# implementation can be found here: https://github.com/kpol/trie
This should get you what you need in a more simplified format:
var result = listofItems.Select(n =>
{
bool res = list.Any(v => n.StartsWith(v));
return res
? n
: string.Empty;
}).Where(b => !b.Equals(string.Empty));
The Trie data structure is what you need. Take a look at this more mature library: TrieNet
using Gma.DataStructures.StringSearch;
...
var trie = new SuffixTrie<int>(3);
trie.Add("hello", 1);
trie.Add("world", 2);
trie.Add("hell", 3);
var result = trie.Retrieve("hel");

C# dedupe List based on split

I'm having a hard time deduping a list based on a specific delimiter.
For example I have 4 strings like below:
apple|pear|fruit|basket
orange|mango|fruit|turtle
purple|red|black|green
hero|thor|ironman|hulk
In this example I should want my list to only have unique values in column 3, so it would result in an List that looks like this,
apple|pear|fruit|basket
purple|red|black|green
hero|thor|ironman|hulk
In the above example I would have gotten rid of line 2 because line 1 had the same result in column 3. Any help would be awesome, deduping is tough in C#.
how i'm testing this:
static void Main(string[] args)
{
BeginListSet = new List<string>();
startHashSet();
}
public static List<string> BeginListSet { get; set; }
public static void startHashSet()
{
string[] BeginFileLine = File.ReadAllLines(#"C:\testit.txt");
foreach (string begLine in BeginFileLine)
{
BeginListSet.Add(begLine);
}
}
public static IEnumerable<string> Dedupe(IEnumerable<string> list, char seperator, int keyIndex)
{
var hashset = new HashSet<string>();
foreach (string item in list)
{
var array = item.Split(seperator);
if (hashset.Add(array[keyIndex]))
yield return item;
}
}
Something like this should work for you
static IEnumerable<string> Dedupe(this IEnumerable<string> input, char seperator, int keyIndex)
{
var hashset = new HashSet<string>();
foreach (string item in input)
{
var array = item.Split(seperator);
if (hashset.Add(array[keyIndex]))
yield return item;
}
}
...
var list = new string[]
{
"apple|pear|fruit|basket",
"orange|mango|fruit|turtle",
"purple|red|black|green",
"hero|thor|ironman|hulk"
};
foreach (string item in list.Dedupe('|', 2))
Console.WriteLine(item);
Edit: In the linked question Distinct() with Lambda, Jon Skeet presents the idea in a much better fashion, in the form of a DistinctBy custom method. While similar, his is far more reusable than the idea presented here.
Using his method, you could write
var deduped = list.DistinctBy(item => item.Split('|')[2]);
And you could later reuse the same method to "dedupe" another list of objects of a different type by a key of possibly yet another type.
Try this:
var list = new string[]
{
"apple|pear|fruit|basket",
"orange|mango|fruit|turtle",
"purple|red|black|green",
"hero|thor|ironman|hulk "
};
var dedup = new List<string>();
var filtered = new List<string>();
foreach (var s in list)
{
var filter = s.Split('|')[2];
if (dedup.Contains(filter)) continue;
filtered.Add(s);
dedup.Add(filter);
}
// Console.WriteLine(filtered);
Can you use a HashSet instead? That will eliminate dupes automatically for you as they are added.
May be you can sort the words with delimited | on alphabetical order. Then store them onto grid (columns). Then when you try to insert, just check if there is column having a word which starting with this char.
If LINQ is an option, you can do something like this:
// assume strings is a collection of strings
List<string> list = strings.Select(a => a.Split('|')) // split each line by '|'
.GroupBy(a => a[2]) // group by third column
.Select(a => a.First()) // select first line from each group
.Select(a => string.Join("|", a))
.ToList(); // convert to list of strings
Edit (per Jeff Mercado's comment), this can be simplified further:
List<string> list =
strings.GroupBy(a => a.split('|')[2]) // group by third column
.Select(a => a.First()) // select first line from each group
.ToList(); // convert to list of strings

Categories

Resources