Intersection operation on a list of comma separated string - c#

I have a list of comma separated string like below:
List<string> IdList=new List<string>();
and each element of list has comma separated string like
1,2,4,5,6,7,8,10,12,15,16
2,3,5,7,8,9,0,10,16,17
4,5,89,12,13,1,2,3,6,7,10,16
I want to apply AND operation on this list of string so I get output like below:
2,5,7,10,16
Is there any efficient way to implement Intersection operation?

You're actually looking for an intersection.
If you don't need the values in numeric order, you could just treat each string as just comma-separated values. Start with the first list, and just intersect each other one appropriately:
HashSet<string> set = new HashSet<string>(list[0].Split(','));
foreach (var item in list.Skip(1))
{
set.IntersectWith(item.Split(','));
}
string result = string.Join(",", set);
Complete sample code:
using System;
using System.Collections.Generic;
using System.Linq;
class Test
{
static void Main()
{
var list = new List<string>
{
"1,2,4,5,6,7,8,10,12,15,16",
"2,3,5,7,8,9,0,10,16,17",
"4,5,89,12,13,1,2,3,6,7,10,16"
};
HashSet<string> set = new HashSet<string>(list[0].Split(','));
foreach (var item in list.Skip(1))
{
set.IntersectWith(item.Split(','));
}
string result = string.Join(",", set);
Console.WriteLine(result);
}
}
Result (order not guaranteed):
2,5,7,10,16

I don't know about "less memory utilization", but my first shot at this would be something along these lines (untested, coded in browser, no Visual Studio handy yadda yadda):
Dictionary<int,int> occurences = new Dictionary<int,int>();
int numberOfLists = YourCollectionOfOuterLists.Count;
foreach (string list in YourCollectionOfOuterLists) {
foreach (string value in list.Split(',')) {
occurences[value] = ((occurences[value] as int) ?? 0) + 1;
}
}
List<int> output = new List<int>();
foreach (int key in occurences.Keys) {
if (occurences[key] == numberOfLists) {
output.Add(key);
}
}
return String.Join(output.Select(x => x.ToString()), ",");
It might very well be possible to write the code more tersely, but anything that accomplishes what you seem to be after will still have to perform roughly the same steps: decide which elements exist in all lists (which is slightly non-trivial as the number of lists is unknown), then make a new list out of those values.
If you have access to it, something like Parallel.ForEach() might help cut down on wallclock execution time at least of the second loop (and possibly the first, with proper locking/synchronization in place).
If you are after something other than this, please clarify your question to describe exactly what you want.

I'm not sure about performance but you can use the Aggregate extension method to 'fold intersections'.
var data = new List<string>
{
"1,2,4,5,6,7,8,10,12,15,16",
"2,3,5,7,8,9,0,10,16,17",
"4,5,89,12,13,1,2,3,6,7,10,16",
};
var fold = data.Aggregate(data[0].Split(',').AsEnumerable(), (d1, d2) => d1.Intersect(d2.Split(',')));

Related

More efficient method to search a list of strings for certain strings?

I have a list of strings. Neither the number of nor the order of these strings is guaranteed. The only thing that is certain is that this list WILL at least contain my 3 strings of interest and inside those strings we'll say "string1", "string2", and "string3" will be contained within them respectively (i.e. these strings can contain more information but those keywords will definitely be in there). I then want to use these results in a function.
My current implementation to solve this is as such:
foreach(var item in myList)
{
if (item.Contains("string1"))
{
myFunction1(item);
}
else if (item.Contains("string2"))
{
myFunction2(item);
}
else if (item.Contains("string3"))
{
myFunction3(item);
}
}
Is there a better way to check string lists and apply functions to those items that match some criteria?
One approach is to use Regex for the fixed list of strings, and check which group is present, like this:
// Note the matching groups around each string
var regex = new Regex("(string1)|(string2)|(string3)");
foreach(var item in myList) {
var match = regex.Match(item);
if (!match.Success) {
continue;
}
if (match.Groups[1].Success) {
myFunction1(item);
}
else if (match.Groups[2].Success)
{
myFunction2(item);
}
else if (match.Groups[3].Success)
{
myFunction3(item);
}
}
This way all three matches would be done with a single pass through the target string.
You could reduce some of the duplicated code in the if statements by creating a Dictionary that maps the strings to their respective functions. (This snippet assumes that myList contains string values, but can easily be adapted to a list of any type.)
Dictionary<string, Action<string>> actions = new Dictionary<string, Action<string>>
{
["string1"] = myFunction1,
["string2"] = myFunction2,
["string3"] = myFunction3
};
foreach (var item in myList)
{
foreach (var action in actions)
{
if (item.Contains(action.Key))
{
action.Value(item);
break;
}
}
}
For a list of only three items, this might not be much of an improvement, but if you have a large list of strings/functions to search for it can make your code much shorter. It also means that adding a new string/function pair is a one-line change. The biggest downside is that the foreach loop is a bit more difficult to read.

Reduce code bloat when trimming an array of strings

I have a string of comma separated values, to get a string array from this I use
string[] values = value.Split(',');
I want to trim all these values by creating a new list of string and calling a foreach on the array like this
List<string> trimmedValues = new List<string>();
foreach (string str in values)
trimmedValues.Add(str.Trim());
Is there a more efficent way to do this with less code say by calling a method on the array itself?
Use linq
List<string> trimmedValues = values.Select(v => v.trim()).toList()
Try this :
var myTrimResult = "a ,b , c ".Split(',').Select(x => x.Trim());
The "myTrimResult" variable will contain trimmed elements.
To effectively reduce code bloat, I suggest to use an extension.
Declare a method like the following, in a common context in your project (or even better, in an external helper DLL to carry among different projects):
public static List<string> TrimList(this string[] array)
{
var list = new List<string>();
foreach (var s in array)
{
list.Add(s.Trim());
}
return list;
}
Now, in your code, you can simply use:
var trimmedValues = values.TrimList();
I find it more readable than using LINQ expression in code

How do I return a list of Distinct Words using LINQ in C#?

The goal is to sort through a text (i.e. a speech) and output a list of the distinct words in the speech to a textbox. I have read through a lot of tips on the boards and played around a lot but at this point am at that point where I am more confused then when I started. Here is my code
private void GenerateList(string[] wordlist)
{
List<string> wordList = new List<string>();
for (int i = 0; i < wordlist.Length; i++)
{
wordList.Add(wordlist[i]);
}
var uniqueStr = from item in wordList.Distinct().ToList()
orderby item
select item;
for (int i = 0; i < uniqueStr.Count(); i++ )
{
txtOutput.Text = uniqueStr.ElementAt(i) + "\n";
}
}
At this point I am getting a return of one word. For the text I am using (the gettysburg address) it is the word "year" and it is the only instance of that word in the text.
I am passing the function each individual word loaded into a string array that is then put into a list (which may be redundant?).
I hope this does what you need in a simple and efficient manner (using .Dump() from LINQPad)
void Main()
{
// can be any IEnumerable<string> including string[]
var words = new List<string>{"one", "two", "four", "three", "four", "a", "z"};
words.ToDistinctList().Dump();
// you would use txtOutput.Text = words.ToDistinctList()
}
static class StringHelpers
{
public static string ToDistinctList(this IEnumerable<string> words)
{
return string.Join("\n", new SortedSet<string>(words));
}
}
A few tips regarding your question:
There is no reason to turn the array into list, because LINQ extension methods are defined on IEnumerable<T>, which is implemented by both the array and the list
Make sure that all letters are in the same case - use ToLower, for instance
You are overwriting txtOutput.Text in every iteration. Instead of setting the new value, append new part to the existing value
Here is the simple piece of code which produces the output you wanted:
IEnumerable<string> distinct =
wordList
.Select(word => word.ToLower())
.Distinct()
.OrderBy(word => word);
txtOutput.Text = string.Join("\n", distinct.ToArray());
On a related note, here is a very simple LINQ expression which returns distinct words from a text, where the whole text is specified as one string:
public static IEnumerable<string> SplitIntoWords(this string text)
{
string pattern = #"\b[\p{L}]+\b";
return
Regex.Matches(text, pattern)
.Cast<Match>() // Extract matches
.Select(match => match.Value.ToLower()) // Change to same case
.Distinct(); // Remove duplicates
}
You can find more variations of regex pattern for the same problem here: Regex and LINQ Query to Split Text into Distinct Words
Here's how I'd simplify your code, as well as achieve what you want to achieve.
private void GenerateList(string[] wordlist)
{
List<string> wordList = wordlist.ToList(); // initialize the list passing in the array
var uniqueStr = from item in wordList.Distinct().ToList()
orderby item
select item;
txtOutput.Text = String.Join("\n", uniqueStr.ToArray());
}
You can use the fact that the StringBuilder class has a fluent interface along with LINQ to simplify this greatly.
First, you can create the StringBuilder and concatenate all of the words into the same instance like so:
// The builder.
var builder = new StringBuilder();
// A copy of the builder *reference*.
var builderCopy = builder;
// Get the distinct list, order by the string.
builder = wordList
// Get the distinct elements.
.Distinct()
// Order the words.
.OrderBy(w => w).
// Append the builder.
Select(w => builderCopy.AppendLine(word)).
// Get the last or default element, this will
// cycle through all of the elements.
LastOrDefault();
// If the builder is not null, then assign to the output, otherwise,
// assign null.
txtOutput.Text = builder == null ? null : builder.ToString();
Note, you don't have to actually materialize the list, as wordList is already a materialized list, it's an array (and as a side note, typed arrays in C# implement the IList<T> interface).
The AppendLine method (and most of the methods on StringBuilder) return the instance of the StringBuilder that the operation was performed on, which is why the LastOrDefault method call works; simply call the operation and return the result (each item returned will be the same reference).
The builderCopy variable is used to avoid access to a modified closure (it never hurts to be safe).
The null check at the end is for the case where wordList doesn't contain any elements. In this case, the call to LastOrDefault will return null.

Iteration bound variable?

This is non-language-specific, but I'll use examples in C#. Often I face the problem in which I need to add a parameter to an object inside any given iteration of at least one of its parameters, and I have always to come up with a lame temporary list or array of some kind concomitant with the problem of keeping it properly correlated.
So, please bear with me on the examples below:
Is there an easier and better way to do this in C sharp?
List<String> storeStr;
void AssignStringListWithNewUniqueStr (List<String> aList) {
foreach (String str in aList) {
storeStr.add(str);
str = AProcedureToGenerateNewUniqueStr();
}
}
void PrintStringListWithNewUniqueStr (List<String> aList) {
int i = 0;
foreach (String str in aList) {
print(str + storeStr[i]);
i++;
}
}
Notice the correlation above is guaranteed only because I'm iterating through an unchanged aList. When asking about a "easier and better way" I mean it should also make sure the storeStr would always be correlated with its equivalent on aList while keeping it as short and simple as possible. The List could also have been any kind of array or object.
Is there any language in which something like this is possible? It must give same results than above.
IterationBound<String> storeStr;
void AssignStringListWithNewUniqueStr (List<String> aList) {
foreach (String str in aList) {
storeStr = str;
str = AProcedureToGenerateNewUniqueStr();
}
}
void PrintStringListWithNewUniqueStr (List<String> aList) {
foreach (String str in aList) {
print(str + storeStr);
}
}
In this case, the fictitious "IterationBound" kind would guarantee the correlation between the list and the new parameter (in a way, just like Garbage Collectors guarantee allocs). It would somehow notice it was created inside an iteration and associate itself with that specific index (no matter if the syntax there would be uglier, of course). Then, when its called back again in another iteration and it was already created or stored in that specific index, it would retrieve this specific value of that iteration.
Why not simply project your enumerable into a new form?
var combination = aList
.Select(x => new { Initial = x, Addition = AProcedureToGenerateNewUniqueStr() })
.ToList()
.ForEach(x =>
{
print(x.Initial + x.Addition);
});
This way you keep each element associated with the new data.
aList.ForEach(x => print(x + AProcedureToGeneratorNewUniqueString()));

How to add items to a collection while consuming it?

The example below throws an InvalidOperationException, "Collection was modified; enumeration operation may not execute." when executing the code.
var urls = new List<string>();
urls.Add("http://www.google.com");
foreach (string url in urls)
{
// Get all links from the url
List<string> newUrls = GetLinks(url);
urls.AddRange(newUrls); // <-- This is really the problematic row, adding values to the collection I'm looping
}
How can I rewrite this in a better way? I'm guessing a recursive solution?
You can't, basically. What you really want here is a queue:
var urls = new Queue<string>();
urls.Enqueue("http://www.google.com");
while(urls.Count != 0)
{
String url = url.Dequeue();
// Get all links from the url
List<string> newUrls = GetLinks(url);
foreach (string newUrl in newUrls)
{
queue.Enqueue(newUrl);
}
}
It's slightly ugly due to there not being an AddRange method in Queue<T> but I think it's basically what you want.
There are three strategies you can use.
Copy the List<> to a second collection (list or array - perhaps use ToArray()). Loop through that second collection, adding urls to the first.
Create a second List<>, and loop through your urls List<> adding new values to the second list. Copy those to the original list when done looping.
Use a for loop instead of a foreach loop. Grab your count up front. List should leave things indexed correctly, so it you add things they will go to the end of the list.
I prefer #3 as it doesn't have any of the overhead associated with #1 or #2. Here is an example:
var urls = new List<string>();
urls.Add("http://www.google.com");
int count = urls.Count;
for (int index = 0; index < count; index++)
{
// Get all links from the url
List<string> newUrls = GetLinks(urls[index]);
urls.AddRange(newUrls);
}
Edit: The last example (#3) assumes that you don't want to process additional URLs as they are found in the loop. If you do want to process additional URLs as they are found, just use urls.Count in the for loop instead of the local count variable as mentioned by configurator in the comments for this answer.
Use foreach with a lambda, it's more fun!
var urls = new List<string>();
var destUrls = new List<string>();
urls.Add("http://www.google.com");
urls.ForEach(i => destUrls.Add(GetLinks(i)));
urls.AddRange(destUrls);
alternately, you could treat the collection as a queue
IList<string> urls = new List<string>();
urls.Add("http://www.google.com");
while (urls.Count > 0)
{
string url = urls[0];
urls.RemoveAt(0);
// Get all links from the url
List<string> newUrls = GetLinks(url);
urls.AddRange(newUrls);
}
I would create two lists add into the second and then update the reference like this:
var urls = new List<string>();
var destUrls = new List<string>(urls);
urls.Add("http://www.google.com");
foreach (string url in urls)
{
// Get all links from the url
List<string> newUrls = GetLinks(url);
destUrls.AddRange(newUrls);
}
urls = destUrls;
Consider using a Queue with while loop (while q.Count > 0, url = q.Dequeue()) instead of iteration.
I assume you want to iterate over the whole list, and each item you add to it? If so I would suggest recursion:
var urls = new List<string>();
var turls = new List<string();
turls.Add("http://www.google.com")
iterate(turls);
function iterate(List<string> u)
{
foreach(string url in u)
{
List<string> newUrls = GetLinks(url);
urls.AddRange(newUrls);
iterate(newUrls);
}
}
You can probably also create a recursive function, like this (untested):
IEnumerable<string> GetUrl(string url)
{
foreach(string u in GetUrl(url))
yield return u;
foreach(string ret_url in WHERE_I_GET_MY_URLS)
yield return ret_url;
}
List<string> MyEnumerateFunction()
{
return new List<string>(GetUrl("http://www.google.com"));
}
In this case, you will not have to create two lists, since GetUrl does all the work.
But I may have missed the point of you program.
Don't change the collection you're looping through via for each. Just use a while loop on the Count property of the list and access the List items by index. This way, even if you add items, the iteration should pick up the changes.
Edit: Then again, it sort of depends on whether you WANT the new items you added to be picked up by the loop. If not, then this won't help.
Edit 2: I guess the easiest way to do it would be to just change your loop to:
foreach (string url in urls.ToArray())
This will create an Array copy of your list, and it will loop through this instead of the original list. This will have the effect of not looping over your added items.
Jon's approach is right; a queue's the right data structure for this kind of application.
Assuming that you'd eventually like your program to terminate, I'd suggest two other things:
don't use string for your URLs, use System.Web.Uri: it provides a canonical string representation of the URL. This will be useful for the second suggestion, which is...
put the canonical string representation of each URL you process in a Dictionary. Before you enqueue a URL, check to see if it's in the Dictionary first.
It's hard to make the code better without knowing what GetLinks() does. In any event, this avoids recursion. The standard idiom is you don't alter a collection when you're enumerating over it. While the runtime could have let you do it, the reasoning is that it's a source of error, so better to create a new collection or control the iteration yourself.
create a queue with all urls.
when dequeueing, we're pretty much saying we've processed it, so add it to result.
If GetLinks() returns anything, add those to the queue and process them as well.
.
public List<string> ExpandLinksOrSomething(List<string> urls)
{
List<string> result = new List<string>();
Queue<string> queue = new Queue<string>(urls);
while (queue.Any())
{
string url = queue.Dequeue();
result.Add(url);
foreach( string newResult in GetLinks(url) )
{
queue.Enqueue(newResult);
}
}
return result;
}
The naive implementation assumes that GetLinks() will not return circular references. e.g. A returns B, and B returns A. This can be fixed by:
List<string> newItems = GetLinks(url).Except(result).ToList();
foreach( string newResult in newItems )
{
queue.Enqueue(newResult);
}
* As others point out using a dictionary may be more efficient depending on how many items you process.
I find it strange that GetLinks() would return a value, and then later resolve that to more Url's. Maybe all you want to do is 1-level expansion. If so, we can get rid of the Queue altogether.
public static List<string> StraightProcess(List<string> urls)
{
List<string> result = new List<string>();
foreach (string url in urls)
{
result.Add(url);
result.AddRange(GetLinks(url));
}
return result;
}
I decided to rewrite it because while other answers used queues, it wasn't apparent that they didn't run forever.

Categories

Resources