Determine if string appears more than once in string array (C#)

Determine if string appears more than once in string array (C#) - c#

I have an array of strings, f.e.
string [] letters = { "a", "a", "b", "c" };
I need to find a way to determine if any string in the array appears more than once.
I thought the best way is to make a new string-array without the string in question and to use Contains,
foreach (string letter in letters)
{
string [] otherLetters = //?
if (otherLetters.Contains(letter))
{
//etc.
}
}
but I cannot figure out how.
If anyone has a solution for this or a better approach, please answer.

The easiest way is to use GroupBy:
var lettersWithMultipleOccurences = letters.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => g.Key);
This will first group your array using the letters as keys. It then returns only those groups with multiple entries and returns the key of these groups. As a result, you will have an IEnumerable<string> containing all letters that occur more than once in the original array. In your sample, this is only "a".
Beware: Because LINQ is implemented using deferred execution, enumerating lettersWithMultipleOccurences multiple times, will perform the grouping and filtering multiple times. To avoid this, call ToList() on the result:
var lettersWithMultipleOccurences = letters.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => g.Key).
.ToList();
lettersWithMultipleOccurences will now be of type List<string>.

You can the LINQ extension methods:
if (letters.Distinct().Count() == letters.Count()) {
// no duplicates
}
Enumerable.Distinct removes duplicates. Thus, letters.Distinct() would return three elements in your example.

Create a HashSet from the array and compare their sizes:
var set = new HashSet(letters);
bool hasDoubleLetters = set.Size == letters.Length;

A HashSet will give you good performance:
HashSet<string> hs = new HashSet<string>();
foreach (string letter in letters)
{
if (hs.Contains(letter))
{
//etc. more as once
}
else
{
hs.Add(letter);
}
}

Related

Convert ordered comma separated list into tuples with ordered element number (a la SQL SPLIT_STRING) using C# 6.0/.Net Framework 4.8

I can't seem to find a ready answer to this, or even if the question has ever been asked before, but I want functionality similar to the SQL STRING_SPLIT functions floating around, where each item in a comma separated list is identified by its ordinal in the string.
Given the string "abc,xyz,def,tuv", I want to get a list of tuples like:
<1, "abc">
<2, "xyz">
<3, "def">
<4, "tuv">
Order is important, and I need to preserve the order, and be able to take the list and further join it with another list using linq, and be able to preserve the order. For example, if a second list is <"tuv", "abc">, I want the final output of the join to be:
<1, "abc">
<4, "tuv">
Basically, I want the comma separated string to determine the ORDER of the end result, where the comma separated string contains ALL possible strings, and it is joined with an unordered list of a subset of strings, and the output is a list of ordered tuples that consists only of the elements in the second list, but in the order determined by the comma separated string at the beginning.
I could likely figure out all of this on my own if I could just get a C# equivalent to all the various SQL STRING_SPLIT functions out there, which do the split but also include the ordinal element number in the output. But I've searched, and I find nothing for C# but splitting a string into individual elements, or splitting them into tuples where both elements of the tuple are in the string itself, not generated integers to preserve order.
The order is the important thing to me here. So if an element number isn't readily possible, a way to inner join two lists and guarantee preserving the order of the first list while returning only those elements in the second list would be welcome. The tricky part for me is this last part: the result of a join needs a specific (not easy to sort by) order. The ordinal number would give me something to sort by, but if I can inner join with some guarantee the output is in the same order as the first input, that'd work too.

That should work on .NET framework.
using System.Linq;
string str = "abc,xyz,def,tuv";
string str2 = "abc,tuv";
IEnumerable< PretendFileObject> secondList = str2.Split(',').Select(x=> new PretendFileObject() { FileName = x}); //
var tups = str.Split(',')
.Select((x, i) => { return (i + 1, x); })
.Join(secondList, //Join Second list ON
item => item.Item2 //This is the filename in the tuples
,item2 => item2.FileName, // This is the filename property for a given object in the second list to join on
(item,item2) => new {Index = item.Item1,FileName = item.Item2, Obj = item2})
.OrderBy(JoinedObject=> JoinedObject.Index)
.ToList();
foreach (var tup in tups)
{
Console.WriteLine(tup.Obj.FileName);
}
public class PretendFileObject
{
public string FileName { get; set; }
public string Foo { get; set; }
}
Original Response Below
If you wanted to stick to something SQL like here is how to do it with linq operators. The Select method has a built in index param you can make use of. And you can use IntersectBy to perform an easy inner join.
using System.Linq;
string str = "abc,xyz,def,tuv";
string str2 = "abc,tuv";
var secondList = str2.Split(',');
var tups = str.Split(',')
.Select((x, i) => { return (i + 1, x); })
.IntersectBy(secondList, s=>s.Item2) //Filter down to only the strings found in both.
.ToList();
foreach(var tup in tups)
{
Console.WriteLine(tup);
}

This will get you list of tuples
var input = "abc,xyz,def,tuv";
string[] items = input.Split(',');
var tuples = new List<(int, string)>();
for (int i = 0; i < items.Length)
{
tuples.Add(((i + 1), items[i]));
}
if then you want to add list of "tuv" and "abc" and keep 1, you probably want to "Left Join". But I am not sure, how you can do using LINQ because you first need to iterate the original list of tuples and assign same int. Then join. Or, you can join first and then assign int but technically, order is not guaranteed. However, if you assign int first, you can sort by it in the end.
I am slightly confused by "and be able to take the list and further join it with another list using linq". Join usually means aggregate result. But in your case it seem you demanding segment, not joined data.
--
"I want to remove any items from the second list that are not in the first list, and then I need to iterate over the second list IN THE ORDER of the first list"
var input2 = "xxx,xyz,yyy,tuv,";
string[] items2 = input2.Split(',');
IEnumerable<(int, string)> finalTupleOutput =
tuples.Join(items2, t => t.Item2, i2 => i2, (t, i2) => (t.Item1, i2)).OrderBy(tpl => tpl.Item1);
This will give you what you want - matching items from L2 in the order from L1

with LINQ
string inputString = "abc,xyz,def,tuv";
var output = inputString.Split(',')
.Select((item, index) => { return (index + 1, item); });
now you can use the output list as you want to use.

Not 100% sure what you're after, but here's an attempt:
string[] vals = new[] { "abc", "xyz", "dev", "tuv"};
string[] results = new string[vals.Length];
int index = 0;
for (int i = 0; i < vals.Length; i++)
{
results[i] = $"<{++index},\"{vals[i]}\">";
}
foreach (var item in results)
{
Console.WriteLine(item);
}
This produces:
<1,"abc">
<2,"xyz">
<3,"dev">
<4,"tuv">

Given the example
For example, if a second list is <"tuv", "abc">, I want the final
output of the join to be:
<1, "abc"> <4, "tuv">
I think this might be close?
List<string> temp = new List<string>() { "abc", "def", "xyz", "tuv" };
List<string> temp2 = new List<string>() { "dbc", "ace", "zyw", "tke", "abc", "xyz" };
var intersect = temp.Intersect(temp2).Select((list, idx) => (idx+1, list));
This produces an intersect result that has the elements from list 1 that are also in list 2, which in this case would be:
<1, "abc">
<2, "xyz">
If you want all the elements from both lists, switch the Intersect to Union.

Get Elements from String List in order of Occurrence in provided string

Hi I have List of strings as below.
List<string> MyList = new List<string> { "[FirstName]", "[LastName]", "[VoicePhoneNumber]", "[SMSPhoneNumber]" };
I need to get all the elements from the List if exist in string in order. For example my string is
string MessageContent = Hello [LastName] [FirstName]There, this message is for [SMSPhoneNumber]
Right now I am doing
var Exists = MyList.Where(MessageContent.Contains);
This new list have all the items from MyList which occured in MessageContent string but not in order.
How i can get occurrence in order in string?
Desired List as per example is = { "[LastName]","[FirstName]","[SMSPhoneNumber]" }

I would suggest using IndexOf to determine position (and thereby order) as well as existence to avoid searching MessageContent twice at the expense of sorting the answer:
var ans = MyList.Select(w => new { w, pos = MessageContent.IndexOf(w) })
.Where(wp => wp.pos >= 0)
.OrderBy(wp => wp.pos)
.Select(wp => wp.w)
.ToList();
However, if a field may appear more than once, or if you think avoiding the repeated scanning of MessageContent is faster than multiple IndexOf (once per MyList member) (probably not) and avoiding the sort, then you can invert the search (using Span to avoid generating lots of Strings):
var ans2 = Enumerable.Range(0, MessageContent.Length-MyList.Select(w => w.Length).Min())
.Select(p => MyList.FirstOrDefault(w => MessageContent.AsSpan().Slice(p).StartsWith(w)))
.Where(w => w != null)
.ToList();

I did it Using
var Exists = MyList.Where(MessageContent.Contains).OrderBy(s => MessageContent.IndexOf(s));

Get the matching index of a value in a list

So I've got the following code:
string matchingName = "Bob";
List<string> names = GetAllNames();
if (names.Contains(matchingName))
// Get the index/position in the list of names where Bob exists
Is it possible to do this with a couple of lines of code, rather than iterating through the list to get the index or position?

If you have multiple matching instances and want to get all the indices you can use this:
var result = Enumerable.Range(0, names.Count).Where(i => names[i] == matchingName);
If it is just one index you want, then this will work:
int result = names.IndexOf(matchingName);
If there is no matching instance in names, the former solution will yield an empty enumeration, while the latter will give -1.

var index = names.IndexOf(matchingName);
if (index != -1)
{
// do something with index
}

If you want to look for a single match, then IndexOf will suit your purposes.
If you want to look for multiple matches, consider:
var names = new List<string> {"Bob", "Sally", "Hello", "Bob"};
var bobIndexes = names.Select((value, index) => new {value, index})
.Where(z => z.value == "Bob")
.Select(z => z.index);
Console.WriteLine(string.Join(",", bobIndexes)); // this outputs 0,3
The use of (value, index) within Select gives you access to both the element and its index.

Sort a C# list by word

I want to sort a C# list by word. Assume I have a C# list (of objects) which contains following words:
[{id:1, name: "ABC"},
{id:2, name: "XXX"},
{id:3, name: "Mille"},
{id:4, name: "YYY"},
{id:5, name: "Mill",
{id:6, name: "Millen"},
{id:7, name: "OOO"},
{id:8, name: "GGGG"},
{id:9, name: null},
{id:10, name: "XXX"},
{id:11, name: "mil"}]
If user pass Mil as a search key, I want to return all the words starting with the search key & then all the words which does not match criteria & have them sort alphabetically.
Easiest way I can think of is to run a for loop over the result set, put all the words starting with search key into one list and put the renaming words into another list. Sort the second list and them combine both the list to return the result.
I wonder if there is a smarter or inbuilt way to get the desired result.

Sure! You will sort by the presence of a match, then by the name, like this:
var results = objects.OrderByDescending(o => o.Name.StartsWith(searchKey))
.ThenBy(o => o.Name);
Note that false comes before true in a sort, so you'll need to use OrderByDescending.
As AlexD points out, the name can be null. You'll have to decide how you want to treat this. The easiest way would be to use o.Name?.StartsWith(searchKey) ?? false, but you'll have to decide based on your needs. Also, not all Linq scenarios support null propagation (Linq To Entities comes to mind).

This should do it, but there's probably a faster way, maybe using GroupBy somehow.
var sorted = collection
.Where(x => x.Name.StartsWith(criteria))
.OrderBy(x => x.Name)
.Concat(collection
.Where(x => !x.Name.StartsWith(criteria))
.OrderBy(x => x.Name))

You can try GroupBy like this:
var sorted = collection
.GroupBy(item => item.Name.StartsWith(criteria))
.OrderByDescending(chunk => chunk.Key)
.SelectMany(chunk => chunk
.OrderBy(item => item.Name));
Separate items into two groups (meets and doesn't meet the criteria)
Order the groups as whole (1st that meets)
Order items within each group
Finally combine the items

There's nothing C#-specific to solve this, but it sounds like you're really looking for algorithm design guidance.
You should sort the list first. If this is a static list you should just keep it sorted all the time. If the list is large, you may consider using a different data structure (Binary Search Tree, Skip List, etc.) which is more optimized for this scenario.
Once it's sorted, finding matching elements becomes a simple binary search. Move the matching elements to the beginning of the result set, then return.

Add an indicator of a match into the select, and then sort on that:
void Main()
{
word[] Words = new word[11]
{new word {id=1, name= "ABC"},
new word {id=2, name= "XXX"},
new word {id=3, name= "Mille"},
new word {id=4, name= "YYY"},
new word {id=5, name= "Mill"},
new word {id=6, name= "Millen"},
new word {id=7, name= "OOO"},
new word {id=8, name= "GGGG"},
new word {id=9, name= null},
new word {id=10, name= "XXX"},
new word {id=11, name= "mil"}};
var target = "mil";
var comparison = StringComparison.InvariantCultureIgnoreCase;
var q = (from w in Words
where w.name != null
select new {
Match = w.name.StartsWith(target, comparison)?1:2,
name = w.name})
.OrderBy(w=>w.Match).ThenBy(w=>w.name);
q.Dump();
}
public struct word
{
public int id;
public string name;
}

It is probably not easier but you could create a class that implements IComparable Interface and have a property Mil that is used by CompareTo.
Then you could just call List.Sort(). And you can pass an IComparer to List.Sort.
It would probably be the most efficient and you can sort in place rather than producing a new List.
On average, this method is an O(n log n) operation, where n is Count;
in the worst case it is an O(n ^ 2) operation.
public int CompareTo(object obj)
{
if (obj == null) return 1;
Temperature otherTemperature = obj as Temperature;
if (otherTemperature != null)
{
if(string.IsNullOrEmpty(Mil)
return this.Name.CompareTo(otherTemperature.Name);
else if(this.Name.StartsWith(Mill) && otherTemperature.Name.StartsWith(Mill)
return this.Name.CompareTo(otherTemperature.Name);
else if(!this.Name.StartsWith(Mill) && !otherTemperature.Name.StartsWith(Mill)
return this.Name.CompareTo(otherTemperature.Name);
else if(this.Name.StartsWith(Mill))
return 1;
else
return 0;
}
else
throw new ArgumentException("Object is not a Temperature");
}
You will need to add how you want null Name to sort

First create a list of the words that match, sorted.
Then add to that list all of the words that weren't added to the first list, also sorted.
public IEnumerable<Word> GetSortedByMatches(string keyword, Word[] words)
{
var result = new List<Word>(words.Where(word => word.Name.StartsWith(keyword))
.OrderBy(word => word.Name));
result.AddRange(words.Except(result).OrderBy(word => word.Name));
return result;
}
Some of the comments suggest that it should be case-insensitive. That would be
public IEnumerable<Word> GetSortedByMatches(string keyword, Word[] words)
{
var result = new List<Word>(
words.Where(word => word.Name.StartsWith(keyword, true)) //<-- ignoreCase
.OrderBy(word => word.Name));
result.AddRange(words.Except(result).OrderBy(word => word.Name));
return result;
}

Parsing delimited data for specific instance of repeated line

I have an array of strings in the following format, where each string begins with a series of three characters indicating what type of data it contains. For example:
ABC|.....
DEF|...
RHG|1........
RHG|2........
RHG|3........
XDF|......
I want to find any repeating lines (RHG in this example) and mark the last line with a special character:
>RHG|3.........
What's the best way to do this? My current solution has a method to count the line headers and create a dictionary with the header counts.
protected Dictionary<string, int> CountHeaders(string[] lines)
{
Dictionary<string, int> headerCounts = new Dictionary<string, int>();
for (int i = 0; i < lines.Length; i++)
{
string s = lines[i].Substring(0, 3);
int value;
if (headerCounts.TryGetValue(s, out value))
headerCounts[s]++;
else
headerCounts.Add(s, 1);
}
return headerCounts;
}
In the main parsing method, I select the lines that are repeated.
var repeats = CountHeaders(lines).Where(x => x.Value > 1).Select(x => x.Key);
foreach (string s in repeats)
{
// Get last instance of line in lines and mark it
}
This is as far as I've gotten. I think I can do what I want with another LINQ query but I'm not too sure. Also, I can't help but feel that there's a more optimal solution.

You can use LINQ to achieve that.
Input string:
var input = #"ABC|.....
DEF|...
RHG|1........
RHG|2........
RHG|3........
XDF|......";
LINQ query:
var results = input.Split(new[] { Environment.NewLine })
.GroupBy(x => x.Substring(0, 3))
.Select(g => g.ToList())
.SelectMany(g => g.Count > 1 ? g.Take(g.Count - 1).Concat(new[] { string.Format(">{0}", g[g.Count - 1]) }) : g)
.ToArray();
I used Select(g => g.ToList()) projection to make g.Count O(1) operation in further query steps.
You can Join result array into one string using String.Join method:
var output = String.Join(Environment.NewLine, results);

Alternatively, you could find repeating lines with a backreferencing regex. I wrote this hacky regex using your sample data and it matches the lines starting with a preceding 'tag', pipe seperated values.
^(?<Tag>.+)[|].+[\n\r](\k<Tag>[|].+[\n\r])+
The match range starts at the beginning of the first RHG line and selects up to the last RHG line.

Here's an example that includes the parsing and the counting in one Linq statement - feel free to break it up if you want to:
string[] data = new string[]
{
"ABC|.....",
"DEF|...",
"RHG|1........",
"RHG|2........",
"RHG|3........",
"XDF|......"
};
data.Select(d=> d.Split('|')) // split the strings
.Select(d=> new { Key = d[0], Value = d[1] }) // select the key and value
.GroupBy (d => d.Key) // group by the key
.Where(g=>g.Count() > 1 ) // find duplicates
.Select(d => d.Skip(1)) // select the repeating elements
.SelectMany(g=>g) // flatten into a single list
;
This will give you a list of key/value pairs that are duplicates. so with the sample data it will return
Key Value
RHG 2........
RHG 3........
I'm not sure what you mean by "marking" the line, however...

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Determine if string appears more than once in string array (C#) - c#

You can the LINQ extension methods: if (letters.Distinct().Count() == letters.Count()) { // no duplicates } Enumerable.Distinct removes duplicates. Thus, letters.Distinct() would return three elements in your example.

Create a HashSet from the array and compare their sizes: var set = new HashSet(letters); bool hasDoubleLetters = set.Size == letters.Length;

A HashSet will give you good performance: HashSet<string> hs = new HashSet<string>(); foreach (string letter in letters) { if (hs.Contains(letter)) { //etc. more as once } else { hs.Add(letter); } }

Related

Convert ordered comma separated list into tuples with ordered element number (a la SQL SPLIT_STRING) using C# 6.0/.Net Framework 4.8

Get Elements from String List in order of Occurrence in provided string

Get the matching index of a value in a list

Sort a C# list by word

Parsing delimited data for specific instance of repeated line

Categories

Resources