Identifying and grouping similar items in a collection of strings

Identifying and grouping similar items in a collection of strings - c#

I have a collection of strings like the following:
List<string> codes = new List<string>
{
"44.01", "44.02", "44.03", "44.04", "44.05", "44.06", "44.07", "44.08", "46", "47.10"
};
Each string is made up of two components separated by a full stop - a prefix code and a subcode. Some of the strings don't have sub codes.
I want to be able combine the strings whose prefixes are the same and output them as follows with the other codes also:
44(01,02,03,04,05,06,07,08),46,47.10
I'm stuck at the first hurdle of this, which is how to identify and group together the codes whose prefix values are the same, so that I can combine them into a single string as you can see above.

You can do:
var query = codes.Select(c =>
new
{
SplitArray = c.Split('.'), //to avoid multiple split
Value = c
})
.Select(c => new
{
Prefix = c.SplitArray.First(), //you can avoid multiple split if you split first and use it later
PostFix = c.SplitArray.Last(),
Value = c.Value,
})
.GroupBy(r => r.Prefix)
.Select(grp => new
{
Key = grp.Key,
Items = grp.Count() > 1 ? String.Join(",", grp.Select(t => t.PostFix)) : "",
Value = grp.First().Value,
});
This is how it works:
Split each item in the list on the delimiter and populate an anonymous type with Prefix, Postfix and original value
Later group on Prefix
after that select the values and the post fix values using string.Join
For output:
foreach (var item in query)
{
if(String.IsNullOrWhiteSpace(item.Items))
Console.WriteLine(item.Value);
else
Console.WriteLine("{0}({1})", item.Key, item.Items);
}
Output would be:
44(01,02,03,04,05,06,07,08)
46
47.10

Try this:-
var result = codes.Select(x => new { SplitArr = x.Split('.'), OriginalValue = x })
.GroupBy(x => x.SplitArr[0])
.Select(x => new
{
Prefix= x.Key,
subCode = x.Count() > 1 ?
String.Join(",", x.Select(z => z.SplitArray[1])) : "",
OriginalValue = x.First().OriginalValue
});
You can print your desired output like this:-
foreach (var item in result)
{
Console.Write("{0}({1}),",item.Prefix,item.subCode);
}
Working Fiddle.

Outlined idea:
Use Dictionary<string, List<string>> for collecting your result
in a loop over your list, use string.split() .. the first element will be your Dictionary key ... create a new List<string> there if the key doesn't exist yet
if the result of split has a second element, append that to the List
use a second loop to format that Dictionary to your output string
Of course, linq is possible too, e.g.
List<string> codes = new List<string>() {
"44.01", "44.05", "47", "42.02", "44.03" };
var result = string.Join(",",
codes.OrderBy(x => x)
.Select(x => x.Split('.'))
.GroupBy(x => x[0])
.Select((x) =>
{
if (x.Count() == 0) return x.Key;
else if (x.Count() == 1) return string.Join(".", x.First());
else return x.Key + "(" + string.Join(",", x.Select(e => e[1]).ToArray()) + ")";
}).ToArray());
Gotta love linq ... haha ... I think this is a monster.

You can do it all in one clever LINQ:
var grouped = codes.Select(x => x.Split('.'))
.Select(x => new
{
Prefix = int.Parse(x[0]),
Subcode = x.Length > 1 ? int.Parse(x[1]) : (int?)null
})
.GroupBy(k => k.Prefix)
.Select(g => new
{
Prefix = g.Key,
Subcodes = g.Where(s => s.Subcode.HasValue).Select(s => s.Subcode)
})
.Select(x =>
x.Prefix +
(x.Subcodes.Count() == 1 ? string.Format(".{0}", x.Subcodes.First()) :
x.Subcodes.Count() > 1 ? string.Format("({0})", string.Join(",", x.Subcodes))
: string.Empty)
).ToArray();
First it splits by Code and Subcode
Group by you Code, and get all Subcodes as a collection
Select it in the appropriate format
Looking at the problem, I think you should stop just before the last Select and let the data presentation be done in another part/method of your application.

The old fashioned way:
List<string> codes = new List<string>() {"44.01", "44.05", "47", "42.02", "44.03" };
string output=""
for (int i=0;i<list.count;i++)
{
string [] items= (codes[i]+"..").split('.') ;
int pos1=output.IndexOf(","+items[0]+"(") ;
if (pos1<0) output+=","+items[0]+"("+items[1]+")" ; // first occurence of code : add it
else
{ // Code already inserted : find the insert point
int pos2=output.Substring(pos1).IndexOf(')') ;
output=output.Substring(0,pos2)+","+items[1]+output.Substring(pos2) ;
}
}
if (output.Length>0) output=output.Substring(1).replace("()","") ;

This will work, including the correct formats for no subcodes, a single subcode, multiple subcodes. It also doesn't assume the prefix or subcodes are numeric, so it leaves leading zeros as is. Your question didn't show what to do in the case you have a prefix without subcode AND the same prefix with subcode, so it may not work in that edge case (44,44.01). I have it so that it ignores the prefix without subcode in that edge case.
List<string> codes = new List<string>
{
"44.01", "44.02", "44.03", "44.04", "44.05", "44.06", "44.07", "44.08", "46", "47.10"
};
var result=codes.Select(x => (x+".").Split('.'))
.Select(x => new
{
Prefix = x[0],
Subcode = x[1]
})
.GroupBy(k => k.Prefix)
.Select(g => new
{
Prefix = g.Key,
Subcodes = g.Where(s => s.Subcode!="").Select(s => s.Subcode)
})
.Select(x =>
x.Prefix +
(x.Subcodes.Count() == 0 ? string.Empty :
string.Format(x.Subcodes.Count()>1?"({0})":".{0}",
string.Join(",", x.Subcodes)))
).ToArray();

General idea, but i'm sure replacing the Substring calls with Regex would be a lot better as well
List<string> newCodes = new List<string>()
foreach (string sub1 in codes.Select(item => item.Substring(0,2)).Distinct)
{
StringBuilder code = new StringBuilder();
code.Append("sub1(");
foreach (string sub2 in codes.Where(item => item.Substring(0,2) == sub1).Select(item => item.Substring(2))
code.Append(sub2 + ",");
code.Append(")");
newCodes.Add(code.ToString());
}

You could go a couple ways... I could see you making a Dictionary<string,List<string>> so that you could have "44" map to a list of {".01", ".02", ".03", etc.} This would require you processing the codes before adding them to this list (i.e. separating out the two parts of the code and handling the case where there is only one part).
Or you could put them into a a SortedSet and provide your own Comparator which knows that these are codes and how to sort them (at least that'd be more reliable than grouping them alphabetically). Iterating over this SortedSet would still require special logic, though, so perhaps the Dictionary to List option above is still preferable.
In either case you would still need to handle a special case "46" where there is no second element in the code. In the dictionary example, would you insert a String.Empty into the list? Not sure what you'd output if you got a list {"46", "46.1"} -- would you display as "46(null,1)" or... "46(0,1)"... or "46(,1)" or "46(1)"?

Related

Remove duplicate by matching string part of text

Check the code bellow. Here i am creating a method that simply should remove the duplicate from the list foo. If you see the list values they are product id and quantity derived by : so the first part of number before : is product and and second part of number after : is the product quantity. I am taking this list into RemoveDuplicateItems() method for processing. This method should remove all matching product id items from whole list but my current method just returns exactly same list which i am taking on input. How can i fix my method to remove those item from list which has matching first part number. (first part number means before :)
The final output on doo variable it should remove the first from from list which is 22:15 since it has matching with second one.
C#:
[HttpPost]
public JsonResult DoSomething()
{
var foo = new List<string>();
foo.Add("22:10");//this should removed by RemoveDuplicateItems() since it has `22` matching with second one
foo.Add("22:15");
foo.Add("25:30");
foo.Add("26:30");
var doo = RemoveDuplicateItems(foo);
return Json("done");
}
public List<string> RemoveDuplicateItems(List<string> AllItems)
{
var FinalList = new List<string>();
var onlyProductIds = new List<string>();
foreach (var item in AllItems)
{
Match result = Regex.Match(item, #"^.*?(?=:)");
onlyProductIds.Add(result.Value);
}
var unique_onlyProductIds = onlyProductIds.Distinct().ToList();
foreach (var item in AllItems)
{
Match result = Regex.Match(item, #"^.*?(?=:)");
var id = unique_onlyProductIds.Where(x => x.Contains(result.Value)).FirstOrDefault();
if (id != null)
{
FinalList.Add(item);
}
}
return FinalList;
}

Does this work for you?
List<string> doo =
foo
.Select(x => x.Split(':'))
.GroupBy(x => x[0], x => x[1])
.Select(x => $"{x.Key}:{x.Last()}")
.ToList();

There are multiple ways to achieve this, one is, as suggested by #Aluan Haddad is to use Linq. His comment uses the query syntax but would could use the method syntax too (I assumed you use C#8):
List<string> doo = foo.GroupBy(str => str[0..2])
.Select(entry => entry.Last())
.ToList();
Note that this works because the current implementation of GroupBy preserves ordering.

you can do it using Linq :
var doo = foo.Select(x =>
{
var split = x.Split(':');
return new { Key = split[0], Value = split[1] };
})
.GroupBy(x => x.Key)
.OrderBy(x => x.Key)
.Select(x =>
{
var max = x.LastOrDefault();
return $"{max.Key}:{max.Value}";
}
).ToList();

Order by multiple properties

I have list which I want to order by like this
First by "ab"
Then by alphabetical order inside the list by "ab"
Then by "cd"
Then by alphabetical order inside the list by "cd"
Then by "ef"
Then by alphabetical order inside the list by "ef"
and then the rest by alphabetical order
I have this linq query
var groups = profileModel.Groups.
OrderByDescending(i => i.FullName.ToLower().Contains("ab")).
ThenByDescending(i => i.FullName.ToLower().Contains("cd")).
ThenByDescending(i => i.FullName.ToLower().Contains("ef"));
How should I extend this one? Do I have to use group by?

It seems that you want this, then no need to use GroupBy:
var groupOrders = new List<string> { "ab", "cd", "ef" };
var resultList = profileModel.Groups
.Select(x => new { ModelGroup = x, First2Letter = x.FullName.Substring(Math.Min(2, x.FullName.Length)) })
.Select(x => new
{
x.ModelGroup,
x.First2Letter,
Index = groupOrders.FindIndex(s => s.Equals(x.First2Letter, StringComparison.OrdinalIgnoreCase))
})
.OrderByDescending(x => x.Index >= 0) // first all known groups
.ThenBy(x => x.Index)
.ThenBy(x => x.ModelGroup.FullName)
.Select(x => x.ModelGroup)
.ToList();

For custom ordering, you can assign a value to each compare condition and OrderByDescending will order by that. Something like this..
lstModel = profileModel.Groups.OrderByDescending(m => m.FullName.ToLower().Contains("ab") ? 3 :
m.FullName.ToLower().Contains("cd") ? 2 :
m.FullName.ToLower().Contains("ef") ? 1 : 0).ToList();

If I got the problem correctly, this will order items based on what they contain. should work in EF as well.
var orderItems = from item in profileModel.Groups
let name = item.FullName.ToLower()
orderby (name.Contains("ab") ? 1 : 0) + (name.Contains("cd") ? 0.1 : 0) + (name.Contains("ef") ? 0.01 : 0) descending
select item;
EDIT
After rereading the problem this might be the right solution
var orderItems = from item in profileModel.Groups
let name = item.FullName.ToLower()
let order = name.Contains("ab") ? 3 : name.Contains("cd") ? 2 : name.Contains("ef") ? 1 : 0
orderby order descending, item.FullName
select item;

If you might have more or different level1 values to test, you may want a generic version.
Using a convenient extension method FirstOrDefault that takes the default value to return
public static T FirstOrDefault<T>(this IEnumerable<T> src, Func<T, bool> test, T defval) => src.Where(aT => test(aT)).DefaultIfEmpty(defval).First();
You can create an array of first level values in order, and sort first on that, then alphabetically:
var level1 = new[] { "ab", "cd", "ef" };
var ans = groups.OrderBy(i => level1.Select((op, n) => (op, n))
.FirstOrDefault(opn => i.FullName.Contains(opn.op),
(op: String.Empty, n: level1.Length)).n)
.ThenBy(i => i.FullName);

Check if String Contains Match in Enumerable.Range Filter List

I want to check if a string contains a word or number from a list and remove it from the string.
I want to use Enumerable.Range() to create the filter list and use it to filter many different strings.
I'm trying to combine two previous answers:
https://stackoverflow.com/a/49733139/6806643
https://stackoverflow.com/a/49740832/6806643
The sentence I want to filter:
This is a A05B09 hello 02 100 test
Filter
A00B00-A100B100, 01-100, 000-100, hello
Should read:
This is a test
Old Way
For Loop - Works
http://rextester.com/BJL70824
New Way
Enumerable Range List - Does not work
http://rextester.com/ZSCM64375
C#
List<List<string>> filters = Enumerable.Range(0, 101)
.SelectMany(a => Enumerable.Range(0, 101).Select(b => "A{0:00}B{1:00}"))
.Select(i => Enumerable.Range(0, 10).Select(c => string.Empty).ToList())
.SelectMany(a => Enumerable.Range(0, 101).Select(b => "{0:000}"))
.SelectMany(a => Enumerable.Range(0, 101).Select(b => "{0:00}"))
.SelectMany(a => Enumerable.Range(0, 1).Select(b => "hello"))
.ToList();
List<string> matches = new List<string>();
// Sentence
string sentence = "This is a A05B09 hello 02 100 test";
string newSentence = string.Empty;
// Find Matches
for (int i = 0; i < filters.Count; i++)
{
// Add to Matches List
if (sentence.Contains(filters[i].ToString()))
{
matches.Add(filters[i]);
}
}
// Filter Sentence
newSentence = Regex.Replace(
sentence
, #"(?<!\S)(" + string.Join("|", matches) + #")(?!\S)"
, ""
, RegexOptions.IgnoreCase
);
// Display New Sentence
Console.WriteLine(newSentence);

I think creating a list of all possible combinations is a very bad approach. You are creating huge lists which will make your process use a lot of RAM and be very slow without any good reason. Why not just create a good Regex? For example, with this expression, you get your desired string:
\b(A\d\dB\d\d|A100B100|0?\d\d|100|hello)\b\s*
That is assuming you don't want to replace stuff like A101B101 or 123.
If you want to replace those as well, the regex is a bit simpler:
\b(A\d\d\d?B\d\d\d?|\d\d\d?|hello)\b\s*

Your this line seems not meet your requirements..SelectMany(a => Enumerable.Range(0, 101).Select(b => "A{0:00}B{1:00}"))
Can you try this Linq?
List<string> filters = Enumerable.Range(0, 101)
.SelectMany(a => Enumerable.Range(0, 101).Select(b => $"A{a:00}B{b:00}"))
.Union(Enumerable.Range(0, 101).Select(b => $"{b:000}"))
.Union(Enumerable.Range(0, 101).Select(b => $"{b:00}"))
.Union(new List<string> {"hello"})
.ToList();
This verion can give you expected result on rextester
List<string> filters = Enumerable.Range(0, 101)
.SelectMany(a => Enumerable.Range(0, 101).Select(b => string.Format("A{0:00}B{1:00}", a, b)))
.Union(Enumerable.Range(0, 101).Select(b => string.Format("{0:000}", b)))
.Union(Enumerable.Range(0, 101).Select(b => string.Format("{0:00}", b)))
.Union(new List<string> { "hello" })
.ToList();

Regex Split String at particular word

I would like to use the .net Regex.Split method to split this input string into an array. It must group the word.
Input: **AAA**-1111,**AAA**-666,**SMT**-QWQE,**SMT**-TTTR
Expected output:
**AAA** : 1111,666
**SMT** : QWQE,TTTR
What pattern do I need to use?

As the comment on the question notes, you cannot do this in a single step (regex or not).
So:
Split on commas.
Split on dash (but keep the pairs)
Group by the first part of each pair.
Something like:
var result = select outer in input.Split(",")
let p = outer.Split('-') // will be string[2]
select new { identifier = p[0], value = p[1] }
into pair
group by pair.identifier into g
select new {
identifier = g.Key
values = String.Join(",", g)
}

This should give you an IEnumerable with a key-string and a string listing (separated by comma) the values fore each:
var input = "AAA-1111,AAA-666,SMT-QWQE,SMT-TTTR";
var list = input.Split(',')
.Select(pair => pair.Split('-'))
.GroupBy(pair => pair.First())
.Select(grp =>
new{
key = grp.Key,
items = String.Join(",", grp.Select(x => x[1]))
});
You can then use it for example like this (if you just want to output the values):
string output = "";
foreach(var grp in list)
{
output += grp.key + ": " + grp.items + Environment.NewLine;
}

FWIW here's the same solution in fluent syntax which might be easier to understand:
string input = "AAA-1111,AAA-666,SMT-QWQE,SMT-TTTR";
Dictionary<string, string> output = input.Split(',') // first split by ','
.Select(el => el.Split('-')) // then split each inner element by '-'
.GroupBy(el => el.ElementAt(0), el => el.ElementAt(1)) // group by the part that comes before '-'
.ToDictionary(grp => grp.Key, grp => string.Join(",", grp)); // convert to a dictionary with comma separated values
-
output["AAA"] // 1111,666
output["SMT"] // QWQE,TTTR

Sorting a list of strings by placing words starting with a certain letter at the start

Assuming I have the following list:
IList<string> list = new List<string>();
list.Add("Mouse");
list.Add("Dinner");
list.Add("House");
list.Add("Out");
list.Add("Phone");
list.Add("Hat");
list.Add("Ounce");
Using LINQ how would I select the words containing "ou" and sort the selection such that the words beginning with "ou" are listed at the start and then the words containing but not starting with "ou" are subsequently listed. The list I'm trying to create would be:
Ounce
Out
House
Mouse
I came up with the following but it is not working:
list.Where(x => x.Contains("ou"))
.OrderBy(x => x.StartsWith("ou"))
.Select(x => x);

You're getting a case-sensitive comparison, and also you need OrderByDescending(). A quick and dirty way to achieve the case-insensitivity is ToLowerInvariant():
var result = list.Where(x => x.ToLowerInvariant().Contains("ou"))
.OrderByDescending(x => x.ToLowerInvariant().StartsWith("ou"))
.Select(x => x);
Live example: http://rextester.com/GUR97180
This previous answer shows the correct way to do a case insensitive comparison (ie, dont use my example above, its bad)

Your first mistake is not comparing strings in a case-insensitive way; "Out" and "Ounce" have capital Os and would not return "true" when you use Contains("ou"). The solution is to use ToLower() when checking letters.
list.Where(x => x.ToLower().Contains("ou"))
.OrderByDescending(x => x.ToLower.StartsWith("ou")) //true is greater than false.
.Select(x => x);

Three problems:
You need to assign the result to something, otherwise it is simply discarded.
You need to use OrderByDescending because true sorts after false if you use OrderBy.
You need to use a case-insensitive compare.
Try this:
var needle = "ou";
var stringComparison = StringComparison.OrdinalIgnoreCase;
var query =
from word in list
let index = word.IndexOf(needle, stringComparison)
where index != -1
orderby index
select word;

This will append an empty space to the beginning of words that start with "OU".
var result = list.Where(x => x.ToLowerInvariant().Contains("ou"))
.OrderBy(x => x.ToLowerInvariant()
.StartsWith("ou") ? " " + x : x.Trim());

list = list.Where(x => x.ToLower().Contains("ou"))
.OrderBy(x => !x.ToLower().StartsWith("ou")).ToList();
Or by using the methods of List (changing it from IList to List):
list.RemoveAll(x => !x.ToLower().Contains("ou"));
list.Sort((s1, s2) => -1 * 1.ToLower().StartsWith("ou")
.CompareTo(s2.ToLower().StartsWith("ou")));

I think this is what you're looking for:
list = list.Where(x => x.IndexOf("ou", StringComparison.OrdinalIgnoreCase) >= 0)
.OrderByDescending(x => x.StartsWith("ou", StringComparison.OrdinalIgnoreCase))
.ThenBy(x => x)
.ToList();
Note that instead of converting the strings ToLower (or upper), I use a StringComparison enum (currently OrdinalIgnoreCase). This ensures that it works consistently as expected in any culture. Choose the right case-insensitive comparison depending on your circumstance.
If you prefer the LINQ query syntax that's:
list = (from x in list
where x.IndexOf("ou", StringComparison.OrdinalIgnoreCase) >= 0
orderby x.StartsWith("ou", StringComparison.OrdinalIgnoreCase) descending, x
select x).ToList();

var bla = "ou";
var list = new List<string>{
"Mouse",
"Dinner",
"House",
"Out",
"Phone",
"Hat",
"Ounce"};
var groupa = list.GroupBy(x =>x.ToLower().Contains(bla));
groupa.First().ToList().OrderByDescending(x => x.ToLower().StartsWith(bla));

You can simply call the list.Sort method by passing in an instance of a custom comparer as follows:
public class MyCustomStringComparer: IComparer<string>
{
public int Compare(Entity x, Entity y)
{
int result = 0;
if (x.ToLower().StartsWith("ou") && y.ToLower().StartsWith("ou"))
result = x.Compare(y);
else if (x.ToLower().StartsWith("ou") && !y.ToLower().StartsWith("ou"))
result = -1;
else if (!x.ToLower().StartsWith("ou") && y.ToLower().StartsWith("ou"))
result = 1;
else
result = x.Compare(y);
return (result);
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Identifying and grouping similar items in a collection of strings - c#

Related

Remove duplicate by matching string part of text

Order by multiple properties

Check if String Contains Match in Enumerable.Range Filter List

Regex Split String at particular word

Sorting a list of strings by placing words starting with a certain letter at the start

Categories

Resources