Remove duplicate by matching string part of text

Remove duplicate by matching string part of text - c#

Check the code bellow. Here i am creating a method that simply should remove the duplicate from the list foo. If you see the list values they are product id and quantity derived by : so the first part of number before : is product and and second part of number after : is the product quantity. I am taking this list into RemoveDuplicateItems() method for processing. This method should remove all matching product id items from whole list but my current method just returns exactly same list which i am taking on input. How can i fix my method to remove those item from list which has matching first part number. (first part number means before :)
The final output on doo variable it should remove the first from from list which is 22:15 since it has matching with second one.
C#:
[HttpPost]
public JsonResult DoSomething()
{
var foo = new List<string>();
foo.Add("22:10");//this should removed by RemoveDuplicateItems() since it has `22` matching with second one
foo.Add("22:15");
foo.Add("25:30");
foo.Add("26:30");
var doo = RemoveDuplicateItems(foo);
return Json("done");
}
public List<string> RemoveDuplicateItems(List<string> AllItems)
{
var FinalList = new List<string>();
var onlyProductIds = new List<string>();
foreach (var item in AllItems)
{
Match result = Regex.Match(item, #"^.*?(?=:)");
onlyProductIds.Add(result.Value);
}
var unique_onlyProductIds = onlyProductIds.Distinct().ToList();
foreach (var item in AllItems)
{
Match result = Regex.Match(item, #"^.*?(?=:)");
var id = unique_onlyProductIds.Where(x => x.Contains(result.Value)).FirstOrDefault();
if (id != null)
{
FinalList.Add(item);
}
}
return FinalList;
}

Does this work for you?
List<string> doo =
foo
.Select(x => x.Split(':'))
.GroupBy(x => x[0], x => x[1])
.Select(x => $"{x.Key}:{x.Last()}")
.ToList();

There are multiple ways to achieve this, one is, as suggested by #Aluan Haddad is to use Linq. His comment uses the query syntax but would could use the method syntax too (I assumed you use C#8):
List<string> doo = foo.GroupBy(str => str[0..2])
.Select(entry => entry.Last())
.ToList();
Note that this works because the current implementation of GroupBy preserves ordering.

you can do it using Linq :
var doo = foo.Select(x =>
{
var split = x.Split(':');
return new { Key = split[0], Value = split[1] };
})
.GroupBy(x => x.Key)
.OrderBy(x => x.Key)
.Select(x =>
{
var max = x.LastOrDefault();
return $"{max.Key}:{max.Value}";
}
).ToList();

Related

OrderBy search results that startswith searcch word then by results that contains search word

I have an object like this:
public class MyObject
{
public List<string> Names { get; set; }
// other props
}
and I have a filtered List like this :
var freeText = "comm";
var list = new List<MyObject>(); // Initialized
var searchResult = list.Where(o =>
o.Names.Any(n => n.ToLower().StartsWith(freeText.ToLower())) ||
o.Names.Any(n => n.ToLower().Contains(freeText.ToLower())));
and it's working fine but, what I'm trying to do is to get the search results ordered by
starts with first then by contains.
ex:
Obj 1 {Names : [ "free communication" ,"some name"]},
Obj 2 { Names : ["communcation Center", "whatever"]}
I want the result to be [Obj2, Obj1].
I tried to orderBy index of freeText but it doesn't seem to work on an array/list of string.
I tried to ulter the solution at This question so it works on array instead of string but it didn't work.
any idea how to to this?

You can implement a simple scoring mechanism where you'll capture two flags (startsWith and contains) and use those flags both for filtering and for sorting:
var result = list.Select(item => new
{
item,
startsWith = item.Names.Any(n => n.ToLower().StartsWith(freeText.ToLower())),
contains = item.Names.Any(n => n.ToLower().Contains(freeText.ToLower())),
})
.Where(item => item.startsWith || item.contains)
.OrderByDescending(item => item.startsWith)
.ThenByDescending(item => item.contains)
.Select(x => x.item);

Linq on List<object>

Existing legacy code is as follows:
List<object> myItems;
//myItems gets populated by a method call
foreach (object[] item in myItems)
{
string Id = item[0].ToString();
string Number = item[1].ToString();
//now do some processing if Number satisfies some criteria
}
would like to convert this using linq to select all Ids that match a certain Number.
All suggestions would be appreciated.
Thanks.

Use Select() and Where()
bool IsSatisfyingNumber(String number) {
// True if number satisfies some criteria
}
List<String> matchingIds = myItems
.Where(item => IsSatisfyingNumber(item[1].ToString()))
.Select(item => item[0].ToString())
.ToList();

The list myItems contains items of type object where each this item is actually object[] so we need to cast to object[] first and then filter and select based on the searched certain number.
string certainNumber = "1";
var myIds = myItems
.Where(o => ((object[]) o)[1].ToString() == certainNumber)
.Select(o => ((object[]) o)[0].ToString());
The equality operator on strings performs an ordinal (case-sensitive and culture-insensitive) comparison so change it in the Where... if you need some different kind of comparison in your case.

Got it working and wanted to share the information:
var myIds =
(from item in myItems.Cast<object[]>()
select new
{ Id = item[0], Number = (string)item[1] }
)
.Where(x => x.Number == filtercondition)
.Select(x => (string)x.Id)
.ToList();

LINQ: Enumerate through duplicates in List and remove them

I need to remove duplicates, but also log which I am removing. I have two solutions right now, one that can go through each duplicate and one that removes duplicates. I know that removing in-place inside a foreach is dangerous so I am a bit stuck on how to do this as efficient as possible.
What I got right now
var duplicates = ListOfThings
.GroupBy(x => x.ID)
.Where(g => g.Skip(1).Any())
.SelectMany(g => g);
foreach (var duplicate in duplicates)
{
Log.Append(Logger.Type.Error, "Conflicts with another", "N/A", duplicate.ID);
}
ListOfThings = ListOfThings.GroupBy(x => x.ID).Select(y => y.First()).ToList();

Well, ToList() will materialize the query, so if you allow side effects (i.e. writing to log) it could be like that:
var cleared = ListOfThings
.GroupBy(x => x.ID)
.Select(chunk => {
// Side effect: writing to log while selecting
if (chunk.Skip(1).Any())
Log.Append(Logger.Type.Error, "Conflicts with another", "N/A", chunk.Key);
// if there're duplicates by Id take the 1st one
return chunk.First();
})
.ToList();

Why group when one can use the Aggregate function to determine the duplicates for the report and the result?
Example
var items = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Alpha"};
var duplicatesDictionary =
items.Aggregate (new Dictionary<string, int>(),
(results, itm) =>
{
if (results.ContainsKey(itm))
results[itm]++;
else
results.Add(itm, 1);
return results;
});
Here is the result of the above where each insert was counted and reported.
Now extract the duplicates report for any count above 1.
duplicatesDictionary.Where (kvp => kvp.Value > 1)
.Select (kvp => string.Format("{0} had {1} duplicates", kvp.Key, kvp.Value))
Now the final result is to just extract all the keys.
duplicatesDictionary.Select (kvp => kvp.Key);

You can use a hash set and union it with a list to get unique items; just override the reference comparison. Implementing IEqualityComparer<T> is flexible; if it's just ID that makes two objects unique then ok; but if it's more you can extend it, too.
You can get duplicates with LINQ.
void Main()
{
//your original class:
List<Things> originalList = new List<Things> { new Things(5), new Things(3), new Things(5) };
//i'm doing this in LINQPad; if you're using VS you may need to foreach the object
Console.WriteLine(originalList);
//put your duplicates back in a list and log them as you did.
var duplicateItems = originalList.GroupBy(x => x.ID).Where(x => x.Count() > 1).ToList();//.Select(x => x.GetHashCode());
Console.WriteLine(duplicateItems);
//create a custom comparer to compare your list; if you care about more than ID then you can extend this
var tec = new ThingsEqualityComparer();
var listThings = new HashSet<Things>(tec);
listThings.UnionWith(originalList);
Console.WriteLine(listThings);
}
// Define other methods and classes here
public class Things
{
public int ID {get;set;}
public Things(int id)
{
ID = id;
}
}
public class ThingsEqualityComparer : IEqualityComparer<Things>
{
public bool Equals(Things thing1, Things thing2)
{
if (thing1.ID == thing2.ID)
{
return true;
}
else
{
return false;
}
}
public int GetHashCode(Things thing)
{
int hCode = thing.ID;
return hCode.GetHashCode();
}
}

Identifying and grouping similar items in a collection of strings

I have a collection of strings like the following:
List<string> codes = new List<string>
{
"44.01", "44.02", "44.03", "44.04", "44.05", "44.06", "44.07", "44.08", "46", "47.10"
};
Each string is made up of two components separated by a full stop - a prefix code and a subcode. Some of the strings don't have sub codes.
I want to be able combine the strings whose prefixes are the same and output them as follows with the other codes also:
44(01,02,03,04,05,06,07,08),46,47.10
I'm stuck at the first hurdle of this, which is how to identify and group together the codes whose prefix values are the same, so that I can combine them into a single string as you can see above.

You can do:
var query = codes.Select(c =>
new
{
SplitArray = c.Split('.'), //to avoid multiple split
Value = c
})
.Select(c => new
{
Prefix = c.SplitArray.First(), //you can avoid multiple split if you split first and use it later
PostFix = c.SplitArray.Last(),
Value = c.Value,
})
.GroupBy(r => r.Prefix)
.Select(grp => new
{
Key = grp.Key,
Items = grp.Count() > 1 ? String.Join(",", grp.Select(t => t.PostFix)) : "",
Value = grp.First().Value,
});
This is how it works:
Split each item in the list on the delimiter and populate an anonymous type with Prefix, Postfix and original value
Later group on Prefix
after that select the values and the post fix values using string.Join
For output:
foreach (var item in query)
{
if(String.IsNullOrWhiteSpace(item.Items))
Console.WriteLine(item.Value);
else
Console.WriteLine("{0}({1})", item.Key, item.Items);
}
Output would be:
44(01,02,03,04,05,06,07,08)
46
47.10

Try this:-
var result = codes.Select(x => new { SplitArr = x.Split('.'), OriginalValue = x })
.GroupBy(x => x.SplitArr[0])
.Select(x => new
{
Prefix= x.Key,
subCode = x.Count() > 1 ?
String.Join(",", x.Select(z => z.SplitArray[1])) : "",
OriginalValue = x.First().OriginalValue
});
You can print your desired output like this:-
foreach (var item in result)
{
Console.Write("{0}({1}),",item.Prefix,item.subCode);
}
Working Fiddle.

Outlined idea:
Use Dictionary<string, List<string>> for collecting your result
in a loop over your list, use string.split() .. the first element will be your Dictionary key ... create a new List<string> there if the key doesn't exist yet
if the result of split has a second element, append that to the List
use a second loop to format that Dictionary to your output string
Of course, linq is possible too, e.g.
List<string> codes = new List<string>() {
"44.01", "44.05", "47", "42.02", "44.03" };
var result = string.Join(",",
codes.OrderBy(x => x)
.Select(x => x.Split('.'))
.GroupBy(x => x[0])
.Select((x) =>
{
if (x.Count() == 0) return x.Key;
else if (x.Count() == 1) return string.Join(".", x.First());
else return x.Key + "(" + string.Join(",", x.Select(e => e[1]).ToArray()) + ")";
}).ToArray());
Gotta love linq ... haha ... I think this is a monster.

You can do it all in one clever LINQ:
var grouped = codes.Select(x => x.Split('.'))
.Select(x => new
{
Prefix = int.Parse(x[0]),
Subcode = x.Length > 1 ? int.Parse(x[1]) : (int?)null
})
.GroupBy(k => k.Prefix)
.Select(g => new
{
Prefix = g.Key,
Subcodes = g.Where(s => s.Subcode.HasValue).Select(s => s.Subcode)
})
.Select(x =>
x.Prefix +
(x.Subcodes.Count() == 1 ? string.Format(".{0}", x.Subcodes.First()) :
x.Subcodes.Count() > 1 ? string.Format("({0})", string.Join(",", x.Subcodes))
: string.Empty)
).ToArray();
First it splits by Code and Subcode
Group by you Code, and get all Subcodes as a collection
Select it in the appropriate format
Looking at the problem, I think you should stop just before the last Select and let the data presentation be done in another part/method of your application.

The old fashioned way:
List<string> codes = new List<string>() {"44.01", "44.05", "47", "42.02", "44.03" };
string output=""
for (int i=0;i<list.count;i++)
{
string [] items= (codes[i]+"..").split('.') ;
int pos1=output.IndexOf(","+items[0]+"(") ;
if (pos1<0) output+=","+items[0]+"("+items[1]+")" ; // first occurence of code : add it
else
{ // Code already inserted : find the insert point
int pos2=output.Substring(pos1).IndexOf(')') ;
output=output.Substring(0,pos2)+","+items[1]+output.Substring(pos2) ;
}
}
if (output.Length>0) output=output.Substring(1).replace("()","") ;

This will work, including the correct formats for no subcodes, a single subcode, multiple subcodes. It also doesn't assume the prefix or subcodes are numeric, so it leaves leading zeros as is. Your question didn't show what to do in the case you have a prefix without subcode AND the same prefix with subcode, so it may not work in that edge case (44,44.01). I have it so that it ignores the prefix without subcode in that edge case.
List<string> codes = new List<string>
{
"44.01", "44.02", "44.03", "44.04", "44.05", "44.06", "44.07", "44.08", "46", "47.10"
};
var result=codes.Select(x => (x+".").Split('.'))
.Select(x => new
{
Prefix = x[0],
Subcode = x[1]
})
.GroupBy(k => k.Prefix)
.Select(g => new
{
Prefix = g.Key,
Subcodes = g.Where(s => s.Subcode!="").Select(s => s.Subcode)
})
.Select(x =>
x.Prefix +
(x.Subcodes.Count() == 0 ? string.Empty :
string.Format(x.Subcodes.Count()>1?"({0})":".{0}",
string.Join(",", x.Subcodes)))
).ToArray();

General idea, but i'm sure replacing the Substring calls with Regex would be a lot better as well
List<string> newCodes = new List<string>()
foreach (string sub1 in codes.Select(item => item.Substring(0,2)).Distinct)
{
StringBuilder code = new StringBuilder();
code.Append("sub1(");
foreach (string sub2 in codes.Where(item => item.Substring(0,2) == sub1).Select(item => item.Substring(2))
code.Append(sub2 + ",");
code.Append(")");
newCodes.Add(code.ToString());
}

You could go a couple ways... I could see you making a Dictionary<string,List<string>> so that you could have "44" map to a list of {".01", ".02", ".03", etc.} This would require you processing the codes before adding them to this list (i.e. separating out the two parts of the code and handling the case where there is only one part).
Or you could put them into a a SortedSet and provide your own Comparator which knows that these are codes and how to sort them (at least that'd be more reliable than grouping them alphabetically). Iterating over this SortedSet would still require special logic, though, so perhaps the Dictionary to List option above is still preferable.
In either case you would still need to handle a special case "46" where there is no second element in the code. In the dictionary example, would you insert a String.Empty into the list? Not sure what you'd output if you got a list {"46", "46.1"} -- would you display as "46(null,1)" or... "46(0,1)"... or "46(,1)" or "46(1)"?

How do i sum a list of items by code(or any field)?

I have an object that has a list of another object in it. i.e Object1 contains List<Object2>.
Assuming this is the definition of object 2:
public class Object2
{
string code,
string name,
decimal amount
}
I want to be a able to make a list2 from the list whose value will contain what something similar to what a select name, code, sum(amount) group by code kinda statement could have given me
this is what i did but it didnt contain what i needed on passing through.
var newlist = obj2List.GroupBy(x => x.code)
.Select(g => new { Amount = g.Sum(x => x.amount) });
I want code and name in the new list just like the sql statement above.

You're almost there:
var newlist = obj2List.GroupBy(x => x.code)
.Select(g => new
{
Code = g.First().code,
Name = g.First().name,
Amount = g.Sum(x => x.amount)
});
This groups the items by code and creates an anonymous object for each group, taking the code and name of first item of the group. (I assume that all items with the same code also have the same name.)

If you are grouping by code and not by name you'd have to choose something for name from the list, perhaps with First() or Last() or something.
var newlist = obj2List.GroupBy(x => x.code).Select(g => new {
Code = g.Key,
Name = g.First().name,
Amount = g.Sum(x => x.amount)
});

var query = Object1.Obj2List
.GroupBy(obj2 => obj2.code)
.Select(g => new {
Names = string.Join(",", g.Select(obj2.name)),
Code = g.Key,
Amount = g.Sum(obj2 => obj2.Amount)
});
Since you group by code only you need to aggregate the name also in some way. I have used string.Join to create a string like "Name1,Name2,Name3" for each code-group.
Now you could consume the query for example with a foreach:
foreach(var x in query)
{
Console.WriteLine("Code: {0} Names: {1} Amount: {2}"
, x.Code, x.Names, x.Amount);
}

Instead of using the LINQ Extension Methods .GroupBy() and .Select() you could also use a pure LINQ statement which is way easier to read if you come from a SQL Background.
var ls = new List<Object2>();
var newLs = from obj in ls
group obj by obj.code into codeGroup
select new { code = codeGroup.Key, amount = codeGroup.Sum(s => s.amount) };

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Remove duplicate by matching string part of text - c#

Does this work for you? List<string> doo = foo .Select(x => x.Split(':')) .GroupBy(x => x[0], x => x[1]) .Select(x => $"{x.Key}:{x.Last()}") .ToList();

you can do it using Linq : var doo = foo.Select(x => { var split = x.Split(':'); return new { Key = split[0], Value = split[1] }; }) .GroupBy(x => x.Key) .OrderBy(x => x.Key) .Select(x => { var max = x.LastOrDefault(); return $"{max.Key}:{max.Value}"; } ).ToList();

Related

OrderBy search results that startswith searcch word then by results that contains search word

Linq on List<object>

LINQ: Enumerate through duplicates in List and remove them

Identifying and grouping similar items in a collection of strings

How do i sum a list of items by code(or any field)?

Categories

Resources