Remove sub-domains from list of domains using LINQ - c#

I have a list of strings like this:
a#domain.com
b#sub.domain.com
c#sub.sub.domain.com
d#sub.domain2.com
I want to remove the subdomains and only leave the domain.com, domain2.com, etc..
What I have tried so far but with no success:
string[] campusCup(string[] emails)
{
var emailList = emails.Select(x => x.Split('#').Last())
.Distinct()
.Select(x => x.Where(y => x.Split('.').Length > 2).Select(z => x.Split('.').Reverse().Take(2).Reverse()))
.Select(x => x)
.Distinct();
return emailList.ToArray();
}
Any help solving the task or explanation of what I am doing wrong and how can I solve it is appreciated. Thank you

You could first use MailAddress to get the host, then some string methods to get only the last two:
string[] domains = emails
.Select(e => new MailAddress(e).Host.Split('.'))
.Select(arr => String.Join(".", arr.Skip(arr.Length - 2)))
.Distinct()
.ToArray();

This seems to work for me given your data set:
var domains = emails.Select(e => e.Split('#')[1]).Select(d =>
{
var parts = d.Split('.');
return string.Join(".", parts.Skip(parts.Length - 2));
}).Distinct();

If you just want to learn about LINQ, as you mention in the comments of your question, here is another fun option:
var reg = new Regex(#"[a-z0-9\.]+#[a-z0-9\.]*?(?<domain>[a-z0-9]+\.[a-z0-9]+)$");
var secondLevelDomains = domains.SelectMany(domainName => reg.Matches(domainName).Cast<Match>()
.Select(m => m.Groups["domain"])
.Select(m => m.Value))
.Distinct();
It uses matching groups in regular expressions to parse the domain names, and several of the more interesting LINQ functions, like Cast (for converting older collections in to LINQ friendly enumerables), SelectMany (to merge enumerable properties of multiple items), and Distinct (to return only unique entries).
This is probably not the ideal way to do this in a real application, but it exposes a lot of LINQ functionality for learning purposes.

Related

LINQ Lambda efficiency of code groupby orderby

I have this code, but I think that it could run faster, or I just hope to. But I have plenty of data. I'd like to have it as effective as it can be.
Here is the code:
(Need to return newest translations of words (Language and value) from resources grouped by resource and language based on Expression<Func<ResourcesTranslation, bool>> ConditionExpression)
KeyValues = item.Resources
.Where(ConditionExpression)
.GroupBy(g => new { g.ResourceId, g.Language })
.Select(m => m.OrderByDescending(o => o.Changed ?? o.Created))
.Select( s => new KeyValues
{
Language = s.FirstOrDefault().Language,
KeyValue = s.FirstOrDefault().Value
}).ToList();
As you need only one element after grouping, you can return it right in GroupBy clause, it will simplify your code:
KeyValues = item.Resources
.Where(ConditionExpression)
.GroupBy(g => new { g.ResourceId, g.Language },
(x, y) => new { Max = y.OrderByDescending(o => o.Changed ?? o.Created).First() })
.Select(s => new KeyValues
{
Language = s.Max.Language,
KeyValue = s.Max.Value
})
.ToList();
Even though you can get some performance by removing the first, unneeded select (depending on the volume of data this could be minimal to medium improvement) like this:
KeyValues = item.Resources
.Where(ConditionExpression)
.GroupBy(g => new { g.ResourceId, g.Language })
.OrderByDescending(o => o.Changed.HasValue ? o.Changed : o.Created)
.Select( s => new KeyValues
{
Language = s.Language,
KeyValue = s.Value
}).ToList();
Depending on your case, you could:
If your data is in a database, you can create database improvements like adding indexes, updating statistics, using hints etc.
if this is local data, you can use some strategy to split new and old data between various enumerables.
There is no other way to significantly improve your linq query. You need to find another strategy to achieve that.
I found out that Visual Studio translates it in to selects, so I realized that, the best solution for stuff like this is to make some View.. Just giving answer to own Q for another guys.

How can I split a List<T> into two lists, one containing all duplicate values and the other containing the remainder?

I have a basic class for an Account (other properties removed for brevity):
public class Account
{
public string Email { get; set; }
}
I have a List<T> of these accounts.
I can remove duplicates based on the e-mail address easily:
var uniques = list.GroupBy(x => x.Email).Select(x => x.First()).ToList();
The list named 'uniques' now contains only one of each account based on e-mail address, any that were duplicates were discarded.
I want to do something a little different and split the list into two.
One list will contain only 'true' unique values, the other list will contain all duplicates.
For example the following list of Account e-mails:
unique#email.com
dupe#email.com
dupe#email.com
Would be split into two lists:
Unique
unique#email.com
Duplicates
dupe#email.com
dupe#email.com
I have been able to achieve this already by creating a list of unique values using the example at the top. I then use .Except() on the original list to get the differences which are the duplicates. Lastly I can loop over each duplicate to 'pop' it out of the unique list and move it to the duplicate list.
Here is a working example on .NET Fiddle
Can I split the list in a more efficient or syntactically sugary way?
I'd be happy to use a third party library if necessary but I'd rather just stick to pure LINQ.
I'm aware of CodeReview but feel the question also fits here.
var groups = list.GroupBy(x => x.Email)
.GroupBy(g => g.Count() == 1 ? 0 : 1)
.OrderBy(g => g.Key)
.Select(g => g.SelectMany(x => x))
.ToList();
groups[0] will be the unique ones and group[1] will be the non-unique ones.
var duplicates = list.GroupBy(x => x) // or x.Property if you are grouping by some property.
.Where(g => g.Count() > 1)
.SelectMany(g => g);
var uniques = list.GroupBy(x => x) // or x.Property if you are grouping by some property.
.Where(g => g.Count() == 1)
.SelectMany(g => g);
Alternatively, once you get one list, you can get the other one using Except:
var uniques = list.Except(duplicates);
// or
var duplicates = list.Except(uniques);
Another way to do it would be to get uniques, and then for duplicates simply get the elements in the original list that aren't in uniques.
IEnumerable<Account> uniques;
IEnumerable<Account> dupes;
dupes = list.Where(d =>
!(uniques = list.GroupBy(x => x.Email)
.Where(g => g.Count() == 1)
.SelectMany(u => u))
.Contains(d));

Issue with many-to-many query with linq to entities

I've got a table
Application
ApplicationID,
NAme
ApplicationSteps
AplicationStepID,
AplicationID,
StepID
ApplicationStepCriterias
ApplicationStepID,
CriteriaID
So I've got one SelectedCriteriaID - a user choose from a dropdown one criteria and he wants all the applications which has this SelectedCriteriaID in the table ApplicationStepCriterias
I tried
var ds = context.Applications
.Where(a => a.ApplicationSteps
.Select(x=>x.ApplicationStepCriterias
.Select(t=>t.CriteriaId))
.Contains(SelectesdCriteria));
But as I have as result IEnumerable<IEnumerable<int>> I cannot use Contains
Just I get a list of all the CriteriaIds for each ApplicationStep(also a sequence). Just I cannot think of way to get in one list all the CriteriIds.
First, let me try to get the names right. This is not a pure many-to-many association, because the junction class is part of the class model. It is what I unofficially call a 1-n-1 association. So you have
Application -< ApplicationSteps >- ApplicationStepCriterias
I'd strongly recommend to use singular names for your classes ...
Application -< ApplicationStep >- ApplicationStepCriterion
... so you can use plural for collection property names without getting confused.
If I'm right so far, you query should be
context.Applications
.Where(a => a.ApplicationSteps
.Any(x => selectedCriteria
.Contains(x.ApplicationStepCriterion.CriteriaId));
(and I'd also prefer CriterionId, probably referring to a Criterion class)
You may try something like this:
var applicationStepIds = context.ApplicationStepCriterias
.Where(i => i.CriteriaID == selectedCriteria)
.Select(i => i.ApplicationStepID)
.Distinct();
var applicationIds = context.ApplicationSteps
.Where(i => applicationStepIds.Contains(i.AplicationStepID))
.Select(i => i.AplicationID)
.Distinct();
var result = context.Applications.Where(i => applicationIds.Contains(i.ApplicationId));

Linq IEnumerable<IGrouping<string, Class>> back to List<Class>

How can I turn the following statement back to List<DocumentData>
IEnumerable<IGrouping<string, DocumentData>> documents =
documentCollection.Select(d => d).GroupBy(g => g.FileName);
the goal is to get List that should be smaller than documentCollection.
FileName contains duplicates so I want to make sure I don't have duplicate names.
I have also tried the following but it's still providing me with duplicate file names
documentCollection =
documentCollection.GroupBy(g => g.FileName).SelectMany(d => d).ToList();
Each IGrouping<string, DocumentData> is an IEnumerable<DocumentData>, so you could simply call SelectMany to flatten the sequences:
var list = documents.SelectMany(d => d).ToList();
Edit: Per the updated question, it seems like the OP wants to select just the first document for any given filename. This can be achieved by calling First() on each IGrouping<string, DocumentData> instance:
IEnumerable<DocumentData> documents =
documentCollection.GroupBy(g => g.FileName, StringComparer.OrdinalIgnoreCase)
.Select(g => g.First())
.ToList();
You haven't said what T should stand for in List<T> you're looking for, so here are couple the most likely to be desired:
List<DocumentData> - rather pointless as you already have that on documentCollection
var results = documents.SelectMany(g => g).ToList();
List<KeyValuePair<string, List<DocumentData>>
var results =
documents.Select(g => new KeyValuePair(g.Key, g.ToList())).ToList();
List<string> - just the names
var results = documents.Select(g => g.Key).ToList();
List<IGrouping<string, DocumentData>>
var results = documents.ToList();

Order By on the Basis of Integer present in string

I've a problem in my C# application... I've some school classes in database for example 8-B, 9-A, 10-C, 11-C and so on .... when I use order by clause to sort them, the string comparison gives results as
10-C
11-C
8-B
9-A
but I want integer sorting on the basis of first integer present in string...
i.e.
8-B
9-A
10-C
11-C
hope you'll understand...
I've tried this but it throws exception
var query = cx.Classes.Select(x=>x.Name)
.OrderBy( x=> new string(x.TakeWhile(char.IsDigit).ToArray()));
Please help me... want ordering on the basis of classes ....
Maybe Split will do?
.OrderBy(x => Convert.ToInt32(x.Split('-')[0]))
.ThenBy(x => x.Split('-')[1])
If the input is well-formed enough, this would do:
var maxLen = cx.Classes.Max(x => x.Name.Length);
var query = cx.Classes.Select(x => x.Name).OrderBy(x => x.PadLeft(maxLen));
You can add 0 as left padding for a specified length as your data for example 6
.OrderBy(x => x.PadLeft(6, '0'))
This is fundamentally the same approach as Andrius's answer, written out more explicitly:
var names = new[] { "10-C", "8-B", "9-A", "11-C" };
var sortedNames =
(from name in names
let parts = name.Split('-')
select new {
fullName = name,
number = Convert.ToInt32(parts[0]),
letter = parts[1]
})
.OrderBy(x => x.number)
.ThenBy(x => x.letter)
.Select(x => x.fullName);
It's my naive assumption that this would be more efficient because the Split is only processed once in the initial select rather than in both OrderBy and ThenBy, but for all I know the extra "layers" of LINQ may outweigh any gains from that.

Categories

Resources