Improving O(n^2) algorithm - c#

I have the following piece of code:
var keywordItems = adwordsService
.ParseReport(report)
.Where(e => e.Keyword.IndexOf('+') == -1);
var keywordTranslations = keywordTranslationService
.GetKeywordTranslationsByClient(id);
model.KeywordItems = keywordItems
.Where(e =>
{
int lastUnderscore = e.CampaignName.LastIndexOf('_');
var identifer = e.CampaignName.Substring(lastUnderscore + 1);
var translation = keywordTranslations
.FirstOrDefault(t => t.translation == e.Keyword &&
t.LocalCombination_id == identifer);
return translation == null;
})
.OrderBy(e => e.Keyword);
It receives an array and then filters each of these element based on whether or not they've already been seen before.
However, this runs pretty slow, as there's a lot of new elements, so I would like it, if someone can point me in the right direction regarding the best algorithm to use in this case.

Simple join will do the job - it uses hashset for matching between collections, which gives you O(1) for search operation:
from k in keywordItems
let identifer = k.CampaignName.Substring(k.CampaignName.LastIndexOf('_') + 1)
join t in keywordTranslations on
new { k.Keyword, Id = identifer } equals
new { Keyword = t.translation, Id = t.LocalCombination_id } into g
where !g.Any()
orderby k.Keyword
select k
To further improve performance you can move identifier extraction directly to the key creation. Thus you will omit introducing new range variable.

I suggest using hashing, e.g. HashSet<T> or Dictionary<T>. Providing that translation as well as LocalCombination_id are of type string:
HashSet<Tuple<string, int>> keywordTranslations =
new HashSet<Tuple<string, string>>(keywordTranslationService
.GetKeywordTranslationsByClient(id)
.Select(t => new Tuple<string, int>(t.translation, t.LocalCombination_id)));
model.KeywordItems = keywordItems
.Where(e => !keywordTranslations.Contains(new Tuple<string, string>(
e.Keyword,
e.CampaignName.Substring(e.CampaignName.LastIndexOf('_') + 1))))
.OrderBy(e => e.Keyword);

Related

LINQ subquery with multiple columns

I'm trying to recreate this SQL query in LINQ:
SELECT *
FROM Policies
WHERE PolicyID IN(SELECT PolicyID
FROM PolicyRegister
WHERE PolicyRegister.StaffNumber = #CurrentUserStaffNo
AND ( PolicyRegister.IsPolicyAccepted = 0
OR PolicyRegister.IsPolicyAccepted IS NULL ))
Relationship Diagram for the two tables:
Here is my attempt so far:
var staffNumber = GetStaffNumber();
var policyRegisterIds = db.PolicyRegisters
.Where(pr => pr.StaffNumber == staffNumber && (pr.IsPolicyAccepted == false || pr.IsPolicyAccepted == null))
.Select(pr => pr.PolicyID)
.ToList();
var policies = db.Policies.Where(p => p.PolicyID.//Appears in PolicyRegisterIdsList)
I think I'm close, will probably make two lists and use Intersect() somehow but I looked at my code this morning and thought there has to be an easier way to do this,. LINQ is supposed to be a more readble database language right?
Any help provided is greatly appreciated.
Just use Contains:
var policies = db.Policies.Where(p => policyRegisterIds.Contains(p.PolicyID));
Also better store policyRegisterIds as a HashSet<T> instead of a list for search in O(1) instead of O(n) of List<T>:
var policyRegisterIds = new HashSet<IdType>(db.PolicyRegisters......);
But better still is to remove the ToList() and let it all happen as one query in database:
var policyRegisterIds = db.PolicyRegisters.Where(pr => pr.StaffNumber == staffNumber &&
(pr.IsPolicyAccepted == false || pr.IsPolicyAccepted == null));
var policies = db.Policies.Where(p => policyRegisterIds.Any(pr => pr.PolicyID == p.PolicyID));

Why does my LINQ query always return 0?

I'm facing a weird problem, I haven't programmed much with c# and only started recently, so I apologise in advance if the question is in fact just a beginner mistake.
int i = 0;
var index = from x in (
from v in Category.Items
select new { Key = i++, Value = v })
where ((MenuCategory) x.Value).id == menuItems[items.SelectedIndex].category
select x.Key;
I'm trying to get the index of a specific object in Category.Items[] (where the field id is a specific value, menuItems[items.SelectedIndex].category)
LINQ queries should not cause side effects like this. You can get what you want with method syntax and the overload of Select:
var selectedCatId = menuItems[items.SelectedIndex].category;
var indexes = Category.Items
.Select((c, index) => new { Key = index, Value = c })
.Where(x => ((MenuCategory)x.Value).id == selectedCatId)
.Select(x => x.Key);

Use List<Tuple<int, int>> to return data in Linq

Given:
List<int> myList;
If I wanted to return data where the record ID was contained in this list I would simply do:
var q = db.Table.Where(c=> myList.Contains(c.ID));
However, given:
List<Tuple<int, int>> myList;
How would I write a Linq query to return records where both conditions are met? With one data point I would write:
var q = db.Table.Where(c=>
c.ID == myList.Item1
&& c.AnotherValue == myList.Item2);
How would I convert the above statement to work on a List<Tuple<int, int>>?
A Tuple is a structure that can't not be translated to sql by your Linq Provider. A solution could be making a switch to Linq to Objects
var q = db.Table.AsEnumerable()
.Where(c=> myList.Any(tuple => c.ID == tuple.Item1 &&
c.AnotherValue == tuple.Item2));
But the bad thing about this solution is that you're going to load all the rows from that table to filter in memory.
Another solution could be using Linqkit:
var predicate = PredicateBuilder.False<Table>();
foreach (string t in myList)
{
predicate = predicate.Or(c =>c.ID == t.Item1 && c.AnotherValue == t.Item2));
}
db.Table.AsExpandable().Where(predicate);
You will find more info about this last solution in this link
var q = db.Table.AsEnumerable().Where(c => myList.Any(tuple => c.ID == tuple.Item1 &&
c.AnotherValue == tuple.Item2));
With Any you can check if there is at least one element in myList the matches your condition.
But as #octaviocci pointed out, this is not translatable to SQL, so you would need to call AsEnumerable() before and do the filtering locally, which may not be what you want if there are a lot of irrelevant records.
Here is some sample code that illustrates one approach:
DataTable dt = new DataTable("demo");
// hydrate your table here...
List<Tuple<int, int>> matches = new List<Tuple<int, int>>();
Func<List<Tuple<int,int>>, DataRow, bool> RowMatches = (items, row) => {
var rowValue1 = (int)row["Id"];
var rowValue2 = (int)row["SomeOtherValue"];
return items.Any(item => item.Item1 == rowValue1 && item.Item2 == rowValue2);
};
var results = dt.Rows.Cast<DataRow>().Where(r => RowMatches(matches, r));
Console.WriteLine(results.Any());
See code below:
List<Tuple<int, int>> myList;
var set = new HashSet(myList);
var q = db.Table.AsEnumerable().Where(c=> set.Contains(new Tuple(c.ID, c.AnotherValue)));
Note that hash set is used to performance-optimize the execution of Where clause for large myList.
Since Tuple cannot be used in Linq to Entities, you can try something like this:
List<int> items1 = myList.Select(t => t.Item1).ToList();
List<int> items2 = myList.Select(t => t.Item2).ToList();
var q = db.Table.GroupBy(m => { m.ID, m.AnotherValue })
.Where(g => items1.Contains(g.Key.ID) &&
items2.Contains(g.Key.AnotherValue))
.SelectMany(g => g);

Select Value from Dictionary if Key exists using LINQ

I have a dictionary where I have a List as value. I want to select specific elements from the list that belongs to a specific key. I tried this so far:
Dictionary<int, List<bool>> dic = new Dictionary<int, List<bool>>();
dic.Add(1, new List<bool> { true, true, false });
var works = dic.Where(x => x.Key == 1).SingleOrDefault().Value.Where(x => x == true).ToList();
var doesNotWork = dic.Where(x => x.Key == 2).SingleOrDefault().Value.Where(x => x == true).ToList();
The first LINQ works because there is a key equal to 1. Thus I get a List<bool> with two elements.
The second LINQ does not work because Value is null. How can I rewrite that LINQ such that if there is no suitable key in the dictionary I get an empty List<bool>?
I thought my approach would work because I thought the default element had an empty list instead of null as Value.
Disclaimer: This only works on C# 6.0 and later (VS 2015+)
If you really want to do it in a single line using linq you can use the ?. operator (null-conditional operator) and get a line like this:
var shouldWork = dic.Where(x => x.Key == 2)?.SingleOrDefault().Value.Where(x => x == true).ToList() ?? new List<bool>();
This will set shouldWork to either the result of the linq query or an empty list. You can replace new List<bool>() with anything you want
See MSDN post here and Github post here for information on the new features in C# 6.0 specifically this example from the github site:
int length = customers?.Length ?? 0; // 0 if customers is null
and description of how it works
The null-conditional operator exhibits short-circuiting behavior,
where an immediately following chain of member accesses, element
accesses and invocations will only be executed if the original
receiver was not null
Edit: Since ?. checks for null, you could simplify the above linq query to this:
var shouldWork = dic[key]?.Where(x => x == true).ToList() ?? new List<bool>();
where key is some variable holding your key
You shouldn't be using LINQ to find a key in a Dictionary - the Dictionary has more efficient methods for doing that - ContainsKey/ indexer pair or more optimal TryGetValue.
For instance:
int key = 2;
(A)
var result = dic.ContainsKey(key) ? dic[key].Where(x => x == true).ToList() : new List<bool>();
(B)
List<bool> values;
var result = dic.TryGetValue(key, out values) ? values.Where(x => x == true).ToList() : new List<bool>();
Why does it need to be LINQ?
List<bool> works1 = dic.ContainsKey(1) ? dic[1] : new List<bool>();
Simplest solution would be to use Dictionary<T>.TryGetValue so that you don't check twice for a value, ie:
Dictionary<int, List<bool>> dic = new Dictionary<int, List<bool>>();
dic.Add(1, new List<bool> { true, true, false });
List<bool> match = null;
var found = dic.TryGetValue(2, out match);
if (!found) match = new List<bool>();
You want to get a strange result, but anyway this code help you:
var reslt = (dic.FirstOrDefault(x => x.Key == 2).Value ?? new List<bool>(0))
.Where(x => x)
.ToList();
Using method SingleOrDefault() isn't correct, because key in Dictionary is unique.
And x=>x==true is strange too.
Try this
Dictionary<int, List<bool>> dic = new Dictionary<int, List<bool>>();
dic.Add(1, new List<bool> { true, true, false });
var works = !dic.ContainsKey(1)? new List<bool>(): dic[1].Where(x => x == true).ToList();
var doesNotWork = !dic.ContainsKey(2) ? new List<bool>(): dic[2].Where(x => x == true).ToList();
We don't know why it need to be LINQ, but this could be a option:
var nowItWork = dic.Where(x => x.Key == 2).SelectMany(x => x.Value).Where(x => x).ToList();

Combining two simple related linq queries

I have two queries and i'm using the result of the first one in the second one like this
var temp = (ObjectTable.Where(o => o.Category == "Y"));
var anonymousObjList = temp.Select(o => new {o, IsMax = (o.Value == temp.Max(x => x.Value))});
Is there a way to combine these into one query?
EDIT:
I cannot just chain them directly because I'm using temp.Max() in the second query.
Why? it would be clearer (and more efficient) to make it three:
var temp = (ObjectTable.Where(o => o.Category == "Y"));
int max = temp.Max(x => x.Value);
var anonymousObjList = temp.Select(o => new {o, IsMax = (o.Value == max)});
You can do it in one statement using query syntax, using the let keyword. It only evaluates the 'max' once, so it just like the three separate statements, just in one line.
var anonymousObjList = from o in ObjectTable
where o.Category == "Y"
let max = ObjectTable.Max(m => m.Value)
select new { o, IsMax = (o.Value == max) };
This is the only time I ever use query syntax. You can't do this using method syntax!
edit: ReSharper suggests
var anonymousObjList = ObjectTable.Where(o => o.Category == "Y")
.Select(o => new {o, max = ObjectTable.Max(m => m.Value)})
.Select(#t => new {#t.o, IsMax = (#t.o.Value == #t.max)});
however this is not optimal. The first Select is projecting a max Property for each item in ObjectTable - the Max function will be evaluated for every item. If you use query syntax it's only evaluated once.
Again, you can only do this with query syntax. I'm not fan of query syntax but this makes it worthwhile, and is the only case in which I use it. ReSharper is wrong.
Possibly the most straightfirward refactoring is to replace all instances of "temp" with the value of temp. Since it appears that this value is immutable, the refactoring should be valid (yet ugly):
var anonymousObjList = ObjectTable.Where(o => o.Category == "Y")
.Select(o => new {o, IsMax = (o.Value == ObjectTable.Where(o => o.Category == "Y").Max(x => x.Value))});
As has already been pointed out, this query really has no advantages over the original, since queries use deffered execution and can be built up. I would actually suggest splitting the query even more:
var temp = (ObjectTable.Where(o => o.Category == "Y"));
var maxValue = temp.Max(x => x.Value);
var anonymousObjList = temp.Select(o => new {o, IsMax = (o.Value == maxValue)});
This is better than the original because every time "Max" is called causes another iteration over the entire dataset. Since it is being called in the Select of the original, Max was being called n times. That makes the original O(n^2)!

Categories

Resources