Bug in .Net's LINQ, or me missing something?

Bug in .Net's LINQ, or me missing something? - c#

I had a strange bug i don't understand, and changing LINQ's IEnumerable to list half way through fixed it, and i dont understand why
Not Real the Code, but very similar
The code below doesn't work:
// an IEnumerable of some object (Clasess) internally an array
var ansestors = GetAnsestors();
var current = GetCurrentServerNode();
var result = from serverNode in ansestors
select new PolicyResult
{
//Some irrelevant stuff
OnNotAvailableNode = NodeProcessingActionEnum.ContinueExecution,
};
var thisNode = new PolicyResult
{
//Some irrelevant stuff
OnNotAvailableNode = NodeProcessingActionEnum.ThrowException,
};
result = result.Reverse();
result = result.Concat(new List<PolicyResult> { thisNode });
result.First().OnNotAvailableNode = NodeProcessingActionEnum.ThrowException;
// When looking in the debugger, and in logs, the first element of the
// result sequence has OnNotAvailableNode set to ContinueExecution
// Which doesnt make any sense...
But when i change the ending to the following it works:
result = result.Reverse();
result = result.Concat(new List<PolicyResult> { thisNode });
var policyResults = result.ToList();
var firstPolicyResult = policyResults.First();
firstPolicyResult.OnNotAvailableNode = NodeProcessingActionEnum.ThrowException;
return policyResults;
All the types here are classes (reference types) except NodeProcessingActionEnum which is an enum.
Is this a bug?
Me missing something crucial about LINQ?
Help?

result.First() executes the (deferred / lazy) query.
That line will set the value OK but when you use result later the query will be executed again.
Later you are looking at a newly fetched copy. The fact that it is different lets me assume that GetAnsestors() is also lazily evaluated and is not an in memory List<>
This means that ToList() is a worthwhile optimization as well as a fix. Note that after the ToList you can also use
var firstPolicyResult = policyResults[0];

The problem is that running First on your IEnumerable removes it from the enumerator so you've then checking the next element. Actually I've changed my mind - that's probably not it. This solution might be worth a shot, though.
You could wrap the IEnumerable with something which makes the change for you, e.g. using the Select override which accepts an index too:
var modifiedResults = results.Select((r, index) => {
if (index == 0) {
// This is the first element
r.OnNotAvailableNode = NodeProcessingActionEnum.ThrowException;
}
return r;
});
(untested) should do the trick.

Related

Function to linq conversion

I have a function which I believe can be simplified into LINQ but have been unable to do so yet.
The function looks like this:
private IList<Colour> GetDifference(IList<Colour> firstList, IList<Colour> secondList)
{
// Create a new list
var list = new List<Colour>();
// Loop through the first list
foreach (var first in firstList)
{
// Create a boolean and set to false
var found = false;
// Loop through the second list
foreach (var second in secondList)
{
// If the first item id is the same as the second item id
if (first.Id == second.Id)
{
// Mark it has being found
found = true;
}
}
// After we have looped through the second list, if we haven't found a match
if (!found)
{
// Add the item to our list
list.Add(first);
}
}
// Return our differences
return list;
}
Can this be converted to a LINQ expression easily?

What is Colour? If it overrides Equals to compare by Id then this would work:
firstList.Except(secondList);
If Colour does not override Equals or it would be wrong for you to do so in the wider context, you could implement an IEqualityComparer<Colour> and pass this as a parameter:
firstList.Except(secondList, comparer);
See the documentation
As noted in the comments below, Except has the added side effect of removing any duplicates in the source (firstList in this example). This may or may not be an issue to you, but should be considered.
If keeping any duplicates in firstList is of importance, then this is the alternative:
var secondSet = new HashSet<Colour>(secondList, comparer);
var result = firstList.Where(c => !secondSet.Contains(c));
As before, comparer is optional if Colour implements appropriate equality

try the following:
var result = firstList.Where(x => !secondList.Any(y => y.ID == x.ID));
Edit:
If you care about runtime and don't mind creating your own IEqualityComparer<>, i would suggest you use Except like Charles suggested in his answer. Except seems to use a hashtable for the second list which speeds it up quite a bit compared to my O(n*m) query. However be aware that Except removes duplicates from secondList as well.

What is the correct way of ignoring arguments in LINQ?

I have the following code:
foreach (var b in userNames.Select(a => new User()))
{
...
}
This works quite well, since it gives me all "fresh" user objects, however Code Analysis complains that I shouldn't create unused locals, so my question is, is there a way of ignoring the arguments (similar to the "_" in Haskell).
PS: prehaps my example is not the best. I am sorry for this.
Thanks!
Update 1
I got the following code analysis error:
[System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Performance", "CA1804:RemoveUnusedLocals", MessageId = "a"), System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Performance", "CA1804:RemoveUnusedLocals", MessageId = "b")]

_ is a perfectly valid variable name in C#. So writing
foreach(var b in userNames.Select(_ => new User()))
{
}
is perfectly valid code. It depends on your analysis rules whether it accepts such cases or not.
However, your code is indeed quite suspicious: you're mapping a collection of user names to a collection of users but you're not specifying a direct relation between the two: maybe you wanted to write something like this:
foreach(var b in userNames.Select(username => new User(username)))

To create a collection of objects of a given size, just use the length from the original collection.
var newColletion = Enumerable.Repeat(false, input.Length)
.Select(_ => new User());
but perhaps better would be your own helper method
static class MyEnumerable {
IEnumberable<T> Repeat<T>(Func<T> generator, int count) {
for (var i = 0; i < count; ++i) {
yield return generator();
}
}
}
and then use
var newCollection = MyEnumerable.Repeat(() => new User(), oldCollection.Length);

If quantity is your concern, and need linq, rewrite it as
foreach(var user in Enumerable.Repeat(()=>new User(),Usernames.Count).Select(x=>x()))
{
}
But, it may look ugly based on how you see it.

I'm sure there is a valid case where you would want to ignore arguments like this, but this doesn't seem like one of them. If you are creating N User objects for N userNames, surely you want to couple those together?
foreach (var b in userNames.Select(a => new { name = a, user = new User() }))
{
...
}
Then you won't have any unused arguments.
But the question remains why you aren't just doing this:
foreach (var name in userNames)
{
var user = new User();
// ...
}
As far as I can see, your use of .Select() makes no sense here.

Select performs a projection on the collection on which it is called, performing a specified operation on each element and returning the transformed results in another collection. If you do not need to perform any operation on the lambda element, you'd better simply create an array of User objects directly.
Answering your edit, as I said above, you can simply do this:
var NewColl = new User[userNames.Length];
As for initialization, you could have done this:
Enumerable.Repeat<User>(new User(), userNames.Length);

LINQ, Lambda, confused

Can anybody explain why I don't see the (my) expected output for the WriteLine?
I can see it when I'm debugging it in VS and refresh the 'result' to see its content in my Local window inside VS.
THX
Func<Category, bool> del = (Category cat) => {
System.Console.WriteLine(cat.CategoryName);
return cat.CategoryID > 1;
};
NorthwindEntities nw = new NorthwindEntities();
var result = nw.Categories.Where<Category>(del);
Console.Read();

LINQ structures are lazy-evaluated, which means that your lambda function will not be invoked until items are requested from the enumeration (and even then, not necessarily all at once). This should cause the values to be output to the console:
var result = nw.Categories.Where<Category>(del).ToList();
Please note the implications here: if you did this, the values would be output to the console twice:
var result = nw.Categories.Where<Category>(del);
var otherVariable = result.ToList();
foreach(var item in result)
{
// do something
}
This is a good reason why you should avoid involving code with side-effects in your LINQ queries.

You need to do something with results in order for your lambda to exeucute. Try this:
var result = nw.Categories.Where<Category>(del);
foreach(var r in result)
{
}
As you enumerate over result your lambda will be called.

Perhaps you need to materialize the query. You result is an IEnumerable, so the delegate will file only when the result is actually enumerated.
Try this: var result = nw.Categories.Where<Category>(del).ToList();

This is due to lazy evaluation. The function hasn't actually been executed yet, so it isn't enumerated until you either enumerate it yourself or you do something like this:
Category[] categories = nw.Categories.Where<Category>(del).ToArray();
Calling this will invoke the evaluation. You can read up about this on the web but here is an article to kick things off.

How do I write a C# lambda returning "true" at all times most elegantly?

I want to invoke Queryable.Where() and get all elements. There's no version of Where() that works without a predicate function. So I have to right this:
var result = table.Where( x => true );
and it works but that feels really stupid to me - x is never used, and there's no "transformation" for the => "arrow" symbol.
Is there a more elegant solution?

You can use the following, which is more elegant:
var result = table;
You could also omit result completely, and use table directly.

Isn't table.Where(x=>true) essentially a noop? I mean, what is the point? You can do use _ instead of x though, which is idiomatic.
table.Where(_=> true);
But really, the following is what you are doing:
for (var item in table)
{
if (true) // your Where() clause..
{
yield item;
}
}
See how it doesn't really make sense?

table.Where( x => true ) is not "returning all elements". It simply returns an enumerable that has enough information to return some subset of elements when it is being enumerated upon. Until you enumerate it, no elements are "returned".
And since this subset is not even proper in this case (i.e. all elements are returned), this is essentially a no-op.
To enumerate over all elements, write a simple foreach, or use ToList or ToArray or if you don't care about actually returning any elements (and just want to enumerate, presumably for side-effects): table.All(x => true) or table.Any(x => false), or even just table.Count().

In this case you would not need to call Where because you are not filtering the Queryable.
If you still wish to call Where and you do this in many places you could define a static Func and reuse that:
public static Func<int, bool> t = ReturnTrue;
public static bool ReturnTrue(int i)
{
return true;
}
table.Where(t);

If you're trying to get a copy of the contents of table instead of a reference,
var result = table.ToList();
but it's not clear if that's really what you're trying to accomplish. Details?

Looping on IEnumerator<T>, Any Suggestions

I am having a situation where looping through the result of LINQ is getting on my nerves. Well here is my scenario:
I have a DataTable, that comes from database, from which I am taking data as:
var results = from d in dtAllData.AsEnumerable()
select new MyType
{
ID = d.Field<Decimal>("ID"),
Name = d.Field<string>("Name")
}
After doing the order by depending on the sort order as:
if(orderBy != "")
{
string[] ord = orderBy.Split(' ');
if (ord != null && ord.Length == 2 && ord[0] != "")
{
if (ord[1].ToLower() != "desc")
{
results = from sorted in results
orderby GetPropertyValue(sorted, ord[0])
select sorted;
}
else
{
results = from sorted in results
orderby GetPropertyValue(sorted, ord[0]) descending
select sorted;
}
}
}
The GetPropertyValue method is as:
private object GetPropertyValue(object obj, string property)
{
System.Reflection.PropertyInfo propertyInfo = obj.GetType().GetProperty(property);
return propertyInfo.GetValue(obj, null);
}
After this I am taking out 25 records for first page like:
results = from sorted in results
.Skip(0)
.Take(25)
select sorted;
So far things are going good, Now I have to pass this results to a method which is going to do some manipulation on the data and return me the desired data, here in this method when I want to loop these 25 records its taking a good enough time. My method definition is:
public MyTypeCollection GetMyTypes(IEnumerable<MyType> myData, String dateFormat, String offset)
I have tried foreach and it takes like 8-10 secs on my machine, it is taking time at this line:
foreach(var _data in myData)
I tried while loop and is doing same thing, I used it like:
var enumerator = myData.GetEnumerator();
while(enumerator.MoveNext())
{
int n = enumerator.Current;
Console.WriteLine(n);
}
This piece of code is taking time at MoveNext
Than I went for for loop like:
int length = myData.Count();
for (int i = 0; i < 25;i++ )
{
var temp = myData.ElementAt(i);
}
This code is taking time at ElementAt
Can anyone please guide me, what I am doing wrong. I am using Framework 3.5 in VS 2008.
Thanks in advance

EDIT: I suspect the problem is in how you're ordering. You're using reflection to first fetch and then invoke a property for every record. Even though you only want the first 25 records, it has to call GetPropertyValue on all the records first, in order to order them.
It would be much better if you could do this without reflection at all... but if you do need to use reflection, at least call Type.GetProperty() once instead of for every record.
(In some ways this is more to do with helping you diagnose the problem more easily than a full answer as such...)
As Henk said, this is very odd:
results = from sorted in results
.Skip(0)
.Take(25)
select sorted;
You almost certainly really just want:
results = results.Take(25);
(Skip(0) is pointless.)
It may not actually help, but it will make the code simpler to debug.
The next problem is that we can't actually see all your code. You've written:
After doing the order by depending on the sort order
... but you haven't shown how you're performing the ordering.
You should show us a complete example going from DataTable to its use.
Changing how you iterate over the sequence will not help - it's going to do the same thing either way, really - although it's surprising that in your last attempt, Count() apparently works quickly. Stick to the foreach - but work out exactly what that's going to be doing. LINQ uses a lot of lazy evaluation, and if you've done something which makes that very heavy going, that could be the problem. It's hard to know without seeing the whole pipeline.

The problem is that your "results" IEnumerable isn't actually being evaluated until it is passed into your method and enumerated. That means that the whole operation, getting all the data from dtAllData, selecting out the new type (which is happening on the whole enumerable, not just the first 25), and then finally the take 25 operation, are all happening on the first enumeration of the IEnumerable (foreach, while, whatever).
That's why your method is taking so long. It's actually doing some of the work defined elsewhere inside the method. If you want that to happen before your method, you could do a "ToList()" prior to the method.

You might find it easier to adopt a hybrid approach;
In order:
1) Sort your datatable in-situ. It's probably best to do this at the database level, but, if you can't, then DataTable.DefaultView.Sort is pretty efficient:
dtAllData.DefaultView.Sort = ord[0] + " " + ord[1];
This assumes that ord[0] is the column name, and ord[1] is either ASC or DESC
2) Page through the DefaultView by index:
int pageStart = 0;
List<DataRowView> pageRows = new List<DataRowView>();
for (int i = pageStart; i < dtAllData.DefaultView.Count; i++ )
{
if(pageStart + 25 > i || i == dtAllData.DefaultView.Count - 1) { break; //Exit if more than the number of pages or at the end of the rows }
pageRows.Add(dtAllData.DefaultView[i]);
}
...and create your objects from this much smaller list... (I've assumed the columns are called Id and Name, as well as the types)
List<MyType> myObjects = new List<MyType>();
foreach(DataRowView pageRow in pageRows)
{
myObjects.Add(new MyObject() { Id = Convert.ToInt32(pageRow["Id"]), Name = Convert.ToString(pageRow["Name"])});
}
You can then proceed with the rest of what you were doing.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.