Accessing IEnumerable clears content - c#

I've got a database (Cassandra) query that returns an IEnumerable. Trying to track down why this was returning no data (when I know there's data in the database) I found a curious issue.
The query does in fact return data, 25 entries. I was checking this with a Data.Count(); But later in the code, it was empty. I realised that the Count Method was inexplicably clearing the data.
After a quick investigation: Any for of reading this data clears it completely. Even in the debug, if I load the "results view" to get the list of data, initially I see my 25 entries - then if I click off, and then reload the results view: Emtpy.
Anyone ever had anything like this before?
String connectionString = "SELECT * FROM thedatabase WHERE thecondition";
RowSet Data = ExecuteComand(connectionString);
if (Data == null) // interestingly, I can check for null without issue
{
OutputMessage("Output to " + exportFile + " failed");
return;
}
int b = Data.Count(); // results in 25
int c = Data.Count(); // results in 0

I don't know cassandra but i assume that the returned IEnumerable<T> is executed lazily(deferred). By consuming it (f.e. with foreach, ToList or Count) the query gets executed and the resources(f.e. the connection) are disposed/closed.
If that's the case you could load it into an in memory collection, for example:
var data = ExecuteComand("SELECT * FROM thedatabase WHERE thecondition").ToList();
Now you can use data.Count without "clearing" it.

There is no guarantee that a Enumerable can be enumerated twice. A correctly built enumerable should at least throw an exception, but bad code could always happen. What you can do is materialize it in a well-behaved collection, a List<T> for example:
var data2 = Data.ToList();
int b = data2.Count;
int c = data2.Count;
Now you even have the advantage that Count is O(1) instead of O(N) :-) (in a List<T> the Count value is "cached")

Related

ViewData[var] being populated by the same SelectList, not a different one

I have a list of Band Titles, and I wish to attach a SelectList with each one of them, depending on the BandID.
So first I am getting the list:-
List<BandQuestionTitles> bandQuestTitles = viewModel.PopulateBandQuestionTitles();
and then I have a loop on the BandQuestionTitles to populate a ViewData[var] from a SelectList:-
foreach (var bandQuestTitleItem in bandQuestTitles)
{
//populate the dropdownlist
string strViewDataString = bandQuestTitleItem.BandQuestTitlesText + "Data";
ViewData[strViewDataString] = new SelectList(viewModel.bandQuestionList.Where(p => p.BandQuestTitleID == bandQuestTitleItem.BandQuestTitlesID), "BandQuestID", "BandQuestText");
}
However for some reason, although I am getting correctly the 7 ViewData[""], I am always getting the same SelectList
When I hard code it, it works fine :-
ViewData["PersonalData"] = new SelectList(viewModel.bandQuestionList.Where(p => p.BandQuestTitleID == 1), "BandQuestID", "BandQuestText");
ViewData["BusinessData"] = new SelectList(viewModel.bandQuestionList.Where(p => p.BandQuestTitleID == 2), "BandQuestID", "BandQuestText");
What am I doing wrong in the loop?
Thanks for your help and time
Looks like a problem with your LINQ not being executed when you think it is.
Try this:
new SelectList(viewModel.bandQuestionList.Where(p => p.BandQuestTitleID == bandQuestTitleItem.BandQuestTitlesID)**.ToList()**, "BandQuestID", "BandQuestText");
Relevant article from MS:
http://msdn.microsoft.com/en-us/library/bb738633.aspx
In a query that returns a sequence of values, the query variable itself never holds the query results and only stores the query commands. Execution of the query is deferred until the query variable is iterated over in a foreach or For Each loop. This is known as deferred execution; that is, query execution occurs some time after the query is constructed.
In other words, when the queries are executed bandQuestTitleItem.BandQuestTitlesID will be assigned to the last(7th) ID in your collection for all of the queries.
Adding the .ToList() will cause the queries to execute immediately.

Looping on IEnumerator<T>, Any Suggestions

I am having a situation where looping through the result of LINQ is getting on my nerves. Well here is my scenario:
I have a DataTable, that comes from database, from which I am taking data as:
var results = from d in dtAllData.AsEnumerable()
select new MyType
{
ID = d.Field<Decimal>("ID"),
Name = d.Field<string>("Name")
}
After doing the order by depending on the sort order as:
if(orderBy != "")
{
string[] ord = orderBy.Split(' ');
if (ord != null && ord.Length == 2 && ord[0] != "")
{
if (ord[1].ToLower() != "desc")
{
results = from sorted in results
orderby GetPropertyValue(sorted, ord[0])
select sorted;
}
else
{
results = from sorted in results
orderby GetPropertyValue(sorted, ord[0]) descending
select sorted;
}
}
}
The GetPropertyValue method is as:
private object GetPropertyValue(object obj, string property)
{
System.Reflection.PropertyInfo propertyInfo = obj.GetType().GetProperty(property);
return propertyInfo.GetValue(obj, null);
}
After this I am taking out 25 records for first page like:
results = from sorted in results
.Skip(0)
.Take(25)
select sorted;
So far things are going good, Now I have to pass this results to a method which is going to do some manipulation on the data and return me the desired data, here in this method when I want to loop these 25 records its taking a good enough time. My method definition is:
public MyTypeCollection GetMyTypes(IEnumerable<MyType> myData, String dateFormat, String offset)
I have tried foreach and it takes like 8-10 secs on my machine, it is taking time at this line:
foreach(var _data in myData)
I tried while loop and is doing same thing, I used it like:
var enumerator = myData.GetEnumerator();
while(enumerator.MoveNext())
{
int n = enumerator.Current;
Console.WriteLine(n);
}
This piece of code is taking time at MoveNext
Than I went for for loop like:
int length = myData.Count();
for (int i = 0; i < 25;i++ )
{
var temp = myData.ElementAt(i);
}
This code is taking time at ElementAt
Can anyone please guide me, what I am doing wrong. I am using Framework 3.5 in VS 2008.
Thanks in advance
EDIT: I suspect the problem is in how you're ordering. You're using reflection to first fetch and then invoke a property for every record. Even though you only want the first 25 records, it has to call GetPropertyValue on all the records first, in order to order them.
It would be much better if you could do this without reflection at all... but if you do need to use reflection, at least call Type.GetProperty() once instead of for every record.
(In some ways this is more to do with helping you diagnose the problem more easily than a full answer as such...)
As Henk said, this is very odd:
results = from sorted in results
.Skip(0)
.Take(25)
select sorted;
You almost certainly really just want:
results = results.Take(25);
(Skip(0) is pointless.)
It may not actually help, but it will make the code simpler to debug.
The next problem is that we can't actually see all your code. You've written:
After doing the order by depending on the sort order
... but you haven't shown how you're performing the ordering.
You should show us a complete example going from DataTable to its use.
Changing how you iterate over the sequence will not help - it's going to do the same thing either way, really - although it's surprising that in your last attempt, Count() apparently works quickly. Stick to the foreach - but work out exactly what that's going to be doing. LINQ uses a lot of lazy evaluation, and if you've done something which makes that very heavy going, that could be the problem. It's hard to know without seeing the whole pipeline.
The problem is that your "results" IEnumerable isn't actually being evaluated until it is passed into your method and enumerated. That means that the whole operation, getting all the data from dtAllData, selecting out the new type (which is happening on the whole enumerable, not just the first 25), and then finally the take 25 operation, are all happening on the first enumeration of the IEnumerable (foreach, while, whatever).
That's why your method is taking so long. It's actually doing some of the work defined elsewhere inside the method. If you want that to happen before your method, you could do a "ToList()" prior to the method.
You might find it easier to adopt a hybrid approach;
In order:
1) Sort your datatable in-situ. It's probably best to do this at the database level, but, if you can't, then DataTable.DefaultView.Sort is pretty efficient:
dtAllData.DefaultView.Sort = ord[0] + " " + ord[1];
This assumes that ord[0] is the column name, and ord[1] is either ASC or DESC
2) Page through the DefaultView by index:
int pageStart = 0;
List<DataRowView> pageRows = new List<DataRowView>();
for (int i = pageStart; i < dtAllData.DefaultView.Count; i++ )
{
if(pageStart + 25 > i || i == dtAllData.DefaultView.Count - 1) { break; //Exit if more than the number of pages or at the end of the rows }
pageRows.Add(dtAllData.DefaultView[i]);
}
...and create your objects from this much smaller list... (I've assumed the columns are called Id and Name, as well as the types)
List<MyType> myObjects = new List<MyType>();
foreach(DataRowView pageRow in pageRows)
{
myObjects.Add(new MyObject() { Id = Convert.ToInt32(pageRow["Id"]), Name = Convert.ToString(pageRow["Name"])});
}
You can then proceed with the rest of what you were doing.

Updating an item property within IEnumerable but the property doesn't stay set?

I have two tables: Transactions and TransactionAgents. TransactionAgents has a foreign key to Transactions called TransactionID. Pretty standard.
I also have this code:
BrokerManagerDataContext db = new BrokerManagerDataContext();
var transactions = from t in db.Transactions
where t.SellingPrice != 0
select t;
var taAgents = from ta in db.TransactionAgents
select ta;
foreach (var transaction in transactions)
{
foreach(var agent in taAgents)
{
agent.AgentCommission = ((transaction.CommissionPercent / 100) * (agent.CommissionPercent / 100) * transaction.SellingPrice) - agent.BrokerageSplit;
}
}
dataGridView1.DataSource = taAgents;
Basically, a TransactionAgent has a property/column named AgentCommission, which is null for all TransactionAgents in my database.
My goal is to perform the math you see in the foreach(var agent in taAgents) to patch up the value for each agent so that it isn't null.
Oddly, when I run this code and break-point on agent.AgentCommission = (formula) it shows the value is being calculated for AgentCommissision and the object is being updated but after it displays in my datagrid (used only for testing), it does not show the value it calculated.
So, to me, it seems that the Property isn't being permanently set on the object. What's more, If I persist this newly updated object back to the database with an update, I doubt the calculated AgentCommission will be set there.
Without having my table set up the same way, is there anyone that can look at the code and see why I am not retaining the property's value?
IEnumerable<T>s do not guarantee that updated values will persist across enumerations. For instance, a List will return the same set of objects on every iteration, so if you update a property, it will be saved across iterations. However, many other implementations of IEnumerables return a new set of objects each time, so any changes made will not persist.
If you need to store and update the results, pull the IEnumerable<T> down to a List<T> using .ToList() or project it into a new IEnumerable<T> using .Select() with the changes applied.
To specifically apply that to your code, it would look like this:
var transactions = (from t in db.Transactions
where t.SellingPrice != 0
select t).ToList();
var taAgents = (from ta in db.TransactionAgents
select ta).ToList();
foreach (var transaction in transactions)
{
foreach(var agent in taAgents)
{
agent.AgentCommission = ((transaction.CommissionPercent / 100) * (agent.CommissionPercent / 100) * transaction.SellingPrice) - agent.BrokerageSplit;
}
}
dataGridView1.DataSource = taAgents;
Specifically, the problem is that each time you access the IEnumerable, it enumerates over the collection. In this case, the collection is a call to the database. In the first part, you're getting the values from the database and updating them. In the second part, you're getting the values from the database again and setting that as the datasource (or, pedantically, you're setting the enumerator as the datasource, and then that is getting the values from the database).
Use .ToList() or similar to keep the results in memory, and access the same collection every time.
Assuming you are using LINQ to SQL, if EnableObjectTracking is false, then the objects will be constructed new every time the query is run. Otherwise, you would be getting the same object instances each time and your changes would survive. However, like others have shown, instead of having the query execute multiple times, cache the results in a list. Not only will you get what you want working, you'll have fewer database round trips.
I found that I had to locate the item in the list that I wanted to modify, extract the copy, modify the copy (by incrementing its count property), remove the original from the list and add the modified copy.
var x = stats.Where(d => d.word == s).FirstOrDefault();
var statCount = stats.IndexOf(x);
x.count++;
stats.RemoveAt(statCount);
stats.Add(x);
It is helpful to rewrite your LINQ expression using lambdas so that we can consider the code in more explicit terms.
//Original code from question
var taAgents = from ta in db.TransactionAgents
select ta;
//Rewritten to explicitly call attention to what Select() is actually doing
var taAgents = db.TransactionAgents.Select(ta => new TransactionAgents(/*database row's data*/)});
In the rewritten code, we can clearly see that Select() is constructing a new object based on each row returned from the database. What's more, this object construction occurs every time the IEnumerable taAgents is iterated through.
So, explained more concretely, if there are 5 TransactionAgents rows in the database, in the following example, the TransactionAgents() constructor is called a total of 10 times.
// Assume there are 5 rows in the TransactionAgents table
var taAgents = from ta in db.TransactionAgents
select ta;
//foreach will iterate through the IEnumerable, thus calling the TransactionAgents() constructor 5 times
foreach(var ta in taAgents)
{
Console.WriteLine($"first iteration through taAgents - element {ta}");
}
// these first 5 TransactionAgents objects are now out of scope and are destroyed by the GC
//foreach will iterate through the IEnumerable, thus calling the TransactionAgents() constructor 5 MORE times
foreach(var ta in taAgents)
{
Console.WriteLine($"second iteration through taAgents - element {ta}");
}
// these second 5 TransactionAgents objects are now out of scope and are destroyed by the GC
As we can see, all 10 of our TransactionAgents objects were created by the lambda in our Select() method, and do not exist outside of the scope of the foreach statement.

LINQ ToList().Take(10) vs Take(10).ToList() which one generates more efficient query

Given the following LINQ Statement(s), which will be more efficient?
ONE:
public List<Log> GetLatestLogEntries()
{
var logEntries = from entry in db.Logs
select entry;
return logEntries.ToList().Take(10);
}
TWO:
public List<Log> GetLatestLogEntries()
{
var logEntries = from entry in db.Logs
select entry;
return logEntries.Take(10).ToList();
}
I am aware that .ToList() executes the query immediately.
The first version wouldn't even compile - because the return value of Take is an IEnumerable<T>, not a List<T>. So you'd need it to be:
public List<Log> GetLatestLogEntries()
{
var logEntries = from entry in db.Logs
select entry;
return logEntries.ToList().Take(10).ToList();
}
That would fetch all the data from the database and convert it to a list, then take the first 10 entries, then convert it to a list again.
Getting the Take(10) to occur in the database (i.e. the second form) certainly looks a heck of a lot cheaper to me...
Note that there's no Queryable.ToList() method - you'll end up calling Enumerable.ToList() which will fetch all the entries. In other words, the call to ToList doesn't participate in SQL translation, whereas Take does.
Also note that using a query expression here doesn't make much sense either. I'd write it as:
public List<Log> GetLatestLogEntries()
{
return db.Log.Take(10).ToList();
}
Mind you, you may want an OrderBy call - otherwise it'll just take the first 10 entries it finds, which may not be the latest ones...
Your first option won't work, because .Take(10) converts it to IEnumerable<Log>. Your return type is List<Log>, so you would have to do return logEntries.ToList().Take(10).ToList(), which is more inefficient.
By doing .ToList().Take(10), you are forcing the .Take(10) to be LINQ to objects, while the other way the filter could be passed on to the database or other underlying data source. In other words, if you first do .ToList(), ALL the objects have to be transferred from the database and allocated in memory. THEN you filter to the first 10. If you're talking about millions of database rows (and objects) you can imagine how this is VERY inefficient and not scalable.
The second one will also run immediately because you have .ToList(), so no difference there.
The second version will be more efficient (in both time and memory usage). For example, imagine that you have a sequence containing 1,000,000 items:
The first version iterates through all 1,000,000 items, adding them to a list as it goes. Then, finally, it will take the first 10 items from that large list.
The second version only needs to iterate the first 10 items, adding them to a list as it goes. (The remaining 999,990 items don't even need to be considered.)
How about this ?
I have 5000 records in "items"
version 1:
IQueryable<T> items = Items; // my items
items = ApplyFilteringCriteria(items, filter); // my filter BL
items = ApplySortingCriteria(items, sortBy, sortDir); // my sorting BL
items = items.Skip(0);
items = items.Take(25);
return items.ToList();
this took : 20 sec on server
version 2:
IQueryable<T> items = Items; // my items
items = ApplyFilteringCriteria(items, filter); // my filter BL
items = ApplySortingCriteria(items, sortBy, sortDir); // my sorting BL
List<T> x = items.ToList();
items = x.Skip(0).ToList();
items = x.Take(25).ToList();
return x;
this took : 1 sec on server
What do you think now ? Any idea why ?
The second option.
The first will evaluate the entire enumerable, slurping it into a List(); then you set up the iterator that will iterate through the first ten objects and then exit.
The second sets up the Take() iterator first, so whatever happens later than that, only 10 objects will be evaluated and sent to the "downstream" processing (in this case the ToList() which will take those ten elements and return them as the concrete List).

Help needed for optimizing linq data extraction

I'm fetching data from all 3 tables at once to avoid network latency. Fetching the data is pretty fast, but when I loop through the results a lot of time is used
Int32[] arr = { 1 };
var query = from a in arr
select new
{
Basket = from b in ent.Basket
where b.SUPERBASKETID == parentId
select new
{
Basket = b,
ObjectTypeId = 0,
firstObjectId = "-1",
},
BasketImage = from b in ent.Image
where b.BASKETID == parentId
select new
{
Image = b,
ObjectTypeId = 1,
CheckedOutBy = b.CHECKEDOUTBY,
firstObjectId = b.FIRSTOBJECTID,
ParentBasket = (from parentBasket in ent.Basket
where parentBasket.ID == b.BASKETID
select parentBasket).ToList()[0],
},
BasketFile = from b in ent.BasketFile
where b.BASKETID == parentId
select new
{
BasketFile = b,
ObjectTypeId = 2,
CheckedOutBy = b.CHECKEDOUTBY,
firstObjectId = b.FIRSTOBJECTID,
ParentBasket = (from parentBasket in ent.Basket
where parentBasket.ID == b.BASKETID
select parentBasket),
}
};
//Exception handling
var mixedElements = query.First();
ICollection<BasketItem> basketItems = new Collection<BasketItem>();
//Here 15 millis has been used
//only 6 elements were found
if (mixedElements.Basket.Count() > 0)
{
foreach (var mixedBasket in mixedElements.Basket){}
}
if (mixedElements.BasketFile.Count() > 0)
{
foreach (var mixedBasketFile in mixedElements.BasketFile){}
}
if (mixedElements.BasketImage.Count() > 0)
{
foreach (var mixedBasketImage in mixedElements.BasketImage){}
}
//the empty loops takes 811 millis!!
Why are you bothering to check the counts before the foreach statements? If there are no results, the foreach will just finish immediately.
Your queries are actually all being deferred - they'll be executed as and when you ask for the data. Don't forget that your outermost query is a LINQ to Objects query: it's just returning the result of calling ent.Basket.Where(...).Select(...) etc... which doesn't actually execute the query.
Your plan to do all three queries in one go isn't actually working. However, by asking for the count separately, you may actually be executing each database query twice - once just getting the count and once for the results.
I strongly suggest that you get rid of the "optimizations" in this code which are making it much more complicated and slower than just writing the simplest code you can.
I don't know of any way of getting LINQ to SQL (or LINQ to EF) to execute multiple queries in a single call - but this approach certainly isn't going to do it.
One other minor hint which is irrelevant in this case, but can be useful in LINQ to Objects - if you want to find out whether there's any data in a collection, just use Any() instead of Count() > 0 - that way it can stop as soon as it's found anything.
You're using IEnumerable in the foreach loop. Implementations only have to prepare data when it's asked for. In this way, I'd suggest that the above code is accessing your data lazily -- that is, only when you enumerate the items (which actually happens when you call Count().)
Put a System.Diagnostics.Stopwatch around the call to Count() and see whether that's taking the bulk of the time you're seeing.
I can't comment further here because you don't specify the type of ent in your code sample.

Categories

Resources