LINQ select next record with each matching result

LINQ select next record with each matching result - c#

I have objects from which measurements are saved to a single table. I want to find out how long an object has been in a certain state within a time period.
So in addition to getting the record with the wanted state I need to pair it up with the next measurement made from the same object to calculate the time between them.
I came up with this monster:
// Get the the object entry from Database
MeasuredObject object1;
try
{
object1 = (MeasuredObject)(from getObject in db.MeasuredObject where wantedObject.Id.Equals(getObject.Id) select getObject).Single();
}
catch (System.ArgumentNullException e)
{
throw new System.ArgumentException("Object does not exist", "wantedObject", e);
}
// Get every measurement which matches the state in the time period and the next measurement from it
var pairs = (from m in object1.Measurements
join nextM in object1.Measurements
on (from next in object1.Measurements where (m.Id < next.Id) select next.Id).Min() equals nextM.Id
where 'm is in time period and has required state'
select new { meas = m, next = nextM });
I would say this doesn't seem very efficient especially when I'm using Compact Edition 3.5.
Is there any way to navigate to the next measurement through m or could I somehow use orderby or group to select next by Id? Or even make the join clause simpler?

From the posted code looks like you are working with in memory collection. If that's true, then the following should be sufficient:
var items = (from m in object1.Measurements
where 'm is in time period and has required state'
orderby m.Id
select m)
.ToList();
var pairs = items.Select((item, index) => new
{
meas = item,
next = index + 1 < items.Count ? items[index + 1] : null
});
EDIT: The above is not the exact equivalent of your code because it applies the filter before pairing the items. The exact optimized equivalent would be like this:
var items = object1.Measurements.OrderBy(m => m.Id).ToList();
var pairs = items.Select((item, index) => new
{
meas = item,
next = index + 1 < items.Count ? items[index + 1] : null
})
.Where(pair => 'pair.meas is in time period and has required state');

Related

Incrementing an ID using LINQ

Whenever I'm adding a new object from the front end, the id = 0. In the WebApi layer, I'm trying to find the max ID that exists in the list of object and then assign the next ID to the new objects. The code below doesn't increment the ID correctly
List<Event> events = eventVal.Where(e => e != null).ToList();
int eventMaxID = events.Max(e => e.id);
events.Where(e => e.id == 0)
.Select((e, ixc) => new { id = eventMaxID + 1, Iter = eventMaxID + 1 })
.ToList();
I'm not sure how to use the second parameter for the Select method.
Any help would be appreciated! Thanks.

In the second form of Select that you're using, ixc is the index of the item in the collection. You'll need to add that as well as the previous max Id. That way you shouldn't have to worry about assigning to Iter either (it appears you're just using it as some kind of counter) so I've removed it.
var autoIncrementedEvents = events.Where(e => e.id == 0)
.Select((e, ixc) =>
{
e.id = eventMaxId + 1 + ixc;
return e;
})
.ToList();
Note that the way your code is written the result of this Linq statment is thrown away. You'll want to assign it to something like I've done above.
I'm not going to comment on the validity of this as an overall approach in a web setting (race conditions, duplicate ids, etc). Ideally your datastore should be assigning the Id.

Group in Linq with Max: How do I get the full data set of the max value?

what I have got so far is this:
var a = from e in tcdb.timeclockevent
group e by e.workerId into r
select new { workerId = r.Key, Date = r.Max(d => d.timestamp) };
This Query is giving me latest "timestamp" of every workerId (Note: workerId is not the primary key of tcdb.timeclockevent). So it is only giving me pairs of two values but I need the whole data sets
Does anybody know how I can get the whole datasets of tcdb.timeclock with the maximal timestamp for every workerId?
OR
Does anybody know how I can get the Id of the data sets of the maximal date for each worker?
Thank you in advance :)

You can order your r grouping by timestamp and select the first one
var a = from e in tcdb.timeclockevent
group e by e.workerId into r
select r.OrderByDescending(d => d.timestamp).FirstOrDefault();

Does anybody know how I can get the whole datasets of tcdb.timeclock with the maximal timestamp for every workerId?
Well, the straightforward query would be like this:
var queryA =
from e in tcdb.timeclockevent
group e by e.workerId into g
let maxDate = g.Max(e => e.timestamp)
select new { workerId = g.Key, events = g.Where(e => e.timestamp == maxDate) };
If you don't need IQueryable<T> result and since there is no SQL construct that returns directly the grouped result set, you could try the following query, which uses a different way of filtering the records with maximal timestamp for every workerId inside the database, and then does the grouping in memory:
var queryB = tcdb.timeclockevent
.Where(e => !tcdb.timeclockevent.Any(e2 =>
e2.workerId == e.workerId && e2.timestamp > e.timestamp))
.AsEnumerable()
.GroupBy(e => e.workerId);
You can try and see which one performs better with your data.

Linq2SQL grouping and ungrouping in the same query

Here's a stumper in LINQ to SQL:
string p = prefix ?? "";
string d = delimiter ?? "";
var filegroups = from b in folder.GetFiles(data)
where b.Uri.StartsWith(p) && b.Uri.CompareTo(marker ?? "") >= 0
group b by data.DataContext.GetFileFolder(p, d, b.Uri);
//var folders = from g in filegroups where g.Key.Length > 0 select g;
//var files = from g in filegroups where g.Key.Length == 0 select g;
var files = filegroups.SelectMany(g => g.Key.Length > 0
? from b in g.Take(1) select new FilePrefix { Name = g.Key }
: from b in g select new FilePrefix { Name = b.Uri, Original = b });
var retval = files.Take(maxresults);
folders cannot be nested (out of my control) but filenames can contain slashes and whatever so a deeper folder structure can be emulated
folder.GetFiles is a simple linq equiv (IOrderedQueryable) to select * from files where folderid=#folderid order by Uri
prefix is a filter saying return only those files that start with...delimiter is the path delimiter, such as '/'marker is for pagination - starts returning at a specified point
data.DataContext.GetFileFolder maps to a sql scalar function: return the whole string up to and including the next delimiter that occurs after the prefix string
RETURN substring(#uri, 0, charindex(#delimiter, #uri, len(#prefix)) + len(#delimiter)) That was for troubleshooting - original was a client-side where clause that did map correctly to TSQL. I had just hoped doing a function would change things in the final graph, but nope.
in the above, filegroups, and the commented out folders, and files, all work as expected
The goal is to hit the database just once. I'd like to, in a single return, show subfolders and files based upon interpretation of the FilePrefix object (folders have a null 'original' value)
The issue is with the final selectmany throwing "Could not format node 'ClientQuery' for execution as SQL."
I strongly suspect this would work perfectly if it weren't for the TSQL translation, but looking at this logically, why would it not do its database work and then select the FilePrefixes client side as a final step?
It's late ;) but tomorrow I'll revert to a double tap on the database by slipping a ToList() or something similar somewhere up there to cause that final step to be full client side (kludge). But if anyone has any insights on how to accomplish this with one database hit (short of writing a stored procedure), I'd love to hear it!!
The downside to the kludge is that the final Take(maxresults) could be expensive if the db hit results in a number of records that far exceeds that. And the subsequent Skip(maxresults).Take(1) that I didn't quote, for marking the next page, would hurt twice as much.
Thank you very much

Welp, it looks like 2 database hits are necessary. I started by noticing that the call graph converted the tertiary operator into an IIF which led me to think that IIF, on the sql side, probably doesn't like subqueries as parameters.
string p = prefix ?? "";
string d = delimiter ?? "";
var filegroups = from b in folder.GetFiles(data)
where b.Uri.StartsWith(p) && b.Uri.CompareTo(marker ?? "") >= 0
group b by data.DataContext.nx_GetFileFolder(p, d, b.Uri);
var folders = from g in filegroups where g.Key.Length > 0 select g.Key;
var files = from b in folder.GetFiles(data)
where b.Uri.StartsWith(p) && b.Uri.CompareTo(marker ?? "") >= 0
&& data.DataContext.nx_GetFileFolder(p, d, b.Uri).Length == 0
select b;
folders = folders.OrderBy(f => f).Take(maxresults + 1);
files = files.OrderBy(f => f.Uri).Take(maxresults + 1);
var retval = folders.AsEnumerable().Select(f => new FilePrefix { Name = f })
.Concat(files.AsEnumerable().Select(f => new FilePrefix { Name = f.Uri, Original = f }))
.OrderBy(b => b.Name).Take(maxresults + 1);
int count = 0;
foreach (var bp in retval)
{
if (count++ < maxresults)
yield return bp;
else
newmarker.Name = bp.Name;
}
yield break;
a bit less elegant... I left filegroups and folders, but rewrote the files query to get rid of the group (generated cleaner sql and probably more efficient).
Concat still gave me trouble in this new approach, so I kicked in the AsEnumerable calls, which is the point that breaks this into 2 hits to the database.
I kept maxresults in the sql to limit traffic, so worst case is twice as much data as I want going over the wire. The +1 is to get the next record so the user can be notified where to start on the next page. And I used the iterator pattern so I wouldn't have to loop again to get that next record.

Counting number of items in ObservableCollection where it equals 1 - C#

I made an SQL query and filled the data to an ObservableCollection. The database contains many columns so I want to count how many instances where a specific column = 1, then return that number to an int.
The query:
var test = from x in m_dcSQL_Connection.Testheaders
where dtStartTime <= x.StartTime && dtEndtime >= x.StartTime
select new {
x.N,
x.StartTime,
x.TestTime,
x.TestStatus,
x.Operator,
x.Login,
x.DUT_id,
x.Tester_id,
x.PrintID
};
Then I add the data pulled from the database to an Observable Collection via:
lstTestData.Add(new clsTestNrData(item.N.ToString(),
item.StartTime.ToString(),
item.TestTime.ToString()
etc.....
I want to count how many times TestStatus = 1.
I have read about the .Count property but I do not fully understand how it works on ObservableCollections.
Any help?

The standard ObservableCollection<T>.Count property will give you the number of items in the collection.
What you are looking for is this:
testStatusOneItemCount = lstTestData.Where(item => item.TestStatus == 1).Count()
...which uses IEnumerable<T>.Count() method which is part of LINQ.

To elaborate a bit, Count will simply count the objects in your collection.
I suggest having a quick look at linq 101. Very good examples.
Here's an example:
// Assuming you have :
var list = new List<int>{1,2,3,4,5,6 };
var items_in_list = list.Count(); // = 6;
Using linq's Where, you're basically filtering out items, creating a new list. So, the following will give you the count of all the numbers which are pair:
var pair = list.Where(item => item%2 ==0);
var pair_count = pair.Count; // = 3
You can combine this without the temp variables:
var total = Enumerable.Range(1,6).Where(x => x % 2 ==0).Count(); // total = 6;
Or you can then select something else:
var squares_of_pairs = Enumerable.Range(1,6)
.Where(x => x % 2 ==0).
.Select( pair => pair*pair);
// squares_of_pairs = {4,16, 36}. You can count them, but still get 3 :)

How can I make this LINQ query of an Enumerable DataTable of GTFS data faster?

I'm working with the GTFS data for the New York City MTA subway system. I need to find the stop times for each route at a specific stop. To do that, I get the stop times from a StopTimes DataTable that I have, for a specific stop_id. I only want stop times between now and the next 2 hours.
Then, I need to lookup the trip for each stop time, using the trip_id value. From that trip, I have to lookup the route, using the route_id value, in order to get the route name or number for the stop time.
Here are the counts for each DataTable: StopTimes(522712), Trips(19092), Routes(27).
Right now, this takes anywhere from 20 seconds to 40 seconds to execute. How can I speed this up? Any and all suggestions are appreciated. Thanks!
foreach (var r in StopTimes.OrderBy(z => z.Field<DateTime>("departure_time").TimeOfDay)
.Where(z => z.Field<string>("stop_id") == stopID &&
z["departure_time"].ToString() != "" &&
z.Field<DateTime>("departure_time").TimeOfDay >= DateTime.UtcNow.AddHours(-5).TimeOfDay &&
z.Field<DateTime>("departure_time").TimeOfDay <= DateTime.UtcNow.AddHours(-5).AddHours(2).TimeOfDay))
{
var trip = (from z in Trips
where z.Field<string>("trip_id") == r.Field<string>("trip_id") &&
z["route_id"].ToString() != ""
select z).Single();
var route = (from z in Routes
where z.Field<string>("route_id") == trip.Field<string>("route_id")
select z).Single();
// do stuff (not time-consuming)
}

Try this:
var now = DateTime.UtcNow;
var tod0 = now.AddHours(-5).TimeOfDay;
var tod1 = now.AddHours(-5).AddHours(2).TimeOfDay;
var sts =
from st in StopTimes
let StopID = st.Field<string>("stop_id")
where StopID == stopID
where st["departure_time"].ToString() != ""
let DepartureTime = st.Field<DateTime>("departure_time").TimeOfDay
where DepartureTime >= tod0
where DepartureTime >= tod1
let TripID = st.Field<string>("trip_id")
select new
{
StopID,
TripID,
DepartureTime,
};
Note that there is no orderby in this query and that we're returning an anonymous type. For your "do stuff (not time-consuming)" code to run you may need to add some more properties.
The same approach happens for Trips & Routes.
var ts =
from t in Trips
where t["route_id"].ToString() != ""
let TripID = t.Field<string>("trip_id")
let RouteID = t.Field<string>("route_id")
select new
{
TripID,
RouteID,
};
var rs =
from r in Routes
let RouteID = r.Field<string>("route_id")
select new
{
RouteID,
};
Since you're getting a single record for each look up then using ToDictionary(...) is a good choice to use.
var tripLookup = ts.ToDictionary(t => t.TripID);
var routeLookup = rs.ToDictionary(r => r.RouteID);
Now your query looks like this:
var query = from StopTime in sts.ToArray()
let Trip = tripLookup[StopTime.TripID]
let Route = routeLookup[Trip.RouteID]
orderby StopTime.DepartureTime
select new
{
StopTime,
Trip,
Route,
};
Notice that I've used .ToArray() and I've put the orderby right at the end.
And you run your code like this:
foreach (var q in query)
{
// do stuff (not time-consuming)
}
Let me know if this helps.

I would make a Dictionary<int, Trip> from Trips where the key is the trip_id, and a Dictionary<int, Route> from Routes where the key is route_id. your code is iterating over the 19092 items in Trips once for every one of the items in the filtered IEnumerable<StopTime>. Same deal for Routes, but at least there are only 27 items in there.
Edit:
actually looking at it more closely, the first dictionary would be Dictionary<int, int> where the value is the route_id. And given the one to one relationship between trip_id and route_id you could just build a Dictionary<trip_id, Route> and do one lookup.

It helps to understand deferred query execution so you can make case by case decisions on how to optimize your runtime. Here is a good blog post that can get you started: http://ox.no/posts/linq-vs-loop-a-performance-test

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ select next record with each matching result - c#

Related

Incrementing an ID using LINQ

Group in Linq with Max: How do I get the full data set of the max value?

Linq2SQL grouping and ungrouping in the same query

Counting number of items in ObservableCollection where it equals 1 - C#

How can I make this LINQ query of an Enumerable DataTable of GTFS data faster?

Categories

Resources