Linq2SQL grouping and ungrouping in the same query

Linq2SQL grouping and ungrouping in the same query - c#

Here's a stumper in LINQ to SQL:
string p = prefix ?? "";
string d = delimiter ?? "";
var filegroups = from b in folder.GetFiles(data)
where b.Uri.StartsWith(p) && b.Uri.CompareTo(marker ?? "") >= 0
group b by data.DataContext.GetFileFolder(p, d, b.Uri);
//var folders = from g in filegroups where g.Key.Length > 0 select g;
//var files = from g in filegroups where g.Key.Length == 0 select g;
var files = filegroups.SelectMany(g => g.Key.Length > 0
? from b in g.Take(1) select new FilePrefix { Name = g.Key }
: from b in g select new FilePrefix { Name = b.Uri, Original = b });
var retval = files.Take(maxresults);
folders cannot be nested (out of my control) but filenames can contain slashes and whatever so a deeper folder structure can be emulated
folder.GetFiles is a simple linq equiv (IOrderedQueryable) to select * from files where folderid=#folderid order by Uri
prefix is a filter saying return only those files that start with...delimiter is the path delimiter, such as '/'marker is for pagination - starts returning at a specified point
data.DataContext.GetFileFolder maps to a sql scalar function: return the whole string up to and including the next delimiter that occurs after the prefix string
RETURN substring(#uri, 0, charindex(#delimiter, #uri, len(#prefix)) + len(#delimiter)) That was for troubleshooting - original was a client-side where clause that did map correctly to TSQL. I had just hoped doing a function would change things in the final graph, but nope.
in the above, filegroups, and the commented out folders, and files, all work as expected
The goal is to hit the database just once. I'd like to, in a single return, show subfolders and files based upon interpretation of the FilePrefix object (folders have a null 'original' value)
The issue is with the final selectmany throwing "Could not format node 'ClientQuery' for execution as SQL."
I strongly suspect this would work perfectly if it weren't for the TSQL translation, but looking at this logically, why would it not do its database work and then select the FilePrefixes client side as a final step?
It's late ;) but tomorrow I'll revert to a double tap on the database by slipping a ToList() or something similar somewhere up there to cause that final step to be full client side (kludge). But if anyone has any insights on how to accomplish this with one database hit (short of writing a stored procedure), I'd love to hear it!!
The downside to the kludge is that the final Take(maxresults) could be expensive if the db hit results in a number of records that far exceeds that. And the subsequent Skip(maxresults).Take(1) that I didn't quote, for marking the next page, would hurt twice as much.
Thank you very much

Welp, it looks like 2 database hits are necessary. I started by noticing that the call graph converted the tertiary operator into an IIF which led me to think that IIF, on the sql side, probably doesn't like subqueries as parameters.
string p = prefix ?? "";
string d = delimiter ?? "";
var filegroups = from b in folder.GetFiles(data)
where b.Uri.StartsWith(p) && b.Uri.CompareTo(marker ?? "") >= 0
group b by data.DataContext.nx_GetFileFolder(p, d, b.Uri);
var folders = from g in filegroups where g.Key.Length > 0 select g.Key;
var files = from b in folder.GetFiles(data)
where b.Uri.StartsWith(p) && b.Uri.CompareTo(marker ?? "") >= 0
&& data.DataContext.nx_GetFileFolder(p, d, b.Uri).Length == 0
select b;
folders = folders.OrderBy(f => f).Take(maxresults + 1);
files = files.OrderBy(f => f.Uri).Take(maxresults + 1);
var retval = folders.AsEnumerable().Select(f => new FilePrefix { Name = f })
.Concat(files.AsEnumerable().Select(f => new FilePrefix { Name = f.Uri, Original = f }))
.OrderBy(b => b.Name).Take(maxresults + 1);
int count = 0;
foreach (var bp in retval)
{
if (count++ < maxresults)
yield return bp;
else
newmarker.Name = bp.Name;
}
yield break;
a bit less elegant... I left filegroups and folders, but rewrote the files query to get rid of the group (generated cleaner sql and probably more efficient).
Concat still gave me trouble in this new approach, so I kicked in the AsEnumerable calls, which is the point that breaks this into 2 hits to the database.
I kept maxresults in the sql to limit traffic, so worst case is twice as much data as I want going over the wire. The +1 is to get the next record so the user can be notified where to start on the next page. And I used the iterator pattern so I wouldn't have to loop again to get that next record.

Related

Linq join using split in condition

I'm a newcomer in Linq C#.
I have a scenario where I need to check part of a sentence is equal to another value of the field.
I use IndexOf to get part of the sentence in the left join condition. The result is good when any data match between 'a' and 'c'. But when data does not exist in 'c', then the value of 'test' is all data of table dbData.Data3.
Can anyone know what I'm missing here?
var test = (from a in dbData.Data2
let COLODescIndexOfSpace = a.LongKeywordDesc .IndexOf(' ') < 0 ? 0 : a.LongKeywordDesc .IndexOf(' ')
join c in dbData.Data3 on
new
{
KeywordDesc = a.LongKeywordDesc .Substring(0, COLODescIndexOfSpace),
Stsrc = true
}
equals new
{
KeywordDesc = c.KeywordDesc,
Stsrc = (c.Stsrc != AppConstant.StrSc.Deactive)
}
into c_leftjoin
from c in c_leftjoin.DefaultIfEmpty()
where lmsCourseOutlineIds.Contains(a.CourseOutlineID)
select new
{
data = a.KeywordID,
data2 = c.KeywordID,
data3 = c.KeywordDesc,
}).ToList();
this is some example of data
here
and this is what I expect as result
here

Here is one way to solve the problem (I will leave it to my Linq betters figure out a solution purely with Linq). In short, just create a SQL View in the database and call it to get your desired results.
For example, in the database:
create view dbo.GetSentencesByKeywords
as
select
d2.ID,
d2.longKeywordDesc,
d3.keywordDesc,
d3.AdditionalInfo
from
dbo.Data2 d2
left join
(
select
ID,
keyWordDesc,
AdditionalInfo,
len(keyWordDesc) as keyWordLength
from dbo.Data3 data3
) d3
on substring(d2.longKeywordDesc, 0, d3.keyWordLength + 1) = d3.keywordDesc
Now the linq is nice and simple (assuming you can re-scaffold the dbContext to add it to your models, or otherwise can get it in there one way or another):
var x = dbData.GetSentencesByKeywords.Select(c => c);

LINQ select next record with each matching result

I have objects from which measurements are saved to a single table. I want to find out how long an object has been in a certain state within a time period.
So in addition to getting the record with the wanted state I need to pair it up with the next measurement made from the same object to calculate the time between them.
I came up with this monster:
// Get the the object entry from Database
MeasuredObject object1;
try
{
object1 = (MeasuredObject)(from getObject in db.MeasuredObject where wantedObject.Id.Equals(getObject.Id) select getObject).Single();
}
catch (System.ArgumentNullException e)
{
throw new System.ArgumentException("Object does not exist", "wantedObject", e);
}
// Get every measurement which matches the state in the time period and the next measurement from it
var pairs = (from m in object1.Measurements
join nextM in object1.Measurements
on (from next in object1.Measurements where (m.Id < next.Id) select next.Id).Min() equals nextM.Id
where 'm is in time period and has required state'
select new { meas = m, next = nextM });
I would say this doesn't seem very efficient especially when I'm using Compact Edition 3.5.
Is there any way to navigate to the next measurement through m or could I somehow use orderby or group to select next by Id? Or even make the join clause simpler?

From the posted code looks like you are working with in memory collection. If that's true, then the following should be sufficient:
var items = (from m in object1.Measurements
where 'm is in time period and has required state'
orderby m.Id
select m)
.ToList();
var pairs = items.Select((item, index) => new
{
meas = item,
next = index + 1 < items.Count ? items[index + 1] : null
});
EDIT: The above is not the exact equivalent of your code because it applies the filter before pairing the items. The exact optimized equivalent would be like this:
var items = object1.Measurements.OrderBy(m => m.Id).ToList();
var pairs = items.Select((item, index) => new
{
meas = item,
next = index + 1 < items.Count ? items[index + 1] : null
})
.Where(pair => 'pair.meas is in time period and has required state');

Using Intersect I'm getting a Local sequence cannot be used in LINQ to SQL implementations of query operators except the Contains operator

I'm using a Linq to SQL query to provide a list of search term matches against a database field. The search terms are an in memory string array. Specifically, I'm using an "intersect" within the Linq query, comparing the search terms with a database field "Description". In the below code, the description field is iss.description. The description field is separated into an array within the Linq query and the intersect is used to compare the search terms and description term to keep all of the comparing and conditions within the Linq query so that the database is not taxed. In my research, trying o overcome the problem, I have found that the use of an in-memory, or "local" sequence is not supported. I have also tried a few suggestions during my research, like using "AsEnumerable" or "AsQueryable" without success.
searchText = searchText.ToUpper();
var searchTerms = searchText.Split(' ');
var issuesList1 = (
from iss in DatabaseConnection.CustomerIssues
let desc = iss.Description.ToUpper().Split(' ')
let count = desc.Intersect(searchTerms).Count()
where desc.Intersect(searchTerms).Count() > 0
join stoi in DatabaseConnection.SolutionToIssues on iss.IssueID equals stoi.IssueID into stoiToiss
from stTois in stoiToiss.DefaultIfEmpty()
join solJoin in DatabaseConnection.Solutions on stTois.SolutionID equals solJoin.SolutionID into solutionJoin
from solution in solutionJoin.DefaultIfEmpty()
select new IssuesAndSolutions
{
IssueID = iss.IssueID,
IssueDesc = iss.Description,
SearchHits = count,
SolutionDesc = (solution.Description == null)? "No Solutions":solution.Description,
SolutionID = (solution.SolutionID == null) ? 0 : solution.SolutionID,
SolutionToIssueID = (stTois.SolutionToIssueID == null) ? 0 : stTois.SolutionToIssueID,
Successful = (stTois.Successful == null)? false : stTois.Successful
}).ToList();
...
The only way I have been successful is to create two queries and calling a method as shown below, but this requires the Linq Query to return all of the matching results (with the number of hits for search terms in the description) including the non-matched records and provide an in-memory List<> and then use another Linq Query to filter out the non-matched records.
public static int CountHits(string[] searchTerms, string Description)
{
int hits = 0;
foreach (string item in searchTerms)
{
if (Description.ToUpper().Contains(item.Trim().ToUpper())) hits++;
}
return hits;
}
public static List<IssuesAndSolutions> SearchIssuesAndSolutions(string searchText)
{
using (BYCNCDatabaseDataContext DatabaseConnection = new BYCNCDatabaseDataContext())
{
searchText = searchText.ToUpper();
var searchTerms = searchText.Split(' ');
var issuesList1 = (
from iss in DatabaseConnection.CustomerIssues
join stoi in DatabaseConnection.SolutionToIssues on iss.IssueID equals stoi.IssueID into stoiToiss
from stTois in stoiToiss.DefaultIfEmpty()
join solJoin in DatabaseConnection.Solutions on stTois.SolutionID equals solJoin.SolutionID into solutionJoin
from solution in solutionJoin.DefaultIfEmpty()
select new IssuesAndSolutions
{
IssueID = iss.IssueID,
IssueDesc = iss.Description,
SearchHits = CountHits(searchTerms, iss.Description),
SolutionDesc = (solution.Description == null)? "No Solutions":solution.Description,
SolutionID = (solution.SolutionID == null) ? 0 : solution.SolutionID,
SolutionToIssueID = (stTois.SolutionToIssueID == null) ? 0 : stTois.SolutionToIssueID,
Successful = (stTois.Successful == null)? false : stTois.Successful
}).ToList();
var issuesList = (
from iss in issuesList1
where iss.SearchHits > 0
select iss).ToList();
...
I would be comfortable with two Linq Queries, but with the first Linq Query only returning the matched records and then maybe using a second, maybe lambda expression to order them, but my trials have not been successful.
Any help would be most appreciated.

Ok, so after more searching more techniques, and trying user1010609's technique, I managed to get it working after an almost complete rewrite. The following code first provides a flat record query with all of the information I am searching, then a new list is formed with the filtered information compared against the search terms (counting the hits of each search term for ordering by relevance). I was careful not to return a list of the flat file so there would be some efficiency in the final database retrieval (during the formation of the filtered List<>). I am positive this is not even close to being an efficient method, but it works. I am eager to see more and unique techniques to solving this type of problem. Thanks!
searchText = searchText.ToUpper();
List<string> searchTerms = searchText.Split(' ').ToList();
var allIssues =
from iss in DatabaseConnection.CustomerIssues
join stoi in DatabaseConnection.SolutionToIssues on iss.IssueID equals stoi.IssueID into stoiToiss
from stTois in stoiToiss.DefaultIfEmpty()
join solJoin in DatabaseConnection.Solutions on stTois.SolutionID equals solJoin.SolutionID into solutionJoin
from solution in solutionJoin.DefaultIfEmpty()
select new IssuesAndSolutions
{
IssueID = iss.IssueID,
IssueDesc = iss.Description,
SolutionDesc = (solution.Description == null) ? "No Solutions" : solution.Description,
SolutionID = (solution.SolutionID == null) ? 0 : solution.SolutionID,
SolutionToIssueID = (stTois.SolutionToIssueID == null) ? 0 : stTois.SolutionToIssueID,
Successful = (stTois.Successful == null) ? false : stTois.Successful
};
List<IssuesAndSolutions> filteredIssues = new List<IssuesAndSolutions>();
foreach (var issue in allIssues)
{
int hits = 0;
foreach (var term in searchTerms)
{
if (issue.IssueDesc.ToUpper().Contains(term.Trim())) hits++;
}
if (hits > 0)
{
IssuesAndSolutions matchedIssue = new IssuesAndSolutions();
matchedIssue.IssueID = issue.IssueID;
matchedIssue.IssueDesc = issue.IssueDesc;
matchedIssue.SearchHits = hits;
matchedIssue.CustomerID = issue.CustomerID;
matchedIssue.AssemblyID = issue.AssemblyID;
matchedIssue.DateOfIssue = issue.DateOfIssue;
matchedIssue.DateOfResolution = issue.DateOfResolution;
matchedIssue.CostOFIssue = issue.CostOFIssue;
matchedIssue.ProductID = issue.ProductID;
filteredIssues.Add(matchedIssue);
}
}

LINQ query optimization

I retrieve data from two different repositories:
List<F> allFs = fRepository.GetFs().ToList();
List<E> allEs = eRepository.GetEs().ToList();
Now I need to join them so I do the following:
var EFs = from c in allFs.AsQueryable()
join e in allEs on c.SerialNumber equals e.FSerialNumber
where e.Year == Convert.ToInt32(billingYear) &&
e.Month == Convert.ToInt32(billingMonth)
select new EReport
{
FSerialNumber = c.SerialNumber,
FName = c.Name,
IntCustID = Convert.ToInt32(e.IntCustID),
TotalECases = 0,
TotalPrice = "$0"
};
How can I make this LINQ query better so it will run faster? I would appreciate any suggestions.
Thanks

Unless you're able to create one repository that contains both pieces of data, which would be a far preferred solution, I can see the following things which might speed up the process.
Since you'r always filtering all E's by Month and Year, you should do that before calling ToList on the IQueryable, that way you reduce the number of E's in the join (probably considerably)
Since you're only using a subset of fields from E and F, you can use an anonymous type to limit the amount of data to transfer
Depending on how many serialnumbers you're retrieving from F's, you could filter your E's by serials in the database (or vice versa). But if most of the serialnumbers are to be expected in both sets, that doesn't really help you much further
Reasons why you might not be able to combine the repositories into one are probably because the data is coming from two separate databases.
The code, updated with the above mentioned points 1 and 2 would be similar to this:
var allFs = fRepository.GetFs().Select(f => new {f.Name, f.SerialNumber}).ToList();
int year = Convert.ToInt32(billingYear);
int month = Convert.ToInt32(billingMonth);
var allEs = eRepository.GetEs().Where(e.Year == year && e.Month == month).Select(e => new {e.FSerialNumber, e.IntCustID}).ToList();
var EFs = from c in allFs
join e in allEs on c.SerialNumber equals e.FSerialNumber
select new EReport
{
FSerialNumber = c.SerialNumber,
FName = c.Name,
IntCustID = Convert.ToInt32(e.IntCustID),
TotalECases = 0,
TotalPrice = "$0"
};

How can I make this LINQ query of an Enumerable DataTable of GTFS data faster?

I'm working with the GTFS data for the New York City MTA subway system. I need to find the stop times for each route at a specific stop. To do that, I get the stop times from a StopTimes DataTable that I have, for a specific stop_id. I only want stop times between now and the next 2 hours.
Then, I need to lookup the trip for each stop time, using the trip_id value. From that trip, I have to lookup the route, using the route_id value, in order to get the route name or number for the stop time.
Here are the counts for each DataTable: StopTimes(522712), Trips(19092), Routes(27).
Right now, this takes anywhere from 20 seconds to 40 seconds to execute. How can I speed this up? Any and all suggestions are appreciated. Thanks!
foreach (var r in StopTimes.OrderBy(z => z.Field<DateTime>("departure_time").TimeOfDay)
.Where(z => z.Field<string>("stop_id") == stopID &&
z["departure_time"].ToString() != "" &&
z.Field<DateTime>("departure_time").TimeOfDay >= DateTime.UtcNow.AddHours(-5).TimeOfDay &&
z.Field<DateTime>("departure_time").TimeOfDay <= DateTime.UtcNow.AddHours(-5).AddHours(2).TimeOfDay))
{
var trip = (from z in Trips
where z.Field<string>("trip_id") == r.Field<string>("trip_id") &&
z["route_id"].ToString() != ""
select z).Single();
var route = (from z in Routes
where z.Field<string>("route_id") == trip.Field<string>("route_id")
select z).Single();
// do stuff (not time-consuming)
}

Try this:
var now = DateTime.UtcNow;
var tod0 = now.AddHours(-5).TimeOfDay;
var tod1 = now.AddHours(-5).AddHours(2).TimeOfDay;
var sts =
from st in StopTimes
let StopID = st.Field<string>("stop_id")
where StopID == stopID
where st["departure_time"].ToString() != ""
let DepartureTime = st.Field<DateTime>("departure_time").TimeOfDay
where DepartureTime >= tod0
where DepartureTime >= tod1
let TripID = st.Field<string>("trip_id")
select new
{
StopID,
TripID,
DepartureTime,
};
Note that there is no orderby in this query and that we're returning an anonymous type. For your "do stuff (not time-consuming)" code to run you may need to add some more properties.
The same approach happens for Trips & Routes.
var ts =
from t in Trips
where t["route_id"].ToString() != ""
let TripID = t.Field<string>("trip_id")
let RouteID = t.Field<string>("route_id")
select new
{
TripID,
RouteID,
};
var rs =
from r in Routes
let RouteID = r.Field<string>("route_id")
select new
{
RouteID,
};
Since you're getting a single record for each look up then using ToDictionary(...) is a good choice to use.
var tripLookup = ts.ToDictionary(t => t.TripID);
var routeLookup = rs.ToDictionary(r => r.RouteID);
Now your query looks like this:
var query = from StopTime in sts.ToArray()
let Trip = tripLookup[StopTime.TripID]
let Route = routeLookup[Trip.RouteID]
orderby StopTime.DepartureTime
select new
{
StopTime,
Trip,
Route,
};
Notice that I've used .ToArray() and I've put the orderby right at the end.
And you run your code like this:
foreach (var q in query)
{
// do stuff (not time-consuming)
}
Let me know if this helps.

I would make a Dictionary<int, Trip> from Trips where the key is the trip_id, and a Dictionary<int, Route> from Routes where the key is route_id. your code is iterating over the 19092 items in Trips once for every one of the items in the filtered IEnumerable<StopTime>. Same deal for Routes, but at least there are only 27 items in there.
Edit:
actually looking at it more closely, the first dictionary would be Dictionary<int, int> where the value is the route_id. And given the one to one relationship between trip_id and route_id you could just build a Dictionary<trip_id, Route> and do one lookup.

It helps to understand deferred query execution so you can make case by case decisions on how to optimize your runtime. Here is a good blog post that can get you started: http://ox.no/posts/linq-vs-loop-a-performance-test

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Linq2SQL grouping and ungrouping in the same query - c#

Related

Linq join using split in condition

LINQ select next record with each matching result

Using Intersect I'm getting a Local sequence cannot be used in LINQ to SQL implementations of query operators except the Contains operator

LINQ query optimization

How can I make this LINQ query of an Enumerable DataTable of GTFS data faster?

Categories

Resources