LINQ Union not returning wanted results - c#

I have problem I've been trying to solve for the last couple of hours but without any luck at all.
First, let me show you how my database looks like (just an important part of it):
[radno_mjesto] = JOB
[grupa_radnih_mjesta] = JOB GROUPS (jobs that fall into specific groups; for example if the group name was judges jobs that would fall into this group would be: supreme court judge, administrative law judge, senior judge, etc.)
[osoba] = PERSON
What I'd like to achieve is to query all the people who fall into specific job groups, but after couple of hours I wasn't able to succeed in doing it so. I was trying various combinations of the following code and there are only 2 results I've been getting: all people (no matter what their job is) or people with specific job only (the last job from the job group as in this case).
var sveOsobe = from p in db.osobas
select p;
if (chkGrupaRadnihMjesta.Checked)
{
int id = Convert.ToInt32(GrupaRadnihMjesta.SelectedValue);
var radnaMjesta = from rm in db.grupe_radnih_mjesta_radna_mjesta
where rm.grm_id == id
select rm;
var praznoOsobe = sveOsobe.Where(o => o.osoba_id == -1);
foreach (var radnoMjesto in radnaMjesta)
{
var sveOsobeRadnaMjesta = from p in db.osobas
where p.osoba_id == -1
select p;
sveOsobeRadnaMjesta = sveOsobe.Where(o => o.rm_id == radnoMjesto.rm_id).Union(sveOsobeRadnaMjesta);
praznoOsobe = praznoOsobe.Union(sveOsobeRadnaMjesta);
}
sveOsobe = sveOsobe.Intersect(praznoOsobe);
}
Any help would be appreciated.

This should work....
if (chkGrupaRadnihMjesta.Checked) {
int id = Convert.ToInt32(GrupaRadnihMjesta.SelectedValue);
var sveOsobe = (
from p in db.osobas
join l in db.grupe_radnih_mjesta_radna_mjesta on l.rm_id equals p.rm_id
where l.grm_id == id
select p
).Distinct();
}
I'm guessing at names here!!!

Related

Implement this "not in" where clause in LINQ

I have a table here where it gets popuplated with ActiveDirectory users every night. This list included generic AD accounts used for a variety of purposes.
Examples of lastnames of generic accounts:
vendor testing
IT support
Dept1 Printer
Visitor1
Visitor2
Guest1
Guest2 and etc
I want to retrieve all records ignoring these records. Something like
select * from table where lastname not like '%visitor%'
and lastname not like "%support%"
and so on I made this query but it does not do substring comparison.
List<String> _ignoreList = new List<String> { "visitor", "test" };
IQueryable<String> _records =
from _adUserDatas in _adUserDataDBDataContext.ADUserDatas
where
_adUserDatas.accountActive.ToLower().Contains("yes")
&& _adUserDatas.staffStudentType.ToLower().Contains("neither")
&& !_ignoreList.Contains(_adUserDatas.lastName)
orderby _adUserDatas.username
select _adUserDatas.username;
Here's the resulting SQL being sent to SQL Server.
{
SELECT[t0].[username]
FROM[dbo].[ADUserData] AS[t0]
WHERE
(LOWER([t0].[accountActive]) LIKE# p0)
AND
(LOWER([t0].[staffStudentType]) LIKE# p1)
AND
(NOT([t0].[lastName] IN(#p2, #p3)))
ORDER BY[t0].[username]
}
in LINQ query above, it did not ignore a record with the lastname "only for testing acct".
Any ideas on how to implement it using LINQ?
I've search the net but nothing came up.
Thanks a lot
That is because your are checking whether ignoreList contains the LastName, try doing it the other way.. i.e Whether LastName conatins anything from the ignoreList..
&& !_ignoreList.Any( il => _adUserDatas.lastName.Contains( il ) )
This way it will check whether "only for testing acct" contains anything from { "visitor", "test" }
Hm.. it could be hard to get to work like predicate with in clausule.. My solution would be other:
var queryable = from _adUserDatas in _adUserDataDBDataContext.ADUserDatas
where
_adUserDatas.accountActive.ToLower().Contains("yes")
&& _adUserDatas.staffStudentType.ToLower().Contains("neither")
orderby _adUserDatas.username
select _adUserDatas.username;
foreach (var ignore in _ignoreList)
{
var localIgnore = ignore;
queryable = queryable.Where(userName => !userName.Contains(localIgnore))
}
var result = queryable.ToList();
The answer from pwas lead me to one that works for my situation. PredicateBuilder which is mentioned in lots of topics here in SOF.com. http://www.albahari.com/nutshell/predicatebuilder.aspx
Here's the final code:
ADUserDataDBDataContext _adUserDataDBDataContext = new ADUserDataDBDataContext();
IQueryable<String> _records = null;
Expression<Func<ADUserData,Boolean>> _whereClause = PredicateBuilder.True<ADUserData>();
_whereClause = _whereClause.And(ADUserData => ADUserData.accountActive.ToLower().Contains("yes"));
foreach (var _item in _ignoreList)
{
_whereClause = _whereClause.And(ADUserData => !ADUserData.lastName.ToLower().Contains(_item));
}
_records = _adUserDataDBDataContext.ADUserDatas
.Where(_whereClause)
.Select(ADUserData => ADUserData.fan);
return _records.ToList();

Linq To Sql - return table result and count

i'm very new to linq to sql and in need of a little assistance.
Basically i'm building a message board in C#. I have 3 database tables - basic info is as follows.
FORUMS
forumid
name
THREADS
threadid
forumid
title
userid
POSTS
postid
threadid
text
userid
date
Basically I want to bring back everything I need in one query. I want to list a page of THREADS (for a particular FORUM) and also display the number of POSTS in that THREAD row and when the last POST was for that THREAD.
At the moment i'm getting back all THREADS and then looping through each the result set and making calls to the POST table seperately for the POST count for a Thread and the Latest Post in that thread but obviously this will cause problems in terms of hitting the database as the Message Board gets bigger.
My Linq To SQL so far:
public IList<Thread> ListAll(int forumid)
{
var threads =
from t in db.Threads
where t.forumid == forumid
select t;
return threads.ToList();
}
basicaly i now need to get the number of POSTS in each thread and the date of the last post in each thread.
Any help would be most appreciated :)
EDIT
Hi guys. Thanks for tyour help so far. Basically i'm almost there. However, I left an important part out of my initial question in the fact that I need to retrieve the user name of the person making the last POST. Therefore I need to join p.userid with u.userid on the USERS table. So far I have the following but just need to amend this to join the POST table with the USER table:
public IList<ThreadWithPostInfo> ListAll(int forumid)
{
var threads = (from t in db.Threads
where t.forumid == forumid
join p in db.Posts on t.threadid equals p.threadid into j
select new ThreadWithPostInfo() { thread = t, noReplies = j.Count(), lastUpdate = j.Max(post => post.date) }).ToList();
return threads;
}
UPDATE:
public IList<ThreadWithPostInfo> ListAll(int forumid)
{
var threads = (from t in db.Threads
from u in db.Users
where t.forumid == forumid && t.hide == "No" && t.userid == u.userid
join p in db.Posts on t.threadid equals p.threadid into j
select new ThreadWithPostInfo() { thread = t, deactivated = u.deactivated, lastPostersName = j.OrderByDescending(post => post.date).FirstOrDefault().User.username, noReplies = j.Count(), lastUpdate = j.Max(post => post.date) }).ToList();
return threads;
}
I finally figured that part of it out with thanks to all of you guys :). My only problem now is the Search Results method. At the moment it is like this:
public IList<Thread> SearchThreads(string text, int forumid)
{
var searchResults = (from t in db.Threads
from p in db.Posts
where (t.title.Contains(text) || p.text.Contains(text)) && t.hide == "No"
&& p.threadid == t.threadid
&& t.forumid == forumid
select t).Distinct();
return searchResults.ToList();
}
Note that I need to get the where clause into the new linq code:
where (t.title.Contains(text) || p.text.Contains(text)) && t.hide == "No"
so incorporating this clause into the new linq method. Any help is gratefully received :)
SOLUTION:
I figured out a solution but I don't know if its the best one or most efficient. Maybe you guys can tell me because i'm still getting my head around linq. James I think your answer was closest and got me to near to where I wanted to be - thanks :)
public IList<ThreadWithPostInfo> SearchThreads(string text, int forumid)
{
var searchResults = (from t in db.Threads
from p in db.Posts
where (t.title.Contains(text) || p.text.Contains(text)) && t.hide == "No"
&& p.threadid == t.threadid
&& t.forumid == forumid
select t).Distinct();
//return searchResults.ToList();
var threads = (from t in searchResults
join p in db.Posts on t.threadid equals p.threadid into j
select new ThreadWithPostInfo() { thread = t, lastPostersName = j.OrderByDescending(post => post.date).FirstOrDefault().User.username, noReplies = j.Count(), lastUpdate = j.Max(post => post.date) }).ToList();
return threads;
}
May be Too many database calls per session ....
Calling the database,. whether to query or to write, is a remote call, and we want to reduce the number of remote calls as much as possible. This warning is raised when the profiler notices that a single session is making an excessive number of calls to the database. This is usually an indication of a potential optimization in the way the session is used.
There are several reasons why this can be:
A large number of queries as a result of a Select N + 1
Calling the database in a loop
Updating (or inserting / deleting) a large number of entities
A large number of (different) queries that we execute to perform our task
For the first reason, you can see the suggestions for Select N + 1. Select N + 1 is a data access anti-pattern where the database is accessed in a suboptimal way. Take a look at this code sample :
// SELECT * FROM Posts
var postsQuery = from post in blogDataContext.Posts
select post;
foreach (Post post in postsQuery)
{
//lazy loading of comments list causes:
// SELECT * FROM Comments where PostId = #p0
foreach (Comment comment in post.Comments)
{
//print comment...
}
}
In this example, we can see that we are loading a list of posts (the first select) and then traversing the object graph. However, we access the collection in a lazy fashion, causing Linq to Sql to go to the database and bring the results back one row at a time. This is incredibly inefficient, and the Linq to Sql Profiler will generate a warning whenever it encounters such a case.
The solution for this example is simple. Force an eager load of the collection using the DataLoadOptions class to specify what pieces of the object model we want to load upfront.
var loadOptions = new DataLoadOptions();
loadOptions.LoadWith<Post>(p => p.Comments);
blogDataContext.LoadOptions = loadOptions;
// SELECT * FROM Posts JOIN Comments ...
var postsQuery = (from post in blogDataContext.Posts
select post);
foreach (Post post in postsQuery)
{
// no lazy loading of comments list causes
foreach (Comment comment in post.Comments)
{
//print comment...
}
}
next is updating a large number of entities is discussed in Use Statement Batching, and can be achieved by using the PLinqO project, which is a set of extensions on top of Linq to Sql. How cool would it be to store items in cache as a group. Well, guess what! PLINQO is cool! When storing items in cache, just tell PLINQO the query result needs to belong to a group and specify the name. Invalidating cache is where the coolness of grouping really shows up. No coupling of cache and actions taken on that cache when they are in a group. Check out this example :
public ActionResult MyTasks(int userId)
{
// will be separate cache for each user id, group all with name MyTasks
var tasks = db.Task
.ByAssignedId(userId)
.ByStatus(Status.InProgress)
.FromCache(CacheManager.GetProfile().WithGroup("MyTasks"));
return View(tasks);
}
public ActionResult UpdateTask(Task task)
{
db.Task.Attach(task, true);
db.SubmitChanges();
// since we made an update to the tasks table, we expire the MyTasks cache
CacheManager.InvalidateGroup("MyTasks");
}
PLinqO supports the notion of query batching, using a feature called futures, which allow you to take several different queries and send them to the database in a single remote call. This can dramatically reduce the number of remote calls that you make and increase your application performance significantly.
cmiiw ^_^
public IList<Thread> ListAll(int forumid)
{
var threads =
from t in db.Threads
where t.forumid == forumid
select new
{
Thread = t,
Count = t.Post.Count,
Latest = t.Post.OrderByDescending(p=>p.Date).Select(p=>p.Date).FirstOrDefault()
}
}
Should be something like that
I think what you're really looking for is this:
var threadsWithPostStats = from t in db.Threads
where t.forumid == forumid
join p in db.Posts on t.threadid equals p.threadid into j
select new { Thread = t, PostCount = j.Count(), LatestPost = j.Max(post => post.date) };
Per your comment and updated question, I'm adding this restatement:
var threadsWithPostsUsers = from t in db.Threads
where t.forumid == forumid
join p in db.Posts on t.threadid equals p.threadid into threadPosts
let latestPostDate = threadPosts.Max(post => post.date)
join post in db.Posts on new { ThreadID = t.threadid, PostDate = latestPostDate } equals new { ThreadID = post.threadid, PostDate = post.date} into latestThreadPosts
let latestThreadPost = latestThreadPosts.First()
join u in db.Users on latestThreadPost.userid equals u.userid
select new { Thread = t, LatestPost = latestThreadPost, User = u };
Wouldn't hurt to get familiar with group by in LINQ and aggregates (Max, Min, Count).
Something like this:
var forums = (from t in db.Threads
group t by t.forumid into g
select new { forumid = g.Key, MaxDate = g.Max(d => d.ForumCreateDate) }).ToList();
Also check out this article for how to count items in a LINQ query with group by:
LINQ to SQL using GROUP BY and COUNT(DISTINCT)
LINQ aggregates:
LINQ Aggregate with Sub-Aggregates

How can I make this LINQ query of an Enumerable DataTable of GTFS data faster?

I'm working with the GTFS data for the New York City MTA subway system. I need to find the stop times for each route at a specific stop. To do that, I get the stop times from a StopTimes DataTable that I have, for a specific stop_id. I only want stop times between now and the next 2 hours.
Then, I need to lookup the trip for each stop time, using the trip_id value. From that trip, I have to lookup the route, using the route_id value, in order to get the route name or number for the stop time.
Here are the counts for each DataTable: StopTimes(522712), Trips(19092), Routes(27).
Right now, this takes anywhere from 20 seconds to 40 seconds to execute. How can I speed this up? Any and all suggestions are appreciated. Thanks!
foreach (var r in StopTimes.OrderBy(z => z.Field<DateTime>("departure_time").TimeOfDay)
.Where(z => z.Field<string>("stop_id") == stopID &&
z["departure_time"].ToString() != "" &&
z.Field<DateTime>("departure_time").TimeOfDay >= DateTime.UtcNow.AddHours(-5).TimeOfDay &&
z.Field<DateTime>("departure_time").TimeOfDay <= DateTime.UtcNow.AddHours(-5).AddHours(2).TimeOfDay))
{
var trip = (from z in Trips
where z.Field<string>("trip_id") == r.Field<string>("trip_id") &&
z["route_id"].ToString() != ""
select z).Single();
var route = (from z in Routes
where z.Field<string>("route_id") == trip.Field<string>("route_id")
select z).Single();
// do stuff (not time-consuming)
}
Try this:
var now = DateTime.UtcNow;
var tod0 = now.AddHours(-5).TimeOfDay;
var tod1 = now.AddHours(-5).AddHours(2).TimeOfDay;
var sts =
from st in StopTimes
let StopID = st.Field<string>("stop_id")
where StopID == stopID
where st["departure_time"].ToString() != ""
let DepartureTime = st.Field<DateTime>("departure_time").TimeOfDay
where DepartureTime >= tod0
where DepartureTime >= tod1
let TripID = st.Field<string>("trip_id")
select new
{
StopID,
TripID,
DepartureTime,
};
Note that there is no orderby in this query and that we're returning an anonymous type. For your "do stuff (not time-consuming)" code to run you may need to add some more properties.
The same approach happens for Trips & Routes.
var ts =
from t in Trips
where t["route_id"].ToString() != ""
let TripID = t.Field<string>("trip_id")
let RouteID = t.Field<string>("route_id")
select new
{
TripID,
RouteID,
};
var rs =
from r in Routes
let RouteID = r.Field<string>("route_id")
select new
{
RouteID,
};
Since you're getting a single record for each look up then using ToDictionary(...) is a good choice to use.
var tripLookup = ts.ToDictionary(t => t.TripID);
var routeLookup = rs.ToDictionary(r => r.RouteID);
Now your query looks like this:
var query = from StopTime in sts.ToArray()
let Trip = tripLookup[StopTime.TripID]
let Route = routeLookup[Trip.RouteID]
orderby StopTime.DepartureTime
select new
{
StopTime,
Trip,
Route,
};
Notice that I've used .ToArray() and I've put the orderby right at the end.
And you run your code like this:
foreach (var q in query)
{
// do stuff (not time-consuming)
}
Let me know if this helps.
I would make a Dictionary<int, Trip> from Trips where the key is the trip_id, and a Dictionary<int, Route> from Routes where the key is route_id. your code is iterating over the 19092 items in Trips once for every one of the items in the filtered IEnumerable<StopTime>. Same deal for Routes, but at least there are only 27 items in there.
Edit:
actually looking at it more closely, the first dictionary would be Dictionary<int, int> where the value is the route_id. And given the one to one relationship between trip_id and route_id you could just build a Dictionary<trip_id, Route> and do one lookup.
It helps to understand deferred query execution so you can make case by case decisions on how to optimize your runtime. Here is a good blog post that can get you started: http://ox.no/posts/linq-vs-loop-a-performance-test

Adding where clause to nested Linq selects

I'm still new to Linq so if you see something I really shouldn't be doing, please feel free to suggest a change.
I am working on a new system to allow officers to sign up for overtime. Part of the data is displayed on a map with search criteria filtering unwanted positions. In order to make the data easier to work with, it is read into a hierarchy object structure using Linq. In this example, a job can contain multiple shifts and each shift can have multiple positions available. The Linq statement to read them in looks like the following.
var jobs = (from j in db.Job
join s in db.Shift on j.Id equals s.JobId into shifts
select new JobSearchResult
{
JobNumber = j.Id,
Name = j.JobName,
Latitude = j.LocationLatitude,
Longitude = j.LocationLongitude,
Address = j.AddressLine1,
Shifts = (from shift in shifts
join p in db.Position on shift.Id equals p.ShiftId into positions
select new ShiftSearchResult
{
Id = shift.Id,
Title = shift.ShiftTitle,
StartTime = shift.StartTime,
EndTime = shift.EndTime,
Positions = (from position in positions
select new PositionSearchResult
{
Id = position.Id,
Status = position.Status
}).ToList()
}).ToList()
});
That works fine and has been tested. There may be a better way to do it and if you know of a way, feel free to suggest. My problem is this. After the query is created, search criteria will be added. I know that I could add it when the query is created but for this its easier to do it after. Now, I can easy add criteria that looks like this.
jobs = jobs.Where(j => j.JobNumber == 1234);
However, I am having trouble figuring out how to do the same for Shifts or Positions. In other words, how would I could it to add the condition that a shift starts after a particular time? The following example is what I am trying to accomplish but will not (obviously) work.
jobs = jobs.Shifts.Where(s = s.StartTime > JobSearch.StartTime) //JobSearch.StartTime is a form variable.
Anyone have any suggestions?
Step 1: create associations so you can have the joins hidden behind EntitySet properties.
http://msdn.microsoft.com/en-us/library/bb629295.aspx
Step 2: construct your filters. You have 3 queryables and the possibility of filter interaction. Specify the innermost filter first so that the outer filters may make use of them.
Here are all jobs (unfiltered). Each job has only the shifts with 3 open positions. Each shift has those open positions.
Expression<Func<Position, bool>> PositionFilterExpression =
p => p.Status == "Open";
Expression<Func<Shift, bool>> ShiftFilterExpression =
s => s.Positions.Where(PositionFilterExpression).Count == 3
Expression<Func<Job, bool>> JobFilterExpression =
j => true
Step 3: put it all together:
List<JobSearchResult> jobs = db.Jobs
.Where(JobFilterExpression)
.Select(j => new JobSearchResult
{
JobNumber = j.Id,
Name = j.JobName,
Latitude = j.LocationLatitude,
Longitude = j.LocationLongitude,
Address = j.AddressLine1,
Shifts = j.Shifts
.Where(ShiftFilterExpression)
.Select(s => new ShiftSearchResult
{
Id = s.Id,
Title = s.ShiftTitle,
StartTime = s.StartTime,
EndTime = s.EndTime,
Positions = s.Positions
.Where(PositionFilterExpression)
.Select(p => new PositionSearchResult
{
Id = position.Id,
Status = position.Status
})
.ToList()
})
.ToList()
})
.ToList();

LINQ group by month question

I'm new to LINQ to SQL and I would like to know how to achieve something like this in LINQ:
Month Hires Terminations
Jan 5 7
Feb 8 8
Marc 8 5
I've got this so far, and I think there is something wrong with it but I'm not sure:
from term1 in HRSystemDB.Terminations
group term1 by new { term1.TerminationDate.Month, term1.TerminationDate.Year } into grpTerm
select new HiresVsTerminationsQuery
{
Date = Criteria.Period,
TerminationsCount = grpTerm.Count(term => term.TerminationDate.Month == Criteria.Period.Value.Month),
HiresCount = (from emp in HRSystemDB.Persons.OfType<Employee>()
group emp by new { emp.HireDate.Month, emp.HireDate.Year } into grpEmp
select grpEmp).Count(e => e.Key.Month == Criteria.Period.Value.Month)
});
Thanks in advance.
I'm not quite sure where does the Criteria.Period value come from in your sample query.
However I think you're trying to read both hires and terminations for all available months (and then you can easily filter it). Your query could go wrong if the first table (Termination) didn't include any records for some specified month (say May). Then the select clause wouldn't be called with "May" as the parameter at all and even if you had some data in the second table (representing Hires), then you wouldn't be able to find it.
This can be elegantly solved using the Concat method (see MSDN samples). You could select all termniations and all hires (into a data structure of some type) and then group all the data by month:
var terms = from t in HRSystemDB.Terminations
select new { Month = t.TerminationDate.Month,
Year = term1.TerminationDate.Year,
IsHire = false };
var hires = from emp in HRSystemDB.Persons.OfType<Employee>()
select new { Month = emp.HireDate.Month,
Year = emp.HireDate.Year
IsHire = true };
// Now we can merge the two inputs into one
var summary = terms.Concat(hires);
// And group the data using month or year
var res = from s in summary
group s by new { s.Year, s.Month } into g
select new { Period = g.Key,
Hires = g.Count(info => info.IsHire),
Terminations = g.Count(info => !info.IsHire) }
When looking at the code now, I'm pretty sure there is some shorter way to write this. On the other hand, this code should be quite readable, which is a benefit. Also note that it doesn't matter that we split the code into a couple of sub-queries. Thanks to lazy evalutation of LINQ to SQL, this should be executed as a single query.
I don't know if it shorter but you can also try this version to see if it works better with your server. I don't know exactly how these two answers turn into SQL statements. One might be better based on your indexs and such.
var terms =
from t in Terminations
group t by new {t.Month, t.Year} into g
select new {g.Key, Count = g.Count()};
var hires =
from p in Persons
group p by new {p.Month, p.Year} into g
select new {g.Key, Count = g.Count()};
var summary =
from t in terms
join h in hires on t.Key equals h.Key
select new {t.Key.Month, t.Key.Year,
Hires = h.Count, Terms = t.Count};

Categories

Resources