I am developing a website where records are not sequential, records are displayed randomly.
I use simple mysql random method to display random records by using rand() with limit where limit is parameterized to get next record one by one.
I am using asp.net mvc framework and mysql database.
Here is mysql query.
select distinct
lw.Lawyer_id,
lw.name,
StateName,
ct.city,
Date_of_registration,
lawyer_views,
fl.practice
from
registration lw
left join
states st ON st.Id = lw.State_Id
left join
city ct ON ct.id = lw.City_Id
left join
total_views lwv ON lwv.l_id = lw.L_id
left join
rsuper rsub ON rsub.l_id = lw.L_Id
left join
lfilter fl ON fl.L_Id = lw.L_Id
where
lw.City_Id = '577'
and rsub.special_id = 1
and lw.status = 'Active'
and lw.L_id != 1
and lw.service = 'Free'
order by rand()
limit start , pageSize
in this query pageSize is total number of record which is 18 and start variable is change according to parameter but default value is 0.
It gives 18 records randomly on ajax request but problem is that it gives some duplicate records.
please tell me how to prevent this or other better solution for it and i also try other alternatives of rand() function but it not helps me.
I used this tutorial for better improvements in random records.
https://www.warpconduit.net/2011/03/23/selecting-a-random-record-using-mysql-benchmark-results/
Look here: http://jan.kneschke.de/projects/mysql/order-by-rand/
There uncovered research about how do rand select. May be one of the solutions from there can solve your problem.
P.S. I can't make comments so write post but it should be in comments...
UPD: looked at MySQL select 10 random rows from 600K rows fast ...
Related
I am doing a school project and need help with this last problem I am having.
Currently I am trying to do a many 2 many join and then fill an IEnumerable list with the result - using linq and lambda.
The purpose is to show the compatible games along with every product.
My code as of now:
else
{
var result = (from g in db.Games
join gs in db.GameSize
on g.GameId equals gs.GameId
join s in db.Size
on gs.SizeId equals s.SizeId
join p in db.Product
on s.SizeId equals p.SizeId
select p.Size.Name);
games = db.Games
.Where(game => game.GameSize
.All(s => s.Size.Name == result.FirstOrDefault()));
}
My idea is to join through the tables and find the gameid who have a matching productid - and then add them to "games".
I am aware that this table design is horrible and that I am only getting the first result in the list with FirstOrDefault().
Does anyone have a suggestion or solution to help me? Thanks.
Please ask if I am not making any sense.
Essentially I just wan't to show the games linked to a size. My table looks like this:
--SIZE
insert into size values ('Large')
insert into size values ('Medium')
insert into size values ('Small')
--GAMES
insert into games values ('Magic The Gathering')
insert into games values ('Pokemon')
insert into games values ('Dead of Winter')
--GAMESIZE (RELATION GAMES AND SIZE) (SIZEID, GAMEID)
insert into gamesize values (1, 1)
insert into gamesize values (2, 2)
insert into gamesize values (2, 3)
First, you should really move on from the LINQ to SQL syntax. The EF syntax is much easier and readable. For example:
var result = db.Games.Include("GameSize.Size.Product").Select(m => m.Size.Name);
Second, All doesn't do what you seem to think it does. It returns a boolean: true if all the items match the condition, false if not. It's very unlikely that all your GameSizes have the same Size.Name. You might be looking for Any here.
Third, this whole thing seems counter-intuitive. You're getting all the games and selecting all the size names. Then using that list of size names, to select games that have those size names. In other words, you're doing an extra query to get the results you already had. Remove the select from the result and use that for games instead.
Long and short, if I'm understanding your code properly, you can reduce all of this to just one simple line:
games = db.Games.Include("GameSize.Size.Product");
I'm using EntityFramework 6 and I make Linq queries from Asp.NET server to a azure sql database.
I need to retrieve the latest 20 rows that satisfy a certain condition
Here's a rough example of my query
using (PostHubDbContext postHubDbContext = new PostHubDbContext())
{
DbGeography location = DbGeography.FromText(string.Format("POINT({1} {0})", latitude, longitude));
IQueryable<Post> postQueryable =
from postDbEntry in postHubDbContext.PostDbEntries
orderby postDbEntry.Id descending
where postDbEntry.OriginDbGeography.Distance(location) < (DistanceConstant)
select new Post(postDbEntry);
postQueryable = postQueryable.Take(20);
IOrderedQueryable<Post> postOrderedQueryable = postQueryable.OrderBy(Post => Post.DatePosted);
return postOrderedQueryable.ToList();
}
The question is, what if I literally have a billion rows in my database. Will that query brutally select millions of rows which meet the condition then get 20 of them ? Or will it be smart and realise that I only want 20 rows hence it will only select 20 rows ?
Basically how do I make this query work efficiently with a database that has a billion rows ?
According to http://msdn.microsoft.com/en-us/library/bb882641.aspx Take() function has deferred streaming execution as well as select statement. This means that it should be equivalent to TOP 20 in SQL and SQL will get only 20 rows from the database.
This link: http://msdn.microsoft.com/en-us/library/bb399342(v=vs.110).aspx shows that Take has a direct translation in Linq-to-SQL.
So the only performance you can make is in database. Like #usr suggested you can use indexes to increase performance. Also storing the table in sorted order helps a lot (which is likely your case as you sort by id).
Why not try it? :) You can inspect the sql and see what it generates, and then look at the execution plan for that sql and see if it scans the entire table
Check out this question for more details
How do I view the SQL generated by the Entity Framework?
This will be hard to get really fast. You want an index to give you the sort order on Id but you want a different (spatial) index to provide you with efficient filtering. It is not possible to create an index that fulfills both goals efficiently.
Assume both indexes exist:
If the filter is very selective expect SQL Server to "select" all rows where this filter is true, then sorting them, then giving you the top 20. Imagine there are only 21 rows that pass the filter - then this strategy is clearly very efficient.
If the filter is not at all selective SQL Server will rather traverse the table ordered by Id, test each row it comes by and outputs the first 20. Imagine that the filter applies to all rows - then SQL Server can just output the first 20 rows it sees. Very fast.
So for 100% or 0% selectivity the query will be fast. In between there are nasty mixtures. If you have that this question requires further thought. You probably need more than a clever indexing strategy. You need app changes.
Btw, we don't need an index on DatePosted. The sorting by DatePosted is only done after limiting the set to 20 rows. We don't need an index to sort 20 rows.
I have a linq query that takes 11 minutes to execute against MSSQL server 2008. I used MSSQL Profiler to find the query taking so long to execute, and I ran it alone towards my database.
I also removed all parameters and added the values directly and ran the query. It took less then 1 second to execute!
I have googled and found that using parameters can really impact the performance because the plan is compiled before the value of the where clause is known.
Since Linq to SQL always run parametrized SQL, what can I do to improve performance in this case?
I haven't found anything I can improve on columns regarding indexes. The first table in the Inner Join statement has 192 014 rows, and the SQL without parameters takes less than a second to execute. Screenshots of execution plans attached.
Edits are below the screenshots.
This is the Linq query:
var criteria = CreateBaseCriteria();
var wordsGroup = from word in QueryExecutor.GetSearchWords()
join searchEntry in QueryExecutor.GetReportData(criteria) on (word.SearchID + 100000000) equals searchEntry.EventId
group searchEntry by word.SearchWord into wg
select new SearchAggregate
{
Value = wg.Key,
FirstTime = wg.Min(l => l.EventTime),
LastTime = wg.Max(l => l.EventTime),
AverageHits = wg.Average(l => l.NumberOfHits.HasValue ? l.NumberOfHits.Value : 0),
Count = wg.Count()
};
return wordsGroup.OrderByDescending(w => w.Count).Take(maxRows);
Edit: The screenshots did go a little small in here. There are only 5 parameters in the parametrized SQL.
Edit 2: It is the Inner Join statement with parameter #p0 which causes the execution plan to change. When I only removed #p0 variable with the value itself, it runs in less then a second. If this value is constant in all cases (I have to investigate that) can I do anything so that this value doesn't get used like a parameter?
It is advising you to create an index in green colour above you query plan. Try this first.
I found a way to go around this statement, which is causing the execution time to just grow bigtime:
on (word.SearchID + 100000000) equals searchEntry.EventId
What I did was to add a computed column [SearchIdUnique] AS ([SearchID]+(100000000)). Then I can change my Linq query to this:
on word.SearchIdUnique equals searchEntry.EventId
The query execution is down to less than a second, and issue solved.
I have a page that I'm trying to spice up a bit..
Basically my current SQL select is generated randomly from options chosen on a page.
However, I'd like the default sort order to be random; but from what I've read, I should avoid both nested subqueries in MySql, and order by rand.
But, what if I order by rand the limited number of records returned (less than 50?) with a nested subquery.
Is this a bad idea? And would this be a problem with a larger table of say 2kk + rows?
Here's what I'm thinking of doing
select * from (
select a.*,b.*
from a
left join b on b.id=a.id
where a.isactive AND etc etc
order by a.id DESC // select say page 2 of 50 newest records
limit 50,50
) tblrand order by rand() // randomize the order of those 50 records
**EDIT***
changed the inner order by a.displayorder to order by a.id DESC to better reflect the situation since the default sort order (if none specified) is newest records (ie... id sorted desc order)
and added some comments
*** SOLVED ***** (however i'd still like to know how this would pan out.. ie. how bad would it be performancewise for mysql... tnx in advance)
tnx dbsman.. i was thinking of the problem in a different way.
at first i wanted totally random rows, which as i explained would be a pain in the butt... (i got mixed up in what i wanted to do myself :)
but i think i found a solution that works like a charm with my problem doing it in the application side
//code that gets **Dataset ds** goes here
var rand = new Random();
var result = ds.Tables[0].AsEnumerable().OrderBy(r => rand.Next());
lst.DataSource = result.CopyToDataTable();
I simply can not get this to work out at all, so any expert help would be very much appreciated.
I'm trying (as the subject suggests) to join 2 datatables on Zip Code, but return a table which grouped this by State and has a SUM() of sales.
Here's the latest version of my troubles:
var results =(
from a in dtList.AsEnumerable()
join b in dtListCoded.AsEnumerable()
on a.Field<string>("ZIP") equals b.Field<string>("zip")
group a by {a.Field<string>("StateCode")} into g
select new {
StateCode = a.Field<string>("StateCode"),
SumSales = b.Sum(b => b.Field<double>("SUMSales"))
});
I can join the 2 tables but its getting the result i need that seems to be the tricky bit. If need be I will just have to do 2 queries, but that just seems a bit backward.
Thanks in advance.
Two queries wouldn't be any slower (they should be brought together into a single SQL query upon execution), and would be a lot more readable, transparent during debugging and reusable. I'd recommend breaking it down.