Find a random content in the database - Linq - c#

I need to get out of how I find a random thing in the database as shown to the audience and at the same time it must be able to show one of time.
Normally I have done like this
cmd1.CommandText = #"SELECT TOP 1 opgaver.id, opgaver.rigtigsvar, opgaver.overskift, opgaver.svar1,
opgaver.svar2, opgaver.svar3, opgaveLydefiler.mp3 FROM opgaver INNER JOIN opgaveLydefiler ON opgaver.overskift = opgaveLydefiler.navn ORDER BY newid()";
Tasks and task sounds files are put together such that they, like partnerships / has an inner join together.
I've tried to do like this, but I can not right for it to display only one and the same time make a random of it as I have in the database.
Opgaver opgaver = db.opgavers.FirstOrDefault().Take(1);
EIDT - I have chosen to do like this,
var random = new Random();
var antalopgaver = db.opgavers.Count();
var number = random.Next(antalopgaver);
var randomlySelectedItem = db.opgavers.Skip(number).Take(1).FirstOrDefault();

Here you go:
var item = db.opgavers.OrderBy(q=>Guid.NewGuid()).FirstOrDefault();

Here is an idea in short.
If let's say you are interested in the top 100 records, for example ordered by the date of addition. Then try to generate a random number between 0 and 99 like this:
var random = new Random();
var number = random.Next(100);
Then use this number as offset for your query:
var item = db.opgavers.OrderByDescending(e => e.DateAdded).Skip(number).Take(1).FirstOrDefault();
I advice using FirstOrDefault against First because that way you can handle for example the empty database case, which is sometimes a valid state.
I used Take(1) because I think it is the safest way to ensure that the query will contain the LIMIT clause. Otherwise some LINQ providers might do it else way.
If you can't do such an ordering what I supposed, then as others have pointed out, you could get the number of rows before the query, and use it instead of 100. But that's another query to the database which is sometimes OK, sometimes not so much.

You can order your records by a random number, and then take the first :
var random = new Random();
var randomlySelectedItem =
db.opgavers.Select(o => new { op = o, sort = random.Next(0,10000) })
.OrderBy(obj => obj.sort)
.FirstOrDefault();
Replace the hard coded 10,000 by a number that is basically corresponding to the count of your items (does not need to be the exact count)

Related

Group with linq on multiple properties via grouping method

I have the following code:
var linqResults = (from rst in QBModel.ResultsTable
group rst by GetGroupRepresentation(rst.CallerZipCode, rst.CallerState) into newGroup
select newGroup
).ToList();
With the grouping method:
private string[] GetGroupRepresentation(string ZipCode, string State)
{
string ZipResult;
if (string.IsNullOrEmpty(ZipCode) || ZipCode.Trim().Length < 3)
ZipResult = string.Empty;
else
ZipResult = ZipCode.Substring(0, 3);
return new string[]{ ZipResult, State };
}
This runs just fine but it does not group at all. The QBModel.ResultsTable has 427 records and after the linq has run linqResults still has 427. In debug I can see double-ups of the same truncated zip code and state name. I'm guessing it has to do with the array I'm returning from the grouping method.
What am I doing wrong here?
If I concatenate the return value of the truncated zip code and state name without using an array I get 84 groupings.
If I strip out the rst.CallerState argument and change the grouping method to:
private string GetGroupRepresentation(string ZipCode)
{
if (string.IsNullOrEmpty(ZipCode) || ZipCode.Trim().Length < 3)
return string.Empty;
return ZipCode.Substring(0, 3);
}
It will return me 66 groups
I don't really want to concatenate the group values as I want to use them seperately later, this is wrong as it is based on if the array worked, however, kind of like the following:
List<DataSourceRecord> linqResults = (from rst in QBModel.ResultsTable
group rst by GetGroupRepresentation(rst.CallerZipCode, rst.CallerState) into newGroup
select new MapDataSourceRecord()
{
State = ToTitleCase(newGroup.Key[1]),
ZipCode = newGroup.Key[0],
Population = GetZipCode3Population(newGroup.Key[0])
}).ToList();
Array is reference type, so when the grouping method compare two arrays with same values it can not determine they are the same, because the references are different. you can read more here
One solution would be considering a class instead of using an array for results of function, and use another class to compare your results implementing the IEqualityComparer Interface, and pass it to GroupBy method, so that the grouping method can find which combinations of ZipCode and State are really equatable. read more
Not sure if this will work because I can not replicate your code.
but maybe it will be easier to add a group key and your string[] in seperate variables before you go forth your grouping. like this.
var linqdatacleanup = QBModel.ResultsTable.Select(x=>
new {
value=x,
Representation = GetGroupRepresentation(rst.CallerZipCode, rst.CallerState),
GroupKey= GetGroupRepresentationKey(rst.CallerZipCode, rst.CallerState)
}).ToList();
so GetGroupRepresentationKey returns a single string and your GetGroupRepresentation returns your string[]
this will allow you to do your grouping on this dataset and access your data as you wanted.
but before you spend to much time on this check this stack overflow question. maybe it will help
GroupBy on complex object (e.g. List<T>)

How to skip last 2 records and get all other records with linq?

I have a table called Test:
Test: Id, CreatedBy, CreatedDate
Now I want to get list of test but skip last 2 test. So if I have say for e.g. 10 test then I want to get 1 - 8 test and skip test 9 and 10.
This is how I am trying to do that:
var query = context.Test.OrderByDescending(t=>t.Id).Skip(2) // How to take other records?
In this case: Take(8)
With Take and Skip you can get any range you want.
E.G:
var query = context.Test.OrderByDescending(t=>t.Id);
var allButTheLastTwoElements = query.Take(query.Count() - 2);
Safest way:
var query = context.Test.OrderByDescending(t=>t.Id).ToList();
var allButTheLastTwoElements = query.Take(Math.Max(0,query.Count() - 2));
Or you could just do it the other way around (depending on your requirements)
var query = context.Test.OrderByAscending(t=>t.Id).Skip(2);
If records size is not fixed, you would use:
test.Take(test.Count-2);
//If records are already sorted in the order you like,
or
test.Where(t=>t.ID <= test.Max(m=>m.ID)-2);
//Where ID is a unique key and the list may not be sorted by id
//This will return the lowest 8 ID even if the list is sorted by address or whatever.
What you need is very simple, you don't even need to use Take or query the database twice.
If you OrderByDescending and Skip the first N elements, then you're taking all the remaining elements by default. So you can just do this:
var query = context.Test.OrderByDescending(t=>t.Id).Skip(2);
Docs:
Bypasses a specified number of elements in a sequence and then returns
the remaining elements.
If you don't really intend to deffer the execution or append additional querying logic, then calling .ToList() at the end (which actually executes the query against the database) is logical.

EF - A proper way to search several items in database

I have about 100 items (allRights) in the database and about 10 id-s to be searched (inputRightsIds). Which one is better - first to get all rights and then search the items (Variant 1) or to make 10 checking requests requests to the database
Here is some example code:
DbContext db = new DbContext();
int[] inputRightsIds = new int[10]{...};
Variant 1
var allRights = db.Rights.ToLIst();
foreach( var right in allRights)
{
for(int i>0; i<inputRightsIds.Lenght; i++)
{
if(inputRightsIds[i] == right.Id)
{
// Do something
}
}
}
Variant 2
for(int i>0; i<inputRightsIds.Lenght; i++)
{
if(db.Rights.Any(r => r.Id == inputRightsIds[i]);)
{
// Do something
}
}
Thanks in advance!
As other's have already stated you should do the following.
var matchingIds = from r in db.Rights
where inputRightIds.Contains(r.Id)
select r.Id;
foreach(var id in matchingIds)
{
// Do something
}
But this is different from both of your approaches. In your first approach you are making one SQL call to the DB that is returning more results than you are interested in. The second is making multiple SQL calls returning part of the information you want with each call. The query above will make one SQL call to the DB and return only the data you are interested in. This is the best approach as it reduces the two bottle necks of making multiple calls to the DB and having too much data returned.
You can use following :
db.Rights.Where(right => inputRightsIds.Contains(right.Id));
They should be very similar speeds since both must enumerate the arrays the same number of times. There might be subtle differences in speed between the two depending on the input data but in general I would go with Variant 2. I think you should almost always prefer LINQ over manual enumeration when possible. Also consider using the following LINQ statement to simplify the whole search to a single line.
var matches = db.Rights.Where(r=> inputRightIds.Contains(r.Id));
...//Do stuff with matches
Not forget get all your items into memory to process list further
var itemsFromDatabase = db.Rights.Where(r => inputRightsIds.Contains(r.Id)).ToList();
Or you could even enumerate through collection and do some stuff on each item
db.Rights.Where(r => inputRightsIds.Contains(r.Id)).ToList().Foreach(item => {
//your code here
});

Would there be any performance difference between looping every row of dataset and same dataset list form

I need to loop every row of a dataset 100k times.
This dataset contains 1 Primary key and another string column. Dataset has 600k rows.
So at the moment i am looping like this
for (int i = 0; i < dsProductNameInfo.Tables[0].Rows.Count; i++)
{
for (int k = 0; k < dsFull.Tables[0].Rows.Count; k++)
{
}
}
Now dsProductNameInfo has 100k rows and dsFull has 600k rows. Should i convert dsFull to a KeyValuePaired string list and loop that or there would not be any speed difference.
What solution would work fastest ?
Thank you.
C# 4.0 WPF application
In the exact scenario you mentioned, the performance would be the same except converting to the list would take some time and cause the list approach to be slower. You can easily find out by writing a unit test and timing it.
I would think it'd be best to do this:
// create a class for each type of object you're going to be dealing with
public class ProductNameInformation { ... }
public class Product { ... }
// load a list from a SqlDataReader (much faster than loading a DataSet)
List<Product> products = GetProductsUsingSqlDataReader(); // don't actually call it that :)
// The only thing I can think of where DataSets are better is indexing certain columns.
// So if you have indices, just emulate them with a hashtable:
Dictionary<string, Product> index1 = products.ToDictionary( ... );
Here are references to the SqlDataReader and ToDictionary concepts that you may or may not be familiar with.
The real question is, why isn't this kind of heavy processing done at the database layer? SQL servers are much more optimized for this type of work. Also, you may not have to actually do this, why don't you post the original problem and maybe we can help you optimize deeper?
HTH
There might be quite a few things that could be optimized not related to the looping. E.g. reducing the number of iteration would yield a lot at pressent the body of the inner loop is executed 100k * 600k times so eliminating one iteration of the outer loop would eliminate 600k iterations of the inner (or you might be able to switch the inner and outer loop if it's easier to remove iterations from the inner loop)
One thing that you could do in any case is only index once for each table:
var productNameInfoRows = dsProductNameInfo.Tables[0].Rows
var productInfoCount = productNameInfoRows.Count;
var fullRows = dsFull.Tables[0].Rows;
var fullCount = fullRows.Count;
for (int i = 0; i < productInfoCount; i++)
{
for (int k = 0; k < fullCount; k++)
{
}
}
inside the loops you'd get to the rows with productNameInfoRows[i] and FullRows[k] which is faster than using the long hand I'm guessing there might be more to gain from optimizing the body than the way you are looping over the collection. Unless of course you have already profiled the code and found the actual looping to be the bottle neck
EDIT After reading your comment to Marc about what you are trying to accomplish. Here's a go at how you could do this. It's worth noting that the below algorithm is probabalistic. That is there's a 1:2^32 for two words being seen as equal without actually being it. It is however a lot faster than comparing strings.
The code assumes that the first column is the one you are comparing.
//store all the values that will not change through the execution for faster access
var productNameInfoRows = dsProductNameInfo.Tables[0].Rows;
var fullRows = dsFull.Tables[0].Rows;
var productInfoCount = productNameInfoRows.Count;
var fullCount = fullRows.Count;
var full = new List<int[]>(fullCount);
for (int i = 0; i < productInfoCount; i++){
//we're going to compare has codes and not strings
var prd = productNameInfoRows[i][0].ToString().Split(';')
.Select(s => s.GetHashCode()).OrderBy(t=>t).ToArray();
for (int k = 0; k < fullCount; k++){
//caches the calculation for all subsequent oterations of the outer loop
if (i == 0) {
full.Add(fullRows[k][0].ToString().Split(';')
.Select(s => s.GetHashCode()).OrderBy(t=>t).ToArray());
}
var fl = full[k];
var count = 0;
for(var j = 0;j<fl.Length;j++){
var f = fl[j];
//the values are sorted so we can exit early
for(var m = 0;m<prd.Length && prd[m] <= f;m++){
count += prd[m] == f ? 1 : 0;
}
}
if((double)(fl.Length + prd.Length)/count >= 0.6){
//there's a match
}
}
}
EDIT your comment motivated me to give it another try. The below code could have fewer iterations. Could have is because it depends on the number of matches and the number of unique words. A lot of unique words and a lot of matches for each (which would require a LOT of words per column) would potentially yield more iterations. However under the assumption that each row has few words this should yield substantial fewer iterations. your code has a NM complexity this has N+M+(matchesproductInfoMatches*fullMatches). In other words the latter would have to be almost 99999*600k for this to have more iterations than yours
//store all the values that will not change through the execution for faster access
var productNameInfoRows = dsProductNameInfo.Tables[0].Rows;
var fullRows = dsFull.Tables[0].Rows;
var productInfoCount = productNameInfoRows.Count;
var fullCount = fullRows.Count;
//Create a list of the words from the product info
var lists = new Dictionary<int, Tuple<List<int>, List<int>>>(productInfoCount*3);
for(var i = 0;i<productInfoCount;i++){
foreach (var token in productNameInfoRows[i][0].ToString().Split(';')
.Select(p => p.GetHashCode())){
if (!lists.ContainsKey(token)){
lists.Add(token, Tuple.Create(new List<int>(), new List<int>()));
}
lists[token].Item1.Add(i);
}
}
//Pair words from full with those from productinfo
for(var i = 0;i<fullCount;i++){
foreach (var token in fullRows[i][0].ToString().Split(';')
.Select(p => p.GetHashCode())){
if (lists.ContainsKey(token)){
lists[token].Item2.Add(i);
}
}
}
//Count all matches for each pair of rows
var counts = new Dictionary<int, Dictionary<int, int>>();
foreach(var key in lists.Keys){
foreach(var p in lists[key].Item1){
if(!counts.ContainsKey(p)){
counts.Add(p,new Dictionary<int, int>());
}
foreach(var f in lists[key].Item2){
var dic = counts[p];
if(!dic.ContainsKey(f)){
dic.Add(f,0);
}
dic[f]++;
}
}
}
If performance is the critical factor, then I would suggest trying an array-of-struct; this has minimal indireaction (DataSet/DataTable has quite a lot of indirection). You mention KeyValuePair, and that would work, although it might not necessarily be my first choice. Milimetric is right to say that there is an overhead if you create a DataSet first and then build an array/list from tht - however, even then the time savings when looping may exceed the build time. If you can restructure the load to remove the DataSet completely, great.
I would also look carefully at the loops, to see if anything could reduce the actual work needed; for example, would building a dictionary/grouping allow faster lookups? Would sorting allow binary search? Can any operations be per-aggregated and applied at a higher level (with fewer rows)?
What are you doing with the data inside the nested loop?
Is the source of your datasets a SQL database? If so, the best possible performance you could get would be to perform your calculation in SQL using an inner join and return the result to .net.
Another alternative would be to use the dataset's built in querying methods that act like SQL, but in-memory.
If neither of those options are appropriate, you would get a performance improvement by retrieving the 'full' dataset as a DataReader and looping over it as the outer loop. A dataset loads all of the data from SQL into memory in one hit. With 600k rows, this will take up a lot of memory! Whereas a DataReader will keep the connection to the DB open and stream rows as they are read. Once you have read a row the memory will be reused/reclaimed by the garbage collector.
In your comment reply to my earlier answer you said that both datasets are essentially lists of strings and each string a delimited list of tags effectively. I would first look to normalise the csv strings in the database. I.e. Split the CSVs, add them to a tag table and link from the product to the tags via a link table.
You can then quite easily create a SQL statement that will do your matching according to the link records rather than by string (which be more performant in it's own right).
The issue you would then have is that if your sub-set product list needs to be passed into SQL from .net you would need to call the SP 100k times. Thankfully SQL 2008 R2, introduced TableTypes. You could define a table type in your database with one column to hold your product ID, have your SP accept that as an input parameter and then perform an inner join between your actual tables and your table parameter.. I've used this in my own project with very large datasets and the performance gain was massive.
On the .net side you can create a DataTable matching the structure of the SQL table type and then pass that as a command parameter when calling your SP (once!).
This article shows you how to do both the SQL and .net sides. http://www.mssqltips.com/sqlservertip/2112/table-value-parameters-in-sql-server-2008-and-net-c/

is it possible to use Take in LINQ when the query returns an anonymous type?

I'm trying to get the n-th element out of a list of anonymous types returned by a LINQ query where n is a random number from 0 to 100. Messed around with it a while now and I'm not getting anywhere. My code (with names changed to protect IP):
var query = from Table1 t1 in theContext.Table1
join Table2 t2 in theContext.Table2
on ...
where ...
select new
{
partNum = t1.part_number,
partSource = t2.part_source
}
int num = new Random().Next(0, 100);
// here's where the code I've tried fails
Can I somehow do a Take<T>(100).ToList<T>()[num] to get a single anonymous type with partNum and partSource? I ended up solving this by explicitly defining a type, but it seemed like I was missing a more elegant solution here. All I want to do is return a Dictionary<string, string> to the caller so I'd prefer not to have to define a type outside of this method.
Update: ElementAt doesn't work for this. I tried adding:
// get a random part from the parts list
int num = new Random().Next(0, query.Count() - 1 );
var nthElement = query.ElementAt(num);
And I got an exception: The query operator 'ElementAt' is not supported.
You should be able to use:
var item = query.Take(100).ToList()[num];
Of course, it would be more efficient to do:
var item = query.Skip(num).First();
I believe you just want the ElementAt extension method:
var nthElement = query.ElementAt(num);
No need to mess with Take queries or such, and certainly not ToList.

Categories

Resources