What is the best (and fastest) way to retreive random rows using Linq to SQL with unique data / no duplicate record? oh i preffer to do it in 1 statement, does it possible?
i found this relevant question but i don't think that this approach resulting unique records.
i have tried this so far :
//first approach
AirAsiaDataContext LinqDataCtx = new AirAsiaDataContext();
var tes = (from u in LinqDataCtx.users.AsEnumerable()
orderby Guid.NewGuid()
select u).Take(5);
//second approach
var usr = from u in LinqDataCtx.users
select u;
int count = usr.Count(); // 1st round-trip
int index = new Random().Next(count);
List<user> tes2 = new List<user>();
for (int i = 0; i < 5; i++)
{
tes2.Add(usr.Skip(index).FirstOrDefault()); // 2nd round-trip
}
as you can see above, i have tried 2 solution, it works, but above codes did not resulting unique records, there are chances for duplicate.
db.TableName.OrderBy(x=>Guid.NewGuid()).FirstOrDefault();
If you want to take unique data / no duplicate record,
you'd better to use another list to store the row which you taked already.
Related
I have a datatable in memory and I need to select some records from it, walk through the records making changes to fields and they same the changes back to the datatable. I can do this with filters, views, and sql but I'm trying to do it in Linq.
var results = (from rows in dtTheRows.AsEnumerable()
select new
{
rows.Job,
}).Distinct();
foreach (var row in results)
{
firstRow = true;
thisOnHand = 0;
var here = from thisRow in dtTheRows.AsEnumerable()
orderby thisRow.PromisedDate
select new
{
thisRow.OnHandQuantity,
thisRow.Balance,
thisRow.RemainingQuantity
};
foreach(var theRow in here)
{
// business logic here ...
theRow.OnHandQuantity = 5;
} // foreach ...
The first linq query and foreach are gain the list of subsets of data to be considered. I include it here in case it is relevant. My problem is at this line:
heRow.OnHandQuantity = 5;
My error is:
"Error 19 Property or indexer 'AnonymousType#1.OnHandQuantity' cannot be assigned to -- it is read only"
What am I missing here? Can I update this query back into the original datatable?
var here = from thisRow in dtTheRows.AsEnumerable()
orderby thisRow.PromisedDate
select new
{
thisRow.OnHandQuantity,
thisRow.Balance,
thisRow.RemainingQuantity
};
Instead of passing three variables in select, pass thisRow itself. That may solve error on statement - theRow.OnHandQuantity = 5;
The error is self descriptive, you can't update/modify an anonymous type. You have to return the original entity you want to modify from your query.
select thisRow;
instead of
select new
{
thisRow.OnHandQuantity,
thisRow.Balance,
thisRow.RemainingQuantity
};
i have some List that hold 1 or more Guid.
var getMyfirstListGuids = (from dlist in db.MyTable
where dlist.id == theid
select dlist).ToList();
List<MyfirstList> myfirstList = new List<MyfirstList>();
foreach (var item in myLandingPageList)
{
Guid theguid = new Guid(item.guid);
MyfirstList addnewRow = new MyfirstList();
addnewRow.LpGuid = new Guid(theguid);
myfirstList.Add(addnewRow);
}
Now i have a list with 1 or more guids.
my next step is to make list with data from SQL by the first list guids.
In the SQL could be 1 row or more for each guid. with one row result i can guess what to do. but if there is many results i dont have an idea.
Okay so you wanna do it like this:
List<SecondListGUID> secondListGUID = new List<SecondListGUID>();
foreach (var item in myfirstList)
{
for(int i = 0; i<_yourDBEntity.GUIDs.Count(); i++)
{
if(item.LpGuid == _yourDBEntity.GUIDs[i].GUID)
secondListGUID.add(
new SecondListGUID() {
// add the corresponding GUID's here
});
}
}
Basically you have to do a foreach through your first List and then do a for loop (or foreach - whichever you prefer) through your DB table (entity in this case if you're using Entity framework) and simply compare the GUID's from your DB table and if they match you'll wanna add the item to your third list.
P.S. I've worked with the informations that you've provided, you can change the 2nd list type into the one you need, and change your entity framwork data model name into the one you actually use :)
You can try this
List<Guid>getMyfirstListGuids = new List<Guid>();
getMyfirstListGuids.addRange(from dlist in db.MyTable
where dlist.id == theid
select dlist).ToList());
List<MySecList> mySecList = new List<MySecList>();
mySecList.AddRange(_db.myLandingPageList.Where(p => p.Guid.Any(x => getMyfirstListGuids.Contains(x));
I have a small program where you can select some database tables and create a excel file with all values for each table and thats my solution to create the excel file.
foreach (var selectedDatabase in this.lstSourceDatabaseTables.SelectedItems)
{
//creates a new worksheet foreach selected table
foreach (TableRetrieverItem databaseTable in tableItems.FindAll(e => e.TableName.Equals(selectedDatabase)))
{
_xlWorksheet = (Excel.Worksheet) xlApp.Worksheets.Add();
_xlWorksheet.Name = databaseTable.TableName.Length > 31 ? databaseTable.TableName.Substring(0, 31): databaseTable.TableName;
_xlWorksheet.Cells[1, 1] = string.Format("{0}.{1}", databaseTable.TableOwner,databaseTable.TableName);
ColumnRetriever retrieveColumn = new ColumnRetriever(SourceConnectionString);
IEnumerable<ColumnRetrieverItem> dbColumns = retrieveColumn.RetrieveColumns(databaseTable.TableName);
var results = retrieveColumn.GetValues(databaseTable.TableName);
int i = 1;
(result is a result.Item3 is a List<List<string>> which contains all values from a table and for each row is a new list inserted)
for (int j = 0; j < results.Item3.Count(); j++)
{
int tmp = 1;
foreach (var value in results.Item3[j])
{
_xlWorksheet.Cells[j + 3, tmp] = value;
tmp++;
}
}
}
}
It works but when you have a table with 5.000 or more values it will take such a long time.
Does someone maybe know a better solution to add the List List string per row than my for foreach solution ?
I utilize the GetExcelColumnName function in my code sample to convert from column count to the excel column name.
The whole idea is, that it's very slow to write excel cells one by one. So instead precompute the whole table of values and then assign the result in a single operation. In order to assign values to a two dimensional range, use a two dimensional array of values:
var rows = results.Item3.Count;
var cols = results.Item3.Max(x => x.Count);
object[,] values = new object[rows, cols];
// TODO: initialize values from results content
// get the appropriate range
Range range = w.Range["A3", GetExcelColumnName(cols) + (rows + 2)];
// assign all values at once
range.Value = values;
Maybe you need to change some details about the used index ranges - can't test my code right now.
As I see, youd didn't do profiling. I recomend to do profiling first (for example dotTrace) and see what parts of your code actualy causing performance issues.
In my practice there is rare cases (almost no such cases) when code executes slower than database requests, even if code is realy awfull in algorithmic terms.
First, I recomend to fill up your excel not by columns, but by rows. If your table has many columns this will cause multiple round trips to database - it is great impact to performance.
Second, write to excel in batches - by rows. Think of excel files as mini-databases, with same 'batch is faster than one by one' principles.
In my service, first I generate 40,000 possible combinations of home and host countries, like so (clientLocations contains 200 records, so 200 x 200 is 40,000):
foreach (var homeLocation in clientLocations)
{
foreach (var hostLocation in clientLocations)
{
allLocationCombinations.Add(new AirShipmentRate
{
HomeCountryId = homeLocation.CountryId,
HomeCountry = homeLocation.CountryName,
HostCountryId = hostLocation.CountryId,
HostCountry = hostLocation.CountryName,
HomeLocationId = homeLocation.LocationId,
HomeLocation = homeLocation.LocationName,
HostLocationId = hostLocation.LocationId,
HostLocation = hostLocation.LocationName,
});
}
}
Then, I run the following query to find existing rates for the locations above, but also include empty the missing rates; resulting in a complete recordset of 40,000 rows.
var allLocationRates = (from l in allLocationCombinations
join r in Db.PaymentRates_AirShipment
on new { home = l.HomeLocationId, host = l.HostLocationId }
equals new { home = r.HomeLocationId, host = (Guid?)r.HostLocationId }
into matches
from rate in matches.DefaultIfEmpty(new PaymentRates_AirShipment
{
Id = Guid.NewGuid()
})
select new AirShipmentRate
{
Id = rate.Id,
HomeCountry = l.HomeCountry,
HomeCountryId = l.HomeCountryId,
HomeLocation = l.HomeLocation,
HomeLocationId = l.HomeLocationId,
HostCountry = l.HostCountry,
HostCountryId = l.HostCountryId,
HostLocation = l.HostLocation,
HostLocationId = l.HostLocationId,
AssigneeAirShipmentPlusInsurance = rate.AssigneeAirShipmentPlusInsurance,
DependentAirShipmentPlusInsurance = rate.DependentAirShipmentPlusInsurance,
SmallContainerPlusInsurance = rate.SmallContainerPlusInsurance,
LargeContainerPlusInsurance = rate.LargeContainerPlusInsurance,
CurrencyId = rate.RateCurrencyId
});
I have tried using .AsEnumerable() and .AsNoTracking() and that has sped things up quite a bit. The following code shaves several seconds off of my query:
var allLocationRates = (from l in allLocationCombinations.AsEnumerable()
join r in Db.PaymentRates_AirShipment.AsNoTracking()
But, I am wondering: How can I speed this up even more?
Edit: Can't replicate foreach functionality in linq.
allLocationCombinations = (from homeLocation in clientLocations
from hostLocation in clientLocations
select new AirShipmentRate
{
HomeCountryId = homeLocation.CountryId,
HomeCountry = homeLocation.CountryName,
HostCountryId = hostLocation.CountryId,
HostCountry = hostLocation.CountryName,
HomeLocationId = homeLocation.LocationId,
HomeLocation = homeLocation.LocationName,
HostLocationId = hostLocation.LocationId,
HostLocation = hostLocation.LocationName
});
I get an error on from hostLocation in clientLocations which says "cannot convert type IEnumerable to Generic.List."
The fastest way to query a database is to use the power of the database engine itself.
While Linq is a fantastic technology to use, it still generates a select statement out of the Linq query, and runs this query against the database.
Your best bet is to create a database View, or a stored procedure.
Views and stored procedures can easily be integrated into Linq.
Material Views ( in MS SQL ) can further speed up execution, and missing indexes are by far the most effective tool in speeding up database queries.
How can I speed this up even more?
Optimizing is a bitch.
Your code looks fine to me. Make sure to set the index on your DB schema where it's appropriate. And as already mentioned: Run your Linq against SQL to get a better idea of the performance.
Well, but how to improve performance anyway?
You may want to have a glance at the following link:
10 tips to improve LINQ to SQL Performance
To me, probably the most important points listed (in the link above):
Retrieve Only the Number of Records You Need
Turn off ObjectTrackingEnabled Property of Data Context If Not
Necessary
Filter Data Down to What You Need Using DataLoadOptions.AssociateWith
Use compiled queries when it's needed (please be careful with that one...)
I know what index out of bounds is all about. When I debug I see why as well. basically what is happening is I do a filter on my database to look for records that are potential/pending. I then gather a array of those numbers send them off to another server to check to see if those numbers have been upgraded to a sale. If it has been upgraded to a sale the server responds back with the new Sales Order ID and my old Pending Sales Order ID (SourceID). I then do a for loop on that list to filter it down that specific SourceID and update the SourceID to be the Sales Order ID and change a couple of other values. Problem is is that when I use that filter on the very first one it throws a index out of bounds error. I check the results returned by the filter and it says 0. Which i find kind of strange because I took the sales order number from the list so it should be there. So i dont know what the deal is. Here is the code in question that throws the error. And it doesn't do it all the time. Like I just ran the code this morning and it didn't throw the error. But last night it did before I went home.
filter.RowFilter = string.Format("Stage = '{0}'", Potential.PotentialSale);
if (filter.Count > 0)
{
var Soids = new int[filter.Count];
Console.Write("Searching for Soids - (");
for (int i = 0; i < filter.Count; i++)
{
Console.Write(filter[i][1].ToString() + ",");
Soids[i] = (int)filter[i][1];
}
Console.WriteLine(")");
var pendingRecords = Server.GetSoldRecords(Soids);
var updateRecords = new NameValueCollection();
for (int i = 0; i < pendingRecords.Length; i++)
{
filter.RowFilter = "Soid = " + pendingRecords[i][1];
filter[0].Row["Soid"] = pendingRecords[i][0];
filter[0].Row["SourceId"] = pendingRecords[i][1];
filter[0].Row["Stage"] = Potential.ClosedWon;
var potentialXML = Potential.GetUpdatePotentialXML(filter[0].Row["Soid"].ToString(), filter[0].Row["Stage"].ToString());
updateRecords.Add(filter[0].Row["ZohoID"].ToString(), potentialXML);
}
if i'm counting right line 17 is the error where the error is thrown. pendingRecords is a object[][] array. pendingRecords[i] is the individual records. pendingRecords[i][0] is the new Sales OrderID (SOID) and pendingRecords[i][1] is the old SOID (now the SourceID)
Any help on this one? is it because i'm changing the SOID to the new SOID, and the filter auto updates itself? I just don't know
Well I ended up changing how it worked all together and it actually sorts it a bit nicer now. The code i am about to post has a bunch of hard coded numbers due to the structure of my table that is returned. Sorry about that. I have learned since then to not do that, but i am working on a different project now and will change that when I have to change the program. But here is the solution.
var potentials = Server.GetNewPotentials(); //loads all records from server
for (int i = 0; i < potentials.Length; i++)
{
var filter = AllPotentials.DefaultView;
var result1 = CheckSoidOrSource(potentials[i].Soid, true);
var result2 = CheckSoidOrSource(potentials[i].SourceID,false) ;
//This potential can't be found at all so let's add it to our table
if (result1+result2==0)
{
Logger.WriteLine("Found new record. Adding it to DataTable and sending it to Zoho");
AllPotentials.Add(potentials[i]);
filter.RowFilter = string.Format("Soid = '{0}'", potentials[i].SourceID);
var index = AllPotentials.Rows.IndexOf(filter[0].Row);
ZohoPoster posterInsert = new ZohoPoster(Zoho.Fields.Potentials, Zoho.Calls.insertRecords);
AllPotentials.Rows[index]["ZohoID"] = posterInsert.PostNewPotentialRecord(3, filter[0].Row);
}
//This potential is not found, but has a SourceId that matches a Soid of another record.
if (result1==0 && result2 == 1)
{
Logger.WriteLine("Found a record that needs to be updated on Zoho");
ZohoPoster posterUpdate = new ZohoPoster(Zoho.Fields.Potentials, Zoho.Calls.updateRecords);
filter.RowFilter = string.Format("Soid = '{0}'", potentials[i].SourceID);
var index = AllPotentials.Rows.IndexOf(filter[0].Row);
AllPotentials.Rows[index]["Soid"] = potentials[i].Soid;
AllPotentials.Rows[index]["SourceId"] = potentials[i].SourceID;
AllPotentials.Rows[index]["PotentialStage"] = potentials[i].PotentialStage;
AllPotentials.Rows[index]["UpdateRecord"] = true;
AllPotentials.Rows[index]["Amount"] = potentials[i].Amount;
AllPotentials.Rows[index]["ZohoID"] = posterUpdate.UpdatePotentialRecord(3, filter[0].Row);
}
}
AllPotentials.AcceptChanges();
}
private int CheckSoidOrSource(string Soid, bool checkSource)
{
var filter = AllPotentials.DefaultView;
if (checkSource)
filter.RowFilter = string.Format("Soid = '{0}' OR SourceId = '{1}'",Soid, Soid);
else
filter.RowFilter = string.Format("Soid = '{0}'", Soid);
return filter.Count;
}
basically what is happening is that i noticed something about my data when I filter it this way. The two results would only return the following results (0,0) (0,1) and (1,0) (0,0) means that the record doesn't exist at all in this table so I need to add it. (1,0) means that the Sales Order ID (Soid) matches another Soid in the table so it already exists. Lastly (0,1) means that the Soid doesn't exist in this table but i found a record that has the Soid as it's source...which to me means that the one that had it as a source has been upgraded from a potential to a sale, which in turn means i have to update the record and Zoho. This worked out to much less work for me because now I don't have to search for won and lost records, i only have to search for lost records. less code same results is always a good thing :)