I have a table, containing weekly sales data from multiple years for a few hundred products.
Simplified, I have 3 columns: ProductID, Quantity, [and Date (week/year), not relevant for the question]
In order to process the data, i want to fetch everything using LINQ. In the next step I would like create a List of Objects for the sales data, where an Object consists of the ProductId and an array of the corresponding sales data.
EDIT: directly after, I will process all the retrieved data product-by-product in my program by passing the sales as an array to a statistics software (R with R dot NET) in order to get predictions.
Is there a simple (built in) way to accomplish this?
If not, in order to process the sales product by product,
should I just create the mentioned List using a loop?
Or should I, in terms of performance, avoid that all together and:
Fetch the sales data product-by-product from the database as I need it?
Or should I make one big List (with query.toList()) from the resultset and get my sales data product-by-product from there?
erm, something like
var groupedByProductId = query.GroupBy(p => p.ProductId).Select(g => new
{
ProdcutId = g.Key,
Quantity = g.Sum(p => p.Quantity)
});
or perhaps, if you don't want to sum and, instread need the quantities as an array of int ordered by Date.
var groupedByProductId = query.GroupBy(p => p.ProductId).Select(g => new
{
ProdcutId = g.Key,
Quantities = g.OrderBy(p => p.Date).Select(p => p.Quantity).ToArray()
});
or maybe you need to pass the data around and an anonymous type is inappropriate., you could make an IDictionary<int, int[]>.
var salesData = query.GroupBy(p => p.ProductId).ToDictionary(
g => g.Key,
g => g.OrderBy(p => p.Date).Select(p => p.Quantity).ToArray());
so later,
int productId = ...
int[] orderedQuantities = salesData[productId];
would be valid code (less the ellipsis.)
You may create a Product class with id and list of int data. Something as below:
Public class Product{
public List<int> list = new List<int>();
public int Id;
Public Product(int id,params int[] list){
Id = id;
for (int i = 0; i < list.Length; i++)
{
list.Add(list[i]);
}
}
}
Then use:
query.where(x=>new Product(x.ProductId,x.datum1,x.datum2,x.datum3));
Related
I have a database table with records for each user/year combination.
How can I get data from the database using EF and a list of userId/year combinations?
Sample combinations:
UserId Year
1 2015
1 2016
1 2018
12 2016
12 2019
3 2015
91 1999
I only need the records defined in above combinations. Can't wrap my head around how to write this using EF/Linq?
List<UserYearCombination> userYears = GetApprovedYears();
var records = dbcontext.YearResults.Where(?????);
Classes
public class YearResult
{
public int UserId;
public int Year;
public DateTime CreatedOn;
public int StatusId;
public double Production;
public double Area;
public double Fte;
public double Revenue;
public double Diesel;
public double EmissionsCo2;
public double EmissionInTonsN;
public double EmissionInTonsP;
public double EmissionInTonsA;
....
}
public class UserYearCombination
{
public int UserId;
public int Year;
}
This is a notorious problem that I discussed before here. Krishna Muppalla's solution is among the solutions I came up with there. Its disadvantage is that it's not sargable, i.e. it can't benefit from any indexes on the involved database fields.
In the meantime I coined another solution that may be helpful in some circumstances. Basically it groups the input data by one of the fields and then finds and unions database data by grouping key and a Contains query of group elements:
IQueryable<YearResult> items = null;
foreach (var yearUserIds in userYears.GroupBy(t => t.Year, t => t.UserId))
{
var userIds = yearUserIds.ToList();
var grp = dbcontext.YearResults
.Where(x => x.Year == yearUserIds.Key
&& userIds.Contains(x.UserId));
items = items == null ? grp : items.Concat(grp);
}
I use Concat here because Union will waste time making results distinct and in EF6 Concat will generate SQL with chained UNION statements while Union generates nested UNION statements and the maximum nesting level may be hit.
This query may perform well enough when indexes are in place. In theory, the maximum number of UNIONs in a SQL statement is unlimited, but the number of items in an IN clause (that Contains translates to) should not exceed a couple of thousands. That means that
the content of your data will determine which grouping field performs better, Year or UserId. The challenge is to minimize the number of UNIONs while keeping the number of items in all IN clauses below approx. 5000.
you can try this
//add the possible filters to LIST
var searchIds = new List<string> { "1-2015", "1-2016", "2-2018" };
//use the list to check in Where clause
var result = (from x in YearResults
where searchIds.Contains(x.UserId.ToString()+'-'+x.Year.ToString())
select new UserYearCombination
{
UserId = x.UserId,
Year = x.Year
}).ToList();
Method 2
var d = YearResults
.Where(x=>searchIds.Contains(x.UserId.ToString() + '-' + x.Year.ToString()))
.Select(x => new UserYearCombination
{
UserId = x.UserId,
Year = x.Year
}).ToList();
I have the following code to extract records from a dbcontext randomly using Guid class:
var CategoryList = {1,5};
var generatedQues = new List<Question>();
//Algorithm 1 :)
if (ColNum > 0)
{
generatedQues = db.Questions
.Where(q => CategoryList.Contains(q.CategoryId))
.OrderBy(q => Guid.NewGuid()).Take(ColNum).ToList();
}
First, I have a list of CategoryId stored in CategoryList as a condition to be fulfilled when getting records from the db. However, I would like to achieve an even distribution among the questions based on the CategoryId.
For example:
If the ColNum is 10, and the CategoryId obtained are {1,5}, I would like to achieve by getting 5 records that are from CategoryId = 1 and another set of 5 records from CategoryId = 5. If the ColNum is an odd number like 11, I would also like to achieve an even distribution as much as possible like maybe getting 5 records from CategoryId 1 and 6 records from CategoryId 2.
How do I do this?
This is a two step process,
Determine how many you want for each category
Select that many items from each category in a random order
For the first part, define a class to represent the category and how many items are required
public class CategoryLookup
{
public CategoryLookup(int catId)
{
this.CategoryId = catId;
}
public int CategoryId
{
get; private set;
}
public int RequiredAmount
{
get; private set;
}
public void Increment()
{
this.RequiredAmount++;
}
}
And then, given your inputs of the required categories and the total number of items required, work out how many are required for each category
var categoryList = new []{1,5};
var colNum = 7;
var categoryLookup = categoryList.Select(x => new CategoryLookup(x)).ToArray();
for(var i = 0;i<colNum;i++){
categoryLookup[i%categoryList.Length].Increment();
}
The second part is really easy, just use a SelectMany to get the list of questions (Ive used a straight linq to objects to test, should work fine for database query. questions in my code would just be db.Questions in yours)
var result = categoryLookup.SelectMany(
c => questions.Where(q => q.CategoryId == c.CategoryId)
.OrderBy(x => Guid.NewGuid())
.Take(c.RequiredAmount)
);
Live example: http://rextester.com/RHF33878
You could try something like this:
var CategoryList = {1,5};
var generatedQues = new List<Question>();
//Algorithm 1 :)
if (ColNum > 0 && CategoryList.Count > 0)
{
var take = // Calculate how many of each
// First category
var query = db.Questions
.Where(q => q.CategoryId == CategoryList[0])
.OrderBy(q => Guid.NewGuid()).Take(take);
// For all remaining categories
for(int i = 1; i < CategoryList.Count; i++)
{
// Calculate how many you want
take = // Calculate how many of each
// Union the questions for that category to query
query = query.Union(
query
.Where(q => q.CategoryId == CategoryList[i])
.OrderBy(q => Guid.NewGuid()).Take(take));
}
// Randomize again and execute query
generatedQues = query.OrderBy(q => Guid.NewGuid()).ToList()
}
The idea is to just get a random list for each category and add them all together. Then you randomize that again and create your list. I do not know if it will do all this on the database or in memory, but it should be database I think. The resulting SQL will look horrible though.
Full disclosure, I'm pretty much a total noob whe it comes to linq. I could be way of base on how i should be approaching this.
I have a DataTable with 3 columns
oid,idate,amount
each id has multiple dates, and each date has multiple amounts. What I need to do is sum the amount for each day for each id, so instead of:
id,date,amount
00045,02/13/2011,11.50
00045,02/14/2011,11.00
00045,02/14/2011,12.00
00045,02/15/2011,10.00
00045,02/15/2011,5.00
00045,02/15/2011,12.00
00054,02/13/2011,8.00
00054,02/13/2011,9.00
I would have:
id,date,SumOfAmounts
00045,02/13/2011,11.50
00045,02/14/2011,23.00
00045,02/15/2011,27.00
00054,02/13/2011,17.00
private void excelDaily_Copy_Into(DataTable copyFrom, DataTable copyTo)
{
var results = from row in copyFrom.AsEnumerable()
group row by new
{
oid = row["oid"],
idate = row["idate"]
} into n
select new
{
///unsure what to do
}
};
I've tried a dozen or so different ways of doing this and I always sort of hit a wall where i can't figure out how to progress. I've been all over stack overflow and the msdn and nothing so far has really helped me.
Thank you in advance!
You could try this:
var results = from row in copyFrom.AsEnumerable()
group row by new
{
oid = row.Field<int>("oid"),// Or string, depending what is the real type of your column
idate = row.Field<DateTime>("idate")
} into g
select new
{
g.Key.oid,
g.Key.idate,
SumOfAmounts=g.Sum(e=>e.Field<decimal>("amount"));
};
I suggest to use Field extension method which provides strongly-typed access to each of the column values in the specified row.
Although you don't specify it, apparently copyFrom is an object from a class DataTable that implements IEnumerable.
According to MSDN System.Data.DataTable the class does not implement it. If you use that class, you need property Rows, which returns a collections of rows that implements IEnumerable:
IEnumerable<DataRow> rows = copyFrom.Rows.Cast<DataRow>()
but if you use a different DataTable class, you'll probably do something similar to cast it to a sequence of DataRow.
An object of class System.Data.DataRow has item properties to access the columns in the row. In your case the column names are oid, idate and amount.
To convert your copyFrom to the sequence of items you want to do the processing on is:
var itemsToProcess = copyFrom.Rows.Cast<DataRow>()
.Select(row => new
{
Oid = row["oid"],
Date = (DateTime)row["idate"],
Amount = (decimal)row["amount"],
});
I'm not sure, but I assume that column idate contains dates and column amount contains some value. Feel free to use other types if your columns contain other types.
If your columns contain strings, convert them to the proper items using Parse:
var itemsToProcess = copyFrom.Rows.Cast<DataRow>()
.Select(row => new
{
Id = (string)row["oid"],
Date = DateTime.Parse( (string) row["idate"]),
Amount = Decimal.Parse (string) row["amount"]),
});
If you are unfamiliar with the lambda expressions. It helped me a lot to read it as follows:
itemsToProcess is a collection of items, taken from the collection of
DataRows, where from each row in this collection we created a new
object with three properties: Id = ...; Data = ...; Amount = ...
See
Explanation of Standard Linq oerations for Cast and Select
Anonymous Types
Now we have a sequence where we can compare dates and sum the amounts.
What you want, is to group all items in this sequence into groups with the same Id and Date. So you want a group where with Id = 00045 and Date = 02/13/2011, and a group with Id = 00045 and date = ,02/14/2011.
For this you use Enumerable.GroupBy. As the selector (= what have all items in one group in common) you use the combination of Id and Date:
var groups = itemsToProcess.GroupBy(item => new
{Id = item.Id, Data = item.Data} );
Now you have groups.
Each group has a property Key, of a type with two properties: Id and Data.
Each group is a sequence of items from your itemsToProcess collection (so it is an "itemToprocess" with Id / Data / Value properties)
all items in one group have the same Id and same Data.
So all you have to do is Sum all elements from the sequence in each group.
var resultSequence = groups.Select(groupItem => new
{
Id = groupItem.Key.Id
Date = groupItem.Key.Date,
Sum = groupItem.Sum(itemToProcess => itemToProcess.Value,
}
So putting it all together into one statement:
var resultSequence = copyFrom.Rows.Cast<DataRow>()
.Select(row => new
{
Id = (string)row["oid"],
Date = DateTime.Parse( (string) row["idate"]),
Amount = Decimal.Parse (string) row["amount"]),
})
.GroupBy (itemToProcess => new
{
Id = item.Id,
Data = item.Data
});
.Select(groupItem => new
{
Id = groupItem.Key.Id
Date = groupItem.Key.Date,
Sum = groupItem.Sum(itemToProcess => itemToProcess.Value,
});
I am wondering what is recommended in the following scenario:
I have a large loop that I traverse to get an ID which I then store in a database like so:
foreach (var rate in rates)
{
// get ID from rate name
Guid Id = dbContext.DifferentEntity
.Where(x => x.Name == rate.Name).FirstOrDefault();
// create new object with the newly discovered
// ID to insert into the database
dbContext.YetAnotherEntity.Add(new YetAnotherEntity
{
Id = Guid.NewGuid(),
DiffId = Id,
}
}
Would it be better/ faster to do this instead (first get all DifferentEntity IDs, rather than querying for them separately)?
List<DifferentEntity> differentEntities = dbContext.DifferentEntity;
foreach (var rate in rates)
{
// get ID from rate name
Guid Id = differentEntities
.Where(x => x.Name == rate.Name).FirstOrDefault();
// create new object with the newly discovered
// ID to insert into the database
dbContext.YetAnotherEntity.Add(new YetAnotherEntity
{
Id = Guid.NewGuid(),
DiffId = Id,
}
}
Is the difference negligible or is this something I should consider? Thanks for your advice.
Store your Rate Names in a sorted string array (string[]) instead of a List or Collection. Then use Array.BinarySearch() to make your search much faster. Rest of what I was going to write has already been written by #Felipe above.
Run them horses! There is really a lot we do not know. Is it possible to keep all the entities in memory? How many of them are duplicates with respect to Name?
A simplistic solution with one fetch from the database and usage of parallelism:
// Fetch entities
var entitiesDict = dbContext.DifferentEntity
.Distinct(EqualityComparerForNameProperty).ToDictionary(e => e.Name);
// Create the new ones real quick and divide into groups of 500
// (cause that horse wins in my environment with complex entities,
// maybe 5 000 or 50 000 fits your scenario better since they are not that complex?)
var newEnts = rates.AsParallel().Select((rate, index) => {
new {
Value = new YetAnotherEntity
{ Id = Guid.NewGuid(), DiffId = entitiesDict[rate.Name],},
Index = index
}
})
.GroupAdjacent(anon => anon.Index / 500) // integer division, and note GroupAdjacent! (not GroupBy)
.Select(group => group.Select(anon => anon.Value)); // do the select so we get the ienumerables
// Now we have to add them to the database
Parallel.ForEach(groupedEnts, ents => {
using (var db = new DBCONTEXT()) // your dbcontext
{
foreach(var ent in ents)
db.YetAnotherEntity.Add(ent);
db.SaveChanges();
}
});
In general in database scenarios, the expensive stuff is the fetch and commits, so try to keep them to a minimum.
You can decrease the number of queries you are doing in database. For example, take all names and query findind Ids where the names contains.
Try something like this.
// get all names you have in rates list...
var rateNames = rates.Select(x => x.Name).ToList();
// query all Ids you need where contains on the namesList... 1 query, 1 column (Id, I imagine)
var Ids = dbContext.DifferentEntity.Where(x => rateNames.Contains(x.Name).Select(x => x.Id).ToList();
// loop in Ids result, and add one by one
foreach(var id in Ids)
dbContext.YetAnotherEntity.Add(new YetAnotherEntity
{
Id = Guid.NewGuid(),
DiffId = id,
}
The situation is, that i need to create table in grid view looking like this:
----------| ID---|---Name--|--1/2002--|--2/2002--|--1/2003--|........| 2/2009 |
Cust1--|
Cust2--|
:
:
I have two tables in db - Customers and orders, throught LINQ to SQL DataContext
ID and Name of the customers i´m getting from a simple query
var custInfo = from cust in db.Customers
select new { ID = cust.Id,
FullName = cust.FirstName + " " + cust.LastName }
dataGridOrdersPreview.DataSource = custInfo;
And i need some clue, how to generate that columns in format t/year where t indicates the first or second half of the year, and assign to that generated columns each Customer´s orders in that session of the year ( displaying only costs )
[edit]
As far as now, i´m attempting to something like this:
var orders = from ord in db.Orders
group ord by ord.Id_cust into grouped
let costs = grouped
.Where( s => s.YearSession == session && s.Year == year)
.Select(a => new { Costs = a.Cost ) } )
select new { ID = grouped.Key,
Name = custInfo
.Where( a => a.ID == grouped.Key)
.Select( j => j.Name).Single(),
Cost = ExtensionLibrary.Sum(costs, "\n")
};
( in Cost getting only the summed costs in that year session for each customer )
and then i think about iterating throuhgh the years and sessions and getting
somehow the query results to corresponding columns
while (year <= DateTime.Today.Year)
{
year++;
while (session < 2)
{
session++;
dataGridOrdersPreview.Columns.Add(session +"/"+ year);
col.Add((session +"/"+ year),
orders.Select( a => a.Cost ).ToList() );
/* col is Dictionary<string, List<string> > */
}
session = 0;
}
Here i have generated columns that i want and i have orders in Dictionary where Key is column name and Value are orders in that column, but i need some help binding it to that columns
The way I've seen it done is to create a class that has the properties that you want, for example,
class CustOrders
{
public string CustName {get; set;}
public int Orders2002-1 {get; set;}
public int Orders2002-2 {get; set;}
...
public int Orders2009-1 {get; set;}
}
Then use the System.Windows.Forms.BindingSource, call it say CustOrdsBindingSource and set its DataSource to a list of your new class.
List<CustOrders> myListOfCustOrders = new List<CustOrders>();
/* Code to populate myListOfCustOrders */
CustOrdsBindingSource.DataSource = myListOfCustOrders;
In this case you will have to write the code to convert each result of your query results to an instance of CustOrders and store it in myListOfCustOrders.
Finally, the grid view's data source will also have to be set:
gridView1.DataSource = CustOrdsBindingSource;
The big problem I see with this approach is that you will have to change the CustOrders class every year unless there is some voodoo some can suggest to insert properties into the class at run time.
Either way, I hope this gives you a start.
As long as there will be no updating/adding/deleting of rows, I think I would just generate that grid manually.
Fetch the list of customers and the count of how many sales in what year/session. And then in the form take that list and create the needed columns.