Efficient Way To Query Nested Data

Efficient Way To Query Nested Data - c#

I have need to select a number of 'master' rows from a table, also returning for each result a number of detail rows from another table. What is a good way of achieving this without multiple queries (one for the master rows and one per result to get the detail rows).
For example, with a database structure like below:
MasterTable:
- MasterId BIGINT
- Name NVARCHAR(100)
DetailTable:
- DetailId BIGINT
- MasterId BIGINT
- Amount MONEY
How would I most efficiently populate the data object below?
IList<MasterDetail> data;
public class Master
{
private readonly List<Detail> _details = new List<Detail>();
public long MasterId
{
get; set;
}
public string Name
{
get; set;
}
public IList<Detail> Details
{
get
{
return _details;
}
}
}
public class Detail
{
public long DetailId
{
get; set;
}
public decimal Amount
{
get; set;
}
}

Normally, I'd go for the two grids approach - however, you might also want to look at FOR XML - it is fairly easy (in SQL Server 2005 and above) to shape the parent/child data as xml, and load it from there.
SELECT parent.*,
(SELECT * FROM child
WHERE child.parentid = parent.id FOR XML PATH('child'), TYPE)
FROM parent
FOR XML PATH('parent')
Also - LINQ-to-SQL supports this type of model, but you need to tell it which data you want ahead of time. Via DataLoadOptions.LoadWith:
// sample from MSDN
Northwnd db = new Northwnd(#"c:\northwnd.mdf");
DataLoadOptions dlo = new DataLoadOptions();
dlo.LoadWith<Customer>(c => c.Orders);
db.LoadOptions = dlo;
var londonCustomers =
from cust in db.Customers
where cust.City == "London"
select cust;
foreach (var custObj in londonCustomers)
{
Console.WriteLine(custObj.CustomerID);
}
If you don't use LoadWith, you will get n+1 queries - one master, and one child list per master row.

It can be done with a single query like this:
select MasterTable.MasterId,
MasterTable.Name,
DetailTable.DetailId,
DetailTable.Amount
from MasterTable
inner join
DetailTable
on MasterTable.MasterId = DetailTable.MasterId
order by MasterTable.MasterId
Then in psuedo code
foreach(row in result)
{
if (row.MasterId != currentMaster.MasterId)
{
list.Add(currentMaster);
currentMaster = new Master { MasterId = row.MasterId, Name = row.Name };
}
currentMaster.Details.Add(new Detail { DetailId = row.DetailId, Amount = row.Amount});
}
list.Add(currentMaster);
There's a few edges to knock off that but it should give you the general idea.

select < columns > from master
select < columns > from master M join Child C on M.Id = C.MasterID

You can do it with two queries and one pass on each result set:
Query for all masters ordered by MasterId then query for all Details also ordered by MasterId. Then, with two nested loops, iterate the master data and create a new Master object foreach row in the main loop, and iterate the details while they have the same MasterId as the current Master object and populate its _details collection in the nested loop.

Depending on the size of your dataset you can pull all of the data into your application in memory with two queries (one for all masters and one for all nested data) and then use that to programatically create your sublists for each of your objects giving something like:
List<Master> allMasters = GetAllMasters();
List<Detail> allDetail = getAllDetail();
foreach (Master m in allMasters)
m.Details.Add(allDetail.FindAll(delegate (Detail d) { return d.MasterId==m.MasterId });
You're essentially trading memory footprint for speed with this approach. You can easily adapt this so that GetAllMasters and GetAllDetail only return the master and detail items you're interested in. Also note for this to be effective you need to add the MasterId to the detail class

This is an alternative you might consider. It does cost $150 per developer, but time is money too...
We use an object persistence layer called Entity Spaces that generates the code for you to do exactly what you want, and you can regenerate whenever your schema changes. Populating the objects with data is transparent. Using the objects you described above would look like this (excuse my VB, but it works in C# too):
Dim master as New BusinessObjects.Master
master.LoadByPrimaryKey(43)
Console.PrintLine(master.Name)
For Each detail as BusinessObjects.Detail in master.DetailCollectionByMasterId
Console.PrintLine(detail.Amount)
detail.Amount *= 1.15
End For
With master.DetailCollectionByMasterId.AddNew
.Amount = 13
End With
master.Save()

Related

Optimize EF core query in an alphabetically ordered list

I've been dealing with an issue lately, and although i have some solutions in mind, i'd like to find the best one from every point of view.
Let's say i have a WPF app with EF Core. There are about 3000 customers in my database (SQLite in my case, but in the future this should also work with slower ones). When the user opens the customer's list, i'm loading only some of them (quantity = 50, page = 0), in alphabetical order. As soon as the user scrolls down to the bottom, 50 more are loaded (quantity = 50, page = 1).
CustomerRepository.GetQueryableAll().Skip(page * quantity).Take(quantity).ToList();
Everything works fine. Here comes the problem though: there's a button to create a new customer, which opens a modal window. Let's say the user creates a customer with starting letter W. As soon as he/she hits SAVE, the new customer is saved to the database, the window is closed, and the list must be reloaded. But loading the whole list until W is, of course, really slow.
So far, i've tried to query the database in a background task and store how many customers start with each letter of the database in a static Dictionary: as soon as SAVE is hit, i can guess more or less how many "pages" to Skip() in the database and get the group of 50 in which the new customer will be. It works, it's quite fast, but i'm worried that it won't work in countries with non Latin alphabets:
public async Task<Dictionary<char, int>> GetCustomersByInitialsCount()
{
return await Task.Run(async delegate
{
var dictionary = new Dictionary<char, int>();
for (char c = 'A'; c <= 'Z'; c++)
{
var count = await CustomerRepository.GetCustomerCountStartingWith(c.ToString());
dictionary.Add(c, count);
}
return dictionary;
});
}
[... and in the repository:]
public async Task<int> GetCustomerCountStartingWith(string startingLetter)
{
using (var dbContext = new MyDbContext())
{
return await dbContext.Set<Customer>().CountAsync(p => p.LastName.ToUpper().StartsWith(startingLetter.ToUpper()));
}
}
Otherwise, instead of this background query, i could also try to "guess" the right page depending on the starting char, but i'm still puzzled by the unexpected outcomes i could have with non latin languages.
If anybody knows better tools or have any other useful ideas, i'll gladly consider them!
Thank you very much in advance and happy coding.

What if you add a request to get all the first "letters" in your table ?
public async Task<List<string>> GetCustomerFirstLetter()
{
using (var dbContext = new MyDbContext())
{
return await dbContext.Set<Customer>().Select(x => x.lastName.Substring(0, 1)).Distinct().ToList();
}
}
and then
public async Task<Dictionary<char, int>> GetCustomersByInitialsCount()
{
return await Task.Run(async delegate
{
var dictionary = new Dictionary<char, int>();
var letters = GetCustomerFirstLetter();
foreach(letter in letters)
{
var count = await CustomerRepository.GetCustomerCountStartingWith(letter);
dictionary.Add(letter, count);
}
return dictionary;
});
}

Alternative solution. A little bit more efficient from my point of view
Your problem boils down to how to get new customer's row number in whole dataset ordered by customer's name.
First of all, in plain SQL for SQLite or MSSQL you may solve your problem of getting right page number with ROW_NUMBER function. Query example:
SELECT TOP 1 rnd.rownum, rnd.LastName
from (SELECT ROW_NUMBER() OVER( ORDER BY c.LastName) AS rownum, c.LastName
FROM [Customer] c) rnd
WHERE rnd.LastName = '<your new customers name here>'
So, after getting exact rownumber value and having already page count param you can easily calculate needed page.
Getting back to your code. This feature can be implemented in EF with overloaded version of Select method, but unfortunately, it has not been implemented in EF Core for IQueryable yet (see this).
But you can still pass exact query right to db using FromSql method.
Solution consists of two steps:
To get required data you need to define Query for model builder this way (additional fields just for example, youl need RowNum only):
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Query<CustomerRownNum>();
}
public class CustomerRownNum
{
public long RowNum { get; set; }
public Guid Id { get; set; }
public string LastName { get; set; }
}
Then you need to pass mentioned above SQL query to context's Query method this way:
string customerLastName = "<your customer's last name>";
var result = dbContext.Query<CustomerRownNum>().FromSql(
#"select top 1 rnd.RowNum, rnd.Id, rnd.LastName
from
(SELECT ROW_NUMBER() OVER( ORDER BY c.LastName) AS RowNum
, c.Id, c.LastName
FROM [Customer] c) rnd
WHERE rnd.LastName = {0}", customerLastName).FirstOrDefault();
Finally you'll get data you needed right in result variable.
Hope that helps!

Update collection from DbSet object via Linq

i know it is not complicated but i struggle with it.
I have IList<Material> collection
public class Material
{
public string Number { get; set; }
public decimal? Value { get; set; }
}
materials = new List<Material>();
materials.Add(new Material { Number = 111 });
materials.Add(new Material { Number = 222 });
And i have DbSet<Material> collection
with columns Number and ValueColumn
I need to update IList<Material> Value property based on DbSet<Material> collection but with following conditions
Only one query request into database
The returned data from database has to be limited by Number identifier (do not load whole database table into memory)
I tried following (based on my previous question)
Working solution 1, but download whole table into memory (monitored in sql server profiler).
var result = (
from db_m in db.Material
join m in model.Materials
on db_m.Number.ToString() equals m.Number
select new
{
db_m.Number,
db_m.Value
}
).ToList();
model.Materials.ToList().ForEach(m => m.Value= result.SingleOrDefault(db_m => db_m.Number.ToString() == m.Number).Value);
Working solution 2, but it execute query for each item in the collection.
model.Materials.ToList().ForEach(m => m.Value= db.Material.FirstOrDefault(db_m => db_m.Number.ToString() == m.Number).Value);
Incompletely solution, where i tried to use contains method
// I am trying to get new filtered collection from database, which i will iterate after.
var result = db.Material
.Where(x=>
// here is the reasonable error: cannot convert int into Material class, but i do not know how to solve this.
model.Materials.Contains(x.Number)
)
.Select(material => new Material { Number = material.Number.ToString(), Value = material.Value});
Any idea ? For me it is much easier to execute stored procedure with comma separated id values as a parameter and get the data directly, but i want to master linq too.

I'd do something like this without trying to get too cute :
var numbersToFilterby = model.Materials.Select(m => m.Number).ToArray();
...
var result = from db_m in db.Material where numbersToFilterBy.Contains(db_m.Number) select new { ... }

Indexing subcategories vs finding them dynamically (performance)

I'm building a web-based store application, and I have to deal with many nested subcategories within each other. The point is, I have no idea whether my script will handle thousands (the new system will replace the old one, so I know what traffic I have to expect) - at the present day, respond lag from the local server is 1-2 seconds more than other pages with added about 30 products in different categories.
My code is the following:
BazaArkadiaDataContext db = new BazaArkadiaDataContext();
List<A_Kategorie> Podkategorie = new List<A_Kategorie>();
public int IdKat { get; set; }
protected void Page_Load(object sender, EventArgs e)
{
if (!IsPostBack)
{
List<A_Produkty> Produkty = new List<A_Produkty>(); //list of all products within the category and remaining subcategories
if (Page.RouteData.Values["IdKategorii"] != null)
{
string tmpkat = Page.RouteData.Values["IdKategorii"].ToString();
int index = tmpkat.IndexOf("-");
if (index > 0)
tmpkat = tmpkat.Substring(0, index);
IdKat = db.A_Kategories.Where(k => k.ID == Convert.ToInt32(tmpkat)).Select(k => k.IDAllegro).FirstOrDefault();
}
else
return;
PobierzPodkategorie(IdKat);
foreach (var item in Podkategorie)
{
var x = db.A_Produkties.Where(k => k.IDKategorii == item.ID);
foreach (var itemm in x)
{
Produkty.Add(itemm);
}
}
//data binding here
}
}
List<A_Kategorie> PobierzPodkategorie(int IdKat, List<A_Kategorie> kat = null)
{
List<A_Kategorie> Kategorie = new List<A_Kategorie>();
if (kat != null)
Kategorie.Concat(kat);
Kategorie = db.A_Kategories.Where(k => k.KatNadrzedna == IdKat).ToList();
if (Kategorie.Count() > 0)
{
foreach (var item in Kategorie)
{
PobierzPodkategorie(item.IDAllegro, Kategorie);
Podkategorie.Add(item);
}
}
return Kategorie;
}
TMC;DR*
My function PobierzPodkategorie recursively seeks through subcategories (subcategory got KatNadrzedna column for its parent category, which is placed in IDAllegro), selects all the products with the subcategory ID and adds it to the Produkty list. The database structure is pretty wicked, as the category list is downloaded from another shop service server and it needed to get our own ID column in case the foreign server would change the structure.
There are more than 30 000 entries in the category list, some of them will have 5 or more parents, and the website will show only main categories and subcategories ("lower" subcategories are needed by external shop connected with SOAP).
My question is
Will adding index table to the database (Category 123 is parent for 1234, 12738...) will improve the performance, or is it just waste of time? (The index should be updated when version of API changes and I have no idea how often would it be) Or is there other way to do it?
I'm asking because changing the script will not be possible in production, and I don't know how the db engine handles lots of requests - I'd really appreciate any help with this.
The database is MSSQL
*Too much code; didn't read

The big efficiency gain you can get is to load all subproducts in a single query. The time saved by reducing network trips can be huge. If 1 is a root category and 12 a child category, you can query all root categories and their children like:
select *
from Categories
where len(Category) <= 2
An index on Category would not help with the above query. But it's good practice to have a primary key on any table. So I'd make Category the primary key. A primary key is unique, preventing duplicates, and it is indexed automatically.
Moving away from RBAR (row by agonizing row) has more effect than proper tuning of the database. So I'd tackle that first.

You definitely should move the recursion into database. It can be done using WITH statement and Common Table Expressions. Then create a view or stored procedure and map it to you application.
With that you should be able to reduce SQL queries to two (or even one).

Linq-to-Entities return list along with average of ratings infor each item

I am returning a list of restaurants that pulls information from the RESTAURANT, CUISINE, CITY, and STARRATING tables. I want to get a list of each restaurant with its associated city and cuisine along with the average rating in the STARRATING table. This is what I have, so far ... Thanks in advance.
RestaurantsEntities db = new RestaurantsEntities();
public List<RESTAURANT> getRestaurantsWRating(string cuisineName, string cityName, string priceName, string ratingName)
{
var cuisineID = db.CUISINEs.First(s => s.CUISINE_NAME == cuisineName).CUISINE_ID;
List<RESTAURANT> result = (from RESTAURANT in db.RESTAURANTs.Include("CITY").Include("CUISINE").Include("STARRATING")
where RESTAURANT.CUISINE_ID == cuisineID
orderby RESTAURANT.REST_NAME ascending
select RESTAURANT).ToList();
return result;
}

From what you have it looks like Restaurant has a STARRATING collection. If so, this is what you can do:
from r in db.Restaurants
where r.CUISINE_ID == cuisineID
orderby r.REST_NAME ascending
select new {
Restaurant = r,
City = r.CITY,
Cuisine = r.CUISINE,
AvgRating = r.STARRATING.Average(rt => rt.Rating)
}
You'd need to give more informations about your classes and associations (preferably a class diagram) if this is not right.
(BTW using capitals for class and property names is not conventional).

First I would wrap your whole code block above in a using statement:
using(RestaurantEntities db = new RestaurantEntities())
{
...
}
This will help with cleanup for the EF context.
The way I would typically do this is if you have control of your database, I would create a view in the database that does this work, add the view to your entity model and query the view. This simplifies the whole process and offloads the work of the aggregation to the database.
If you don't have control over the database or don't prefer the view technique then I would query using the include technique as you have done and then add a partial class to RESTAURANT (if using model-first) in order to add an AverageRating property and then manually calculate the average for each related STARRATING set of related rows and apply the resultant value to the added property. You could do this through linq to objects once you have all the data back. This technique would not scale very well as more data is accumulated unless you are confident you never return but one or a few RESTAURANT instances. You could use something like:
//query data as you have done above...
foreach(RESTAURANT r in result)
{
if(r.STARRATING.Count() > 0)
{
r.AverageRating = r.STARRATING.Average(rating => rating.Value); //.Value is your field name
}
else
{
r.AverageRating = 0; // or whatever default you prefer...
}
}
Hope this helps.

Best way to dynamically get column names from oracle tables

We are using an extractor application that will export data from the database to csv files. Based on some condition variable it extracts data from different tables, and for some conditions we have to use UNION ALL as the data has to be extracted from more than one table. So to satisfy the UNION ALL condition we are using nulls to match the number of columns.
Right now all the queries in the system are pre-built based on the condition variable. The problem is whenever there is change in the table projection (i.e new column added, existing column modified, column dropped) we have to manually change the code in the application.
Can you please give some suggestions how to extract the column names dynamically so that any changes in the table structure do not require change in the code?
My concern is the condition that decides which table to query. The variable condition is
like
if the condition is A, then load from TableX
if the condition is B then load from TableA and TableY.
We must know from which table we need to get data. Once we know the table it is straightforward to query the column names from the data dictionary. But there is one more condition, which is that some columns need to be excluded, and these columns are different for each table.
I am trying to solve the problem only for dynamically generating the list columns. But my manager told me to make solution on the conceptual level rather than just fixing. This is a very big system with providers and consumers constantly loading and consuming data. So he wanted solution that can be general.
So what is the best way for storing condition, tablename, excluded columns? One way is storing in database. Are there any other ways? If yes what is the best? As I have to give at least a couple of ideas before finalizing.
Thanks,

A simple query like this helps you to know each column name of a table in Oracle.
Select COLUMN_NAME from user_tab_columns where table_name='EMP'
Use it in your code :)

Ok, MNC, try this for size (paste it into a new console app):
using System;
using System.Collections.Generic;
using System.Linq;
using Test.Api;
using Test.Api.Classes;
using Test.Api.Interfaces;
using Test.Api.Models;
namespace Test.Api.Interfaces
{
public interface ITable
{
int Id { get; set; }
string Name { get; set; }
}
}
namespace Test.Api.Models
{
public class MemberTable : ITable
{
public int Id { get; set; }
public string Name { get; set; }
}
public class TableWithRelations
{
public MemberTable Member { get; set; }
// list to contain partnered tables
public IList<ITable> Partner { get; set; }
public TableWithRelations()
{
Member = new MemberTable();
Partner = new List<ITable>();
}
}
}
namespace Test.Api.Classes
{
public class MyClass
{
private readonly IList<TableWithRelations> _tables;
public MyClass()
{
// tableA stuff
var tableA = new TableWithRelations { Member = { Id = 1, Name = "A" } };
var relatedclasses = new List<ITable>
{
new MemberTable
{
Id = 2,
Name = "B"
}
};
tableA.Partner = relatedclasses;
// tableB stuff
var tableB = new TableWithRelations { Member = { Id = 2, Name = "B" } };
relatedclasses = new List<ITable>
{
new MemberTable
{
Id = 3,
Name = "C"
}
};
tableB.Partner = relatedclasses;
// tableC stuff
var tableC = new TableWithRelations { Member = { Id = 3, Name = "C" } };
relatedclasses = new List<ITable>
{
new MemberTable
{
Id = 2,
Name = "D"
}
};
tableC.Partner = relatedclasses;
// tableD stuff
var tableD = new TableWithRelations { Member = { Id = 3, Name = "D" } };
relatedclasses = new List<ITable>
{
new MemberTable
{
Id = 1,
Name = "A"
},
new MemberTable
{
Id = 2,
Name = "B"
},
};
tableD.Partner = relatedclasses;
// add tables to the base tables collection
_tables = new List<TableWithRelations> { tableA, tableB, tableC, tableD };
}
public IList<ITable> Compare(int tableId, string tableName)
{
return _tables.Where(table => table.Member.Id == tableId
&& table.Member.Name == tableName)
.SelectMany(table => table.Partner).ToList();
}
}
}
namespace Test.Api
{
public class TestClass
{
private readonly MyClass _myclass;
private readonly IList<ITable> _relatedMembers;
public IList<ITable> RelatedMembers
{
get { return _relatedMembers; }
}
public TestClass(int id, string name)
{
this._myclass = new MyClass();
// the Compare method would take your two paramters and return
// a mathcing set of related tables that formed the related tables
_relatedMembers = _myclass.Compare(id, name);
// now do something wityh the resulting list
}
}
}
class Program
{
static void Main(string[] args)
{
// change these values to suit, along with rules in MyClass
var id = 3;
var name = "D";
var testClass = new TestClass(id, name);
Console.Write(string.Format("For Table{0} on Id{1}\r\n", name, id));
Console.Write("----------------------\r\n");
foreach (var relatedTable in testClass.RelatedMembers)
{
Console.Write(string.Format("Related Table{0} on Id{1}\r\n",
relatedTable.Name, relatedTable.Id));
}
Console.Read();
}
}
I'll get back in a bit to see if it fits or not.

So what you are really after is designing a rule engine for building dynamic queries. This is no small undertaking. The requirements you have provided are:
Store rules (what you call a "condition variable")
Each rule selects from one or more tables
Additionally some rules specify columns to be excluded from a table
Rules which select from multiple tables are satisfied with the UNION ALL operator; tables whose projections do not match must be brought into alignment with null columns.
Some possible requirements you don't mention:
Format masking e.g. including or excluding the time element of DATE columns
Changing the order of columns in the query's projection
The previous requirement is particularly significant when it comes to the multi-table rules, because the projections of the tables need to match by datatype as well as number of columns.
Following on from that, the padding NULL columns may not necessarily be tacked on to the end of the projection e.g. a three column table may be mapped to a four column table as col1, col2, null, col3.
Some multi-table queries may need to be satisfied by joins rather than set operations.
Rules for adding WHERE clauses.
A mechanism for defining default sets of excluded columns (i.e. which are applied every time a table is queried) .
I would store these rules in database tables. Because they are data and storing data is what databases are for. (Unless you already have a rules engine to hand.)
Taking the first set of requirements you need three tables:
RULES
-----
RuleID
Description
primary key (RuleID)
RULE_TABLES
-----------
RuleID
Table_Name
Table_Query_Order
All_Columns_YN
No_of_padding_cols
primary key (RuleID, Table_Name)
RULE_EXCLUDED_COLUMNS
---------------------
RuleID
Table_Name
Column_Name
primary key (RuleID, Table_Name, Column_Name)
I've used compound primary keys just because it's easier to work with them in this context e.g. running impact analyses; I wouldn't recommend it for regular applications.
I think all of these are self-explanatory except the additional columns on RULE_TABLES.
Table_Query_Order specifies the order in which the tables appear in UNION ALL queries; this matters only if you want to use the column_names of the leading table as headings in the CSV file.
All_Columns_YN indicates whether the query can be written as SELECT * or whether you need to query the column names from the data dictionary and the RULE_EXCLUDED_COLUMNS table.
No_of_padding_cols is a simplistic implementation for matching projections in those UNION ALL columns, by specifying how many NULLs to add to the end of the column list.
I'm not going to tackle those requirements you didn't specify because I don't know whether you care about them. The basic thing is, what your boss is asking for is an application in its own right. Remember that as well as an application for generating queries you're going to need an interface for maintaining the rules.

MNC,
How about creating a dictionary of all the known tables involved in the application process up front (irrespective of the combinations - just a dictionary of the tables) which is keyed on tablename. the members of this dictionary would be a IList<string> of the column names. This would allow you to compare two tables on both the number of columns present dicTable[myVarTableName].Count as well as iterating round the dicTable[myVarTableName].value to pull out the column names.
At the end of the piece, you could do a little linq function to determine the table with the greatest number of columns and create the structure with nulls accordingly.
Hope this gives food for thought..

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.