Best practices for my dynamic queries in Entity Framework - c#

I am building a web application that is a recreation of an older system, and I am trying to build it in an architected, yet pragmatic and maintainable way (unlike the old system). Anyways, I am currently designing my queries for my models in my application. The old system allows developers to assign any field through a boolean to be a searchable value from a table, meaning a single view for maintaining some models' records might contain 20 searchable fields in the front-end and doing that only requires ticking a single box.
Now I would like to implement something similar in this new system with C# with a backend using EF as the data mapper, but I am not sure what approach is the most maintainable. In my current approach the filters are sent by the client as a record that (at most) contains all the possible filterable fields e.g
public record GetOrderQuery()
{
public string OrderReference { get; set;}
public string OrdererName { get; set; }
public int ItemCount { get; set; }
//etc...
}
I am fine with it, if the record limits filters which can be applied ( should I have the record contain an object that has fieldName, fieldValue, queryType and have that as an iterable property in the record instead?), but I would like to streamline the actual filtering. Basically if the client sent any of the above fields in the request (as JSON and none are required), the filtering is applied to those fields. I am currently thinking that I could implement this with reflection: I try to find a field in the actual model where the property name is the same as in the record, then I construct the predicate for the Where() by chaining expressions.
I construct expressions for each property that has a value in the query and can be found through reflection (a property with the same name), then I link those together using a Binary Expressions, combinining each of the filters in to a single expression. I am not sure if this is the best approach or even what is a good way to implement this though (performance or maintainability wise or just in general). Are there any other ways to implement this, are there any pitfalls in this I should look out for, any resources I should read? Thanks!

Related

RE: CRUD operations. Is it pulling more data than is needed a bad thing?

RE: CRUD operations... Is it pulling more data than is needed a bad thing?
Let me preface this with saying I really did search for this answer. On and off for some time now. I'm certain it's been asked/answered before but I can't seem to find it. Most articles seem to be geared towards how to perform basic CRUD operations. I'm really wanting to get deeper into best practices. Having said that, here's an example model I mocked up for example purposes.
public class Book
{
public long Id { get; set; }
public string Name { get; set; }
public decimal AverageRating { get; set; }
public decimal ArPoints { get; set; }
public decimal BookLevel { get; set; }
public string Isbn { get; set; }
public DateTime CreatedAt { get; set; }
public DateTime PublishedAt { get; set; }
public Author Author { get; set; }
public IEnumerable<Genre> Genres { get; set; }
}
I'm using ServiceStack's OrmLite, migrating string queries to object model binding wherever possible. It's a C# MVC.NET project, using Controller/Service/Repository layers with DI. My biggest problem is with Read and Update operations. Take Reads for example. Here are two methods (only wrote what I thought was germane) for example purposes.
public class BookRepository
{
public Book Single(long id)
{
return _db.SelectById<Book>(id);
}
public IEnumerable<Book> List()
{
return _db.Select<Book>();
}
}
Regardless of how this would need to change for the real world, the problem is simply that to much information is returned. Say if I were displaying a list of books to the user. Even if the List method were written so that it didn't pull nested methods (Author & Genres), it would have data for properties that were not used.
It seems like I could either learn to live with getting data I don't need or write a bunch of extra methods that changes what properties are pulled. Using the Single method, here's a few examples...
public Book SinglePublic(long id): Returns a few properties
public Book SingleSubscribed(long id): Returns most properties
public Book SingleAdmin(long id): Returns all properties
Having to write out methods like this for most tables doesn't seem very maintainable to me. But then, almost always getting unused information on most calls has to affect performance, right? I have to be missing something. Any help would be GREATLY appreciated. Feel free to just share a link, give me a PluralSight video to watch, recommend a book, whatever. I'm open to anything. Thank you.
As a general rule you should avoid pre-mature optimization and always start with the simplest & most productive solution first as avoiding complexity & large code-base sizes should be your first priority.
If you're only fetching a single row, you should definitely start by only using a single API and fetch the full Book entity, I'll personally also avoid the Repository abstraction which I view as an additional unnecessary abstraction, so I'd just be using OrmLite APIs directly in your Controller or Service, e.g:
Book book = db.SingleById<Book>(id);
You're definitely not going to notice the additional unused fields over the I/O cost of the RDBMS network call and the latency & bandwidth between your App and your RDBMS is much greater than additional info on the wire over the Internet. Having multiple APIs for the sake of reducing unused fields adds unnecessary complexity, increases code-base size / technical debt, reduces reusability, cacheability & refactorability of your code.
Times when to consider multiple DB calls for a single entity:
You've received feedback & given a task to improve the performance of a page/service
Your entity contains large blobbed text or binary fields like images
The first speaks to avoiding pre-mature optimization by first focusing on simplicity & productivity before optimizing to resolve known realizable performance issues. In that case first profile the code, then if it shows the issue is with the DB query you can optimize for only returning the data that's necessary for that API/page.
To improve performance I'd typically first evaluate whether caching is viable as it's typically the least effort / max value solution where you can easily cache APIs with a [CacheResponse] attribute which will cache the optimal API output for the specified duration or you can take advantage of caching primitives in HTTP to avoid needing to return any non-modified resources over the wire.
To avoid the second issue of having different queries without large blobbed data, I would extract it out into a different 1:1 row & only retrieve it when it's needed as large row sizes hurts overall performance in accessing that table.
Custom Results for Summary Data
So it's very rare that I'd have different APIs for accessing different fields of a single entity (more likely due to additional joins) but for returning multiple results of the same entity I would have a different optimized view with just the data required. This existing answer shows some ways to retrieve custom resultsets with OrmLite (See also Dynamic Result Sets in OrmLite docs).
I'll generally prefer to use a custom Typed POCO with just the fields I want the RDBMS to return, e.g. in a summary BookResult Entity:
var q = db.From<Book>()
.Where(x => ...);
var results = db.Select<BookResult>(q);
This is all relative to the task at hand, e.g. the fewer results returned or fewer concurrent users accessing the Page/API the less likely you should be to use multiple optimized queries whereas for public APIs with 1000's of concurrent users of frequently accessed features I'd definitely be looking to profiling frequently & optimizing every query. Although these cases would typically be made clear from stakeholders who'd maintain "performance is a feature" as a primary objective & allocate time & resources accordingly.
I can't speak to ORM Lite, but for Entity Framework the ORM will look ahead, and only return columns that are necessary to fulfill subsequent execution. If you couple this with view models, you are in a pretty good spot. So, for example, lets say you have a grid to display the titles of your books. You only need a subset of columns from the database to do so. You could create a view model like this:
public class BookListViewItem{
public int Id {get;set;}
public string Title {get; set;}
public BookListView(Book book){
Id = book.Id;
Title = book.Title;
}
}
And then, when you need it, fill it like this:
var viewModel = dbcontext.Books
.Where(i => i.whateverFilter)
.Select(i => new BookListViewItem(i))
.ToList();
That should limit the generated SQL to only request the id and title columns.
In Entity Framework, this is called 'projection'. See:
https://social.technet.microsoft.com/wiki/contents/articles/53881.entity-framework-core-3-projections.aspx

How to search large amount of data based on tags?

I'm planing to create an application to sort and view photos and images I have.
I want to give the program a list of folders (with subfolders) to handle and tag images with multiple, custom tags as I go through them. If I then enter one, or multiple, tags in a search bar I want all images with that tag to appear in a panel.
The go to approach would be SQL, but I don't want to have a SQL server running in the background. I want the program to be fully portable, so just the exe and maybe a small amount of files it creates.
I thought I would create a tree where every node is a folder and the leafs are the images. I would then add the tags of the leafs to the parent-node and cascade that upwards, so that the root node has a list of all the tags. This should allow for a fast search and with parallelisation for a fast building of the tree.
But before I start to work on such a tree I wondered if there is already something like this, or if there is a better approach?
Just to make it clear, I'm talking about multiple tags here, so a Dictionary won't work.
Tags by definition are unique and so cry out to be indexed and sorted.
A Dictionary<Tag,ImageCollection>. Why not? Seems ideal for tags.
A Dictionary<Image, TagCollection>. The reverse reference of the above. You don't want to try going through dictionary values to get at keys.
Create custom classes. Tag, Image, TagCollection, ImageCollection; then override Equals, GetHashCode, implement IComparable. This will optimize the built-in .net indexing, sorting, searching. Many collection "Find" methods take delegates for customized searching. Be sure to read MSDN documentation.
I think this could constitute the core structure. For any given query, staring with initial fetches from these structures should be pretty quick. And yielding custom collections will help too.
There is nothing wrong with a mix of LINQ and "traditional" coding. I expect that in any case you're better off with indexed/sorted tags.
Here's how I'd handle it.
First, use SQLite. It's a single-dll distribution, lightweight, superfast and impressively capable database whose sole purpose is to be used by these types of applications. A database is a far better approach than trying to persist trees to files (the issue with a custom persistence isn't that the idea in itself is bad, but rather than there's a dozen edge cases it'll need to handle that you're not likely to have thought of where a database has them automatically covered).
Second, set up some POCOs for your media and your tags. Something like this:
abstract class Media
{
public string Filename {get;set;}
public virtual ICollection<Tag> Tags {get;set;}
}
public class Image : Media
{
public ImageFormat Format {get;set;}
public int ResX {get;set;}
public int ResY {get;set;} // or whatever
}
public class Video : Media
{
public VideoFormat Format {get;set;}
public int Bitrate {get;set;}
}
public class Tag
{
public string Name {get;set;}
public virtual ICollection<Media> Media {get;set;}
}
This forms the basis for all of your MVVM stuff (you're using MVVM with WPF, right?)
Use Entity Framework for your data access (persistence and querying).
With that, you can do something like this to query your items:
public IEnumerable<Media> SearchByTags(List<Tag> tags) {
var q = from m in _context.Media
join mt in _context.MediaTags on m.ID = mt.ID
join t in tags on mt.Name = tag.Name
select m;
return q;
}
That will covert to a relatively optimized database query and get you a list of applicable media based on your tags that you want to search by. Feed this list back to your presentation (MVVM) layer and build your tree from the results.
(this assumes that you have a table of Media, a table of Tags, and a junction/bridge table of MediaTags - I've left many details out and this is very much aircode, but as a general concept, I think it works just fine).

MongoDB: How to define a dynamic entity in my own domain class?

New to MongoDB. Set up a C# web project in VS 2013.
Need to insert data as document into MongoDB. The number of Key-Value pair every time could be different.
For example,
document 1: Id is "1", data is one pair key-value: "order":"shoes"
document 2: Id is "2", data is a 3-pair key-value: "order":"shoes", "package":"big", "country":"Norway"
In this "Getting Started" says because it is so much easier to work with your own domain classes this quick-start will assume that you are going to do that. suggests make our own class like:
public class Entity
{
public ObjectId Id { get; set; }
public string Name { get; set; }
}
then use it like:
var entity = new Entity { Name = "Tom" };
...
entity.Name = "Dick";
collection.Save(entity);
Well, it defeats the idea of no-fixed columns, right?
So, I guess BsonDocument is the the model to use and is there any good samples for beginners?
I'm amazed how often this topic comes up... Essentially, this is more of a 'statically typed language limitation' than a MongoDB issue:
Schemaless doesn't mean you don't have any schema per se, it basically means you don't have to tell the database up front what you're going to store. It's basically "code first" - the code just writes to the database like it would to RAM, with all the flexibility involved.
Of course, the typical application will have some sort of reoccurring data structure, some classes, some object-oriented paradigm in one way or another. That is also true for the indexes: indexes are (usually) 'static' in the sense that you do have to tell mongodb about which field to index up front.
However, there is also the use case where you don't know what to store. If your data is really that unforeseeable, it makes sense to think "code first": what would you do in C#? Would you use the BsonDocument? Probably not. Maybe an embedded Dictionary does the trick, e.g.
public class Product {
public ObjectId Id {get;set;}
public decimal Price {get;set;}
public Dictionary<string, string> Attributes {get;set;}
// ...
}
This solution can also work with multikeys to simulate a large number of indexes to make queries on the attributes reasonably fast (though the lack of static typing makes range queries tricky). See
It really depends on your needs. If you want to have nested objects and static typing, things get a lot more complicated than this. Then again, the consumer of such a data structure (i.e. the frontend or client application) often needs to make assumptions that make it easy to digest this information, so it's often not possible to make this type safe anyway.
Other options include indeed using the BsonDocument, which I find too invasive in the sense that you make your business models depend on the database driver implementation; or using a common base class like ProductAttributes that can be extended by classes such as ProductAttributesShoes, etc. This question really revolves around the whole system design - do you know the properties at compile time? Do you have dropdowns for the property values in your frontend? Where do they come from?
If you want something reusable and flexible, you could simply use a JSON library, serialize the object to string and store that to the database. In any case, the interaction with such objects will be ugly from the C# side because they're not statically typed.

SetFields in Mongo C# driver

I'm using C# mongo driver and I have users collection like below,
public class User
{
public string Name { get; set; }
public DateField Date { get; set; }
/*
* Some more properties
*/
public List<string> Slugs { get; set; } //I just need to return this property
}
I'm writing a query in which it just returns me the slugs property.
To do this i'm trying to use SetFields(...) method from the mongo driver. SetFields returns the cursor of the User type i'm expecting something to be of my Slugs property type so that I don't return whole set of properties when i just need one.
Is it possible ?
Yes and no. You can use the aggregation framework's projection operator $project to change the structure of the data, but I wouldn't do that for two reasons:
MongoDB generally tries to preserve the structure unless you force it to, particularly because it makes it easier to work with statically typed languages (the old object/relational mismatch: SQL queries don't 'answer' in users or blog post, but some wild Chimaera of properties collected from various tables, which might require additional DTOs depending on the query itself, which is all a bit ugly).
Aggregation framework queries are a bit more complicated and a bit slower, and I wouldn't let the urge to do some micro-optimization dictate a lot of unnecessary complexity.
After all, omitting a few fields is a micro-optimization already (setting index covered queries aside), but on the client-side the cost of empty fields should be next to none.

Securing/omitting selected properties of a domain entity in NHibernate (subclass, projection, ?)

Consider the following simplified scenario:
public class Person
{
public string Name { get; set; }
public int Age { get; set; }
// restricted
public string SocialSecurityNumber { get; set; }
// restricted
public string MothersMaidenName { get; set; }
}
So, in the application, many users can view Person data. Some users can view all; other users can view only Name and Age.
Having the UI display only authorized data is easy enough on the client side, but I really don't want to even send that data to the client.
I've tried to achieve this by creating a FullPerson : BasicPerson hierarchy (table-per-class-hierarchy). I used two implementations of a StaffRepository to get the desired type, but the necessary casting fails at runtime because of the NH proxies. Of course in the RDBMS any given row in the People table can represent a FullPerson or a BasicPerson, and giving them both the same discriminator value does not work either.
I considered mapping only FullPerson and using AliasToBean result transformer to filter down to BasicPerson, but I understand this to be a one-way street, whereas I want the full benefit of entity management and lazy loading (though the example above doesn't include collections) in the session.
Another thought I had was to wrap up all the restricted fields into a class and add this as a property. My concerns with this approach are several:
It compromises my domain model,
I'd have to declare the property as a collection (always of 1) in order to have it load lazily, and
I'm not even sure how I'd prevent that lazy collection from loading.
All this feels wrong. Is there a known approach to achieve the desired result?
clarification:
This in an intranet-only desktop application; the session lives on the client. While I can certainly create an intermediate service layer, I would have to give up lazy loading and change tracking, which I'd really like to keep in place.
First, let me say that I do not think it is NHibernate's responsibility to handle security, and data redaction based on same. I think you're overcomplicating this by trying to put it in the data access layer.
I would insert a layer into the service or controller that receives this data request from the client (which shouldn't be the Repository itself) and will perform the data redaction based on user permissions. So, you'd perform the full query to the DB, and then based on user permissions, the service layer would clear out fields of the result set before returning that result set over the service connection. It is not the absolute most performant solution, but is both more secure and more performant than sending all the data to the client and having the client software "censor" it. The Ethernet connection between DB and service layers of the server architecture can handle far more bandwidth than the Internet connection between service layer and client, and in a remote client app you generally have very little control over what the client does with the data; you could be talking to a hacked copy of your software, or a workalike, that doesn't give two flips about user security.
If network bandwidth between service and DB is of high importance, or if a LOT of information is restricted, Linq2NH should be smart enough to let you specify what should or shouldn't be included in a query results using a select list:
if(!user.CanSeeRestrictedFields)
var results = from p as Repository.AsQueryable<Person>()
//rest of Linq statement
select new Person {
Name = p.Name,
Age = p.Age
};
else
var results = from p as Repository.AsQueryable<Person>()
//rest of Linq statement
select new Person {
Name = p.Name,
Age = p.Age,
SocialSecurityNumber = p.SocialSecurityNumber,
MothersMaidenName = p.MothersMaidenName
};
I do not know if Linq2NH is smart enough to parse conditional operators into SQL; I doubt it, but on the off-chance it's possible, you can specify conditional operators in the initializer for the SSN and MMN fields based on whether the user has rights to see them, allowing you to combine these two queries.
I would leave your domain model alone, and instead use Automapper to map to specific DTOs based on the security level of the current user (or whatever criteria you use to determine access to the specific properties). This should be done in some sort of service layer that sits between your UI and your repositories.
Edit:
Based upon your requirements of keeping lazy-loading and change tracking in place, perhaps using the proxy pattern to wrap your domain object is a viable alternative? You could wrap your original domain model in a proxy that performed your security checks on each given property. I believe CSLA.NET uses a method like this for field-level security, so it may be worth browsing the source to get some inspiration. You could perhaps take this one step further and use interfaces implemented by your proxy that only exposed the properties that the user had access to.

Categories

Resources