I'm creating asp.net-mvc application where user is uploading multiple files.
The data will be compared with db data, processed and exported later. Also paging.
When displaying these data, sorting and filtering is importing.
When data is uploaded, some of them will be stored in db, some will be displayed as not found in db, some will be modified and stored ... etc
My question is, what is the best way to store the uploaded data in order to be available to be process or viewed?
Load in memory
Create temp tables for every session? (even don't know if possible)
Different storage which can be queryable (access data using linq) (JSON??)
Another option.
The source files are (csv or excel)
One of the files example
Name Age Street City Country Code VIP
---------------------------------------------------------
Mike 42 AntwSt Leuven Belgium T5Df No
Peter 32 Ut123 Utricht Netherland T666 Yes
Example of class
public class User
{
public string Name { get; set; }
public Address Address { get; set; } // street, city,country
public Info Info { get; set; } // Age, and Cres
}
public class Info
{
public int Age { get; set; }
public Cres Cres { get; set; }
}
public class Cres
{
public string Code { get; set; }
public bool VIP { get; set; }
}
There are a variety of strategies for handling this (I actually just wrote an entire dissertation over the subject), and there are many different considerations you'll need to take under consideration to achieve this.
Depending on the amount of data present, and what you're doing with it, it may be simple enough to simply store information in Session Storage. Now how you actually implement the session store is up to you, and there are pros and cons to how you decide to do that.
I would personally recommend a server side session store to handle everything and there are a variety of different options for how to do that. For example: SSDB and Redis.
Then from there, you'll need a way of communicating to clients what has actually happened with their data. If multiple clients need to access the same data set and a single user uploads a change, how will you alert every user of this change? Again, there are a lot of options, you can use a Pub/Sub Framework to alert all listening clients. You could also tap into Microsoft's SignalR framework to attempt to handle this.
There's a lot of different If's, But's, Maybe's, etc to the question, and unfortunately I don't believe there is any one perfect solution to you problem without knowing exactly what you're trying to achieve.
If the data size is small and you just need them to exist temporarily, feel free to go with storing them in memory and thus cut all the overhead your would have with other solutions.
You just need to be sure to consider that the data in memory will be gone if the server or the app is switched off for whatever reason.
It might also be a good idea to consider, what happens if the same user performs the operation for the second time, while the operation on the first data is not completed yet. If this can happen to you (it usually does), make sure to use good synchronization mechanisms to prevent race conditions.
Related
RE: CRUD operations... Is it pulling more data than is needed a bad thing?
Let me preface this with saying I really did search for this answer. On and off for some time now. I'm certain it's been asked/answered before but I can't seem to find it. Most articles seem to be geared towards how to perform basic CRUD operations. I'm really wanting to get deeper into best practices. Having said that, here's an example model I mocked up for example purposes.
public class Book
{
public long Id { get; set; }
public string Name { get; set; }
public decimal AverageRating { get; set; }
public decimal ArPoints { get; set; }
public decimal BookLevel { get; set; }
public string Isbn { get; set; }
public DateTime CreatedAt { get; set; }
public DateTime PublishedAt { get; set; }
public Author Author { get; set; }
public IEnumerable<Genre> Genres { get; set; }
}
I'm using ServiceStack's OrmLite, migrating string queries to object model binding wherever possible. It's a C# MVC.NET project, using Controller/Service/Repository layers with DI. My biggest problem is with Read and Update operations. Take Reads for example. Here are two methods (only wrote what I thought was germane) for example purposes.
public class BookRepository
{
public Book Single(long id)
{
return _db.SelectById<Book>(id);
}
public IEnumerable<Book> List()
{
return _db.Select<Book>();
}
}
Regardless of how this would need to change for the real world, the problem is simply that to much information is returned. Say if I were displaying a list of books to the user. Even if the List method were written so that it didn't pull nested methods (Author & Genres), it would have data for properties that were not used.
It seems like I could either learn to live with getting data I don't need or write a bunch of extra methods that changes what properties are pulled. Using the Single method, here's a few examples...
public Book SinglePublic(long id): Returns a few properties
public Book SingleSubscribed(long id): Returns most properties
public Book SingleAdmin(long id): Returns all properties
Having to write out methods like this for most tables doesn't seem very maintainable to me. But then, almost always getting unused information on most calls has to affect performance, right? I have to be missing something. Any help would be GREATLY appreciated. Feel free to just share a link, give me a PluralSight video to watch, recommend a book, whatever. I'm open to anything. Thank you.
As a general rule you should avoid pre-mature optimization and always start with the simplest & most productive solution first as avoiding complexity & large code-base sizes should be your first priority.
If you're only fetching a single row, you should definitely start by only using a single API and fetch the full Book entity, I'll personally also avoid the Repository abstraction which I view as an additional unnecessary abstraction, so I'd just be using OrmLite APIs directly in your Controller or Service, e.g:
Book book = db.SingleById<Book>(id);
You're definitely not going to notice the additional unused fields over the I/O cost of the RDBMS network call and the latency & bandwidth between your App and your RDBMS is much greater than additional info on the wire over the Internet. Having multiple APIs for the sake of reducing unused fields adds unnecessary complexity, increases code-base size / technical debt, reduces reusability, cacheability & refactorability of your code.
Times when to consider multiple DB calls for a single entity:
You've received feedback & given a task to improve the performance of a page/service
Your entity contains large blobbed text or binary fields like images
The first speaks to avoiding pre-mature optimization by first focusing on simplicity & productivity before optimizing to resolve known realizable performance issues. In that case first profile the code, then if it shows the issue is with the DB query you can optimize for only returning the data that's necessary for that API/page.
To improve performance I'd typically first evaluate whether caching is viable as it's typically the least effort / max value solution where you can easily cache APIs with a [CacheResponse] attribute which will cache the optimal API output for the specified duration or you can take advantage of caching primitives in HTTP to avoid needing to return any non-modified resources over the wire.
To avoid the second issue of having different queries without large blobbed data, I would extract it out into a different 1:1 row & only retrieve it when it's needed as large row sizes hurts overall performance in accessing that table.
Custom Results for Summary Data
So it's very rare that I'd have different APIs for accessing different fields of a single entity (more likely due to additional joins) but for returning multiple results of the same entity I would have a different optimized view with just the data required. This existing answer shows some ways to retrieve custom resultsets with OrmLite (See also Dynamic Result Sets in OrmLite docs).
I'll generally prefer to use a custom Typed POCO with just the fields I want the RDBMS to return, e.g. in a summary BookResult Entity:
var q = db.From<Book>()
.Where(x => ...);
var results = db.Select<BookResult>(q);
This is all relative to the task at hand, e.g. the fewer results returned or fewer concurrent users accessing the Page/API the less likely you should be to use multiple optimized queries whereas for public APIs with 1000's of concurrent users of frequently accessed features I'd definitely be looking to profiling frequently & optimizing every query. Although these cases would typically be made clear from stakeholders who'd maintain "performance is a feature" as a primary objective & allocate time & resources accordingly.
I can't speak to ORM Lite, but for Entity Framework the ORM will look ahead, and only return columns that are necessary to fulfill subsequent execution. If you couple this with view models, you are in a pretty good spot. So, for example, lets say you have a grid to display the titles of your books. You only need a subset of columns from the database to do so. You could create a view model like this:
public class BookListViewItem{
public int Id {get;set;}
public string Title {get; set;}
public BookListView(Book book){
Id = book.Id;
Title = book.Title;
}
}
And then, when you need it, fill it like this:
var viewModel = dbcontext.Books
.Where(i => i.whateverFilter)
.Select(i => new BookListViewItem(i))
.ToList();
That should limit the generated SQL to only request the id and title columns.
In Entity Framework, this is called 'projection'. See:
https://social.technet.microsoft.com/wiki/contents/articles/53881.entity-framework-core-3-projections.aspx
I'm having a problem to find a standard, how such an update would look like. I have this model (simplified). Bear in mind that Team is allowed without any player and Team can have up to 500 players:
public class Team
{
public int TeamId { get; set; }
public string Name { get; set; }
public string City { get; set; }
public List<Player> Players { get; set; }
}
public class Player
{
public int PlayerId { get; set; }
public string Name { get; set; }
public int Age { get; set; }
}
and this endpoints:
Partial Team Update (without players): [PATCH] /api/teams/{teamId}. Offers me options to update particular fields of the team, but no players.
Update Team (with players): [PUT] /api/teams/{teamId}. In payload data I pass json with entire Team object, including collection of players.
Update Player alone: [PUT] /api/teams/{teamId}/players/{playerId}
I started wondering if I need endpoint #2 at all. The only advantage of endpoint #2 is that I can update many players in one request. I can delete or add many players at once, as well. So I started looking for any standard, how such a popular scenario is being handled in the real world?
I have two options:
Keep endpoint #2 to be able to update/add/remove many child records at the same time.
Remove endpoint #2. Allow to change Team only via PATCH without ability to manipulate Player collection. Player collection can be changed only by endpoints:
[POST] /api/teams/{teamId}/players
[PUT] /api/teams/{teamId}/players/{playerId}
[DELETE] /api/teams/{teamId}/players/{playerId}
Which option is a better practice? Is there a standard how to handle Entity with Collection situation?
Thanks.
This one here https://softwareengineering.stackexchange.com/questions/232130/what-is-the-best-pattern-for-adding-an-existing-item-to-a-collection-in-rest-api could really help you.
In essence it says that POST is the real append verb. If you are not really updating the player resource as a whole, then you are appending just another player to the list.
The main argument with which I agree, is that the PUT verb requires the entire representation of what you are updating.
The patch on the other hand, I would use to update a bunch of resources at the same time.
There is no really wrong or right way to do it. It depends on how you view the domain at the end of the day.
You can have bulk operations and I would certainly use POST with that. There are some things to consider though.
How to handle partial success. Would one fail the others? If not, what is your response?
How will you send back the new resources url? The new resources should be easily discoverable.
Apart from some design considerations, if you are taking about multiple inserts, you'd better do it in bulk. If it's a couple at a time, save yourself and the people who will consume it some time and go with one by one.
So, I've got an aggregate( Project ) that has a collection of entities (ProjectVariables) in it. The variables do not have Ids on them because they have no identity outside of the Project Aggregate Root.
public class Project
{
public Guid Id { get; set; }
public string Name { get; set; }
public List<ProjectVariable> ProjectVariables { get; set; }
}
public class ProjectVariable
{
public string Key { get; set; }
public string Value { get; set; }
public List<string> Scopes { get; set; }
}
The user interface for the project is an Angular web app. A user visits the details for the project, and can add/remove/edit the project variables. He can change the name. No changes persist to the database until the user clicks save and the web app posts some json to the backend, which in turns passes it down to the domain.
In accordance to DDD, it's proper practice to have small, succinct methods on the Aggregate roots that make atomic changes to them. Examples in this domain could be a method Project.AddProjectVariable(projectVariable).
In order to keep this practice, that means that the front end app needs to track changes and submit them something like this:
public class SaveProjectCommand
{
public string NewName { get; set; }
public List<ProjectVariable> AddedProjectVariables { get; set; }
public List<ProjectVariable> RemovedProjectVariables { get; set; }
public List<ProjectVariable> EditedProjectVariables { get; set; }
}
I suppose it's also possible to post the now edited Project, retrieve the original Project from the repo, and diff them, but that seems a little ridiculous.
This object would get translated into Service Layer methods, which would call methods on the Aggregate root to accomplish the intended behaviors.
So, here's where my questions come...
ProjectVariables have no Id. They are transient objects. If I need to remove them, as passed in from the UI tracking changes, how do identify the ones that need to be removed on the Aggregate? Again, they have no identification. I could add surrogate Ids to the ProjectVariables entity, but that seems wrong and dirty.
Does change tracking in my UI seem like it's making the UI do too much?
Are there alternatives mechanisms? One thought was to just replace all of the ProjectVariables in the Project Aggregate Root every time it's saved. Wouldn't that have me adding a Project.ClearVariables() and the using Project.AddProjectVariable() to the replace them? Project.ReplaceProjectVariables(List) seems to be very "CRUDish"
Am I missing something a key component? It seems to me that DDD atomic methods don't mesh well with a pattern where you can make a number of different changes to an entity before committing it.
In accordance to DDD, it's proper practice to have small, succinct
methods on the Aggregate roots that make atomic changes to them.
I wouldn't phrase it that way. The methods should, as much as possible, reflect cohesive operations that have a domain meaning and correspond with a verb or noun in the ubiquitous language. But the state transitions that happen as a consequence are not necessarily small, they can change vast swaths of Aggregate data.
I agree that it is not always feasible though. Sometimes, you'll just want to change some entities field by field. If it happens too much, maybe it's time to consider changing from a rich domain model approach to a CRUD one.
ProjectVariables have no Id. They are transient objects.
So they are probably Value Objects instead of Entities.
You usually don't modify Value Objects but replace them (especially if they're immutable). Project.ReplaceProjectVariables(List) or some equivalent is probably your best option here. I don't see it as being too CRUDish. Pure CRUD here would mean that you only have a setter on the Variables property and not even allowed to create a method and name it as you want.
My title is probably terrible because I'm having trouble wording what I am trying to do.
I have an object that can potentially contain a huge number of records that looks something like this:
public class AssignmentGenerator : BaseGenerator
{
public bool IsLibrary { get; set; } = false;
public List<LineItem> LineItems { get; set; } = new List<LineItem>();
}
public class LineItem
{
public string Name { get; set; }
public string Value { get; set; }
}
I have a form created that allows editing of the values of the object, but it is possible for the list of line items to become very large (one example I have is ~ 3000). This being the case, I would like to make the line item list be a paged list in my view allowing editing of say 10 to 50 items at a time.
I've read a lot of tutorials and posts about how to do paging but none of the ones I've found go into how to do editing of a large set of data. I don't want to save the changes on each page to the database until the user actually clicks on the Save button. Is there a way to store the values in the object, retrieve them as needed, and then save upon user action?
The short answer is yes, there's a way - you're the programmer, you can do what you want. It's hard to give real code examples without more details, so below is just vague guidance.
You have to store their changes somewhere, but you can choose to save them in a staging database, or keep your AssignmentGenerator in memory on the server and just update the collection when they page (assuming 1 server or pinned sessions).
You will have to post the current state of the objects as the user changes pages (instead of just a Get endpoint). You don't have to save to the real database; you just update your temporary copy. The Save button should trigger a different controller action which moves your temporary copy to the real data store.
I have been using ServiceStack.Redis for a couple of days and the last puzzle in my app is searching cache.
I have a simple object
public class Person
{
public int Id { get; set; }
public string Name { get; set; }
public string Surname { get; set; }
public int Age { get; set; }
public string Profession { get; set; }
}
e.g. I want to return all persons which Name is Joe and they are older than 10 years
What is better speed wise?
To run query against database which will return a list of ids and than to get matched records via Redis .GetByIds function.
or
As RedisClient doesn't have native Linq support (doesn't have AsEnumerable only iList) to run GetAll() and than to preform further filtering.
Does anyone have experience with this ?
I've been struggling with same problem, my option was to save a "light " set of data that represents the attributes I need to identify a whole register o attributes I need to filter the whole bunch of data, then go to the database for the rest if necessary.
I just started to use redis and I know this probably is not the best option but its better than go to the database each time even for filtering information.
Hope to know if you found a better solution :)
I think Redis is not a good candidate for such queries, it doesn't have indexes, so you might end up building your own in order to meet speed requirements. Which is not a good idea at all. So I would go with a SQL db which can help me on such queries, or even more complex ones on the Person type.
And then you can use the Redis cache only to store the query results in it so you can easy move through them for things like paging or sorting.
At least this is how we do it in our apps.