Handle a lot of data in ASP-NET MVC

Handle a lot of data in ASP-NET MVC - c#

I'm starting to program with ASP-NET MVC an application with Angular for the front-end and SQL Sever to the database. In some cases, I have complex query than I have to use and I cannot modify because of a restriction business. I am using a structure similar to this one: using simple queries in ASP.NET MVC but I don´t know which is the correct way to handle a lot of data and show in the front-end.
I have a ViewModel with the data structure of the results query, a DomainModel where the query is located and the Controller to communicate with the front-end.
My problem is that I don´t know which would be the way to develop what I am trying. Now I´m trying to create as many objects in a list object as rows in my query, but when this method is running my computer gets blocked with no error showed (I can guess it is because it is using the whole memory).
Note that the table in the front has to show only 25 results per page, maybe I can execute the query always when the user choose a diferente page of the table, getting a different lots of results. I didn´t try this choice yet.
This is part of the DomainModel:
public IEnumerable<OperationView> GetOperations()
{
List<OperationView> Operationslist = new List<OperationView>();
using (SqlConnection connection = new SqlConnection(connectionString))
using (SqlCommand command = new SqlCommand("", connection))
{
command.CommandText = /*Query joining 8-10 tables*/;
connection.Open();
SqlDataReader reader = command.ExecuteReader();
while (reader.Read())
{
var OperationView = new OperationView();
OperationView.IdOperacion = reader["ID_OPERACION"].ToString();
//Loading here some other variables of OperationView
Operationslist.Add(OperationView);
}
connection.Close();
}
return Operationslist;
}
This is part of the Controller:
public IEnumerable<OperationView> GetOperaciones()
{
var Operation = new OperationPDomainModel();
return Operation.GetOperations();
}
I think that my front and ViewModel are not importants for this problem, but I can include them if needed.
Currently, if I try to execute the computer shuts down unexpectely...

As your system is going out of memory, you need to have pagination.
This paging is should be done in the database side. UI just need to pass the page index and number of records displayed per page.
So your query should be something as below
Select a,b,c, ROW_NUMBER() OVER(ORDER BY a) rnum from foo
where rnum between (25 * Page_Index) + 1 and (25 * Page_Index) + 25

There are a few improvements you could make.
Make the call async
The operation would hang as it blocks the main thread. If possible try this operation async. Use task-based programming to run the operation on a different thread. That should make things a little better not improve that significantly.
Use pagination
Get only the number of records that you need to display on the page. This should be the best improvement based on the code you have. It would also be better to have some more filters if possible. But getting only 25 records if you need only 25 should be the way to go.
It would also help if you could use modern programming techniques like EF and LINQ instead of traditional ADO.Net
Use Ajax
Such large processing should be done using AJAX calls. If you do not want the user to wait for the data to be loaded, you can load the page and make the data retrieval a part of a separate AJAX call.

check this View Millions Of Records
https://www.c-sharpcorner.com/article/how-to-scroll-and-view-millions-of-records/

Related

Trying to sync data from third party api

This question has probably been asked correctly before, and I'll gladly accept an answer pointing me to the right spot. The problem is I don't know how to ask the question correctly to get anything returned in a search.
I'm trying to pull data from a 3rd party api (ADP) and store data in my database using asp.net core.
I am wanting to take the users returned from the API and store them in my database, where I have an ADP ancillary table seeded with the majority of the data from the api.
I would then like to update or add any missing or altered records in my database FROM the API.
I'm thinking that about using an ajax call to the api to retrieve the records, then either storing the data to another table and using sql to look for records that are changed between the two tables and making any necessary changes(this would be manually activated via a button), or some kind of scheduled background task to perform this through methods in my c# code instead of ajax.
The question I have is:
Is it a better fit to do this as a stored procedure in sql or rather have a method in my web app perform the data transformation.
I'm looking for any examples of iterating through the returned data and updating/creating records in my database.
I've only seen vague not quite what I'm looking for examples and nothing definitive on the best way to accomplish this. If I can find any reference material or examples, I'll gladly research but I don't even know where to start, or the correct terms to search for. I've looked into model binding, ajax calls, json serialization & deserialization. I'm probably overthinking this.
Any suggestions or tech I should look at would be appreciated. Thanks for you time in advance.
My app is written in asp.net core 2.2 using EF Core
* EDIT *
For anyone looking - https://learn.microsoft.com/en-us/dotnet/csharp/tutorials/console-webapiclient
This with John Wu's Answer helped me achieve what I was looking for.

If this were my project this is how I would break down the tasks, in this order.
First, start an empty console application.
Next, write a method that gets the list of users from the API. You didn't tell us anything at all about the API, so here is a dummy example that uses an HTTP client.
public async Task<List<User>> GetUsers()
{
var client = new HttpClient();
var response = await client.GetAsync("https://SomeApi.com/Users");
var users = await ParseResponse(response);
return users.ToList();
}
Test the above (e.g. write a little shoestring code to run it and dump the results, or something) to ensure that it works independently. You want to make sure it is solid before moving on.
Next, create a temporary table (or tables) that matches the schema of the data objects that are returned from the API. For now you will just want to store it exactly the way you retrieve it.
Next, write some code to insert records into the table(s). Again, test this independently, and review the data in the table to make sure it all worked correctly. It might look a little like this:
public async Task InsertUser(User user)
{
using (var conn = new SqlConnection(Configuration.ConnectionString))
{
var cmd = new SqlCommand();
//etc.
await cmd.ExecuteNonQueryAsync();
}
}
Once you know how to pull the data and store it, you can finish the code to extract the data from the API and insert it. It might look a little like this:
public async Task DoTheMigration()
{
var users = await GetUsers();
var tasks = users.Select
(
u => InsertUser(u)
);
await Task.WhenAll(tasks.ToArray());
}
As a final step, write a series of stored procedures or a DTS package to move the data from the temp tables to their final resting place. If you are using MS Access, you can write a series of queries and execute them in order with some VBA. At a high level it would:
Check for any records that exist in the temp table but not in the final table and insert them into the final table.
Check for any records that exist in the final table but not the temp table and remove them or mark them as deleted.
Check for any records in common that have different column values and update the final table.
Each of these development activities raises it own set of questions, of course, which you can post back to StackOverflow with details. As it is your question doesn't have enough specificity for a more in-depth answer.

Large query how to return results back in sections C#/ASP.NET MVC 4

I have an application written in ASP.NET MVC 4. I have a requirement to return large results from a table accessed with oledbdatareader.
I am using AJAX to return a JsonResult object that contains a List: List<TableRow>
What I do not understand is, if I am in the DataReader loop
using (OleDbDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
names.Add(Convert.ToString(reader[0]));
}
}
Is there a way to periodically send the list object, and then create a new object to pickup and continue on?

Technically the server can only return a single response to every request, so there is not way to do what you want (short of setting up some sort of crazy socket stuff).
I'd flip what you're doing on its head and have your javascript request chunks of the dataset in batches of 1000 (or whatever size), and have it start rendering while requesting the next chunk.
Better yet, you could implement some form of infinite scrolling in your UI so that the next chunk is only requested just in time for it to be displayed, that way you're not sending unneeded data to the client.

I think you have a few options that are rather common to implement. If you have 10,000 records that you need to give back to the client, you can manage this in your MVC application. If you are using Entity Framework and Linq, you can write your business logic to just send back 100 rows per each time the user clicks the next button. This will keep the transmission to the client small and even keep the call from the web server to the SQL server small.
If you don't want to have the user click a next button (i.e. paging) but want do to an infinite scroll style, then just do the same thing, as the user keeps scrolling, just keep calling the Ajax method to send back each 100 rows at a time.
The web server and database server isn't going to choke on 10,000 records; it will be choking going down to the client. Even if you open a socket in Signal R, I think you should ask yourself do I really need to push 10,000 rows all at once to the client?
Think about the Twitter app on a mobile phone, they are sending the data to you as you scroll, they are not sending it all at once. Does that help any?
Updated based on your comment regarding its straight SQL
Here is an example of doing a simple version of paging in SQL:
DECLARE #intStartRow int;
DECLARE #intEndRow int;
SET #intStartRow = (#intPage -1) * #intPageSize + 1;
SET #intEndRow = #intPage * #intPageSize;
WITH blogs AS
(SELECT strBlogName,
ROW_NUMBER() OVER(ORDER BY intID DESC) as intRow,
COUNT(intID) OVER() AS intTotalHits
FROM tblBlog)
SELECT strBlogName, intTotalHits FROM blogs
WHERE intRow BETWEEN #intStartRow AND #intEndRow
Source: http://joelabrahamsson.com/my-favorite-way-to-do-paging-with-t-sql/

Fastest Get data from remote server

I'm creating a windows application in which I need to get data using ado.net/(Or any other way using C# if any ). From one table. The database table apparently has around 100000 records and it takes forever to download.
Is there any faster way where I could get data into faster?
I tried the DataReader but still isn't fast enough.

The data-reader API is about the most direct you can do. The important thing is where is the time?
is it bandwidth in transferring the data?
or is it in the fundamental query?
You can find out by running the query locally on the machine, and see how long it takes. If bandwidth is your limit, then all you can really try is removing columns you don't actually need (don't do select *). Or pay for a fatter pipe between you and the server. In some cases, querying the data locally, and returning it in some compressed form might help - but then you're really talking about something like a web-service, which has other bandwidth considerations.
More likely, though, the problem is the query itself. Often, things like:
writing sensible tsql
adding an appropriate index
avoid cursors, complex processing, etc

You might want to implement a need to know basis method. Only pull down the first chunk of data that is needed and then when the next set is needed, pull those rows.

It's probably your query that is so slow not the streaming process. You should show us your sql query, then we could help you to improve it.
Assuming you want to get all 100000 records from you table, you could use a SqlDataAdapter to fill a DataTable or a SqlDataReader to fill a List<YourCustomClass>:
the DataTable approach (since i don't know your fields it's difficult to show a class):
var table = new DataTable();
const string sql = "SELECT * FROM dbo.YourTable ORDER BY SomeColumn";
using(var con = new SqlConnection(Properties.Settings.Default.ConnectionString))
using(var da = new SqlDataAdapter(sql, con))
{
da.Fill(table);
}

"cursor like" reading inside a CLR procedure/function

I have to implement an algorithm on data which is (for good reasons) stored inside SQL server. The algorithm does not fit SQL very well, so I would like to implement it as a CLR function or procedure. Here's what I want to do:
Execute several queries (usually 20-50, but up to 100-200) which all have the form select a,b,... from some_table order by xyz. There's an index which fits that query, so the result should be available more or less without any calculation.
Consume the results step by step. The exact stepping depends on the results, so it's not exactly predictable.
Aggregate some result by stepping over the results. I will only consume the first parts of the results, but cannot predict how much I will need. The stop criteria depends on some threshold inside the algorithm.
My idea was to open several SqlDataReader, but I have two problems with that solution:
You can have only one SqlDataReader per connection and inside a CLR method I have only one connection - as far as I understand.
I don't know how to tell SqlDataReader how to read data in chunks. I could not find documentation how SqlDataReader is supposed to behave. As far as I understand, it's preparing the whole result set and would load the whole result into memory. Even if I would consume only a small part of it.
Any hint how to solve that as a CLR method? Or is there a more low level interface to SQL server which is more suitable for my problem?
Update: I should have made two points more explicit:
I'm talking about big data sets, so a query might result in 1 mio records, but my algorithm would consume only the first 100-200 ones. But as I said before: I don't know the exact number beforehand.
I'm aware that SQL might not be the best choice for that kind of algorithm. But due to other constraints it has to be a SQL server. So I'm looking for the best possible solution.

SqlDataReader does not read the whole dataset, you are confusing it with the Dataset class. It reads row by row, as the .Read() method is being called. If a client does not consume the resultset the server will suspend the query execution because it has no room to write the output into (the selected rows). Execution will resume as the client consumes more rows (SqlDataReader.Read is being called). There is even a special command behavior flag SequentialAccess that instructs the ADO.Net not to pre-load in memory the entire row, useful for accessing large BLOB columns in a streaming fashion (see Download and Upload images from SQL Server via ASP.Net MVC for a practical example).
You can have multiple active result sets (SqlDataReader) active on a single connection when MARS is active. However, MARS is incompatible with SQLCLR context connections.
So you can create a CLR streaming TVF to do some of what you need in CLR, but only if you have one single SQL query source. Multiple queries it would require you to abandon the context connection and use isntead a fully fledged connection, ie. connect back to the same instance in a loopback, and this would allow MARS and thus consume multiple resultsets. But loopback has its own issues as it breaks the transaction boundaries you have from context connection. Specifically with a loopback connection your TVF won't be able to read the changes made by the same transaction that called the TVF, because is a different transaction on a different connection.

SQL is designed to work against huge data sets, and is extremely powerful. With set based logic it's often unnecessary to iterate over the data to perform operations, and there are a number of built-in ways to do this within SQL itself.
1) write set based logic to update the data without cursors
2) use deterministic User Defined Functions with set based logic (you can do this with the SqlFunction attribute in CLR code). Non-Deterministic will have the affect of turning the query into a cursor internally, it means the value output is not always the same given the same input.
[SqlFunction(IsDeterministic = true, IsPrecise = true)]
public static int algorithm(int value1, int value2)
{
int value3 = ... ;
return value3;
}
3) use cursors as a last resort. This is a powerful way to execute logic per row on the database but has a performance impact. It appears from this article CLR can out perform SQL cursors (thanks Martin).
I saw your comment that the complexity of using set based logic was too much. Can you provide an example? There are many SQL ways to solve complex problems - CTE, Views, partitioning etc.
Of course you may well be right in your approach, and I don't know what you are trying to do, but my gut says leverage the tools of SQL. Spawning multiple readers isn't the right way to approach the database implementation. It may well be that you need multiple threads calling into a SP to run concurrent processing, but don't do this inside the CLR.
To answer your question, with CLR implementations (and IDataReader) you don't really need to page results in chunks because you are not loading data into memory or transporting data over the network. IDataReader gives you access to the data stream row-by-row. By the sounds it your algorithm determines the amount of records that need updating, so when this happens simply stop calling Read() and end at that point.
SqlMetaData[] columns = new SqlMetaData[3];
columns[0] = new SqlMetaData("Value1", SqlDbType.Int);
columns[1] = new SqlMetaData("Value2", SqlDbType.Int);
columns[2] = new SqlMetaData("Value3", SqlDbType.Int);
SqlDataRecord record = new SqlDataRecord(columns);
SqlContext.Pipe.SendResultsStart(record);
SqlDataReader reader = comm.ExecuteReader();
bool flag = true;
while (reader.Read() && flag)
{
int value1 = Convert.ToInt32(reader[0]);
int value2 = Convert.ToInt32(reader[1]);
// some algorithm
int newValue = ...;
reader.SetInt32(3, newValue);
SqlContext.Pipe.SendResultsRow(record);
// keep going?
flag = newValue < 100;
}

Cursors are a SQL only function. If you wanted to read chunks of data at a time, some sort of paging would be required so that only a certain amount of the records would be returned. If using Linq,
.Skip(Skip)
.Take(PageSize)
Skips and takes could be used to limit results returned.
You can simply iterate over the DataReader by doing something like this:
using (IDataReader reader = Command.ExecuteReader())
{
while (reader.Read())
{
//Do something with this record
}
}
This would be iterating over the results one at a time, similiar to a cursor in SQL Server.
For multiple recordsets at once, try MARS
(if SQL Server)
http://msdn.microsoft.com/en-us/library/ms131686.aspx

Execute multiple SQL commands in one round trip

I am building an application and I want to batch multiple queries into a single round-trip to the database. For example, lets say a single page needs to display a list of users, a list of groups and a list of permissions.
So I have stored procs (or just simple sql commands like "select * from Users"), and I want to execute three of them. However, to populate this one page I have to make 3 round trips.
Now I could write a single stored proc ("getUsersTeamsAndPermissions") or execute a single SQL command "select * from Users;exec getTeams;select * from Permissions".
But I was wondering if there was a better way to specify to do 3 operations in a single round trip. Benefits include being easier to unit test, and allowing the database engine to parrallelize the queries.
I'm using C# 3.5 and SQL Server 2008.

Something like this. The example is probably not very good as it doesn't properly dispose objects but you get the idea. Here's a cleaned up version:
using (var connection = new SqlConnection(ConnectionString))
using (var command = connection.CreateCommand())
{
connection.Open();
command.CommandText = "select id from test1; select id from test2";
using (var reader = command.ExecuteReader())
{
do
{
while (reader.Read())
{
Console.WriteLine(reader.GetInt32(0));
}
Console.WriteLine("--next command--");
} while (reader.NextResult());
}
}

The single multi-part command and the stored procedure options that you mention are the two options. You can't do them in such a way that they are "parallelized" on the db. However, both of those options does result in a single round trip, so you're good there. There's no way to send them more efficiently. In sql server 2005 onwards, a multi-part command that is fully parameterized is very efficient.
Edit: adding information on why cram into a single call.
Although you don't want to care too much about reducing calls, there can be legitimate reasons for this.
I once was limited to a crummy ODBC driver against a mainframe, and there was a 1.2 second overhead on each call! I'm serious. There were times when I crammed a little extra into my db calls. Not pretty.
You also might find yourself in a situation where you have to configure your sql queries somewhere, and you can't just make 3 calls: it has to be one. It shouldn't be that way, bad design, but it is. You do what you gotta do!
Sometimes of course it can be very good to encapsulate multiple steps in a stored procedure. Usually not for saving round trips though, but for tighter transactions, getting ID for new records, constraining for permissions, providing encapsulation, blah blah blah.

Making one round-trip vs three will be more eficient indeed. The question is wether it is worth the trouble. The entire ADO.Net and C# 3.5 toolset and framework opposes what you try to do. TableAdapters, Linq2SQL, EF, all these like to deal with simple one-call==one-resultset semantics. So you may loose some serious productivity by trying to beat the Framework into submission.
I would say that unless you have some serious measurements showing that you need to reduce the number of roundtrips, abstain. If you do end up requiring this, then use a stored procedure to at least give an API kind of semantics.
But if your query really is what you posted (ie. select all users, all teams and all permissions) then you obviosuly have much bigger fish to fry before reducing the round-trips... reduce the resultsets first.

I this this link might be helpful.
Consider using at least the same connection-openning; according to what it says here, openning a connection is almost the top-leader of performance cost in Entity-Framework.

Firstly, 3 round trips isn't really a big deal. If you were talking about 300 round trips then that would be another matter, but for just 3 round trips I would conderer this to definitley be a case of premature optimisation.
That said, the way I'd do this would probably be to executed the 3 stored procuedres using SQL:
exec dbo.p_myproc_1 #param_1 = #in_param_1, #param_2 = #in_param_2
exec dbo.p_myproc_2
exec dbo.p_myproc_3
You can then iterate through the returned results sets as you would if you directly executed multiple rowsets.

Build a temp-table? Insert all results into the temp table and then select * from #temp-table
as in,
#temptable=....
select #temptable.field=mytable.field from mytable
select #temptable.field2=mytable2.field2 from mytable2
etc... Only one trip to the database, though I'm not sure it is actually more efficient.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.