Is it acceptable to cache huge amount of data in .Net? [closed] - c#

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm designing an accounting application with more than 400 tables in SQL Server.
About 10% of those tables are operational tables, and the others are used for decoding and reference information.
For example, Invoice tables (Master and details) use about 10 table to decode information like buyer, item , marketer and ... .
I want to know is it acceptable to cache decode tables in asp.net cache and do not query them from SQL Server (I know that changes to cache items should commit on SQL Server too). And use cache items for decoding?
I think it makes it so much faster than regular applications.
Maybe all of them together(Cache tables) are about 500 MB after some years because they don't change frequently.

If you've got the RAM then it's fine to use 500 MB.
However unless you have a performance problem now then caching will only cause problems. Don't fix problems that you haven't encountered, design for performance and optimize only when you have problems - because otherwise the optimization can cause more problems that it solves.
So I would advise that usually it is better to ensure that your queries are optimized and well structured, you have the correct indexes on the tables and that you issue a minimum amount of queries.
Although 500MB isn't a lot of data to cache, with all due respect, usually SQL Server will do a better job of caching than you can - providing that you use it correctly.
Using a cache will always improve performance; at a cost of higher implementation complexity.
For static data that never changes a cache is useful; but it still needs to be loaded and shared between threads which in itself can present challenges.
For data that rarely changes it becomes much more complex simply because it could have changed. If a single application (process) is the only updater of a cache then it isn't as difficult, but still not a simple task.
I have spent months optimizing a offline batch processing system (where the code has complete control of the database for a period of 12 hours). Part of the optimisation is to use various caches and data reprojections. All of the caches are readonly. Memory usage is around the 10gb mark during execution, database is around 170gb, 60 million records.
Even with the caching there has been considerable changes to the underlying schema to improve efficiency. The readonly caches are to eliminate reading during processing; to allow multi threaded processing and to improve the insert performance.
Processing rate has gone from 6 items processed per second 20 months ago to around 6000 items per second (yesterday) - but there is a genuine need for this optimization as the number of items to process has risen from 100,000 to 8 million in the same period.
If you don't have a need then don't optimize.

Related

Parallel processing in server applications [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Since on a server side application the work is done by the server, and since it also needs to serve other request, I would like to know if there are any real benefits of using Parallel processing in server side applications? The way I see it, I think it is usually bad to use parallel-processing? focusing the CPU power only on part of the problem, other requests cannot get server?
If there are advantages, I guess they should be considered only when specific conditions are meat. So, what are some good guidelines of when to use the Parallel class in server applications ?
You are balancing two concerns: Fast response for a given user, and supporting all users that wish to connect to the server in a given time period.
Before considering parallelism for faster computation for a given user, consider whether precomputation and caching allow you to meet your performance requirements. Perform hotspot analysis and see if there are opportunities to optimize existing code.
If your deployment hardware is a given, observe the CPU load during peak times. If the CPU is busy (rule of thumb 70%+ utilization), parallel computing will be detrimental to both concerns. If the CPU isn't heavily loaded, you might improve response time for a given user without affecting the number of users the server can handle at once (benchmark to be sure).
If you aren't meeting your single-user performance targets and have exhausted options to precalculate and cache (and have analyzed performance hotspots and don't see opportunities to optimize), you can always parallelize workloads that lend themselves to parallel computation if you're willing to upgrade your server(s) as needed so that during peak periods you don't over-tax the CPU.
As with most performance-related questions: it depends on a lot of factors. Things like:
do you tend to have a lot of requests hitting your server at the same time?
how much of a typical request's turnaround time is spent waiting on I/O, as opposed to actually exercising the CPU?
are you able to have multiple instances of your server code sitting behind a load balancer?
how well does the operation you're looking at get optimized by parallelizing?
how important is it for the operation you're performing to return an answer to the user faster than it would without parallelism?
In my experience, most of the time for typical request is spent waiting for things like database queries and REST API calls to complete, or loading files from a disk. These are not CPU-intensive operations, and inasmuch as they can be made concurrent that can usually be done by simply orchestrating async Tasks in a concurrent manner, not necessarily using parallel threads.
Also in my experience, most attempts to use the TPL to improve performance of an algorithm end up yielding only marginal performance improvements, whereas other approaches (like using more appropriate data structures, caching, etc.) often yield improvements of orders of magnitude.
And of course if your application isn't going too slow for your needs in the first place then any optimization would count as premature optimization, which you want to avoid.
But if you for some reason find yourself doing a CPU-intensive operation that responds well to parallelism, in a part of your code that absolutely must perform faster than it currently does, then parallel processing is a good choice.

C# Implement cache for WPF [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to implement cache for my WPF application.
My application is holding over Sorry, I read of the wrong count. There is 2328681 items, and what I want to do is to cache all these itemsinto a file saved on the computer or something, which should release the workload of retrieving data from the database on the next runtime.
I'm going to have a function which check the latest DBUpdateTime, which compare if the DBUpdateTime in cache differs from the one in SQL, then retrieve the newest update.
Does someone know how I can achieve this? With what kind of library do you suggest my to use in order to achieve the cache?
I'm going to show active items, but I also want to show inactive items, should I save all itemsin a cache, then filter it by runtime?
Making a dynamic database in a cashe is wrong. I think not in one window, you do not call 300,000 records.
Better where you display them, put a limit of 200 records. And make a normal filter, if you have it, optimize your query.
I think instead of 300,000 records, "REASONABLE" will show 200, or at will 300, 500, 1000, 10000.
For example, I have a window "Connections" and "Contracts" and plus a Link window. I have about 2 million entries, I show the last 200 by filter.
With small amounts of data, Serialisation is better than a local database.
In this case it seems you need over 2 million records so you'd need to pull them all into memory to work with them if you stored them in a flat file or memory.
That sounds like it'd be too much data to handle.
Meaning a local database is very likely your best candidate. Which one suits best depends on how you will access the data and what you'll store.
SQLlite would be a candidate if these are simple records.
If you need to know immediately any change is made to a record then a push mechanism would be an idea.
You could use signalr to tell clients data has changed.
If you don't need to know immediately then you could have the client poll and ask what's changed every once in a while.
What I have done in the past is to add a RecentChanges table per logical entity. When a record is changed a record is added with the id, timestamp and user. You can then read this table to find what's been changed since a specific time. Where heavy usage and database overheads mean a more sophisticated approach I've cached copies of recently changed records on a business server.

SQLite vs. SQL Server [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I created an application in C# using Winforms which has daily transaction of 2000 rows of data per day. I'm using SQL Server 2012 but I'm trying to use SQLite because of his fame and most people refer this
So, can you give me some ideas which one is better for my needs?
Thanks
SQLite integrates with your .NET application better than SQL server
SQLite is generally a lot faster than SQL Server.
However, SQLite only supports a single writer at a time (meaning the execution of an individual transaction). SQLite locks the entire database when it needs a lock (either read or write) and only one writer can hold a write lock at a time. Due to its speed this actually isn't a problem for low to moderate size applications, but if you have a higher volume of writes (hundreds per second) then it could become a bottleneck. There are a number of possible solutions like separating the database data into different databases and caching the writes to a queue and writing them asynchronously. However, if your application is likely to run into these usage requirements and hasn't already been written for SQLite, then it's best to use something else like SQL Server that has finer grained locking.
SQLite is a nice fast database to use in standalone applications. There's dozens of GUI's around to create the schema you want and interfaces for pretty much any language you would want (C#/C/C++/Java/Python/Perl). It's also cross platform and is suitable for Windows, Linux, Mac, Android, iOS and many other operating systems.
Here are some advantages for SQLite:
Perfomance
In many cases at least 2-3 times faster than MySQL/PostgreSQL.
No socket and/or TCP/IP overhead - SQLite runs in the same process as your application.
Functionality
Sub-selects, Triggers, Transactions, Views.
Up to 281 TB of data storage.
Small memory footprint.
Self-contained: no external dependencies.
Atomic commit and rollback protect data integrity.
Easily movable database.
Security
Each user has their own completely independent database(s).

How much async/await is OK? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
In our project we are using async/await for almost 3 purposes (for all of their methods):
Data access layer: where fetching/updating databases (using Dapper).
Cache (Redis): read/write.
ASP.Net MVC 5 controllers.
The question is how much async/await is ok. Is it ok to use them even when reading or writing small amount of data? How about cache and controllers?
Remarks: the project is a little special and it may have about 50,000 requests per second for few hours of a day.
According to an article I've read:
Async/await is great for avoiding blocking while potentially
time-consuming work is performed in a .NET application, but there are
overheads associated with running an async method
The cost of this is comparatively negligible when the asynchronous
work takes a long time, but it’s worth keeping in mind.
Based on what you asked, even when reading or writing small amount of data?. It doesnt seem to be a good idea as there are over.
Here is the article: The overhead of async/await in NET 4.5
And in the article he used a profiler to check the optimization of async/await.
QUOTE:
Despite this async method being relatively simple, ANTS Performance
Profiler shows that it’s caused over 900 framework methods to be run
in order to initialize it and the work it does the first time that
it’s run.
The question here maybe if you're gonna accept these minimal overheads, and take into consideration that these overheads do pile up into something possibly problematic.
The question is how much async/await is ok. Is it ok to use them even
when reading or writing small amount of data? How about cache and
controllers?
You should use async/await for I/O bound operations, it doesn't matter if it's a small amount of data. More important is to avoid potentially long running I/O bound operations mainly disk and network calls. Asp.net has limited size of thread pool and these operations may block it. Using asynchronous calls helps your application to scale better and allows to handle more concurrent requests.
For more info: http://msdn.microsoft.com/en-us/magazine/dn802603.aspx

Best practice - load a lot of stuff in the application_start? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a webshop with a lot of products and other content. Currently I load all content in to a global list at Application_Start, which takes aprox 15-25 seconds.
This makes the the site really fast, as I can get any product/content in O(1) time.
However, is this best practice?
Currently I got a webhotel which is not a VPS / Dedicated server, so it recycles the application from time to time, which gives random visitors load times up to 15-25 seconds (only to become a bigger number with more content). This is of course totally unacceptable, but I guess it would be solved with a VPS.
What is the normal way of doing this? I guess a webshop like Amazon probably don't load all their products into a huge list :-D
Any thoughts and ideas would be highly appreciated.
It looks like you've answered your question for your case "This is of course totally unacceptable".
If your goal O(1) normal request to database for single product is likely O(1) unless you need to have complicated joins between products. Consider trying to drop all your pre-caching logic and see if you have problem with performance. You can limit startup impact by lazy caching instead.
Large sites often use distributed caching like MemcaheD.
A more scalable setup is to set up a web service to provide the content, which the website calls when it needs it. The web service will need to cache frequently needed content to achieve fast response times.
First of all, 15-20 seconds to load data is too much time, so I suspect this cases
This time is for compile and not the data load
The data is too much and you full the memory
The method that you use to read data is very slow
The data storage is too slow, or the struct is on text file and the read of it is slow
My opinion is that you cache only small amount of data that you need to use it too many times in short time. The way you describe it is not good practice for some reasons.
If you have many pools you read the same data on all pools and you spend memory for no reason.
The data that you cache you can not change them - is for read only
Even if you cache some data then you need to render the page, and there is where you actually need to make the cache, on the final render, not on data.
What and how to cache.
We cache the final render page.
We also set cache for the page and other elements to the client.
We read and write the data from database as they come and we left the database do the cache, he knows better.
If we cache data then they must be small amount that needed to be used for long loop and we avoid the database call many times.
Also we cache as they ask for it, and if not used for long time, or the memory need space this part of cache gone away. If some part of the data come from complex combinations of many tables then we make a temporary flat big table that keep all the data together, every one in a row. This table are temporary and if we needed too much we make a second temporary database file that we keep this part of the data.
How fast is the database read ? Well is so fast that you not need to worry about that, you need to check other point of delays, like as I say the full render of a page, or some parts of the page.
What you need to worry about is a good database design, a good and fast way to retrieve your data, and a good optimize code to show them.
Separation of responsibilities will help you scale for the future.
With your current setup, you are limited to the resources of your web server, and, like you said, your start up times will grow out of control as you continue adding more products.
If you share the burden of each page request with SQL Server, you open up your application to allow it to scale as needed. Over time, you may decide to add more web servers, cluster SQL Server, or switch to a new database back-end altogether. However, if all the burden is on the application pool, then you are drastically limiting yourself.

Categories

Resources