Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to implement cache for my WPF application.
My application is holding over Sorry, I read of the wrong count. There is 2328681 items, and what I want to do is to cache all these itemsinto a file saved on the computer or something, which should release the workload of retrieving data from the database on the next runtime.
I'm going to have a function which check the latest DBUpdateTime, which compare if the DBUpdateTime in cache differs from the one in SQL, then retrieve the newest update.
Does someone know how I can achieve this? With what kind of library do you suggest my to use in order to achieve the cache?
I'm going to show active items, but I also want to show inactive items, should I save all itemsin a cache, then filter it by runtime?
Making a dynamic database in a cashe is wrong. I think not in one window, you do not call 300,000 records.
Better where you display them, put a limit of 200 records. And make a normal filter, if you have it, optimize your query.
I think instead of 300,000 records, "REASONABLE" will show 200, or at will 300, 500, 1000, 10000.
For example, I have a window "Connections" and "Contracts" and plus a Link window. I have about 2 million entries, I show the last 200 by filter.
With small amounts of data, Serialisation is better than a local database.
In this case it seems you need over 2 million records so you'd need to pull them all into memory to work with them if you stored them in a flat file or memory.
That sounds like it'd be too much data to handle.
Meaning a local database is very likely your best candidate. Which one suits best depends on how you will access the data and what you'll store.
SQLlite would be a candidate if these are simple records.
If you need to know immediately any change is made to a record then a push mechanism would be an idea.
You could use signalr to tell clients data has changed.
If you don't need to know immediately then you could have the client poll and ask what's changed every once in a while.
What I have done in the past is to add a RecentChanges table per logical entity. When a record is changed a record is added with the id, timestamp and user. You can then read this table to find what's been changed since a specific time. Where heavy usage and database overheads mean a more sophisticated approach I've cached copies of recently changed records on a business server.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm currently working on a point of sales software in which I have a table to record each and every item of a transaction, Since its going to hold hundreds of records each day after its release, I just wanna know the maximum amount of records that can be held by a Table and can anyone pls let me know whether it can slow down the software over time.
For practical day-to-day purposes (where you're inserting hundreds or thousands of rows per day) there is no limit to the size of the table, except if it fills up your disk.
Remember that organisations with userbases larger than yours, use databases with not hundreds of rows per day, but millions of rows per day.
Typically though, you will start to run into performance issues that needs fixing. You can still get good performance, you just need to do more to watch and tweak it.
For example, you may have a typical table with, say,
An ID (autoincrement/identity) that is the Primary Key (and clustered index).
A date/time field recording when it occurred
Some other data e.g., user IDs, amounts, types of action, etc
Each row you insert into the table just puts a new row at the end of that table, which databases typically have no problem doing. Even if the table is already large, adding more rows isn't much of a problem.
However, imagine you have a query/report that gets the data for the last week - for example, SELECT * FROM trn_log WHERE trn_datetime >= DATEADD(day, -7, getdate())
At first that runs fine.
After a while, it slows down. Why? Because the database doesn't know the that the datetimes are sequential, and therefore it must read every row of the table and work out which of the rows are the ones you want to use.
At that point, you start to think about indexes - which is a good next step. But when you add an index, it slows down your new row inserts (by a small amount).
I learned a lot from watching Brent Ozar's videos. I recommend watching his How to Think Like the SQL Server Engine series.
Note that this above is based on my experience with SQL Server - but it's likely (at this fundamental level) most other databases are the same.
The number of rows per page is limited to 255 rows so that works out to 4.1 billion rows per partition. A table can have an unlimited number of partitions and a single server can manage up to 128PB of storage.
https://www.quora.com/How-many-rows-can-recent-SQL-and-NoSQL-databases-reasonably-handle-within-one-table#:~:text=The%20number%20of%20rows%20per,up%20to%20128PB%20of%20storage.
I'm trying to record log files on my database. My question is which has the less load on making logs on the database. I'm thinking of storing long term log files ,maybe 3-5 years maximum, for an Inventory Program.
Process: I'll be using a barcode scanner.
After scanning a barcode, I'll get all the details of who is logged in, date and time, product details then saved per piece.
I came up with two ideas.
After the scanning event, It will be saved on a DataTable then after finishing a batch.. DataTable will be written on a *.txt file and then uploaded to my database.
After every scanned barcode, an INSERT query will be executed. I suspect this option will be heavy on the server side since I'm not the only one using this server
What are the pros and cons of the two options?
Are there more efficient ways of storing logs?
Based on your use case, I also think you need to consider at least 2 additional factors, the first being how important is it that the scanned item is logged in the database immediately. If you need the scanned item to be logged because you'll be checking to see if its been scanned, for example to prevent other scans, then doing a single insert is probably a very good idea. The second thing to consider is will you ever need to "unscan" an item, and at which part of the process? If the person scanning needs the ability to revert the scan immediately, it might be a good idea to wait until theyre done all their scannings before dumping the data to the database, as this will let you avoid ever having to delete from the table.
Overall I wouldnt worry too much about what the database can handle, sql-server is very good at handling simultaneous single inserts into a table thats designed for that use case. If youre only going to be inserting new data to the end of the table, and not updating or deleting existing records, performance is going to scale very well. The same goes for larger batch inserts, theyre very efficient no matter how many rows you want to bring in, assuming your table is designed for that purpose.
So overall I would probably pick the more efficient solution from the application side for your specific use case, and then once you have decided that, you can shape the database around the code, rather than trying to shape your code around suspected limitations of the database.
What are the pros and cons of the two options?
Basically your question is which way is more efficient (bulk insert or multiple single insert)?
The answers is always depends and always be situation based. So unfortunately, I don't think there's a right answer for you
The way you structure the log table.
If you choose bulk insert, how many rows do you want to insert at 1 time?
Is it read-only table? And if you want to read from it, how often do you do the read?
Do you need to scale it up?
etc...
Are there more efficient ways of storing logs?
There're some possible ways to improve I can think of (not all of them can work together)
If you go with the first option, maybe you can schedule the insert to non-peak hours
If you go with the first option, chunk the log files and do the insert
Use another database to do the logging
If you go with the second option, do some load testing
Personally, I prefer to go with second option if the project is small to medium size and the logging is critical part of the project.
hope it helps.
Go with the second option, and use transactions. This way the data will not be sent to the db until you call the transaction to complete. (Which can be scheduled.) This will also prevent broken data from getting into your database when a crash or something occurs.
Transactions in .net
Transaction Tutorial in C#
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm designing an accounting application with more than 400 tables in SQL Server.
About 10% of those tables are operational tables, and the others are used for decoding and reference information.
For example, Invoice tables (Master and details) use about 10 table to decode information like buyer, item , marketer and ... .
I want to know is it acceptable to cache decode tables in asp.net cache and do not query them from SQL Server (I know that changes to cache items should commit on SQL Server too). And use cache items for decoding?
I think it makes it so much faster than regular applications.
Maybe all of them together(Cache tables) are about 500 MB after some years because they don't change frequently.
If you've got the RAM then it's fine to use 500 MB.
However unless you have a performance problem now then caching will only cause problems. Don't fix problems that you haven't encountered, design for performance and optimize only when you have problems - because otherwise the optimization can cause more problems that it solves.
So I would advise that usually it is better to ensure that your queries are optimized and well structured, you have the correct indexes on the tables and that you issue a minimum amount of queries.
Although 500MB isn't a lot of data to cache, with all due respect, usually SQL Server will do a better job of caching than you can - providing that you use it correctly.
Using a cache will always improve performance; at a cost of higher implementation complexity.
For static data that never changes a cache is useful; but it still needs to be loaded and shared between threads which in itself can present challenges.
For data that rarely changes it becomes much more complex simply because it could have changed. If a single application (process) is the only updater of a cache then it isn't as difficult, but still not a simple task.
I have spent months optimizing a offline batch processing system (where the code has complete control of the database for a period of 12 hours). Part of the optimisation is to use various caches and data reprojections. All of the caches are readonly. Memory usage is around the 10gb mark during execution, database is around 170gb, 60 million records.
Even with the caching there has been considerable changes to the underlying schema to improve efficiency. The readonly caches are to eliminate reading during processing; to allow multi threaded processing and to improve the insert performance.
Processing rate has gone from 6 items processed per second 20 months ago to around 6000 items per second (yesterday) - but there is a genuine need for this optimization as the number of items to process has risen from 100,000 to 8 million in the same period.
If you don't have a need then don't optimize.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a webshop with a lot of products and other content. Currently I load all content in to a global list at Application_Start, which takes aprox 15-25 seconds.
This makes the the site really fast, as I can get any product/content in O(1) time.
However, is this best practice?
Currently I got a webhotel which is not a VPS / Dedicated server, so it recycles the application from time to time, which gives random visitors load times up to 15-25 seconds (only to become a bigger number with more content). This is of course totally unacceptable, but I guess it would be solved with a VPS.
What is the normal way of doing this? I guess a webshop like Amazon probably don't load all their products into a huge list :-D
Any thoughts and ideas would be highly appreciated.
It looks like you've answered your question for your case "This is of course totally unacceptable".
If your goal O(1) normal request to database for single product is likely O(1) unless you need to have complicated joins between products. Consider trying to drop all your pre-caching logic and see if you have problem with performance. You can limit startup impact by lazy caching instead.
Large sites often use distributed caching like MemcaheD.
A more scalable setup is to set up a web service to provide the content, which the website calls when it needs it. The web service will need to cache frequently needed content to achieve fast response times.
First of all, 15-20 seconds to load data is too much time, so I suspect this cases
This time is for compile and not the data load
The data is too much and you full the memory
The method that you use to read data is very slow
The data storage is too slow, or the struct is on text file and the read of it is slow
My opinion is that you cache only small amount of data that you need to use it too many times in short time. The way you describe it is not good practice for some reasons.
If you have many pools you read the same data on all pools and you spend memory for no reason.
The data that you cache you can not change them - is for read only
Even if you cache some data then you need to render the page, and there is where you actually need to make the cache, on the final render, not on data.
What and how to cache.
We cache the final render page.
We also set cache for the page and other elements to the client.
We read and write the data from database as they come and we left the database do the cache, he knows better.
If we cache data then they must be small amount that needed to be used for long loop and we avoid the database call many times.
Also we cache as they ask for it, and if not used for long time, or the memory need space this part of cache gone away. If some part of the data come from complex combinations of many tables then we make a temporary flat big table that keep all the data together, every one in a row. This table are temporary and if we needed too much we make a second temporary database file that we keep this part of the data.
How fast is the database read ? Well is so fast that you not need to worry about that, you need to check other point of delays, like as I say the full render of a page, or some parts of the page.
What you need to worry about is a good database design, a good and fast way to retrieve your data, and a good optimize code to show them.
Separation of responsibilities will help you scale for the future.
With your current setup, you are limited to the resources of your web server, and, like you said, your start up times will grow out of control as you continue adding more products.
If you share the burden of each page request with SQL Server, you open up your application to allow it to scale as needed. Over time, you may decide to add more web servers, cluster SQL Server, or switch to a new database back-end altogether. However, if all the burden is on the application pool, then you are drastically limiting yourself.
I would like some advice on how to best go about what I'm trying to achieve.
I'd like to provide a user with a screen that will display one or more "icon" (per say) and display a total next to it (bit like the iPhone does). Don't worry about the UI, the question is not about that, it is more about how to handle the back-end.
Let's say for argument sake, I want to provide the following:
Total number of unread records
Total number of waiting for approval
Total number of pre-approved
Total number of approved
etc...
I suppose, the easiest way to descrive the above would be "MS Outlook". Whenever emails arrive to your inbox, you can see the number of unread email being updated immediately. I know it's local, so it's a bit different, but now imagine having the same principle but for the queries above.
This could vary from user to user and while dynamic stored procedures are not ideal, I don't think I could write one sp for each scenario, but again, that's not the issue heree.
Now the recommendation part:
Should I be creating a timer that polls the database every minute (for example?) and run-all my relevant sql queries which will then provide me with the relevant information.
Is there a way to do this in real time without having a "polling" mechanism i.e. Whenever a query changes, it updates the total/count and then pushes out the count of the query to the relevant client(s)?
Should I have some sort of table storing these "totals" for each query and handle the updating of these immediately based on triggers in SQL and then when queried by a user, it would only read the "total" rather than trying to calculate them?
The problem with triggers is that these would have to be defined individually and I'm really tring to keep this as generic as possible... Again, I'm not 100% clear on how to handle this to be honest, so let me know what you think is best or how you would go about it.
Ideally when a specific query is created, I'd like to provide to choices. 1) General (where anyone can use this) and b) Specific where the "username" would be used as part of the query and the count returned would only be applied for that user but that's another issue.
The important part is really the notification part. While the polling is easy, I'm not sure I like it.
Imagine if I had 50 queries to be execute and I've got 500 users (unlikely, but still!) looking at the screen with these icons. 500 users would poll the database every minute and 50 queries would also be executed, this could potentially be 25000 queries per miuntes... Just doesn't sound right.
As mentioned, ideally, a) I'd love to have the data changes in real-time rather than having to wait a minute to be notified of a new "count" and b) I want to reduce the amount of queries to a minimum. Maybe I won't have a choice.
The idea behind this, is that they will have a small icon for each of these queries, and a little number will be displayed indicating how many records apply to the relevant query. When they click on this, it will bring them the relevant result data rather than the actual count and then can deal with it accordingly.
I don't know if I've explained this correctly, but if unclear, please ask, but hopefully I have and I'll be able to get some feedback on this.
Looking forward to your feeback.
Thanks.
I am not sure if this is the ideal solution but maybe a decent 1.
The following are the assumptions I have taken
Considering that your front end is a web application i.e. asp.net
The data which needs to be fetched on a regular basis is not hugh
The data which needs to be fetched does not change very frequently
If I were in this situation then I would have gone with the following approach
Implemented SQL Caching using SQLCacheDependency class. This class will fetch the data from the database and store in the cache of the application. The cache will get invalidated whenever the data in the table on which the dependency is created changes thus fetching the new data and again creating the cache. And you just need to get the data from the cache rest everything (polling the database, etc) is done by asp.net itself. Here is a link which describes the steps to implement SQL Caching and believe me it is not that difficult to implement.
Use AJAX to update the counts on the UI so that the User does not feel the pinch of PostBack.
What about "Improving Performance with SQL Server 2008 Indexed Views"?
"This is often particularly effective for aggregate views in decision
support or data warehouse environments"