I'm working on a web2 project that i would like has thousands of rows per day by users.
for handling this size of data i designed database like this:
one .mdf and .ldf file as Minor DataBase and 1 Major DB to save and query the User Accounts and DataBase Files addresses.
i have worked several months for this plan and now i can manage it easily.
i want to know if it is good idea to handle huge size of Independent datas ?
witch has better performance in your opinion ? opening connection of many small .mdf files or just a huge dataBase.
afterwards i'll divide the mdf Repository in several computers.
all of them are handled by C# and linq (.net4)
// Later Descriptions
i built this plan and it works fine.
for example: opening every small mdf file takes 1sec time and query it in 0.0sec. it makes static time for every connection but in single Database for 50rows system must find them in for instance 200,000 rows and takes about 4-5sec in my system with simple select query with Primary key.
for other instance i want to get a row between 500,000 rows to bind page content and select 50 Comments between 2milmions row, and get count of votes of every comment, view count in day, week, month and total. count of likes, answer of comments and get more datas from 2-3 other tables, this querys are heavy and take more time than small slave database.
i think a good design and processes must work easy for system.
the only problem is that small slave databases with sql server files takes more physical size about 3MB per DataBase.
There is no reason to split something that could/should exist as a single database in to multiple independent parts.
There are already mechanisms to partition a single logical database across multiple files: Files and Filegroups Architecture as well as to partition large tables (A few thousand rows per day doesn't really qualify as a large table).
"Thousands of rows per day" should be pocket change for Sql Server.
First, I voted up Alex K answer. File groups will get you to where you want to be most likely. Partitioned tables may be overkill, and is only available in Enterprise version and is not for the light hearted.
What I will add is:
http://www.google.com/#q=glenn+berry+dmv&bav=on.2,or.r_gc.r_pw.&fp=73d2ceaabb6b01bf&hl=en
You need to tweak your indexes. In the good vs. better vs. best category, Glenn Berry's DMV queries are "better". Those queries will help you fix the majority of issues.
In the "best" category is pain staking looking at each stored procedure, and looking at the execution plan and trying out different things. This is what a good dba is able to provide.
Here are some "basics" on file setup considerations. Pay attention the TEMP database setup.
http://technet.microsoft.com/en-us/library/cc966534.aspx
its difficult to manage small MDF file u have to go with SQL server and SQL server database provide 10GB data storage per one database os its easy
Related
Symantec backup software writes backup image details to a local Sqlite database. I'm writing a utility to query several of these databases on multiple devices from a central location. I only need the most recently added records from a single table in each database.
However, based on the network traffic I'm observing, it appears that the entire database is being transferred across the network. However, based on procmon results, it appears that all 4,920 records in the table are being transferred across the network. Is there a way to pull just the records I need? Perhaps one must sort by an index to avoid pulling over all records?
I should be seeing just a few KB of data transfer, but instead I'm seeing several MB per query. I know it is possible to transfer just the records you need with MS Access databases--which are also file-based--but I don't have much experience with sqlite.
I'm open to more creative solutions as well.
I had to fix two things to reduce the amount of network traffic generated by my sqlite queries:
Add an index that matched the ORDER BY clause of my query
Add a LIMIT clause to my query so that I was only pulling the records I actually needed
Years back, I had created a small system against a requirement where a snapped image from Android was uploaded onto a server along with its respective custom data and then stored on the disk and the custom data describing the image was further broken up and stored in the database. Each of the snapped images was actually a part of a campaign. Over the period, the system went on growing enough and now there are now over 10,000 campaigns already and over 500-1000 images per campaign. Though, the performance is not all that bad however I believe its just a matter of time. We now are thinking of archiving the past campaigns in another database called as Archive. Now here is what I am planning to do.
The Archive Database will have the exact same structure and the Archive functionality may have a search mechanism however, retrieval speed is not much of a concern here as this will happen very rarely.
I was thinking of removing records from one database and cloning it in the other, however the identity column probably will not let me do that very seamlessly. (and I may be wrong too.)
There needs to be a restore option too. (This is probably the most challenging part)
If I just make the records blank(except for the identity) from the original database and copy it to the other with no identity constraint, probably it is not going to help and I think it will loose the purpose of the exercise.
Any advise over this? Is there any known strategy or pattern or literature or even a link that may guide me on this?
Thank you in advance for your help.
I say: as long as you don't run out of space on your server, leave it as it is.
Over the period, the system went on growing enough and now there are now over 10,000 campaigns already and over 500-1000 images per campaign.
→ That's 5-10 millions of rows (created over several years).
For SQL Server, that's not that much.
Yes, I know...we're talking about image files stored in the database, not "regular" rows. Still, if your server has reasonably sized hardware, it shouldn't really matter.
I'm talking from experience here - at work, we have a SQL Server database which we use to store PDF files and images.
In our case, we're using a "regular" image column - since you're using SQL Server 2008, you could even use FILESTREAM (maybe you already do, but I don't know - you didn't say anything how exactly you're storing the image in the database).
We started the project on SQL Server 2005, where FILESTREAM wasn't available yet. In the meantime, we upgraded to SQL Server 2012, but never changed the data type in the table where we're storing the files.
If you still prefer creating a separate archive database and moving old data there, one piece of advice concerning this:
2) I was thinking of removing records from one database and cloning it
in the other, however the identity column probably will not let me do
that very seamlessly. (and I may be wrong too.)
[...]
4) If I just make the records blank(except for the identity) from the
original database and copy it to the other with no identity
constraint, probably it is not going to help and I think it will loose
the purpose of the exercise.
You don't need to set the column to identity in the archive database as well.
Just leave everything as it is in the main database, but remove the identity setting from the primary key in the archive database.
The archive database doesn't ever need to generate new keys (hence no need for identity), you're just copying rows with already existing keys from the main database.
I think good solution for you case is SSIS. This technology can provide fast loading of big volume of data to you Archive system. In addition you can use table partitioning to increase performance of manipulation of big data in Archive system. Also check such thing like comumnstore indexes (but it depends on version of SQL server).
I created such solution with following steps:
1) switch partition from main table t to another table t_1(the oldest rows in a table) in production system
2) load data to Archive system from table t_1
3) drop or truncate table t_1
The situation is as follows: I have a large-ish dataset with a couple thousand entries that I populate from an Excel file. For each entry I have to match it to another field on a certain table in the database (this table contains only a couple hundred entries).
What's the best way to go about doing it? I can make a query for each entry in the dataset but this seems fairly wasteful; on the other hand I can just select the fields I need from all the entries in the table, put them on a Dictionary or some other data structure and match them on IIS, thus making effectively only one query but doing all the processing on the webserver.
Dataset : ~1000 to ~3000 entries
Table in the DB: ~300 entries
Using asp.net on IIS but the database is a MS access file.
Is either of these better the other? Is there a third, better way I haven't thought of?
Databases are designed to do many things that are useful for data processing. A lot of benefits for transactional processing are contained in the acronym ACID -- atomicity, consistency, isolation, durability. In other words, databases behave the way you would expect when you store something in them. The data is there, relationships are enforced, it will be there tomorrow.
The features that you want are on the querying side. Databases in general (although perhaps not MS Access in particular) allow a relatively standard interface to powerful processing. Database engines know how to optimize queries. Database engines know how to manage memory. Database engines know how to manager hierarchical memory, with disk, RAM, and cache. Databases know how to take advantage of indexes, row partitions, and other optimizations (you can get this functionality by using a free version of a more advanced database, such as SQL Server, Oracle, Postgres, or even MySQL).
You are talking about thousands of rows of data. Databases can easily work with millions of rows. You are talking about two tables. Databases can easily manage many more tables and queries using a dozen.
So, no, you should not load your data into in-memory structures on the application side. You should do the processing in the database and bring back the results you want. Then, you can format the results on the application side, to take advantage of what applications do best: interface to the user.
In my application I have a SQL Server 2008 table Employee Swipedaily_Tbl with 11 columns
where the employee daily swipes are inserted.
And I have about 8000 employees in my company. This means there will be at least 16000 rows created daily..
I am planing to delete all the rows at the end of a month and save them to another table in order to increase performance...... or back up the previous month data as dmb file from by application itself
As I am a new to SQL Server and DBA, can anyone suggest whether there is a better idea?
Can I create a dump file from the application?
Either by using Partitioning Table so inserting new data in huge volume database table won't effect its performance or using Script to backup data monthly wise using SQL Job and delete from existing one but if you are using Identity column you might need some changes in script to avoid conflict in old and new data.
Create an identical table
Create a SQL script to copy all the data older than a given date
(say today's date) to that table and delete from your table
Configure a SQL agent job to execute that script on the 1st of every
month
However, with proper indexing, you should be OK to reatian the data in your original table itself for a much longer period - 365 day x 8000 employees x 2 swipes = 5.84 million records, not too much for SQL server to handle.
Raj
You can create another table identical to Swipedaily_Tbl(11 columns) with additional one column that would tell when specific record was inserted in the backup table. You can then create a script that would backup the data older than one month and delete that data from the orignal table. You can then create a batch or a console application that could be scheduled to run at the end of month.
Hope this help.
Thanks.
It would depend on your requirements for the "old" data.
Personally, I would strongly consider using table partitioning.
See: http://technet.microsoft.com/en-us/library/dd578580(v=sql.100).aspx
Keep all records in table; this will make queries that look at current and historic data simultaneously simpler and potentially cheaper.
As all too often, it depends. Native partitioning requires the Enterprise Edition of SQL Server, however there are ways around it (although not very clean), like this.
If you do have the Enterprise Edition of SQL Server, I would take a serious look at partitioning (well linked in some of the other answers here), however I wouldn't split on a monthly basis, maybe a quarterly or semi-annual basis, as at two swipes per day is less than half a million rows per month, and a 1.5-3 mil. row table isn't that much for SQL server to handle.
If you are experiencing performance issues at this point in time with maybe a few months of data, have you reviewed the most frequent queries hitting the table and ensured that they're using indexes?
I have developed an network application that is in use in my company for last few years.
At start it was managing information about users, rights etc.
Over the time it grew with other functionality. It grew to the point that I have tables with, let's say 10-20 columns and even 20,000 - 40,000 records.
I keep hearing that Access in not good for multi-user environments.
Second thing is the fact that when I try to read some records from the table over the network, the whole table has to be pulled to the client.
It happens because there is no database engine on the server side and data filtering is done on the client side.
I would migrate this project to the SQL Server but unfortunately it cannot be done in this case.
I was wondering if there is more reliable solution for me than using Access Database and still stay with a single-file database system.
We have quite huge system using dBase IV.
As far as I know it is fully multiuser database system.
Maybe it will be good to use it instead of Access?
What makes me not sure is the fact that dBase IV is much older than Access 2000.
I am not sure if it would be a good solution.
Maybe there are some other options?
If you're having problems with your Jet/ACE back end with the number of records you mentioned, it sounds like you have schema design problems or an inefficiently-structured application.
As I said in my comment to your original question, Jet does not retrieve full tables. This is a myth propagated by people who don't have a clue what they are talking about. If you have appropriate indexes, only the index pages will be requested from the file server (and then, only those pages needed to satisfy your criteria), and then the only data pages retrieved will be those that have the records that match the criteria in your request.
So, you should look at your indexing if you're seeing full table scans.
You don't mention your user population. If it's over 25 or so, you probably would benefit from upsizing your back end, especially if you're already comfortable with SQL Server.
But the problem you described for such tiny tables indicates a design error somewhere, either in your schema or in your application.
FWIW, I've had Access apps with Jet back ends with 100s of thousands of records in multiple tables, used by a dozen simultaneous users adding and updating records, and response time retrieving individual records and small data sets was nearly instantaneous (except for a few complex operations like checking newly entered records for duplication against existing data -- that's slower because it uses lots of LIKE comparisons and evaluation of expressions for comparison). What you're experiencing, while not an Access front end, is not commensurate with my long experience with Jet databases of all sizes.
You may wish to read this informative thread about Access: Is MS Access (JET) suitable for multiuser access?
For the record this answer is copied/edited from another question I answered.
Aristo,
You CAN use Access as your centralized data store.
It is simply NOT TRUE that access will choke in multi-user scenarios--at least up to 15-20 users.
It IS true that you need a good backup strategy with the Access data file. But last I checked you need a good backup strategy with SQL Server, too. (With the very important caveat that SQL Server can do "hot" backups but not Access.)
So...you CAN use access as your data store. Then if you can get beyond the company politics controlling your network, perhaps then you could begin moving toward upfitting your current application to use SQL Server.
I recently answered another question on how to split your database into two files. Here is the link.
Creating the Front End MDE
Splitting your database file into front end : back end is sort of a key to making it more performant. (Assume, as David Fenton mentioned, that you have a reasonably good design.)
If I may mention one last thing...it is ridiculous that your company won't give you other deployment options. Surely there is someone there with some power who you can get to "imagine life without your application." I am just wondering if you have more power than you might realize.
Seth
The problems you experience with an Access Database shared amongst your users will be the same with any file based database.
A read will pull a lot of data into memory and writes are guarded with some type of file lock. Under your environment it sounds like you are going to have to make the best of what you have.
"Second thing is the fact that when I try to read some records from the table over the network, the whole table has to be pulled to the client. "
Actually no. This is a common misstatement spread by folks who do not understand the nature of how Jet, the database engine inside Access, works. Pulling down all the records, or excessive number of records, happens because you don't have all the fields used in the selection criteria or sorting in the index. We've also found that indexing yes/no aka boolean fields can also make a huge difference in some queries.
What really happens is that Jet brings down the index pages and data pages which are required. While this is a lot more data than a database engine would create this is not the entire table.
I also have clients with 600K and 800K records in various tables and performance is just fine.
We have an Access database application that is used pretty heavily. I have had 23 users on all at the same time before without any issues. As long as they don't access the same record then I don't have any problems.
I do have a couple of forms that are used and updated by several different departments. For instance I have a Quoting form that contains 13 different tabs and 10-20 fields on each tab. Users are typically in a single record for minutes editing and looking for information. To avoid any write conflicts I call the below function any time a field is changed. As long as it is not a new record being entered, then it updates.
Function funSaveTheRecord()
If ([chkNewRecord].value = False And Me.Dirty) Then
'To save the record, turn off the form's Dirty property
Me.Dirty = False
End If
End Function
They way I have everything setup is as follows:
PDC.mdb <-- Front End, Stored on the users machine. Every user has their own copy. Links to tables found in PDC_be.mdb. Contains all forms, reports, queries, macros, and modules. I created a form that I can use to toggle on/off the shift key bipass. Only I have access to it.
PDC_be.mdb <-- Back End, stored on the server. Contains all data. Only form and VBA it contains is to toggle on/off the shift key bipass. Only I have access to it.
Secured.mdw <-- Security file, stored on the server.
Then I put a shortcut on the users desktop that ties the security file to the front end and also provides their login credentials.
This database has been running without error or corruption for over 6 years.
Access is not a flat file database system! It's a relational database system.
You can't use SQL Server Express?
Otherwise, MySQL is a good database.
But if you can't install ANYTHING (you should get into those politics sooner rather than later -- or it WILL be later), just use you existing database system.
Basically with Access, it cannot handle more than 5 people connected at the same time, or it will corrupt on you.