Querying remote sqlite database - c#

Symantec backup software writes backup image details to a local Sqlite database. I'm writing a utility to query several of these databases on multiple devices from a central location. I only need the most recently added records from a single table in each database.
However, based on the network traffic I'm observing, it appears that the entire database is being transferred across the network. However, based on procmon results, it appears that all 4,920 records in the table are being transferred across the network. Is there a way to pull just the records I need? Perhaps one must sort by an index to avoid pulling over all records?
I should be seeing just a few KB of data transfer, but instead I'm seeing several MB per query. I know it is possible to transfer just the records you need with MS Access databases--which are also file-based--but I don't have much experience with sqlite.
I'm open to more creative solutions as well.

I had to fix two things to reduce the amount of network traffic generated by my sqlite queries:
Add an index that matched the ORDER BY clause of my query
Add a LIMIT clause to my query so that I was only pulling the records I actually needed

Related

Tools or data base tips to show big data information?

I'm having a problem with an application builded in .net core(c#) and SQL server 2017 with angular js version 1.x (frondend).
The problem is the following we have are very big tables, with millons of records. Only a simple select count in one of theses tables takes to long. we execute the query directly from the code without passing through any ORM librery, but even without using any ORM the queries take too long.
I was asking myself ¿if there is another better way to consult these giant tables likes (external tools, another type of database, etc.) since in many cases you need to show reports and see statistics graphs?.
One possible strategy is to use table partitions using a partition function that match your business needs. With this you can split data in table among many files, thus reducing the number of results to scan.
See this link for detailed info.
OLTP databases like SQL Server are not designed for handling OLAP (aggregate) queries in the real time in case of large datasets. Typical workarounds are:
limit number of aggregated rows with extra WHERE conditions, and add indexes for these columns. This is usually is possible with historic data like orders, events log etc - show reports only for last month or year.
use materialized views and use it for reports that doesn't need much detalization
configure slave read-only instance of SQL Server, possibly add columnstore indexes, and use it for OLAP queries.
replicate your SQL Server data to specialized (possibly, distributed) analytical database that can handle OLAP queries in the real-time (like Amazon Redshift, Vertica, MongoDb, ElasticSearch, Yandex ClickHouse etc)
If reports can be configured by end users ensure that your ROLAP-like engine produces efficient SQL GROUP BY queries.

Best way to transfer data between SQL Servers

I have a general in-office database and a bunch of laptop-installed SQL Server Express databases of the same structure. In these databases I log information, which is partitioned by the id of a specific event. The laptops are taken by employees mostly to service one event (2-3 at the most sometimes), and obviously I have an eventId column in all tables except the dictionaries in the databases (though it rarely goes above 3 on the laptop databases).
When the employees come back to the office, they need to backup the last event data to the office SQL Server, where eventId's can be a hundred or a thousand, etc., this is the history of all past events.
Another reason for my question is I would like to be able to make an eventId-based selective copy of an existing database to a different server for development purposes (from local to SQL Azure actually).
What is the efficient way of moving such data between servers? I have approximately 50-100 tables with keys and references.
What we do at the moment - selective copying the data to a new clean database on source server (only whatever is required for copying); backing up this new DB; restoring it on a target server; data copy between databases using dynamic T-SQL with a dbname as a parameter. What I don't like is dynamic T-Sql, no syntax check at all until runtime.
Loading data somewhat in RAM with help of developing a C# program (but it may be too resource consuming, as long as data may be large) and transferring it to a new server with a C# program as a proxy and RAM as intermediate storage. Challenging and in certain situations impossible (too much data).
I started looking into SSIS, but it seems that the general advice for use of SSIS is when copying is done at fixed environment ("in-house"). We supply software not to a vast amount of customers, but there surely are more than one, and we would need to pass connection information dynamically from somewhere, as long as we can't visit all our customers to prepare a set-up for them on location.
Any other suggestions?
Any help appreciated!
Here is what I would recommend.
Create an SSIS package. Install it on the Laptops SQL Server Express.
You can use a configuration file that will allow you to select which Event Ids you want to Export.
You can also keep all of your connection information in the configuration file. For those non in-house customers you can remove the configuration file between exports.
That would be my recommendation.

Multiple MDF FIles VS Single DataBase(SQL Server)

I'm working on a web2 project that i would like has thousands of rows per day by users.
for handling this size of data i designed database like this:
one .mdf and .ldf file as Minor DataBase and 1 Major DB to save and query the User Accounts and DataBase Files addresses.
i have worked several months for this plan and now i can manage it easily.
i want to know if it is good idea to handle huge size of Independent datas ?
witch has better performance in your opinion ? opening connection of many small .mdf files or just a huge dataBase.
afterwards i'll divide the mdf Repository in several computers.
all of them are handled by C# and linq (.net4)
// Later Descriptions
i built this plan and it works fine.
for example: opening every small mdf file takes 1sec time and query it in 0.0sec. it makes static time for every connection but in single Database for 50rows system must find them in for instance 200,000 rows and takes about 4-5sec in my system with simple select query with Primary key.
for other instance i want to get a row between 500,000 rows to bind page content and select 50 Comments between 2milmions row, and get count of votes of every comment, view count in day, week, month and total. count of likes, answer of comments and get more datas from 2-3 other tables, this querys are heavy and take more time than small slave database.
i think a good design and processes must work easy for system.
the only problem is that small slave databases with sql server files takes more physical size about 3MB per DataBase.
There is no reason to split something that could/should exist as a single database in to multiple independent parts.
There are already mechanisms to partition a single logical database across multiple files: Files and Filegroups Architecture as well as to partition large tables (A few thousand rows per day doesn't really qualify as a large table).
"Thousands of rows per day" should be pocket change for Sql Server.
First, I voted up Alex K answer. File groups will get you to where you want to be most likely. Partitioned tables may be overkill, and is only available in Enterprise version and is not for the light hearted.
What I will add is:
http://www.google.com/#q=glenn+berry+dmv&bav=on.2,or.r_gc.r_pw.&fp=73d2ceaabb6b01bf&hl=en
You need to tweak your indexes. In the good vs. better vs. best category, Glenn Berry's DMV queries are "better". Those queries will help you fix the majority of issues.
In the "best" category is pain staking looking at each stored procedure, and looking at the execution plan and trying out different things. This is what a good dba is able to provide.
Here are some "basics" on file setup considerations. Pay attention the TEMP database setup.
http://technet.microsoft.com/en-us/library/cc966534.aspx
its difficult to manage small MDF file u have to go with SQL server and SQL server database provide 10GB data storage per one database os its easy

Whats the best way to compare large amounts of data between two different databases?

I have one desktop application receiving data from a webservice and storing it inside a local postgresql database (while the webservice retrieves data from a SQL Server database). At the end of the process there will be a minimum of 2.5 million entries inside a table in my local database but this will be received from de webservice in batches of about 300 rows at time and within a time frame of about 15 days.
What I need is a way to make sure that my local database has the exact same information the server's database has.
I'm thinking of creating some sort of checksum for each batch received and then, after all batches were received, another checksum of the entire table but I don't know if this is the best solution and, if is, I don't know where to start to create it.
PS: TCP already handles integrity check so I don't even know if this is needed, but it is critical that the data are the same.
I can see how a checksum could possibly be useful, but the amount of transformation you're doing would probably make it impractical. You'd have to derive the checksum on either the original form of the data or on the transformed form; it wouldn't be valid on both.
You have some strange constraints (been there myself), so it's kind of hard to come up with a clear strategy without knowing all the details. Maybe one of the following suggestions would work.
A simple count(*) on the SQL Server side and on the PostgreSQL side after the migration is complete.
Dump out a list of keys from the SQL Server side and from the PostgreSQL side after the migration is complete, and then sort and compare those files.
If 1 and 2 aren't possible because of limited access to SQL Server, maybe dump out the results of the web service calls to a single file location as you go along, and then extract the same data from PostgreSQL at the end, and compare those files.
There are numerous tools available for comparing files if you choose options 2 or 3.
Do you have control over the web service and SQL Server DB? If you do, SQL Server Change Tracking should do the trick. MSDN Change Tracking will track every change (or just the changes you care about) on a per table basis. Each time you synchronize you just pass it your version number and it will return the changeset required to bring you up to date.

Cross-referencing across multiple databases

I have two databases, one is an MS Access file, the other is a SQL Server database. I need to create a SELECT command that filters data from the SQL Server database based on the data in the Access database. What is the best way to accomplish this with ADO.NET?
Can I pull the required data from each database into two new tables. Put these in a single Dataset. Then perform another SELECT command on the Dataset to combine the data?
Additional Information:
The Access database is not permanent. The Access file to use is set at runtime by the user.
Here's a bit of background information to explain why there are two databases. My company uses a CAD program to design buildings. The program stores materials used in the CAD model in an Access database. There is one file for each model. I am writing a program that will generate costing information for each model. This is based on current material prices stored in a SQL Server database.
My Solution
I ended up just importing the data in the access db into a temporary table in the SQL server db. Performing all the necessary processing then removing the temporary table. It wasn't a pretty solution but it worked.
You don't want to pull both datasets across if you don't have to do that. You are also going to have trouble implementing Tomalak's solution since the file location may change and might not even be readily available to the server itself.
My guess is that your users set up an Access database with the people/products or whatever that they are interested in working with and that's why you need to select across the two databases. If that's the case, the Access table is probably smaller than the SQL Server table(s). Your best bet is to pull in the Access data, then use that to generate a filtered query to SQL Server so that you can minimize the data that is sent over the network.
So, the most important things are:
Filter the data ON THE SERVER so that you can minimize network traffic and also because the database is going to be faster at filtering than ADO.NET
If you have to choose a dataset to pull into your application, pull in the smaller dataset and then use that to filter the other table.
Assuming Sql Server can get to the Access databases, you could construct an OPENROWSET query across them.
SELECT a.*
FROM SqlTable
JOIN OPENROWSET(
'Microsoft.Jet.OLEDB.4.0',
'C:\Program Files\Microsoft Office\OFFICE11\SAMPLES\Northwind.mdb';'admin';'',
Orders
) as b ON
a.Id = b.Id
You would just change the path to the Access database at runtime to get to different MDBs.
First you need to do something on the server - reference the Access DB as a "Linked Server".
Then you will be able to query it from within the SQL server, pulling out or stuffing in data however you like. This web page gives a nice overview on how to do it.
http://blogs.meetandplay.com/WTilton/archive/2005/04/22/318.aspx
If I read the question correctly, you are NOT attempting to cross reference across multiple databases.
You need merely to reference details about a particular FILE, which in this case, could contain:
primary key, parent file checksum (if it is a modification), file checksum, last known author, revision number, date of last change...
And then that primary key when adding information obtained from analysing that file using your program.
If you actually do need a distributed database, perhaps you would prefer to use a non-relational database such as LDAP.
If you can't use LDAP, but must use a relational database, you might consider using GUID's to ensure that your primary keys are good.
Since you don't give enough information, i'm going to have to make some assumptions.
Assuming:
The SQL Server and the Access Database are not on the same computer
The SQL Server cannot see the Access database over a file share or it would be too difficult to achieve this.
You don't need to do joins between the access database and the sql server, only use data from teh access database as lookup elements of your where clause
If the above assumptions are correct, then you can simply use ADO to open the Access database and retrieve the data you need, possibly in a dataset or datatable. Then extract the data you need and feed it to a different ADO query to your SQL Server in a dynamic Where clause, prepared statement, or via parameters to a stored procedure.
The other solutions people are giving all assume you need to do joins on your data or otherwise execute SQL which includes both databases. To do that, you have to use linked databases, or else import the data into a table (perhaps temporary).
Have you tried benchmarking what happens if you link from the Access front end to your SQL Server via ODBC and write your SQL as though both tables are local? You could then do a trace on the server to see exactly what Jet sends to the server. You might be surprised as to how efficient Jet is with this kind of thing. If you're linking on a key field (e.g., and ID field, whether from the SQL Server or not), it would likely be the case that Jet would send a list of of the IDs. Or you could write your SQL to do it that way (using IN SELECT ... in your WHERE clause).
Basically, how efficient things will be depends on where your WHERE clause is going to be executed. If, for instance, you are joining a local Jet table with a linked SQL Server table on a single field, and filtering the results based on values in the local table, it's very likely to be extremely efficient, in that the only thing Jet will send to the server is whatever is necessary to filter the SQL Server table.
Again, though, it's going to depend entirely on exactly what you're trying to do (i.e., which fields you're filtering on). But give Jet a chance to see if it is smart, as opposed to assuming off the bat that Jet will screw it up. It may very well require some tweaking to get Jet to work efficiently, but if you can keep all your logic client-side, you're better off than trying to muck around with tracking all the Access databases from the server.

Categories

Resources