I'm a C++ programmer and I'm not familiar with the .NET database model. I usually use IDataReader (OdbcDataReader, OledbDataReader or SqlDataReader) to read data from database. Sometimes when I need a bulk of data I use DataAdapter, but what should I do to achieve the functionality of scrollable cursors that exists in native libraries like ODBC?
Thanks all of you for your answers, but I am in a situation that I can't accept them, of course this is my fault that didn't explain my problem completely. I explain it as a comment in one of answers that now removed.
I have to write a program that will act as a proxy between client side program and MSSQL, for this library I have following requirements:
My program should be compatible with MSSQL2000
I don't know all the tables and queries that will be sent by the user, I should simply add some information to it, make a log, ... and then execute it against MSSQL, so it is really hard to use techniques that based on ordered field(s) of the query or primary key of the table(All my works are in one database but that database is huge and may change over time).
Only a part of data is needed by the client, most DBMS support LIMIT OFFSET, unfortunately MSSQL do not support it, and ROW_NUMBER does not exist in the MSSQL2000 and if it supported, then again I need to understand program logic and that need a parse of SQL command(Actually I write a parsing library with boost::spirit but that's native code and beside that I'm not yet 100% sure about its functionality).
I may have multiple clients but most of queries that will be sent by them are one of a few predefined queries(of course users still send custom queries but its about 30% of all queries), So I think I can open some scrollable cursors and respond to clients using that cursors and a custom cache.
Server machine and its MSSQL will be dedicated to my program, so I really want to use all of the power of the server and DBMS to achieve my functionality.
So now:
What is the problem in using scrollable cursors and why I should avoid them?
How can I use scrollable cursors in .NET?
In SQL Server you can create queries paged thus. The page number you handle it easily from the application. You do not need to create cursors for this task.
For SQL Server 2005 o higher
SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY ID) AS ROW FROM TABLEA ) AS ALIAS
WHERE ROW > 40
AND ROW <= 49
For SQL Server 2000
SELECT TOP 10 T.* FROM TABLA AS T WHERE T.ID NOT IN
( SELECT TOP 39 id from tabla order by id desc )
ORDER BY T.ID DESC
PD: edited to include support for SQL Server 2000
I usually use DataReader.Read() to skip all rows that I do not want to use when doing paging on a DB which do not support paging.
If you don't want to build the SQL paged query yourself you are free to use my paging class: https://github.com/jgauffin/Griffin.Data/blob/master/src/Griffin.Data/BasicLayer/Paging/SqlServerPager.cs
When Microsoft designed the ADO.NET API, they made the decision to expose only firehose cursors (IDataReader etc). This may or may not actually pose a problem for you. You say that you want "functionality of scrollable cursors", but that can mean all sorts of things, not just paging, and each particular use case can be tackled in a variety of ways. For example:
Requirement: The user should be able to arbitrarily page up and down the resultset.
Retrieve only one page of data at a time, e.g. using the ROW_NUMBER() function. This is more efficient than scrolling through a cursor.
Requirement: I have an extremely large data set and I only want to process one row at a time to avoid running out of memory.
Use the firehose cursor provided by ADO.NET. Note that this is only practical if (a) you don't need to hit the database at all during the loop, or (b) you have MARS configured in your connection string.
Simulate a keyset cursor by retrieving the set of unique identifiers into an array, then loop through the array and read one row of data at a time.
Requirement: I am doing a complicated calculation that involves moving forwards and backwards through the resultset.
You should be able to re-write your algorithm to eliminate this requirement. For example, read one set of rows, process them, read another set of rows, process them, etc.
UPDATE (more information provided in the question)
Your business requirements are asking too much. You have to handle arbitrary queries that assume the presence of scrollable cursors, but you can't provide scrollable cursors, and you can't re-write the client code to not use scrollable cursors. That's an impossible position to be in. I recommend you stick with what you currently have (C++ and ODBC) and don't bother trying to re-write it in .NET.
I don't think cursors will work for you particular case. The main reason is that you have 3 tiers. But let's take two steps back.
Most 3 tier applications have a stateless middle tier (your c++ code). Caching is fine since it really just an optimization and does not create any real state in the middle tier. The middle tier normally has a small number of open sessions to the database. Because opening a db session is expensive for the processor, and after the db session is open a set amount of RAM is reserved at the database server. When a request is received by the middle tier, the request is processed and handed on to the SQL database. An algorithm may be used to pick any of the open sessions, or it can even be done at random. In this model it is not possible to know what session will receive the next request. Cursors belong to the session that received the original query request. So you can't really expect that the receiving session will be the one that has your open cursor.
The 3 tier model I described is used mainly for web applications so they can scale to hundreds or thousands of clients. Were SQL servers would never be able to open that many sessions. Microsoft ADO.NET already has many features to support the kind of architecture I described, so it is not very hard to implement. And the same is used even in non Web applications depending on the circumstance. You could potentially keep track of your sessions so you could open a single session per client, I would first make sure that the use case justifies that. Know that open cursors can take up a lot of resources as well.
Cursors still have a place within a single transaction, it's just hard to keep them open so that the client application can fetch/update values within the result set.
What I would suggest its that you do the following within the query transaction. Store in a separate table the primary key values of the main table in your query. On the separate table include other values like sessionid and rownumber. Return a few of the first rows by linking to the new table in the original query. And in subsequent calls just query the corresponding rows again by linking to your new table. You will need an equivalent to a caching mechanism to purge old data, and to refresh the result set according to your needs.
Related
I don't know whether it is better to use temporary tables in SQL Server or use the DataTable in C# for a report. Here is the scope of the report: it will be copied into a workbook with about 10 worksheets - each worksheet containing about 1000 rows and about 30 columns so it's a lot of data. There is some guidance out there but I could not find anything specific regarding the amount of data that is too much for a DataTable. According to https://msdn.microsoft.com/en-us/library/system.data.datatable.aspx, 16M rows but my data set seems unwieldy considering the number of columns I have. Plus, I will either have to make multiple SQL queries to collect the data in my report or try to write a stored procedure in SQL to collect that data. How do I figure out this quandary?
My rule of thumb is that if it can be processed on the database server, it probably should. Keep in mind, no matter how efficient your C# code is, SQL Server will mostly likely to it faster and more efficiently, after all it was designed for data manipulation.
There is no shame in using #temp tables. They maintain stats, can be indexed, and/or manipulated. One recent example, a developer create an admittedly elegant query using cte, the performance was 12-14 seconds vs mine at 1 second using #temps.
Now, one carefully structured stored procedure could produce and return the 10 data-sets for your worksheets. If you are using a product like SpreadSheetLight (there are many options available), it becomes a small matter of passing the results and creating the tabs (no cell level looping... unless you want or need to).
I would also like to add, you can dramatically reduce the number of touch points and better enforce the business logic by making SQL Server do the heavy lifting. For example, a client introduced a 6W risk rating, which was essentially a 6.5. HUNDREDS of legacy reports had to be updated, while I only had to add the 6W into my mapping table.
There's a lot of missing context here - how is this report going to be accessed and run? Is this going to run as a scripted event every day?
Have you considered SSRS?
In my opinion it's best to abstract away your business logic by creating Views or Stored Procedures in the database. Stored Procedures would probably be the way to go but it really depends on your specific environment. Then you can point whatever tools you want to use at the database object. This has several advantages:
if you end up having different versions or different formats of the report, and your logic ever changes, you can update the logic in one place rather than many.
your code is simpler and cleaner, typically:
select v.col1, v.col2, v.col3
from MY_VIEW v
where v.date between #startdate and #enddate
I assume your 10 spreadsheets are going to be something like
Summary Page | Department 1 | Department 2 | ...
So you could make a generalized View or SP, create a master spreadsheet linked to the db object that pulls all the relevant data from SQL, and use Pivot Tables or filters or whatever else you want, and use that to generate your copies that get sent out.
But before going to all that trouble, I would make sure that SSRS is not an option, because if you can use that, it has a lot of baked in functionality that would make your life easier (export to Excel, automatic date parameters, scheduled execution, email subscriptions, etc).
I have ERP database "A" has only read permission, where i cant create trigger on the table.
A is made for ERP system (Unknown Program for me ). I have another Database "B" that is private to my application this application work on both databases. i want to reflect A's changes(for any insert/Update/Delete) instantly to B.
Is there any Functionality in c# that can work exactly as trigger works in database???
You have few solutions, best one depends on which kind of database you have to support.
Generic solution, changes in A database aren't allowed
If you can't change master database and this must work with every kind of database then you have only one option: polling.
You shouldn't check too often (so forget to do it more or less instantly) to save network traffic and it's better to do in in different ways for insert/update/delete. What you can do depends on how database is structured, for example:
Insert: to catch an insert you may simply check for highest row ID (assuming what you need to monitor has an integer column used as key).
Update: for updates you may check a timestamp column (if it's present).
Delete: this may be more tricky to detect, a first check would be count number of rows, if it's changed and no insert occured then you detected a delete else just subtract the number of inserts.
Generic solution, changes in A database are allowed
If you can change the original database you can decrease network traffic (and complexity) using triggers on database side, when a trigger is fired just put a record in an internal log table (just few columns: one for the change type, one for affected table, one for affected record).
You will need to poll only on this table (using a simple query to check if number of rows increased). Because action (insert/update/delete) is stored in the table you just need to switch on that column to execute proper action.
This has a big disadvantage (in my point of view): it puts logic related to your application inside the master database. This may be terrible or not but it depends on many many factors.
SQL Server/Vendor specific
If you're application is tied to Microsoft SQL Server you can use SqlDependency class to track changes made. It works for SS only but I think there may be implementations for other databases. Disadvantage is that this will always bee specific to a specific vendor (so if A database will change host...you'll have to change your code too).
From MSDN:
SqlDependency was designed to be used in ASP.NET or middle-tier services where there is a relatively small number of servers having dependencies active against the database. It was not designed for use in client applications, where hundreds or thousands of client computers would have SqlDependency objects set up for a single database server.
Anyway if you're using SQL Server you have other options, just follow links in MSDN documentation.
Addendum: if you need a more fine control you may check TraceServer and Object:Altered (and friends) classes. This is even more tied to Microsoft SQL Server but it should be usable on a more wide context (and you may keep your applications unaware of these things).
You may find useful, depending on your DBMS:
Change Data Capture (MS SQL)
http://msdn.microsoft.com/en-us/library/bb522489%28v=SQL.100%29.aspx
Database Change Notification (Oracle)
http://docs.oracle.com/cd/B19306_01/appdev.102/b14251/adfns_dcn.htm
http://www.oracle.com/technetwork/issue-archive/2006/06-mar/o26odpnet-093584.html
Unfortunately, there's no SQL92 solution on data change notification
Yes There is excellent post are here please check this out..
http://devzone.advantagedatabase.com/dz/webhelp/advantage9.1/mergedprojects/devguide/part1point5/creating_triggers_in_c_with_visual_studio_net.htm
If this post solve your question then mark as answered..
Thanks
I have a lot of data which needs to be paired based on a few simple criteria. There is a time window (both records have a DateTime column), if one record is very close in time (within 5 seconds) to another then it is a potential match, the record which is the closest in time is considered a complete match. There are other fields which help narrow this down also.
I wrote a stored procedure which does this matching on the server before returning the
full, matched dataset to a C# application. My question is, would it be better to pull in the 1 million (x2) rows and deal with them in C#, or is sql server better suited to perform this matching? If Sql server is, then what is the fastest way of pairing data using datetime fields?
Right now I select all records from Table 1/Table 2 into temporary tables, iterate through each record in Table 1, look for a match in Table 2 and store the match (if one exists) in a temporary table, then I delete both records in their own temporary tables.
I had to rush this piece for a game I'm writing, so excuse the bad (very bad) procedure... It works, it's just horribly inefficient! The whole SP is available on pastebin: http://pastebin.com/qaieDsW7
I know the SP is written poorly, so saying "hey, dumbass... write it better" doesn't help! I'm looking for help in improving it, or help/advice on how I should do the whole thing differently! I have about 3/5 days to rewrite it, I can push that deadline back a bit, but I'd rather not if you guys can help me in time! :)
Thanks!
Ultimately, compiling your your data on the database side is preferable 99% of the time, as it's designed for data crunching (through the use of indexes, relations, etc). A lot of your code can be consolidated by the use of joins to compile the data in exactly the format you need. In fact, you can bypass almost all your temp tables entirely and just fill a master Event temp table.
The general pattern is this:
INSERT INTO #Events
SELECT <all interested columns>
FROM
FireEvent
LEFT OUTER JOIN HitEvent ON <all join conditions for HitEvent>
This way you match all fire events to zero or more HitEvents. After our discussion in chat, you can even limit it to zero or one hit event by wrapping it in a subquery and using a window function for ROW_NUMBER() OVER (PARTITION BY HitEvent.EventID ORDER BY ...) AS HitRank and add a WHERE HitRank = 1 to the outer query. This is ultimately what you ended up doing and got the results you were expecting (with a bit of work and learning in the process).
If the data is already in the database, that is where you should do the work. You absolutely should learn to display and query plans using SQL Server Management Studio, and become able to notice and optimize away expensive computations like nested loops.
Your task probably does not require any use of temporary tables. Temporary tables tend to be efficient when they are relatively small and/or heavily reused, which is not your case.
I would advise you to try to optimize the stored procedure if is not running fast enough and not rewrite it in C#. Why would you want to transfer millions of rows out of SQL Server anyway?
Unfortunately I don't have an SQL Server installation so I can't test your script, but I don't see any CREATE INDEX statements in there. If you didn't just skipped them for brevity, then you should surely analyze your queries and see which indexes are needed.
So the answer depends on several factors like resources available per client/server (Ram/CPU/Concurrent Users/Concurrent processes, etc.)
Here are some basic rules that will improve your performance regardless of what you use:
Loading a million rows into c# program is not a good practice. Unless this is a stand alone process with plenty of ram.
Uniqueidentifiers will never out perform Integers. Comparisons
Common Table Expression are a good alternative for fast performing matching. How to use CTE
Finally you have to consider output. If there is constant reading and writing that affects the user interface, then you should manage that in memory (c#), otherwise all CRUD operations should be kept inside the database.
I have developed an network application that is in use in my company for last few years.
At start it was managing information about users, rights etc.
Over the time it grew with other functionality. It grew to the point that I have tables with, let's say 10-20 columns and even 20,000 - 40,000 records.
I keep hearing that Access in not good for multi-user environments.
Second thing is the fact that when I try to read some records from the table over the network, the whole table has to be pulled to the client.
It happens because there is no database engine on the server side and data filtering is done on the client side.
I would migrate this project to the SQL Server but unfortunately it cannot be done in this case.
I was wondering if there is more reliable solution for me than using Access Database and still stay with a single-file database system.
We have quite huge system using dBase IV.
As far as I know it is fully multiuser database system.
Maybe it will be good to use it instead of Access?
What makes me not sure is the fact that dBase IV is much older than Access 2000.
I am not sure if it would be a good solution.
Maybe there are some other options?
If you're having problems with your Jet/ACE back end with the number of records you mentioned, it sounds like you have schema design problems or an inefficiently-structured application.
As I said in my comment to your original question, Jet does not retrieve full tables. This is a myth propagated by people who don't have a clue what they are talking about. If you have appropriate indexes, only the index pages will be requested from the file server (and then, only those pages needed to satisfy your criteria), and then the only data pages retrieved will be those that have the records that match the criteria in your request.
So, you should look at your indexing if you're seeing full table scans.
You don't mention your user population. If it's over 25 or so, you probably would benefit from upsizing your back end, especially if you're already comfortable with SQL Server.
But the problem you described for such tiny tables indicates a design error somewhere, either in your schema or in your application.
FWIW, I've had Access apps with Jet back ends with 100s of thousands of records in multiple tables, used by a dozen simultaneous users adding and updating records, and response time retrieving individual records and small data sets was nearly instantaneous (except for a few complex operations like checking newly entered records for duplication against existing data -- that's slower because it uses lots of LIKE comparisons and evaluation of expressions for comparison). What you're experiencing, while not an Access front end, is not commensurate with my long experience with Jet databases of all sizes.
You may wish to read this informative thread about Access: Is MS Access (JET) suitable for multiuser access?
For the record this answer is copied/edited from another question I answered.
Aristo,
You CAN use Access as your centralized data store.
It is simply NOT TRUE that access will choke in multi-user scenarios--at least up to 15-20 users.
It IS true that you need a good backup strategy with the Access data file. But last I checked you need a good backup strategy with SQL Server, too. (With the very important caveat that SQL Server can do "hot" backups but not Access.)
So...you CAN use access as your data store. Then if you can get beyond the company politics controlling your network, perhaps then you could begin moving toward upfitting your current application to use SQL Server.
I recently answered another question on how to split your database into two files. Here is the link.
Creating the Front End MDE
Splitting your database file into front end : back end is sort of a key to making it more performant. (Assume, as David Fenton mentioned, that you have a reasonably good design.)
If I may mention one last thing...it is ridiculous that your company won't give you other deployment options. Surely there is someone there with some power who you can get to "imagine life without your application." I am just wondering if you have more power than you might realize.
Seth
The problems you experience with an Access Database shared amongst your users will be the same with any file based database.
A read will pull a lot of data into memory and writes are guarded with some type of file lock. Under your environment it sounds like you are going to have to make the best of what you have.
"Second thing is the fact that when I try to read some records from the table over the network, the whole table has to be pulled to the client. "
Actually no. This is a common misstatement spread by folks who do not understand the nature of how Jet, the database engine inside Access, works. Pulling down all the records, or excessive number of records, happens because you don't have all the fields used in the selection criteria or sorting in the index. We've also found that indexing yes/no aka boolean fields can also make a huge difference in some queries.
What really happens is that Jet brings down the index pages and data pages which are required. While this is a lot more data than a database engine would create this is not the entire table.
I also have clients with 600K and 800K records in various tables and performance is just fine.
We have an Access database application that is used pretty heavily. I have had 23 users on all at the same time before without any issues. As long as they don't access the same record then I don't have any problems.
I do have a couple of forms that are used and updated by several different departments. For instance I have a Quoting form that contains 13 different tabs and 10-20 fields on each tab. Users are typically in a single record for minutes editing and looking for information. To avoid any write conflicts I call the below function any time a field is changed. As long as it is not a new record being entered, then it updates.
Function funSaveTheRecord()
If ([chkNewRecord].value = False And Me.Dirty) Then
'To save the record, turn off the form's Dirty property
Me.Dirty = False
End If
End Function
They way I have everything setup is as follows:
PDC.mdb <-- Front End, Stored on the users machine. Every user has their own copy. Links to tables found in PDC_be.mdb. Contains all forms, reports, queries, macros, and modules. I created a form that I can use to toggle on/off the shift key bipass. Only I have access to it.
PDC_be.mdb <-- Back End, stored on the server. Contains all data. Only form and VBA it contains is to toggle on/off the shift key bipass. Only I have access to it.
Secured.mdw <-- Security file, stored on the server.
Then I put a shortcut on the users desktop that ties the security file to the front end and also provides their login credentials.
This database has been running without error or corruption for over 6 years.
Access is not a flat file database system! It's a relational database system.
You can't use SQL Server Express?
Otherwise, MySQL is a good database.
But if you can't install ANYTHING (you should get into those politics sooner rather than later -- or it WILL be later), just use you existing database system.
Basically with Access, it cannot handle more than 5 people connected at the same time, or it will corrupt on you.
I'm a complete novice in database/PC application sp please forgive my ignorance.
I'd like to capture packets to a database in real time so that multiple applications would have the ability to monitor physical I/O data being returned via udp packets from a PLC and I had a few questions.
In the long run it will need to be cross platform but for the time being I'm using a C# packet capture library in Windows. Any suggestions on database type MySQL vs SQlite?
At ~1500 200byte packets a sec, is it feasible to an insert a packet 1500 times a sec? I've read that SQlite has some problems with concurency, if I have an app querying the packet data in the database ~10 times a sec on a 25-50ms delay -is that doable?
I expect to "only" need to store 20MB or so of data in the DB at any one time. Can the database be forced to run in memory only? When writing the packet data, can the data packet (byte array) be written in one statement rather than iteratively inserting each byte/word? I suppose I could turn it into a string but I expect that would make it nearly impossible to query with any speed. I don't see any mention of anything like a "byte array type" in any of the databases I briefly looked at. FWIW All the data is coming up to a dedicated NIC on a static IP. The packets are sequential (I know it's not guaranteed with UDP but I've never seen one out of order yet) I could stride through the data easily if the database supported an array type. -That's good right, no random searches?
Thanks for taking the time to read this.
Bob
What is the perceived advantage you're looking for in a relational database for this? Since you say you're not much into databases, here is a brief of usual reasons why SQL is an options, perhaps it helps you clarify your requirements and your options:
Queryability. If you want to expose the data for a rich search that includes options to filter out records, to sort results, to aggregate calculations then indeed SQL databases offer such facilities. They do not come for free though. To speed up searches a database engine has to duplicate parts of the data into several indexes, which adds to the insert/update times as all those indexes have to be maintained.
Recoverability. Databases can ensure that data is kept in a consistent state in case of a crash. Using either write-ahead log or versioned updates they write changes in a fashion than ensures the client that when his statement returned back to him the changes it made are durable (I'm omitting a bunch of details for simplicity).
Consistency. By isolating changes between users until they explicitly commit a group of related operations the database exposes always a consistent state to a viewer. To achieve this a database will have to deploy either locking or versioning.
Scalability. Databases can take care of maintaining very large sets of data, much larger than a process viable address space. They'll use a buffer pool to keep hot pages cached and manage the underlying file-offset-to-memory-address mapping and also all the needed I/O to read from disk and write back changes. They will also present multiple files as an united storage area, thus surpassing OS file size limitations, if any.
Interoperability. Other processes can use standard libraries (ie. ODBC, ADO etc) and languages (SQL) to operate on the data, so there is no need to develop a custom library/access API.
Now, is any of these needed by your scenario? Is there something else I omitted? I'm asking these questions because what you want to achieve is not trivial. You can achieve 1500 inserts per second with relative ease, but is much harder to do that and offer decent read performance. Also it seems that much of what relational databases offer (consistency, recoverability, scalability) are not a goal for you. There are a number of products tuned specifically for the in-memory niche that are much faster than what you'd get from a typical disk oriented relational database.
EDIT: I forgot you're working in C#.
First of all, are you planning to query the database from more than one computer? If so, you would want to use MySQL. Otherwise, SQLite is probably a good choice. But note that MySQL is probably necessary for multiple C# apps and an in-memory database. If you choose MySQL, use MySQL Connector/NET. For SQLite, there is System.Data.SQLite (which I've used for a WinForms app and can recommend).
You say you need to do 1500 200 byte insertion statements each statement. SQLite reports that it can do 50,000 per second. The key caveat is that this refers to raw inserts, not transactions. Committing a transaction slows you down, as that usually means flushing to disk.
Both SQLite (see their In-Memory Databases) and MySQL (see their MEMORY (HEAP) Storage Engine) can use in-memory databases. However, for SQLite this may defeat your goal of letting "multiple applications" access it. With SQLite, there is a undocumented (and "not guaranteed to work in future SQLite releases") way you may be able to share in-memory databases (e.g. using shared memory). It was discussed in a prior SO question; see also the linked mail message from SQLite's main author. Note that sharing a SQLite in-memory database will probably not be possible if you stick to managed code. You can definitely have a MySQL in-memory database shared between multiple clients.
Using either C# client, you should be able to insert a whole packet in a single line with a DbParameter (i.e. SQLiteParameter or MySqlParameter). Note the Value and Size properties in particular.
I don't think you need any "array type". You can simply have a incrementing primary key (INTEGER PRIMARY KEY) column and a packet content column (BLOB or TEXT). I'm not sure which of BLOB or TEXT will give you best performance for SQLite. Your SQLite schema could look like
CREATE TABLE packets ( id INTEGER PRIMARY KEY, packet BLOB);
Then, you can easily select e.g. packets within a certain range of primary keys. Of course you could add a datetime column, but that will require indexing. For MySQL, it would be something like:
CREATE TABLE packets ( id INTEGER PRIMARY KEY, packet VARCHAR(200)) ENGINE=MEMORY;
I hope this helps. Keep in mind profiling is the best way to be sure what works well for your app.
libpcap, wireshark round robin files
Look around, play with wireshark, look at how it achieves similar results to yours.