I created a couple of tables procedurally via C# named something like [MyTableOneCustom0] and [MyTableTwoCustom0]. When I try to return all of the values from these tables via "Open Table" in MSSQL Server Management Studio, I receive the following error:
Error Source:
Microsoft.VisualStudio.DataTools
Error Message: Exception has been
thrown by the target of an invocation.
However, I can still bring up all of the data via a SELECT * statement.
Does anyone know what is causing this?
Based on a similar post loacated at at Egg Head Cafe, it looks like the Management Studio will thrown an exception if there are too many columns included explicitly in the query. Select * returns them implicitly, so there doesn't seem to be an issue.
I have over 800 columns in this table, so I'm sure this is the problem.
I hesitate to ask, but normally you would not want 800 or columns in a database, so why did you do this? Given how databases store information you are possibly creating many problems for yourself with a design like that in terms of data retrieval and storage. How many bytes of data woudl a full row have? You know there is a limit to the number of bytes of data that can be stored in a row. You could be setting yourself up for issues entering data when a row exceeds those limits. It might be best to break into separate tables even if there is a one-to-one relationship. Read in BOL about data pages and how data is stored to understand why this concerns me.
Related
Let's say I executed an SQL statement from a C# application that caused an SqlException with the following error message to be thrown:
Violation of UNIQUE KEY constraint 'MyUniqueConstraint'. Cannot insert duplicate key in object 'MyTable'. The duplicate key value is
(MyValue).
I understand that the entire string comes from the database. I would like to extract the relevant parts from the message - similar to this question, but I want to actually extract data such as MyUniqueConstraint, MyTable and MyValue, not just the SQL error number.
Actually parsing the message is not an option as it is error-prone: you have to do it for every possible error, and if the error text changes from one version of SQL server to another, there would be serious problems.
Is it possible to reasonably obtain such structured information from the application when a database-level error occurs (ideally from the exception)?
Not long ago I faced a similar problem, and I found out that even filtering SqlExceptions based on their numbers is problematic.
Other than that, I haven't yet encountered a solution that could get data details from SqlException. That is, other than parsing the message. One of the problems here will be that text of server message is different for different languages and server versions
Actually I recall there is a system table/view in the mssql master database that contains all error messages for different languages. You can try parsing the messages based on that, but some SqlExceptions aren't even from the server side of things...
I want to say at least one remotely helpful thing here, so:
In the namespace System.Data there is DBConcurrencyException which seems to be usefull to an extent (meaning: in a single case that is not entirely related to this question). It has Row property (row that caused the exception to be thrown, you can use it to get the table name and the data).
Wishfull thinking.
I understand that the entire string comes from the database
Yes, it does. It is a string.
Actually parsing the message is not an option as it is error-prone
Parsing is the ONLY option as there are only 2 items provided by the database: The sql error number and the string. So, either you come up with a generic crystal ball or you work with what you have - which means parsing.
No database that I Know of provides more information mysteriously hidden in the exception - and if it would do so, it would not be a standard approach across databases anyway.
i have a fully working production site based on entity framework and now i need to import a large amount of data weekly into the database.
the data comes in the form of text files which i go through line by line, check against the database to see if it exists and if it does update anything that has changed or just insert it if not.
the problem im having is that it takes around 32 hours to run the full import process and some of the files have to be manually split into smaller chunks to avoid memory issues seemingly caused by entity framework. i have managed to slow down the memory increase but the last time i ran a file without splitting it, it ran for about 12 hours before running out of memory at somewhere over 1.5gb.
so can someone suggest to me the best way of importing this data, i have heard of sqlbulkcopy but wasnt sure if it was the correct thing to use. can anyone provide any examples? or suggest anything more appropriate. for instance, should i create a duplicate of the entity using standard .net sql commands and possibly use a stored procedure
Although SqlBulkCopy is handy from managed code,I reckon the fastest way is to do it is in "pure" sql -- given that SqlBulkCopy doesn't easily do upserts, you would need to execute the MERGE part below anyway
Assuming that your text file is in csv format, and it exists on the SQL Server as "C:\Data\TheFile.txt", and that line endings are normalised as CR-LF (\r\n)
And let's assume that the data is ID,Value1,Value2
this SQL command will insert into a staging table TheFile_Staging which has ID,Value,Value2 columns with compatible data types, and then update the "real" table TheFile_Table (note: code below not tested!)
truncate table TheFile_Staging
BULK INSERT TheFile_Staging FROM'C:\Data\TheFile.txt'
WITH (fieldterminator=',', rowTerminator='\r\n',FirstRow=2)
//FirstRow=2 means skip Row#1 - use this when 1st row is a header.
MERGE TheFile_Table as target
USING (SELECT ID,Value1,Value2 from TheFile_Staging) as source
on target.ID = source.ID
WHEN MATCHED THEN
UPDATE SET target.Value1=source.Value1, target.Value2=source.target2
WHEN NOT MATCHED THEN
INSERT (id,Value1,Value2) VALUES (source.Id,source.Value1,source.Value2);
You can create a stored procedure and set it to run or invoke from code, etc. The only problem with this approach is error handling bulk insert is a bit of a mess - but as long as your data coming in is ok then it's as quite fast.
Normally I'd add some kind of validation check in the WHERE clause us the USING() select of the MERGE to only take the rows that are valid in terms of data.
It's probably also worth pointing out that the definition of the staging table should omit any non-null, primary key and identity constraints, in order that the data can be read in without error esp. if there are empty fields here and there in your source data; and I also normally prefer to pull in date/time data as a plain nvarchar - this way you avoid incorrectly formatted dates causing import errors and your MERGE statement can perform a CAST or CONVERT as needed whilst at the same time ignoring and/or logging to an error table any invalid data it comes across.
Sadly you need to move away from Entity Framework in this kind of scenario; out of the box EF only does line-by-line inserts. You can do interesting things like this, or you can completely disregard EF and manually code the class that will do the bulk inserts using ADO.Net (SqlBulkCopy).
Edit: you can also keep with the current approach if the performance is acceptable, but you will need to recreate the context periodically, not use the same context for all records. I suspect that's the reason for the outrageous memory consumption.
EDIT: Solution (kind of)
So, what I did had very little in common with what I originally wanted to do, but my application now works much faster (DataSets that took upward of 15 minutes to process now go through in 30-40 seconds tops). Here's roughly what I did:
- Read spreadsheet & populate DataTable/DataSet normally
- [HACK WARNING] Instead of using UpdateDataSet, I generate my own SQL queries, mostly by having a skeleton string for each type of update (e.g. String skeleton = "UPDATE ... SET ... WHERE ..."). I then consult the template database and replace the placeholder ... with the appropriate entries.
- [MORE HACK WARNING] The way I dealt with errors was by manually checking whether those errors will occur. So if I know I am about to do an insert, I'll run an error-checking command before the actual insert; what the error checker will do is construct a JOIN statement, checking whether any of the entries in the user's DataSet already exist in the database. Just by executing the JOIN command, I get back a DataSet with the results, so I know that if there is anything there, it's the errors. Then I can proceed to print them.
If anyone needs more details, I'll be happy to provide them. It's a fairly specific question, so I should probably keep this outline fairly high level.
Original Question
For (good) reasons outside of my control, I need to use the Database.UpdateDataSet() method from Microsoft's Enterprise Library. The way my project will work, I am letting the user make changes to the database (multiple database, multiple schemas, multiple tables, but always only one at a time) by uploading Excel spreadsheets to a web application. The spreadsheets follow a design/template specified by me (usually). I am a state where I read the spreadsheet, turn it into a DataTable/DataSet, and use (dynamically generated) prepared statements to make the appropriate changes to the database. Here's the problem:
Each spreadsheet only allows for one type of change (insert/update/delete). I want to make it so if the user uploads an insert spreadsheet, but several (let's say 10) of the entries are already in the database, I not only return with an error, but also tell them which entries (DataRows) violated the primary key constraint.
The idea solution would be get a DataSet with the list of errors back, but I don't see how I can do that. Perhaps there is a way to construct the prepared statements in such a way that if a DataRow is to be inserted (following the example from above), it proceeds normally; however if it attempts to update or delete, it skips it and adds it to an error collection of some sort?
Note that I am trying to avoid using stored procedures. Since the number of different templates will grow extremely quickly after deployment, it is important that I stay away from manually written code and close to database-driven model as much as possible.
I am trying to write up an SSIS package which would migrate queried data from MySQL server to SQL Server. I would need to modify a particular column say "stream" (DT_I4) values (1 would be 2 , 2 would become 4, etc just some random 4 integer replacements) and then check another column value(emp_id) if it exists in SQL Server before inserting. if it exists, do not insert and if it does not, then we write these values.
I am a SSIS newbie, so far I have been able to add both ADO.NET source and ADO.NET destination. I need help with the following
Should I use a derived column or script component to convert the values
How do i check if emp-id exists in SQL Server
How do I map the errors?
What is the best practice to implement the above situation, thanks for reading and for your help.
Generally speaking, it is better to use the stock components to accomplish a task than to write a custom script. Performance and maintenance are two big reasons for that advice. Also, don't try to do too many things in a single transformation. The pipeline can really take advantage of parallelization if you let it.
1) Specifically speaking, perhaps I didn't understand where the conversion needs to happen in your problem description but I would start with neither a Derived Column Transformation nor a Script Component. Instead, for a straight type conversion I'd use Data Conversion Transformation.
Rereading it, perhaps you are attempting a value conversion. Depending on the complexity, it could be accomplished with a derived column or two and worst case, drop to a script task but, even better-does the data need to come over with the unmapped value? Toss a CASE statement in your source query and skip the SSIS complexity with mapping value A to value B.
2) The Lookup Transformation will help you in this department. It is important to note that failure to find a value would result in the package failing in 2005. 2008+ the option for handling not-found rows is more readily available. There is an output path "Redirect Rows to No Match Output" and this is the path you will want to use as you only want the rows that don't already exist. As a general guideline on a Lookup, only pull back the columns of interest as the package will cache that lookup locally. That does not go well on server memory when it's a hundreds of millions of rows and 80+ columns wide.
3) What errors? Conversion errors? Lookup errors? Some-other-error-not-defined? In general, you'll probably want to read about Integration Services Paths. Everything in a data flow has an Error path leading out of it. Most everything has 1+ non-error paths leading out. In cases where there are multiple non-error paths available, when you connect them to the next component, BIDS will ask which output you are intending to use.
4) Knowing the extremely general problem defined, your package may look something like
Refine your question if that doesn't address the specifics.
I have two databases, one is an MS Access file, the other is a SQL Server database. I need to create a SELECT command that filters data from the SQL Server database based on the data in the Access database. What is the best way to accomplish this with ADO.NET?
Can I pull the required data from each database into two new tables. Put these in a single Dataset. Then perform another SELECT command on the Dataset to combine the data?
Additional Information:
The Access database is not permanent. The Access file to use is set at runtime by the user.
Here's a bit of background information to explain why there are two databases. My company uses a CAD program to design buildings. The program stores materials used in the CAD model in an Access database. There is one file for each model. I am writing a program that will generate costing information for each model. This is based on current material prices stored in a SQL Server database.
My Solution
I ended up just importing the data in the access db into a temporary table in the SQL server db. Performing all the necessary processing then removing the temporary table. It wasn't a pretty solution but it worked.
You don't want to pull both datasets across if you don't have to do that. You are also going to have trouble implementing Tomalak's solution since the file location may change and might not even be readily available to the server itself.
My guess is that your users set up an Access database with the people/products or whatever that they are interested in working with and that's why you need to select across the two databases. If that's the case, the Access table is probably smaller than the SQL Server table(s). Your best bet is to pull in the Access data, then use that to generate a filtered query to SQL Server so that you can minimize the data that is sent over the network.
So, the most important things are:
Filter the data ON THE SERVER so that you can minimize network traffic and also because the database is going to be faster at filtering than ADO.NET
If you have to choose a dataset to pull into your application, pull in the smaller dataset and then use that to filter the other table.
Assuming Sql Server can get to the Access databases, you could construct an OPENROWSET query across them.
SELECT a.*
FROM SqlTable
JOIN OPENROWSET(
'Microsoft.Jet.OLEDB.4.0',
'C:\Program Files\Microsoft Office\OFFICE11\SAMPLES\Northwind.mdb';'admin';'',
Orders
) as b ON
a.Id = b.Id
You would just change the path to the Access database at runtime to get to different MDBs.
First you need to do something on the server - reference the Access DB as a "Linked Server".
Then you will be able to query it from within the SQL server, pulling out or stuffing in data however you like. This web page gives a nice overview on how to do it.
http://blogs.meetandplay.com/WTilton/archive/2005/04/22/318.aspx
If I read the question correctly, you are NOT attempting to cross reference across multiple databases.
You need merely to reference details about a particular FILE, which in this case, could contain:
primary key, parent file checksum (if it is a modification), file checksum, last known author, revision number, date of last change...
And then that primary key when adding information obtained from analysing that file using your program.
If you actually do need a distributed database, perhaps you would prefer to use a non-relational database such as LDAP.
If you can't use LDAP, but must use a relational database, you might consider using GUID's to ensure that your primary keys are good.
Since you don't give enough information, i'm going to have to make some assumptions.
Assuming:
The SQL Server and the Access Database are not on the same computer
The SQL Server cannot see the Access database over a file share or it would be too difficult to achieve this.
You don't need to do joins between the access database and the sql server, only use data from teh access database as lookup elements of your where clause
If the above assumptions are correct, then you can simply use ADO to open the Access database and retrieve the data you need, possibly in a dataset or datatable. Then extract the data you need and feed it to a different ADO query to your SQL Server in a dynamic Where clause, prepared statement, or via parameters to a stored procedure.
The other solutions people are giving all assume you need to do joins on your data or otherwise execute SQL which includes both databases. To do that, you have to use linked databases, or else import the data into a table (perhaps temporary).
Have you tried benchmarking what happens if you link from the Access front end to your SQL Server via ODBC and write your SQL as though both tables are local? You could then do a trace on the server to see exactly what Jet sends to the server. You might be surprised as to how efficient Jet is with this kind of thing. If you're linking on a key field (e.g., and ID field, whether from the SQL Server or not), it would likely be the case that Jet would send a list of of the IDs. Or you could write your SQL to do it that way (using IN SELECT ... in your WHERE clause).
Basically, how efficient things will be depends on where your WHERE clause is going to be executed. If, for instance, you are joining a local Jet table with a linked SQL Server table on a single field, and filtering the results based on values in the local table, it's very likely to be extremely efficient, in that the only thing Jet will send to the server is whatever is necessary to filter the SQL Server table.
Again, though, it's going to depend entirely on exactly what you're trying to do (i.e., which fields you're filtering on). But give Jet a chance to see if it is smart, as opposed to assuming off the bat that Jet will screw it up. It may very well require some tweaking to get Jet to work efficiently, but if you can keep all your logic client-side, you're better off than trying to muck around with tracking all the Access databases from the server.