I'm importing data to a database. About 5000 rows each time. When im inserting into the DB one of the columns has a location. There are about 80 possible locations in total. I want to check each one of those and change it to a list of another 80 location names instead before I insert each row into the database. I have a switch statement helping me at the moment, but I was wondering if anyone thinks that this is a bad way to do it or whether I'm on the right track.
So basically at the moment. When I upload my data, that switch statement needs to be checked and a value changed 5000 times. Is switch the right way to go?
Don't use a switch statement, very hard to maintain. Create another table in your DB that maps your input location to the required database location and query off that instead. Makes it much easier to update/insert new locations etc, and keeps the length of your script to a sane level.
You could use a either a conversion table in your database or a Dictionary in your application instead of a switch.
seems inappropriate to convert during the import process.
I would import the data as is, then either UPDATE the table, or use a lookup table as previously suggested
Related
I'm making an API call with C#, getting back a JSON, breaking it down into nested objects, breaking each object into fields and putting the fields into an SQL Server table.
There is one field (OnlineURL) which should be unique.
What is an efficient way of achieving this goal? I currently make a database call for every nested object I pull out of the JSON and then use an if statement. But this is not efficient.
Database Layer
Creating a unique index/constraint for the OnlineURL field in the database will enforce the field being unique no matter what system/codebase references it. This will result in applications erroring on inserts of new records where the OnlineURL already exists or updating record X to an OnlineURL that is already being used by record Y.
Application Layer
What is the rule when OnlineURL already exists? Do you reject the data? Do you update the matching row? Maybe you want to leverage a stored procedure that will insert a new row based on OnlineURL or update the existing one. This will turn a 2 query process into a single query, which will have an impact on large scale inserts.
Assuming your application is serial and the only one working against the database. You could also keep a local cache of OnlineURLs for use during your loop, read in the list once from the database, check each incoming record against it and then add each new OnlineURL you insert into the list. To read in the initial list is only a single query and each comparison is done in memory.
Create an index for that field and it will be.
It is necessary to check the uniqueness and that can't be fullfilled if you don't query the data. That means you will have to check the entire data in that column. Your first option is to improve the query with an index with a fill factor of 80 so you can avoid unnecessary page splits caused by the inserts.
Another option is to use caching and depends on your setup.
You could load the entire column in memory and check for the uniqueness there. Or you could use a distributed cache like Redis. Either way analyze the complexity costs and probably you'll that the index is the most ergonomic option.
I have to parse a big XML file and import (insert/update) its data into various tables with foreign key constraints.
So my first thought was: I create a list of SQL insert/update statements and execute them all at once by using SqlCommand.ExecuteNonQuery().
Another method I found was shown by AMissico: Method
where I would execute the sql commands one by one. No one complained, so I think its also a viable practice.
Then I found out about SqlBulkCopy, but it seems that I would have to create a DataTable with the data I want to upload. So, SqlBulkCopy for every table. For this I could create a DataSet.
I think every option supports SqlTransaction. It's approximately 100 - 20000 records per table.
Which option would you prefer and why?
You say that the XML is already in the database. First, decide whether you want to process it in C# or in T-SQL.
C#: You'll have to send all data back and forth once, but C# is a far better language for complex logic. Depending on what you do it can be orders of magnitude faster.
T-SQL: No need to copy data to the client but you have to live with the capabilities and perf profile of T-SQL.
Depending on your case one might be far faster than the other (not clear which one).
If you want to compute in C#, use a single streaming SELECT to read the data and a single SqlBulkCopy to write it. If your writes are not insert-only, write to a temp table and execute as few DML statements as possible to update the target table(s) (maybe a single MERGE).
If you want to stay in T-SQL minimize the number of statements executed. Use set-based logic.
All of this is simplified/shortened. I left out many considerations because they would be too long for a Stack Overflow answer. Be aware that the best strategy depends on many factors. You can ask follow-up questions in the comments.
Don't do it from C# unless you have to, it's a huge overhead and SQL can do it so much faster and better by itself
Insert to table from XML file using INSERT INTO SELECT
I am trying to write up an SSIS package which would migrate queried data from MySQL server to SQL Server. I would need to modify a particular column say "stream" (DT_I4) values (1 would be 2 , 2 would become 4, etc just some random 4 integer replacements) and then check another column value(emp_id) if it exists in SQL Server before inserting. if it exists, do not insert and if it does not, then we write these values.
I am a SSIS newbie, so far I have been able to add both ADO.NET source and ADO.NET destination. I need help with the following
Should I use a derived column or script component to convert the values
How do i check if emp-id exists in SQL Server
How do I map the errors?
What is the best practice to implement the above situation, thanks for reading and for your help.
Generally speaking, it is better to use the stock components to accomplish a task than to write a custom script. Performance and maintenance are two big reasons for that advice. Also, don't try to do too many things in a single transformation. The pipeline can really take advantage of parallelization if you let it.
1) Specifically speaking, perhaps I didn't understand where the conversion needs to happen in your problem description but I would start with neither a Derived Column Transformation nor a Script Component. Instead, for a straight type conversion I'd use Data Conversion Transformation.
Rereading it, perhaps you are attempting a value conversion. Depending on the complexity, it could be accomplished with a derived column or two and worst case, drop to a script task but, even better-does the data need to come over with the unmapped value? Toss a CASE statement in your source query and skip the SSIS complexity with mapping value A to value B.
2) The Lookup Transformation will help you in this department. It is important to note that failure to find a value would result in the package failing in 2005. 2008+ the option for handling not-found rows is more readily available. There is an output path "Redirect Rows to No Match Output" and this is the path you will want to use as you only want the rows that don't already exist. As a general guideline on a Lookup, only pull back the columns of interest as the package will cache that lookup locally. That does not go well on server memory when it's a hundreds of millions of rows and 80+ columns wide.
3) What errors? Conversion errors? Lookup errors? Some-other-error-not-defined? In general, you'll probably want to read about Integration Services Paths. Everything in a data flow has an Error path leading out of it. Most everything has 1+ non-error paths leading out. In cases where there are multiple non-error paths available, when you connect them to the next component, BIDS will ask which output you are intending to use.
4) Knowing the extremely general problem defined, your package may look something like
Refine your question if that doesn't address the specifics.
Is there a way to bulk upload data into a SQL table without using the INSERT function for each row? I'm using mySQL.
I currently have a program that constaly pulls live data off the internet, and stores in a database. When doing this, it erases a table and then puts the new data in. When it does this though, it does it a row at a time. This would be ok, execpt that other programs pull that data at asyncronous times, and as such there is not guarantee that it picks up the complete table. Sometimes it will pull 10 rows, 20, etc. If there is a way I could insert all rows at once, such that other programs will pull 0 (just after the table is erased) or all rows that would be awesome.
Thanks, and any thoughts much appreciated!
If your mysql table type is InnoDB just use a transaction. Then all the updates will be visible at the same time when you commit.
You can use LOAD DATA INFILE to quickly load the table from a file.
If you are concerned about clients getting a partial view of the data, the easiest way to avoid that would be to do your load into a temp table, then do some table renaming to make the change appear to be instantaneous.
Continuing from Eric's alternate solution: An other way to do it, instead of a temp table is to have effective dates. The SQL to access the table would change such that it would select the most recent date for the pieces of data. If it's half-way through an update, the select should grab half new data and half old data. Once the new data has arrived, you can feel free to delete the old data.
I have a list of strings,the list is constant and won't be changed, there are 12 strings.
Inside database table I have a column with an index to the string.
I don't think it's wise to hold a separate table to hold those strings because they never get changed neither to save the string it self inside this column.
So the only option is to hold the list in some other type.
What about holding the strings in XML file and using Linq-to-Xml to load them into dictionary.
If so is thi better, performance wise, then using datatable?
Those strings will most likely get cached by your SQL server and apply almost no performance hit. But will give you a flexibility in case you have multiple applications sharing same database. Overall keep those in the database unless you have/expect millions database hits.
I agree with Zepplock, keep the strings in the database. You won´t have to worry about performance. One of the big reason is also that if you do so, it will easier for future developers to find the strings and understand their function within the application if you store them in the database in their proper context.
It sounds as if you're describing a table holding product catalog data. Suggest keeping those values in their own rows, and not stored as an XML datatype or in XML in a varchar column.
It sounds as if the data is static today, and rarely if ever, is changed. By storing in XML, you lose the potential future advantage of the relational nature of the database.
Suggest keeping them in a table. As you say, it's only 12 strings/products, and the performance hit will be zero.