Can I execute IF statements from an external source? - c#

I have a data table, and each row contains a set of about 77 values, these values I need to run through about 1200 IF statements, but the IF statements (rules) may vary.
My original idea was to have a SQL table, with columns memberName, operator, target, trueValue, falseValue. Read the data from the table and in run time create expressions and run the data through that. And the idea worked well, the IF statements just seems to be more complicated than just "IF age > 30".
Read more at: How can I calculate an expression parameter in run time?
So I thought, IF I can have an external file, that can be switched after the application has been installed, send the data table to the method(?) in the file and the file returns a single result value. That would also work perfect.
Can I do this with a .dll file maybe? Or something similar.
So to summarize, I want to run my data table values, through IF statements that I get from an external source because the IF statements might change from time to time.

Related

Optimal postgreSQL read-modify-write rows access

Trying to modify some fields in all table records, using Npgsql data Provider for PostgreSQL.
Each record needs:
to be read,
some fields needs to be modified by a C# procedure
and write back to table
Is there an object or mechanism that allow to point to each record to do this without multiple queries to perform the C# procedure call between the reading and writing of each record?
If you're looking for a way to update a value via an open cursor, to avoid an additional UPDATE, then that doesn't exist in PostgreSQL. On the other hand, I'm pretty sure (but not 100%) that on other databases it doesn't actually improve perf either, i.e. that an additional roundtrip for each update is required anyway. In other words, "updating a cursor" for results from a SELECT is probably API sugar rather than an actual optimization.
The most efficient way to accomplish this with Npgsql is probably to do a SELECT, buffer results in memory, iterate them to calculate the new values, and then issue a prepared batched update that updates the rows (i.e. a single command with several UPDATE ...; UPDATE ... statements). If the amount of rows is too large, this can be split into several batches, i.e. "load x rows, calculate, update those x rows; load next x rows...". You can use PostgreSQL's cursor functionality to each time load the next X rows, or simple issue new SELECTs and use LIMIT/OFFSET for paging (likely to have similar performance).

Huge oracle insert statement gets stuck

I'm trying to add data from multiple XML files into my Oracle database. I'm receiving the data in my .NET C# application and I'm using Dapper as ORM.
Currently there are 13 XML files which need to be added into the database. I deserialized all of them and turned them into an INSERT query to add to my database.
Everything seems to be working fine, except after 31 minutes of trying to insert the data in the database, I noticed the following error:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout
expired
After some searching I found this solution: https://stackoverflow.com/a/11895078/2664414 but this seems way to hacky for something this "simple".
As for the details of the INSERT statement, I'm adding 3271 rows for the second INSERT. Here is a short snippet:
INSERT ALL
INTO S1589_XML_LineSection (ID, VALID_FROM_DATE, VALID_TO_DATE, LINE_ID, CHANNEL_NO, TENSION, NATURE, P70, C70, P400, C400, OPERATIONAL_IMPORTANCE, TECHNICAL_EQUIPMENT, OPERATIONALISATION, STANDARD_SPEED_DIR_1, STANDARD_SPEED_DIR_2, LAST_UPDATE_DATE) VALUES ('1','1996-06-02','2005-12-10','404','0','1','1','0','0','330','330','9','9','1','0','0','2010-06-04T04:54:51')
INTO S1589_XML_LineSection (ID, VALID_FROM_DATE, VALID_TO_DATE, LINE_ID, CHANNEL_NO, TENSION, NATURE, P70, C70, P400, C400, OPERATIONAL_IMPORTANCE, TECHNICAL_EQUIPMENT, OPERATIONALISATION, STANDARD_SPEED_DIR_1, STANDARD_SPEED_DIR_2, LAST_UPDATE_DATE) VALUES ('1','2005-12-11','2007-12-08','404','0','1','1','0','0','330','330','9','9','1','60','60','2010-06-04T04:54:51')
SELECT 1 FROM dual
I can perfectly add these lines individually, but when I try to add them all at once, it takes way too long.
So, tips on either improving the query, or making sure I don't get the error (ORA-00054)?

How to query an SQLite db in batches

I am using C# with .NET 4.5. I am making a scraper which collects specific data. Each time a value is scraped, I need to make sure it hasn't already been added to the SQLite db.
To do this, I am making a call each time a value is scraped to query against the db to check if it contains the value, and if not, I make another call to insert the value into the db.
Since I am scraping multiple values per second, this gets to be very IO-intensive, with constant calls to the db.
My question is, is there any better way to do this? Perhaps I could queue the values scraped and then run a batch query at once? Is that possible?
I see three approaches:
Use INSERT OR IGNORE, which will reject an entry if it is already present (based on primary key and unique fields). Or plainly INSERT (or its equivalent (INSERT or ABORT) which will return SQLITE_CONSTRAINT, a value you will have to catch and manage if you want to count failed insertions.
Accumulate, outside the database, the updates you want to make. When you have accumulated enough/all, start a transaction (BEGIN;), do your insertions (you can use INSERT OR IGNORE here as well), commit the transaction (COMMIT;)
You could pre-fetch a list of items you already have, depending, and check against that list, if your data model allows it.

SSIS Task for inconsistent column count import?

Problem.
I regularly receive a feed files from different suppliers. Although the column names are consistent the problem comes when some suppliers send text files with more or less columns in there feed file.
Furthermore the arrangement of these files are inconsistent.
Other than the Dynamic data flow task provided by Cozy Roc is there another way I could import these files. I am not a C# guru but i am driven torwards using a "Script Task" control flow or "Script Component" Data flow task.
Any suggestion, samples or direction will greatly be appreciated.
http://www.cozyroc.com/ssis/data-flow-task
Some forums
http://www.sqlservercentral.com/Forums/Topic525799-148-1.aspx#bm526400
http://www.bidn.com/forums/microsoft-business-intelligence/integration-services/26/dynamic-data-flow
Off the top of my head, I have a 50% solution for you.
The problem
SSIS really cares about meta data so variations in it tend to result in exceptions. DTS was far more forgiving in this sense. That strong need for consistent meta data makes use of the Flat File Source troublesome.
Query based solution
If the problem is the component, let's not use it. What I like about this approach is that conceptually, it's the same as querying a table-the order of columns does not matter nor does the presence of extra columns matter.
Variables
I created 3 variables, all of type string: CurrentFileName, InputFolder and Query.
InputFolder is hard wired to the source folder. In my example, it's C:\ssisdata\Kipreal
CurrentFileName is the name of a file. During design time, it was input5columns.csv but that will change at run time.
Query is an expression "SELECT col1, col2, col3, col4, col5 FROM " + #[User::CurrentFilename]
Connection manager
Set up a connection to the input file using the JET OLEDB driver. After creating it as described in the linked article, I renamed it to FileOLEDB and set an expression on the ConnectionManager of "Data Source=" + #[User::InputFolder] + ";Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=\"text;HDR=Yes;FMT=CSVDelimited;\";"
Control Flow
My Control Flow looks like a Data flow task nested in a Foreach file enumerator
Foreach File Enumerator
My Foreach File enumerator is configured to operate on files. I put an expression on the Directory for #[User::InputFolder] Notice that at this point, if the value of that folder needs to change, it'll correctly be updated in both the Connection Manager and the file enumerator. In "Retrieve file name", instead of the default "Fully Qualified", choose "Name and Extension"
In the Variable Mappings tab, assign the value to our #[User::CurrentFileName] variable
At this point, each iteration of the loop will change the value of the #[User::Query to reflect the current file name.
Data Flow
This is actually the easiest piece. Use an OLE DB source and wire it as indicated.
Use the FileOLEDB connection manager and change the Data Access mode to "SQL Command from variable." Use the #[User::Query] variable in there, click OK and you're ready to work.
Sample data
I created two sample files input5columns.csv and input7columns.csv All of the columns of 5 are in 7 but 7 has them in a different order (col2 is ordinal position 2 and 6). I negated all the values in 7 to make it readily apparent which file is being operated on.
col1,col3,col2,col5,col4
1,3,2,5,4
1111,3333,2222,5555,4444
11,33,22,55,44
111,333,222,555,444
and
col1,col3,col7,col5,col4,col6,col2
-1111,-3333,-7777,-5555,-4444,-6666,-2222
-111,-333,-777,-555,-444,-666,-222
-1,-3,-7,-5,-4,-6,-2
-11,-33,-77,-55,-44,-666,-222
Running the package results in these two screen shots
What's missing
I don't know of a way to tell the query based approach that it's OK if a column doesn't exist. If there's a unique key, I suppose you could define your query to have only the columns that must be there and then perform lookups against the file to try and obtain the columns that ought to be there and not fail the lookup if the column doesn't exist. Pretty kludgey though.
Our solution. We use parent child packages. In the parent pacakge we take the individual client files and transform them to our standard format files then call the child package to process the standard import using the file we created. This only works if the client is consistent in what they send though, if they try to change their format from what they agreed to send us, we return the file.

Minimise database updates from changes in DataTable/SqlDataAdapter

My goal is to maximise performance. The basics of the scenario are:
I read some data from SQL Server 2005 into a DataTable (1000 records x 10 columns)
I do some processing in .NET of the data, all records have at least 1 field changed in the DataTable, but potentially all 10 fields could be changed
I also add some new records in to the DataTable
I do a SqlDataAdapter.Update(myDataTable.GetChanges()) to persist the updates (an inserts) back to the db using a InsertCommand and UpdateCommand I defined at the start
Assume table being updated contains 10s of millions of records
This is fine. However, if a row has changed in the DataTable then ALL columns for that record are updated in the database even if only 1 out of 9 columns has actually changed value. This means unnecessary work, particularly if indexes are involved. I don't believe SQL Server optimises this scenario?
I think, if I was able to only update the columns that had actually changed for any given record, that I should see a noticeable performance improvement (esp. as cumulatively I will be dealing with millions of rows).
I found this article: http://netcode.ru/dotnet/?lang=&katID=30&skatID=253&artID=6635
But don't like the idea of doing multiple UPDATEs within the sproc.
Short of creating individual UPDATE statements for each changed DataRow and then firing them in somehow in a batch, I'm looking for other people's experiences/suggestions.
(Please assume I can't use triggers)
Thanks in advance
Edit: Any way to get SqlDataAdapter to send UPDATE statements specific to each changed DataRow (only to update the actual changed columns in that row) rather than giving a general .UpdateCommand that updates all columns?
Isn't it possible to implement your own IDataAdapter where you implement this functionality ?
Offcourse, the DataAdapter only fires the correct SqlCommand, which is determined by the RowState of each DataRow.
So, this means that you would have to generate the SQL command that has to be executed for each situation ...
But, I wonder if it is worth the effort. How much performance will you gain ?
I think that - if it is really necessary - I would disable all my indexes and constraints, do the update using the regular SqlDataAdapter, and afterwards enable the indexes and constraints.
you might try is do create an XML of your changed dataset, pass it as a parameter ot a sproc and the do a single update by using sql nodes() function to translate the xml into a tabular form.
you should never try to update a clustered index. if you do it's time to rethink your db schema.
I would VERY much suggest that you do this with a stored procedure.
Lets say that you have 10 million records you have to update. And lets say that each record has 100 bytes (for 10 columns this could be too small, but lets be conservative). This amounts to cca 100 MB of data that must be transferred from database (network traffic), stored in memory and than returned to database in form of UPDATE or INSERT that are much more verbose for transfer to database.
I expect that SP would perform much better.
Than again you could divide you work into smaller SP (that are called from main SP) that would update just the necessary fields and that way gain additional performance.
Disabling indexes/constraints is also an option.
EDIT:
Another thing you must consider is potential number of different update statements. In case of 10 fields per row any field could stay the same or change. So if you construct your UPDATE statement to reflect this you could potentially get 10^2 = 1024 different UPDATE statements and any of those must be parsed by SQL Server, execution plan calculated and parsed statement stored in some area. There is a price to do this.

Categories

Resources