Is there any way I can push a new record to SqlDataReader after i pull a table down? I have this piece of trash code that I need to modify, and this seems like the easiest way to do what I need. I understand that this should not be done, and if you have to do it there is something seriously wrong with your logic, but is there a way?
Easiest way from that point is to just manually create a command with a command string of an insert(parametized if not sanitized/clean data, best to do that anyways, but could make code bulkier). Code for that should be quite small, considering you already have everything else setup.
When you say "push a new record to"... do you mean you want to add a record to the results? Or do you mean you want to do an INSERT?
The INSERT cannot be done with a reader; however, you can do things with readers. Of course, it would be simpler to update the original query so that you UNION the data.
In particular: you can't create your own SqlDataReader, but you can create your own bespoke IDataReader implementation; this could wrap the SqlDataReader, simply proxying data from the inner SqlDataReader until the SqlDataReader.Read() method returns false - then you could swap to returning you own data, returning true until you have run out of data. Not trivial to implement (mainly because you need to implement a lot of methods to write your own IDataReader), but certainly not impossible.
SqlDataReaders are forward read only so I doubt you can add a record in (regardless of whether you have pulled the whole table down). In fact anything that inherits DbDataReader is forward read only.
I'm guessing you need to do some manipulation with the records. Maybe what you can do instead is use the SqlDataReader to fill a DataTable and put a new record into the DataTable. But then you'd need to change your code to juggle a DataTable.
You need to expand your question a bit,
If, for example, you need to walk through a million records and update a field on the same table while walking through the data.
You can create a second SqlConnection to your db and execute update statements on the table (prone to locking issues), or better still insert all your changes into a temp table and merge the changes back into the original table after you are done with the reader.
There is little question I am tempted to ask, can this piece of logic be replaced with a single SQL UPDATE statement?
Related
I am looking at Dapper as ORM for our next project, but something is not clear to me.
In this question there are answers on how to do inserts, updates and deletes.
Since this question is already a bit older, maybe there are better ways now a days..
But my biggest concern is how to do an ApplyUpdates on a list.
Suppose you have a List<Customer> that is build like shown here
And suppose you show this list in a DataGridView.
Now, the user will
alter the data of a few rows,
insert a few new rows,
delete a few rows
And when he clicks on the save button, at that time you want to save all these changes in this List<Customer> to your database, using Dapper.
How can I go about that ?
If I have to loop through the list and for each row call an insert, update or delete statement, then how can I determine what operation to use ? The deleted rows will be gone from the list.
I also want to make sure that if one statement fails, all will be rollbacked.
And I need the primary key for all new rows returned and filled in the DataGridView.
In other words, all that ADO DataAdapter/DataTable does for you.
What is the best way to do this using Dapper ?
EDIT
The best way I can think of now is to keep 3 list in memory and when the user alters some data, add a row in the update list, same for insert list and deleted list so I can run through these 3 list on the button click.
But I am hoping there is a better alternative build in Dapper for this kind of situation.
You need to handle this yourself, as Dapper doesn't manage it. There are several theories for how to do it.
Delete all items and then add them again.
Easy to implement.
Bad for DB performance, which is effectively 2 DB writes per row.
Loop through the items and update without checking for changes
Not too difficult to implement.
DB performance better than option 1, but not ideal.
Add and deletes are more complex to detect than updates.
Loop through the items and update only if there are differences
More difficult to implement.
Requires reading from the DB first to compare values (extra DB action)
Store changes in a separate list
Even more difficult to implement, as you need to "wrap" List updates into another class (first class collection?) and store changes
Most efficient for DB, as you execute only the minimum on each DB item.
In the end, you might select different approaches for different Entities depending on how you need to optimise. e.g. Option 1 is fine if you know you will only have a few entities and not many updates.
I have a SQLite database for which I want to populate a new field based on an existing one. I want to derive the new field value using a C# function.
In pseudocode, it would be something like:
foreach ( record in the SQLite database)
{
my_new_field[record_num] = my_C#_function(existing_field_value[record_num]);
}
Having looked at other suggestions on StackOverflow, I'm using a SqliteDataReader to read each record, and then running a SQlite "UPDATE" command based on the specific RowId to update the new field value for the same record.
It works .... but it's REALLY slow and thrashes the hard drive like crazy. Is there really no better way to do this?
Some of the databases I need to update might be millions of records.
Thanks in advance for any help.
Edit:
In response to the comment, here's some real code in a legacy language called Concordance CPL. The important point to note is that you can read and write changes to the current record in one go:
int db;
cycle(db)
{
db->FIRSTFIELD = myfunction(db->SECONDFIELD);
}
myfunction(text input)
{
text output;
/// code in here to derive output from input
return output;
}
I have a feeling there's no equivalent way to do this in SQLite as SQL is inherently transactional, whereas Concordance allowed you to traverse and update the database sequentially.
The answer to this is to wrap all of the updates into a single transaction.
There is an example here that does it for bulk inserts:
https://www.jokecamp.com/blog/make-your-sqlite-bulk-inserts-very-fast-in-c/
In my case, it would be bulk updates based on RowID wrapped into a single transaction.
It's now working, and performance is many orders of magnitude better.
EDIT: per the helpful comment above, defining a custom C# function and then reference it in a single UPDATE command also works well, and in some ways is better than the above as you don't have to loop through within C# itself. See e.g. Create/Use User-defined functions in System.Data.SQLite?
When having a query that returns multiple results, we are iterating among them using the NextResult() of the SqlDataReader. This way we are accessing results sequentially.
Is there any way to access the result in a random / non sequential way. For example jump first to the third result, then to the first e.t.c
I am searching if there is something like rdr.GetResult(1), or a workaround.
Since I was asked Why I want something like this,
First of all I have no access to the query and so I can not changes, so in my client I will have the Results in the sequence that server writes / produces them.
To process (build collections, entities --> business logic) the first I need the Information from both the second and the third one.
Again since it is not an option to modify some of the code, I can not somehow (without writing a lot of code) 'store' the connection info (eg. ids) in order to connect in a later step the two ResultSets
The most 'elegant' solution (for sure not the only one) is to process the result sets in non sequential way. So that is why I am asking if there is such a way.
Update 13/6
While Jeroen Mostert answer gives a thoughtful explanation on why, Think2ceCode1ce answer shows the right directions for a workaround. The content of the link in the comments how in additional dataset could be utilized to work in an async way. IMHO this would be the way to go if was going to write a general solution. However in my case, I based my solution in the nature of my data and the logic behind them. In short terms, (1) I read the data as they come sequentially using the SqlDataReader; (2) I store some of the data I need in a dictionary and a Collection, when I am reading the first in row but second in logic ResultSet; (3) When I am Reading the third in row, but first in logic ResultSet I am iterating in through the collection I built earlier and based on the dictionary data I am building my final result.
The final code seems more efficient and it is more maintainable than using the async DataAdapter. However this is a very specific solution based on my data.
Provides a way of reading a forward-only stream of rows from a SQL
Server database.
https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqldatareader(v=vs.110).aspx
You need to use DataAdapter for disconnected and non-sequential access. To use this you just have to change bit of your ADO.NET code.
Instead of
SqlDataReader sqlReader = sqlCmd.ExecuteReader();
You need
DataTable dt = new DataTable();
SqlDataAdapter sqlAdapter = new SqlDataAdapter(sqlCmd);
sqlAdapter.Fill(dt);
If your SQL returns multiple result sets, you would use DataSet instead of DataTable, and then access result sets like ds.Tables[index_or_name].
https://msdn.microsoft.com/en-us/library/bh8kx08z(v=vs.110).aspx
No, this is not possible. The reason why is quite elementary: if a batch returns multiple results, it must return them in order -- the statement that returns result set #2 does not run before the one that returns result set #1, nor does the client have any way of saying "please just skip that first statement entirely" (as that could have dire consequences for the batch as a whole). Indeed, there's not even any way in general to tell how many result sets a batch will produce -- all of this is done at runtime, the server doesn't know in advance what will happen.
Since there's no way, server-side, to skip or index result sets, there's no meaningful way to do it client-side either. You're free to ignore the result sets streamed back to you, but you must still process them in order before you can move on -- and once you've moved on, you can't go back.
There are two possible global workarounds:
If you process all data and cache it locally (with a DataAdapter, for example) you can go back and forth in the data as you please, but this requires keeping all data in memory.
If you enable MARS (Multiple Active Result Sets) you can execute another query even as the first one is still processing. This does require splitting up your existing single batch code into individual statements (which, if you really can't change anything about the SQL at all, is not an option), but you could go through result sets at will (without caching). It would still not be possible for you to "go back" within a single result set, though.
I have a CSV file and I have to insert it into a SQL Server database. Is there a way to speed up the LINQ inserts?
I've created a simple Repository method to save a record:
public void SaveOffer(Offer offer)
{
Offer dbOffer = this.db.Offers.SingleOrDefault (
o => o.offer_id == offer.offer_id);
// add new offer
if (dbOffer == null)
{
this.db.Offers.InsertOnSubmit(offer);
}
//update existing offer
else
{
dbOffer = offer;
}
this.db.SubmitChanges();
}
But using this method, the program is way much slower then inserting the data using ADO.net SQL inserts (new SqlConnection, new SqlCommand for select if exists, new SqlCommand for update/insert).
On 100k csv rows it takes about an hour vs 1 minute or so for the ADO.net way. For 2M csv rows it took ADO.net about 20 minutes. LINQ added about 30k of those 2M rows in 25 minutes. My database has 3 tables, linked in the dbml, but the other two tables are empty. The tests were made with all the tables empty.
P.S. I've tried to use SqlBulkCopy, but I need to do some transformations on Offer before inserting it into the db, and I think that defeats the purpose of SqlBulkCopy.
Updates/Edits:
After 18hours, the LINQ version added just ~200K rows.
I've tested the import just with LINQ inserts too, and also is really slow compared with ADO.net. I haven't seen a big difference between just inserts/submitchanges and selects/updates/inserts/submitchanges.
I still have to try batch commit, manually connecting to the db and compiled queries.
SubmitChanges does not batch changes, it does a single insert statement per object. If you want to do fast inserts, I think you need to stop using LINQ.
While SubmitChanges is executing, fire up SQL Profiler and watch the SQL being executed.
See question "Can LINQ to SQL perform batch updates and deletes? Or does it always do one row update at a time?" here: http://www.hookedonlinq.com/LINQToSQLFAQ.ashx
It links to this article: http://www.aneyfamily.com/terryandann/post/2008/04/Batch-Updates-and-Deletes-with-LINQ-to-SQL.aspx that uses extension methods to fix linq's inability to batch inserts and updates etc.
Have you tried wrapping the inserts within a transaction and/or delaying db.SubmitChanges so that you can batch several inserts?
Transactions help throughput by reducing the needs for fsync()'s, and delaying db.SubmitChanges will reduce the number of .NET<->db roundtrips.
Edit: see http://www.sidarok.com/web/blog/content/2008/05/02/10-tips-to-improve-your-linq-to-sql-application-performance.html for some more optimization principles.
Have a look at the following page for a simple walk-through of how to change your code to use a Bulk Insert instead of using LINQ's InsertOnSubmit() function.
You just need to add the (provided) BulkInsert class to your code, make a few subtle changes to your code, and you'll see a huge improvement in performance.
Mikes Knowledge Base - BulkInserts with LINQ
Good luck !
I wonder if you're suffering from an overly large set of data accumulating in the data-context, making it slow to resolve rows against the internal identity cache (which is checked once during the SingleOrDefault, and for "misses" I would expect to see a second hit when the entity is materialized).
I can't recall 100% whether the short-circuit works for SingleOrDefault (although it will in .NET 4.0).
I would try ditching the data-context (submit-changes and replace with an empty one) every n operations for some n - maybe 250 or something.
Given that you're calling SubmitChanges per isntance at the moment, you may also be wasting a lot of time checking the delta - pointless if you've only changed one row. Only call SubmitChanges in batches; not per record.
Alex gave the best answer, but I think a few things are being over looked.
One of the major bottlenecks you have here is calling SubmitChanges for each item individually. A problem I don't think most people know about is that if you haven't manually opened your DataContext's connection yourself, then the DataContext will repeatedly open and close it itself. However, if you open it yourself, and then close it yourself when you're absolutely finished, things will run a lot faster since it won't have to reconnect to the database every time. I found this out when trying to find out why DataContext.ExecuteCommand() was so unbelievably slow when executing multiple commands at once.
A few other areas where you could speed things up:
While Linq To SQL doesn't support your straight up batch processing, you should wait to call SubmitChanges() until you've analyzed everything first. You don't need to call SubmitChanges() after each InsertOnSubmit call.
If live data integrity isn't super crucial, you could retrieve a list of offer_id back from the server before you start checking to see if an offer already exists. This could significantly reduce the amount of times you're calling the server to get an existing item when it's not even there.
Why not pass an offer[] into that method, and doing all the changes in cache before submitting them to the database. Or you could use groups for submission, so you don't run out of cache. The main thing would be how long till you send over the data, the biggest time wasting is in the closing and opening of the connection.
Converting this to a compiled query is the easiest way I can think of to boost your performance here:
Change the following:
Offer dbOffer = this.db.Offers.SingleOrDefault (
o => o.offer_id == offer.offer_id);
to:
Offer dbOffer = RetrieveOffer(offer.offer_id);
private static readonly Func<DataContext, int> RetrieveOffer
{
CompiledQuery.Compile((DataContext context, int offerId) => context.Offers.SingleOrDefault(o => o.offer_id == offerid))
}
This change alone will not make it as fast as your ado.net version, but it will be a significant improvement because without the compiled query you are dynamically building the expression tree every time you run this method.
As one poster already mentioned, you must refactor your code so that submit changes is called only once if you want optimal performance.
Do you really need to check if the record exist before inserting it into the DB. I thought it looked strange as the data comes from a csv file.
P.S. I've tried to use SqlBulkCopy,
but I need to do some transformations
on Offer before inserting it into the
db, and I think that defeats the
purpose of SqlBulkCopy.
I don't think it defeat the purpose at all, why would it? Just fill a simple dataset with all the data from the csv and do a SqlBulkCopy. I did a similar thing with a collection of 30000+ rows and the import time went from minutes to seconds
I suspect it isn't the inserting or updating operations that are taking a long time, rather the code that determines if your offer already exists:
Offer dbOffer = this.db.Offers.SingleOrDefault (
o => o.offer_id == offer.offer_id);
If you look to optimise this, I think you'll be on the right track. Perhaps use the Stopwatch class to do some timing that will help to prove me right or wrong.
Usually, when not using Linq-to-Sql, you would have an insert/update procedure or sql script that would determine whether the record you pass already exists. You're doing this expensive operation in Linq, which certainly will never hope to match the speed of native sql (which is what's happening when you use a SqlCommand and select if the record exists) looking-up on a primary key.
Well you must understand linq creates code dynamically for all ADO operations that you do instead handwritten, so it will always take up more time then your manual code. Its simply an easy way to write code but if you want to talk about performance, ADO.NET code will always be faster depending upon how you write it.
I dont know if linq will try to reuse its last statement or not, if it does then seperating insert batch with update batch may improve performance little bit.
This code runs ok, and prevents large amounts of data:
if (repository2.GeoItems.GetChangeSet().Inserts.Count > 1000)
{
repository2.GeoItems.SubmitChanges();
}
Then, at the end of the bulk insertion, use this:
repository2.GeoItems.SubmitChanges();
Suppose I have an ADO.NET DataTable that I was to 'persist' by saving it to a new table in a SQL Server database - is there a fast way of doing this?
I realise I could write code generating the DDL for the 'CREATE TABLE' statement by looping through the DataColumns collection and working out the right type mappings and so on ... but I'm wondering if there is an existing method to do this, or a framework someone has written?
(NB: I need to be able to handle arbitrary columns, nothing too fancy like blobs; just common column types like strings, numbers, guids and dates. The program won't know what the columns in the DataTable are until run-time so they can't be hard-coded.)
ADO.net cannot create tables in SQL Server directly, however, SMO can do this with the .Create method of the Table class. Unfortunately, there is no built-in way to use a DataTable to define an SMO Table object.
Fortunately, Nick Tompson wrote just such a DataTable-to-SMO.Table routine back in 2006. It is posted as one of the replies to this MSDN forums topic http://social.msdn.microsoft.com/forums/en-US/adodotnetdataproviders/thread/4929a0a8-0137-45f6-86e8-d11e220048c3/ (edit: I can make hyperlinks now).
Note also, the reply post that shows how to add SQLBulkCopy to it.
If the table exists, you can use SqlBulkCopy (which will accept a DataTable) to get the data into the table in the fastest possible way (much faster than via an adapter). I don't think it will create the table though. You might have to write the DDL yourself, or find some existing code to loop over the DataTable.Columns to do it.
I think this post can help
I might be easier to store your DataTable as XML. just create a table with an XML column.