Combining multiple data readers into one - c#

I have some code (C#/ADO.NET) where I get 2 or more readers (IDataReader instances) - each of which could be a reader on multiple datasets meant to be enumerated through the NextResult API.
My task is to combine these into a single reader and return it to my caller, so that they can enumerate all the different results through this single reader - calling NextResult as necessary.
(Note that each of these datasets have different kinds of data.).
Seems like a valid use case. There should be some way to do this?

Just for the fun of it I tried creating a class, below. It would definitely be a hassle to test.
Before explaining why it would be a pain, I'll offer you an excuse for saving yourself the trouble. If your class is creating IDataReaders, there's likely not a good reason to pass them to the caller. You can just read the data from them and pass that to the caller. Is there any good reason why the callers need readers and not just the actual data? Opening a datareader and closing it is something you want to accomplish without getting too many hands in the process, so if you can open it, get what you need, and then close it, that's ideal.
When we advance from one result set to the next within an IDataReader it's usually in the context of making a single call, so it's easier to follow what the result sets are. We just called procedure XYZ, it returns two result sets, so we have to check for both result sets. I wouldn't want to deal with an IDataReader that had lots and lots of result sets, especially when it's a bunch of smaller ones artificially combined. You'd have to keep track of a lot of result sets and switch between different methods for reading them since they contain different columns.
And there's also the issue of those open connections. We usually close a connection when we're done with a reader. But now it's less clear when that connection would get closed. How do you know which connection belongs to which reader? What if you close the connection for a reader while a different reader is still using it?
So here's a rough idea of what it might look like. I obviously didn't test this. You would have to keep track of which one is current and handle advancing to the next reader if NextResult is called and there isn't a next result within the current reader. And you'd have to close the readers and make sure they all get disposed. This could be tested, but just the testing would be a headache, and that's often a good warning not to do something.
public class AggregateDataReader : IDataReader
{
private readonly Queue<IDataReader> _readers;
private IDataReader _current;
public AggregateDataReader(IEnumerable<IDataReader> readers)
{
_readers = new Queue<IDataReader>(readers);
}
private bool AdvanceToNextReader()
{
_current?.Dispose();
var moreReaders = _readers.Any();
if (moreReaders) _current = _readers.Dequeue();
return moreReaders;
}
public bool NextResult()
{
if (_current == null) return false;
if (_current.NextResult()) return true;
return AdvanceToNextReader();
}
public bool Read()
{
return _current.Read();
}
public void Dispose()
{
_current?.Dispose();
while (_readers.Any()) _readers.Dequeue().Dispose();
}
public string GetName(int i)
{
return _current.GetName(i);
}
... lots of these...
public byte GetByte(int i)
{
return _current.GetByte(i);
}
public long GetBytes(int i, long fieldOffset, byte[] buffer, int bufferoffset, int length)
{
return _currentGetBytes(i, fieldOffset, buffer, bufferoffset, length);
}
... etc...
public void Close()
{
_current?.Close();
while (_readers.Any()) _readers.Dequeue().Close();
}
... etc...
}

Related

Lock on Static List or access by Key

Please give expert opinion, refer to below static sorted list based on key value pair.
Method1 for close connection uses approach of accessing sorted list using key.
Method2 for close connection uses lock statement on the Sorted List and access it by index.
Please guide which approach is better as thousands of users simultaneously creating thousands of connections on web application. Note, accessing by index without locking can raise Index out of bound exception.
internal class ConnA
{
static internal SortedList slCons = new SortedList();
internal static bool CreateCon(string ConnID)
{
string constring = "sqlconnectionstring_containing_DataSource_UserInfo_InitialCatalog";
SqlConnection objSqlCon = new SqlConnection(constring);
objSqlCon.Open();
bool connSuccess = (objSqlCon.State == ConnectionState.Open) ? true : false;
if (connSuccess && slCons.ContainsKey(ConnID) == false)
{
slCons.Add(ConnID, objSqlCon);
}
return connSuccess;
}
//Method1
internal static void CloseConnection(string ConnID)
{
if (slCons.ContainsKey(ConnID))
{
SqlConnection objSqlCon = slCons[ConnID] as SqlConnection;
objSqlCon.Close();
objSqlCon.Dispose();
objSqlCon.ResetStatistics();
slCons.Remove(ConnID);
}
}
//Method2
internal static void CloseConnection(string ConnID)
{
lock (slCons)
{
int nIndex = slCons.IndexOfKey(ConnID);
if (nIndex != -1)
{
SqlConnection objSqlCon = (SqlConnection)slCons.GetByIndex(nIndex);
objSqlCon.Close();
objSqlCon.Dispose();
objSqlCon.ResetStatistics();
slCons.RemoveAt(nIndex);
}
}
}
internal class UserA
{
public string ConnectionID { get { return HttpContext.Current.Session.SessionID; } }
private ConnA objConnA = new objConnA();
public void ConnectDB()
{
objConnA.CreateCon(ConnectionID));
}
public void DisConnectDB()
{
objConnA.CloseConnection(ConnectionID));
}
}
Access to the SortedList isn't thread safe.
In CreateCon, two threads could access this simultaneously:
if (connSuccess && slCons.ContainsKey(ConnID) == false)
Both threads could determine that the key isn't present, and then both threads try to add it, so that one of them fails.
In method 2:
When this is called - slCons.RemoveAt(nIndex); - the lock guarantees that another call to the same method won't remove another connection, which is good. But nothing guarantees that another thread won't call CreateCon and insert a new connection string, changing the indexes so that nIndex now refers to a different item in the collection. You would end up closing, disposing, and deleting the wrong connection string, likely one that another thread was still using.
It looks like you're attempting an orchestration which ensures that a single connection string will be used across multiple operations. But there's no need to introduce that complication. Whatever class or method needs a connection, there's no need for it to collaborate with this collection and these methods. You can just let each of them open a connection when it needs it and dispose the connection when it's done.
That's expensive, but that's why the framework implements connection pooling. From the perspective of your code connections are being created, opened, closed, and disposed.
But behind the scenes, the "closed" connection isn't really closed, at least not right away. It's actually kept open. If, in a brief period, you "open" another connection with the same connection string, you're actually getting the same connection again, which is still open. That's how the number of connections opened and closed is reduced without us having to manually manage it.
That in turn prevents us from having to do what it looks like you're doing. This might be different if we were opening a transaction on a connection, and then we had to ensure that multiple operations were performed on the same connection. But even then it would likely be clearer and simpler to pass around the connection, not an ID.

Parallel.ForEach: Best way to save off a collection when its record count gets high?

So I'm running a Parallel.ForEach that basically generates a bunch of data which is ultimately going to be saved to a database. However, since collection of data can get quite large I need to be able to occasionally save/clear the collection so as to not run into an OutOfMemoryException.
I'm new to using Parallel.ForEach, concurrent collections, and locks, so I'm a little fuzzy on what exactly needs to be done to make sure everything works correctly (i.e. we don't get any records added to the collection between the Save and Clear operations).
Currently I'm saying, if the record count is above a certain threshold, save the data in the current collection, within a lock block.
ConcurrentStack<OutRecord> OutRecs = new ConcurrentStack<OutRecord>();
object StackLock = new object();
Parallel.ForEach(inputrecords, input =>
{
lock(StackLock)
{
if (OutRecs.Count >= 50000)
{
Save(OutRecs);
OutRecs.Clear();
}
}
OutRecs.Push(CreateOutputRecord(input);
});
if (OutRecs.Count > 0) Save(OutRecs);
I'm not 100% certain whether or not this works the way I think it does. Does the lock stop other instances of the loop from writing to output collection? If not is there a better way to do this?
Your lock will work correctly but it will not be very efficient because all your worker threads will be forced to pause for the entire duration of each save operation. Also, locks tends to be (relatively) expensive, so performing a lock in each iteration of each thread is a bit wasteful.
One of your comments mentioned giving each worker thread its own data storage: yes, you can do this. Here's an example that you could tailor to your needs:
Parallel.ForEach(
// collection of objects to iterate over
inputrecords,
// delegate to initialize thread-local data
() => new List<OutRecord>(),
// body of loop
(inputrecord, loopstate, localstorage) =>
{
localstorage.Add(CreateOutputRecord(inputrecord));
if (localstorage.Count > 1000)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
return localstorage;
},
// finally block gets executed after each thread exits
localstorage =>
{
if (localstorage.Count > 0)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
});
One approach is to define an abstraction that represents the destination for your data. It could be something like this:
public interface IRecordWriter<T> // perhaps come up with a better name.
{
void WriteRecord(T record);
void Flush();
}
Your class that processes the records in parallel doesn't need to worry about how those records are handled or what happens when there's too many of them. The implementation of IRecordWriter handles all those details, making your other class easier to test.
An implementation of IRecordWriter could look something like this:
public abstract class BufferedRecordWriter<T> : IRecordWriter<T>
{
private readonly ConcurrentQueue<T> _buffer = new ConcurrentQueue<T>();
private readonly int _maxCapacity;
private bool _flushing;
public ConcurrentQueueRecordOutput(int maxCapacity = 100)
{
_maxCapacity = maxCapacity;
}
public void WriteRecord(T record)
{
_buffer.Enqueue(record);
if (_buffer.Count >= _maxCapacity && !_flushing)
Flush();
}
public void Flush()
{
_flushing = true;
try
{
var recordsToWrite = new List<T>();
while (_buffer.TryDequeue(out T dequeued))
{
recordsToWrite.Add(dequeued);
}
if(recordsToWrite.Any())
WriteRecords(recordsToWrite);
}
finally
{
_flushing = false;
}
}
protected abstract void WriteRecords(IEnumerable<T> records);
}
When the buffer reaches the maximum size, all the records in it are sent to WriteRecords. Because _buffer is a ConcurrentQueue it can keep reading records even as they are added.
That Flush method could be anything specific to how you write your records. Instead of this being an abstract class the actual output to a database or file could be yet another dependency that gets injected into this one. You can make decisions like that, refactor, and change your mind because the very first class isn't affected by those changes. All it knows about is the IRecordWriter interface which doesn't change.
You might notice that I haven't made absolutely certain that Flush won't execute concurrently on different threads. I could put more locking around this, but it really doesn't matter. This will avoid most concurrent executions, but it's okay if concurrent executions both read from the ConcurrentQueue.
This is just a rough outline, but it shows how all of the steps become simpler and easier to test if we separate them. One class converts inputs to outputs. Another class buffers the outputs and writes them. That second class can even be split into two - one as a buffer, and another as the "final" writer that sends them to a database or file or some other destination.

implementing a c# read write lock where some reads produce writes

I have to implement some .Net code involving a shared resource accessed by different threads. In principle, this should be solved with a simple read write lock. However, my solution requires that some of the read accessions do end up producing a write operation. I first checked the ReaderWriterLockSlim, but by itself it does not solve the problem, because it requires that I know in advance if a read operation can turn into a write operation, and this is not my case. I finally opted by simply using a ReaderWriterLockSlim, and when the read operation "detects" that needs to do a write operation, release the read lock and acquire a write lock. I am not sure if there is a better solution, or event if this solution could lead to some synchronization issue (I have experience with Java, but I am fairly new to .Net).
Below some sample code illustrating my solution:
public class MyClass
{
private int[] data;
private readonly ReaderWriterLockSlim syncLock = new ReaderWriterLockSlim();
public void modifyData()
{
try
{
syncLock.EnterWriteLock();
// clear my array and read from database...
}
finally
{
syncLock.ExitWriteLock();
}
}
public int readData(int index)
{
try
{
syncLock.EnterReadLock();
// some initial preprocessing of the arguments
try
{
_syncLock.ExitReadLock();
_syncLock.EnterWriteLock();
// check if a write is needed <--- this operation is fast, and, in most cases, the result will be false
// if true, perform the write operation
}
finally
{
_syncLock.ExitWriteLock();
_syncLock.EnterReadLock();
}
return data[index];
}
finally
{
syncLock.ExitReadLock();
}
}
}

SSMS SMO Objects: Get query results

I came across this tutorial to understand how to execute SQL scripts with GO statements.
Now I want to know what can I get the output of the messages TAB.
With several GO statements, the output would be like this:
1 rows affected
912 rows affected
...
But server.ConnectionContext.ExecuteNonQuery() can return only an int, while I need all the text. In case there is some error in some part of query, it should put that also in the output.
Any help would be appreciated.
The easiest thing is possibly to just print the number you get back for ExecuteNonQuery:
int rowsAffected = server.ConnectionContext.ExecuteNonQuery(/* ... */);
if (rowsAffected != -1)
{
Console.WriteLine("{0} rows affected.", rowsAffected);
}
This should work, but will not honor the SET NOCOUNT setting of the current session/scope.
Otherwise you would do it like you would do with "plain" ADO.NET. Don't use the ServerConnection.ExecuteNonQuery() method, but create an SqlCommand object by accessing the underlying SqlConnection object. On that subscribe to the StatementCompleted event.
using (SqlCommand command = server.ConnectionContext.SqlConnectionObject.CreateCommand())
{
// Set other properties for "command", like StatementText, etc.
command.StatementCompleted += (s, e) => {
Console.WriteLine("{0} row(s) affected.", e.RecordCount);
};
command.ExecuteNonQuery();
}
Using StatementCompleted (instead, say, manually printing the value that ExecuteNonQuery() returned) has the benefit that it works exactly like SSMS or SQLCMD.EXE would:
For commands that do not have a ROWCOUNT it will not be called at all (e.g. GO, USE).
If SET NOCOUNT ON was set, it will not be called at all.
If SET NOCOUNT OFF was set, it will be called for every statement inside a batch.
(Sidebar: it looks like StatementCompleted is exactly what the TDS protocol talks about when DONE_IN_PROC event is mentioned; see Remarks of the SET NOCOUNT command on MSDN.)
Personally, I have used this approach with success in my own "clone" of SQLCMD.EXE.
UPDATE: It should be noted, that this approach (of course) requires you to manually split the input script/statements at the GO separator, because you're back to using SqlCommand.Execute*() which cannot handle multiple batches at a time. For this, there are multiple options:
Manually split the input on lines starting with GO (caveat: GO can be called like GO 5, for example, to execute the previous batch 5 times).
Use the ManagedBatchParser class/library to help you split the input into single batches, especially implement ICommandExecutor.ProcessBatch with the code above (or something resembling it).
I choose the later option, which was quite some work, given that it is not pretty well documented and examples are rare (google a bit, you'll find some stuff, or use reflector to see how the SMO-Assemblies use that class).
The benefit (and maybe burden) of using the ManagedBatchParser is, that it will also parse all other constructs of T-SQL scripts (intended for SQLCMD.EXE) for you. Including: :setvar, :connect, :quit, etc. You don't have to implement the respective ICommandExecutor members, if your scripts don't use them, of course. But mind you that you'll may not be able to execute "arbitrary" scripts.
Well, were did that put you. From the "simple question" of how to print "... rows affected" to the fact that it is not trivial to do in a robust and general manner (given the background work required). YMMV, good luck.
Update on ManagedBatchParser Usage
There seems to be no good documenation or example about how to implement IBatchSource, here is what I went with.
internal abstract class BatchSource : IBatchSource
{
private string m_content;
public void Populate()
{
m_content = GetContent();
}
public void Reset()
{
m_content = null;
}
protected abstract string GetContent();
public ParserAction GetMoreData(ref string str)
{
str = null;
if (m_content != null)
{
str = m_content;
m_content = null;
}
return ParserAction.Continue;
}
}
internal class FileBatchSource : BatchSource
{
private readonly string m_fileName;
public FileBatchSource(string fileName)
{
m_fileName = fileName;
}
protected override string GetContent()
{
return File.ReadAllText(m_fileName);
}
}
internal class StatementBatchSource : BatchSource
{
private readonly string m_statement;
public StatementBatchSource(string statement)
{
m_statement = statement;
}
protected override string GetContent()
{
return m_statement;
}
}
And this is how you would use it:
var source = new StatementBatchSource("SELECT GETUTCDATE()");
source.Populate();
var parser = new Parser();
parser.SetBatchSource(source);
/* other parser.Set*() calls */
parser.Parse();
Note that both implementations, either for direct statements (StatementBatchSource) or for a file (FileBatchSource) have the problem that they read the complete text at once
into memory. I had one case where that blew up, having a huge(!) script with gazillions of generated INSERT statements. Even though I don't think that is a practical issue, SQLCMD.EXE could handle it. But for the life of me, I couldn't figure out how exactly,
you would need to form the chunks returned for IBatchParser.GetContent() so that the
parser can still work with them (it looks like they would need to be complete statements,
which would sort of defeat the purpose of the parse in the first place...).

How to detect EOF on DataReader in C# without executing Read()

I'm familiar with using .Read() to detect the EOF
using (IDataReader reader = SqlHelper.ExecuteReader(_connectionString, "dbo.GetOrders"))
{
AssertOrder(reader);
while (reader.Read())
{
yield return FillRecord<Order>(reader, StringComparer.OrdinalIgnoreCase);
}
reader.Close();
}
Due to some weird situation I got myself into, the FillRecord actually advances the reader. So now the .Read() in the while loop actually causes this function to skip some rows -- because we are advancing twice.
I wish there was an IDataReader.EOF, but there isn't. Any thoughts?
I think I probably made my opinion clear in the comments... but, since you asked for "Any Thoughts"... here ya go:
Obviously a two second hack job, but you'll get the idea:
(You'd surely want to implement IDisposable (since I think IDataReader is??) etc. Probably just make the class inherit from IDataReader, and implement the class as a facade. Or get real cool and implement a transparent proxy which hijacks the Read method.
public class DataReaderWithEOF
{
public bool EOF { public get; private set; }
private IDataReader reader;
public DataReaderWithEOF(IDataReader reader)
{
this.reader = reader;
}
public bool Read()
{
bool result = reader.Read();
this.EOF = !result;
return result;
}
}
using (IDataReader reader = SqlHelper.ExecuteReader(_connectionString, "dbo.GetOrders"))
{
if(!reader.HasRows)
{
Response.Write("EOF"); // empty
}
while (reader.Read()) //read only registers
{
yield return FillRecord<Order>(reader, StringComparer.OrdinalIgnoreCase);
}
reader.Close();
}
}
every time u use a. Read () it next record.
then do if(reader.Read()) u would already be advancing a record.
An alternative to Steve's solution is to call reader.Close() in FillRecord when Read() returns false. Then you can check reader.IsClosed in your main loop.

Categories

Resources