I've created a file processing service which reads and imports xml files from a specific directory.
The service starts several workers which will poll a filequeue for new files and uses linq2sql for dataaccess. Each workerthread has its own datacontext.
The files being processed contain several orders and each order contains several addresses (Customer/Contractor/Subcontractor)
I've defined a transactionscope around the handling of each file. This way I want to ensure that the whole file is handled correctly, or that the whole file is rolled back when an exception occurs:
try
{
using (var tx = new TransactionScope(TransactionScopeOption.RequiresNew))
{
foreach (var order in orders)
{
HandleType1Order(order);
}
tx.Complete();
}
}
catch (SqlException ex)
{
if (ex.Number == SqlErrorNumbers.Deadlock)
{
throw new FileHandlerException("File Caused a Deadlock, retrying later", ex, true);
}
else
throw;
}
One of the requirements for the service is that is creates or updates found addresses in the xml files. So I've created an address service which is responsible for address management. The following piece of code gets executed for each order (within the method HandleType1Order()) in the xml importfile (And thus is part of the TransactionScope for the entire file).
using (var tx = new TransactionScope())
{
address = GetAddressByReference(number);
if (address != null) //address is already known
{
Log.Debug("Found address {0} - {1}. Updating...", address.Code, address.Name);
UpdateAddress(address, name, number, isContractor, isSubContractor, isCustomer);
}
else
{
//address not known, so create it
Log.Debug("Address {0} not known, creating address", number);
address = CreateAddress(name, number, sourceSystemId, isContractor, isSubContractor,
isCustomer);
_addressRepository.Save(address);
}
_addressRepository.Flush();
tx.Complete();
}
What I'm trying to do here, is to create or update an address, with the number being unique.
The method GetAddressByReference(string number) returns a known address or null when an address is not found.
public virtual Address GetAddressByReference(string reference)
{
return _addressRepository.GetAll().SingleOrDefault(a=>a.Code==reference);
}
When I run the service it however creates multiple addresses with the same number. The method GetAddressByReference() get's called concurrently and should return a known address when a second thread executes the method with the same addressnumber, however it returns null. There is propably something wrong with my transaction boundaries, or isolationlevel, but I can't seem to get it to work.
Can someone point me in the right direction? Help is much appreciated!!
p.s. I've no problem with the transactions being deadlocked and causing a rollback, the file will just be retried when a deadlock occurs.
Edit 1 Threading code:
public void Work()
{
_isRunning = true;
while (true)
{
ImportFileTask task = _queue.Dequeue(); //dequeue blocks on empty queue
if (task == null)
break; //Shutdown worker when a null task is read from the queue
IFileImporter importer = null;
try
{
using (new LockFile(task.FilePath).Acquire()) //create a filelock to sync access accross all processes to the file
{
importer = _kernel.Resolve<IFileImporter>();
Log.DebugFormat("Processing file {0}", task.FilePath);
importer.Import(task.FilePath);
Log.DebugFormat("Done Processing file {0}", task.FilePath);
}
}
catch(Exception ex)
{
Log.Fatal(
"A Fatal exception occured while handling {0} --> {1}".FormatWith(task.FilePath, ex.Message), ex);
}
finally
{
if (importer != null)
_kernel.ReleaseComponent(importer);
}
}
_isRunning = false;
}
The above method runs in all of our worker threads. It uses Castle Windsor to resolve the FileImporter, which has a transient lifestyle (thus not shared accross threads).
You didn't post your threading code, so its difficult to say what the issue is. I'm assuming you have started DTC (Distributed Transaction Coordinator)?
Are you using a ThreadPool? Are you using the "lock" keyword?
http://msdn.microsoft.com/en-us/library/c5kehkcz.aspx
Related
I'm debugging an existing windows service (written in C#) that needs to be manually restarted every few months because it keeps eating memory.
The service is not very complicated. It requests a json file from an external server, which holds products.
Next it parses this json file into a list of products.
For each of these products it is checking if this product already exists in the database. If not it will be added if it does exists the properties will be updated.
The database is a PostgreSQL database and we use NHibernate v3.2.0 as ORM.
I've been using JetBrains DotMemory to profile the service when it runs:
The service starts and after 30s it starts doing its work. SnapShot #1 is made before the first run.
Snapshot #6 was made after the 5th run.
The other snapshots are also made after a run.
As you can see after each run the number of objects increases with approx. 60k and the memory used increases with a few MBs after every run.
Looking closer at Snapshot #6, shows the retained size is mostly used by NHibernate session objects:
Here's my OnStart code:
try
{
// Trying to fix certificate errors:
ServicePointManager.ServerCertificateValidationCallback += delegate
{
_logger.Debug("Cert validation work around");
return true;
};
_timer = new Timer(_interval)
{
AutoReset = false // makes it fire only once, restart when work is done to prevent multiple runs
};
_timer.Elapsed += DoServiceWork;
_timer.Start();
}
catch (Exception ex)
{
_logger.Error("Exception in OnStart: " + ex.Message, ex);
}
And my DoServiceWork:
try
{
// Call execute
var processor = new SAPProductProcessor();
processor.Execute();
}
catch (Exception ex)
{
_logger.Error("Error in DoServiceWork", ex);
}
finally
{
// Next round:
_timer.Start();
}
In SAPProductProcessor I use two db calls. Both in a loop.
I loop through all products from the JSON file and check if the product is already in the table using the product code:
ProductDto dto;
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction(IsolationLevel.ReadCommitted))
{
var criteria = session.CreateCriteria<ProductDto>();
criteria.Add(Restrictions.Eq("Code", code));
dto = criteria.UniqueResult<ProductDto>();
transaction.Commit();
}
}
return dto;
And when the productDto is updated I save it using:
using (var session = SessionFactory.OpenSession())
{
using (var transaction = session.BeginTransaction(IsolationLevel.ReadCommitted))
{
session.SaveOrUpdate(item);
transaction.Commit();
}
}
I'm not sure how to change the code above to stop increasing the memory and the number of object.
I already tried using var session = SessionFactory.GetCurrentSession(); instead of using (var session = SessionFactory.OpenSession()) but that didn't stop the increase of memory.
Update
In the constructor of my data access class MultiSessionFactoryProvider sessionFactoryProvider is injected. And the base class is called with : base(sessionFactoryProvider.GetFactory("data")). This base class has a method BeginSession:
ISession session = _sessionFactory.GetCurrentSession();
if (session == null)
{
session = _sessionFactory.OpenSession();
ThreadLocalSessionContext.Bind(session);
}
And a EndSession:
ISession session = ThreadLocalSessionContext.Unbind(_sessionFactory);
if (session != null)
{
session.Close();
}
In my data access class I call base.BeginSession at the start and base.EndSession at then end.
The suggestion about the Singleton made me have a closer look at my data access class.
I thought when creating this class with every run would free the NHibernate memory when it runs out of scope. I even added some dispose call in the class' destructor. But that didn't work, or more likely I'm not doing it correctly.
I now save my data access class in a static field and re-use it. Now my memory doesn't increase anymore and more important the number of open objects stay the same. I just run the service using DotMemory again for over an hour calling the run around 150 times and the memory of the last snapshot is still around 105MB and the number of object is still 117k and my SessionFactory dictionary is now just 4MB instead of 150*4MB.
I have a service layer project on an MVC 5 ASP.NET application I am creating on .NET 4.5.2 which calls out to an External 3rd Party WCF Service to Get Information asynchronously. An original method to call external service was as below (there are 3 of these all similar in total which I call in order from my GetInfoFromExternalService method (note it isnt actually called that - just naming it for illustration)
private async Task<string> GetTokenIdForCarsAsync(Car[] cars)
{
try
{
if (_externalpServiceClient == null)
{
_externalpServiceClient = new ExternalServiceClient("WSHttpBinding_IExternalService");
}
string tokenId= await _externalpServiceClient .GetInfoForCarsAsync(cars).ConfigureAwait(false);
return tokenId;
}
catch (Exception ex)
{
//TODO plug in log 4 net
throw new Exception("Failed" + ex.Message);
}
finally
{
CloseExternalServiceClient(_externalpServiceClient);
_externalpServiceClient= null;
}
}
So that meant that when each async call had completed the finally block ran - the WCF client was closed and set to null and then newed up when another request was made. This was working fine until a change needed to be made whereby if the number of cars passed in by User exceeds 1000 I create a Split Function and then call my GetInfoFromExternalService method in a WhenAll with each 1000 - as below:
if (cars.Count > 1000)
{
const int packageSize = 1000;
var packages = SplitCarss(cars, packageSize);
//kick off the number of split packages we got above in Parallel and await until they all complete
await Task.WhenAll(packages.Select(GetInfoFromExternalService));
}
However this now falls over as if I have 3000 cars the method call to GetTokenId news up the WCF service but the finally blocks closes it so the second batch of 1000 that is attempting to be run throws an exception. If I remove the finally block the code works ok - but it is obviously not good practice to not be closing this WCF client.
I had tried putting it after my if else block where the cars.count is evaluated - but if a User uploads for e.g 2000 cars and that completes and runs in say 1 min - in the meantime as the user had control in the Webpage they could upload another 2000 or another User could upload and again it falls over with an Exception.
Is there a good way anyone can see to correctly close the External Service Client?
Based on the related question of yours, your "split" logic doesn't seem to give you what you're trying to achieve. WhenAll still executes requests in parallel, so you may end up running more than 1000 requests at any given moment of time. Use SemaphoreSlim to throttle the number of simultaneously active requests and limit that number to 1000. This way, you don't need to do any splits.
Another issue might be in how you handle the creation/disposal of ExternalServiceClient client. I suspect there might a race condition there.
Lastly, when you re-throw from the catch block, you should at least include a reference to the original exception.
Here's how to address these issues (untested, but should give you the idea):
const int MAX_PARALLEL = 1000;
SemaphoreSlim _semaphoreSlim = new SemaphoreSlim(MAX_PARALLEL);
volatile int _activeClients = 0;
readonly object _lock = new Object();
ExternalServiceClient _externalpServiceClient = null;
ExternalServiceClient GetClient()
{
lock (_lock)
{
if (_activeClients == 0)
_externalpServiceClient = new ExternalServiceClient("WSHttpBinding_IExternalService");
_activeClients++;
return _externalpServiceClient;
}
}
void ReleaseClient()
{
lock (_lock)
{
_activeClients--;
if (_activeClients == 0)
{
_externalpServiceClient.Close();
_externalpServiceClient = null;
}
}
}
private async Task<string> GetTokenIdForCarsAsync(Car[] cars)
{
var client = GetClient();
try
{
await _semaphoreSlim.WaitAsync().ConfigureAwait(false);
try
{
string tokenId = await client.GetInfoForCarsAsync(cars).ConfigureAwait(false);
return tokenId;
}
catch (Exception ex)
{
//TODO plug in log 4 net
throw new Exception("Failed" + ex.Message, ex);
}
finally
{
_semaphoreSlim.Release();
}
}
finally
{
ReleaseClient();
}
}
Updated based on the comment:
the External WebService company can accept me passing up to 5000 car
objects in one call - though they recommend splitting into batches of
1000 and run up to 5 in parallel at one time - so when I mention 7000
- I dont mean GetTokenIdForCarAsync would be called 7000 times - with my code currently it should be called 7 times - i.e giving me back 7
token ids - I am wondering can I use your semaphore slim to run first
5 in parallel and then 2
The changes are minimal (but untested). First:
const int MAX_PARALLEL = 5;
Then, using Marc Gravell's ChunkExtension.Chunkify, we introduce GetAllTokenIdForCarsAsync, which in turn will be calling GetTokenIdForCarsAsync from above:
private async Task<string[]> GetAllTokenIdForCarsAsync(Car[] cars)
{
var results = new List<string>();
var chunks = cars.Chunkify(1000);
var tasks = chunks.Select(chunk => GetTokenIdForCarsAsync(chunk)).ToArray();
await Task.WhenAll(tasks);
return tasks.Select(task => task.Result).ToArray();
}
Now you can pass all 7000 cars into GetAllTokenIdForCarsAsync. This is a skeleton, it can be improved with some retry logic if any of the batch requests has failed (I'm leaving that up to you).
I currently just inherited some complex code that unfortunately I do not fully understand. It handles a large number of inventory records inputting/outputting to a database. The solution is extremely large/advanced where I am still on the newer side of c#. The issue I am encountering is that periodically the program will throw an IO Exception. It doesn't actually throw a failure code, but it messes up our output data.
the try/catch block is as follows:
private static void ReadRecords(OleDbRecordReader recordReader, long maxRows, int executionTimeout, BlockingCollection<List<ProcessRecord>> processingBuffer, CancellationTokenSource cts, Job theStack, string threadName) {
ProcessRecord rec = null;
try {
Thread.CurrentThread.Name = threadName;
if(null == cts)
throw new InvalidOperationException("Passed CancellationToken was null.");
if(cts.IsCancellationRequested)
throw new InvalidOperationException("Passed CancellationToken is already been cancelled.");
long reportingFrequency = (maxRows <250000)?10000:100000;
theStack.FireStatusEvent("Opening "+ threadName);
recordReader.Open(maxRows, executionTimeout);
theStack.FireStatusEvent(threadName + " Opened");
theStack.FireInitializationComplete();
List<ProcessRecord> inRecs = new List<PIRecord>(500);
ProcessRecord priorRec = rec = recordReader.Read();
while(null != priorRec) { //-- note that this is priorRec, not Rec. We process one row in arrears.
if(cts.IsCancellationRequested)
theStack.FireStatusEvent(threadName + " cancelling due to request or error.");
cts.Token.ThrowIfCancellationRequested();
if(rec != null) //-- We only want to count the loop when there actually is a record.
theStack.RecordCountRead++;
if(theStack.RecordCountRead % reportingFrequency == 0)
theStack.FireProgressEvent();
if((rec != null) && (priorRec.SKU == rec.SKU) && (priorRec.Store == rec.Store) && (priorRec.BatchId == rec.BatchId))
inRecs.Add(rec); //-- just store it and keep going
else { //-- otherwise, we need to process it
processingBuffer.Add(inRecs.ToList(),cts.Token); //-- note that we don't enqueue the original LIST! That could be very bad.
inRecs.Clear();
if(rec != null) //-- Again, we need this check here to ensure that we don't try to enqueue the EOF null record.
inRecs.Add(rec); //-- Now, enqueue the record that fired this condition and start the loop again
}
priorRec = rec;
rec = recordReader.Read();
} //-- end While
}
catch(OperationCanceledException) {
theStack.FireStatusEvent(threadName +" Canceled.");
}
catch(Exception ex) {
theStack.FireExceptionEvent(ex);
theStack.FireStatusEvent("Error in RecordReader. Requesting cancellation of other threads.");
cts.Cancel(); // If an exception occurs, notify all other pipeline stages, then rethrow
// throw; //-- This will also propagate Cancellation, but that's OK
}
In the log of our job we see the output loader stopping and the exception is
System.Core: Pipe is broken.
Does any one have any ideas as to what may cause this? More importantly, the individual who made this large-scale application is no longer here. When I debug all of my applications, I am able to add break points in the solution and do the standard VS stepping through everything to find the issue. However, this application is huge and has a GUI that pops up when I debug the application. I believe the GUI was made for testing purposes, but it hinders me from actually being able to step through everything. However when the .exe is run from our actual job stream, there is no GUI it just executes the way it's supposed to.
The help I am asking for is 2 things:
just suggestions as to what may cause this. Could an OleDB driver be the cause? Reason I ask is because I have this running on 2 different servers. One test and one not. The one with a new OleDB driver version does not fail (7.0 i believe whereas the other where it fails is 6.0).
Is there any code that I could add that may give me a better indication as to what may be causing the broken pipe? The error only happens periodically. If I run the job again right after, it may not happen. I'd say it's 30-40% of the time it throws the exception.
If you have any additional questions about the structure just let me know.
I have a real time app that tracks assets around a number of sites across the country. As part of this solution I have 8 client apps that update a central server.
My question is that sometimes the apps lose connection to the central server and I am wondering what is the best way to deal with this ? I know I could just increase the max send/receive times to deal with the timeout BUT I also want a graceful solution to deal with if the connection to the server is down:
For example I'm calling my services like this :
using (var statusRepository = new StatusRepositoryClient.StatusRepositoryClient())
{
statusId = statusRepository.GetIdByName(licencePlateSeen.CameraId.ToString());
}
I was thinking of adding a try/catch so...
using (var statusRepository = new StatusRepositoryClient.StatusRepositoryClient())
{
try
{
statusId = statusRepository.GetIdByName(licencePlateSeen.CameraId.ToString());
}
catch (TimeoutException timeout)
{
LogMessage(timeout);
}
catch (CommunicationException comm)
{
LogMessage(comm);
}
}
Dealing it this way doesn't allow me to rerun the code without having a ton of code repeat. Any one got any suggestions ?
EDIT: Looking into Sixto Saez and user24601 answers having an overall solution is better than dealing with timeouts on an individual exception level BUT... I'm was thinking that the below would solve my problem (but it would add a TON of extra code error handling):
void Method(int statusId)
{
var statusRepository = new StatusRepositoryClient.StatusRepositoryClient()
try
{
IsServerUp();
statusId = statusRepository.GetIdByName(licencePlateSeen.CameraId.ToString());
statusRepository.Close();
}
catch (Exception ex)
{
statusRepository.Abort();
if (ex is TimeoutException || ex is CommunicationException)
{
LogMessage(timeout);
Method(statusId);
}
else
{
throw new Exception(ex.Message + ex.InnerException);
}
}
}
}
bool IsServerUp()
{
var x = new Ping();
var reply = x.Send(IPAddress.Parse("127.0.0.1"));
if (reply == null)
{
IsServerUp();
}
else
{
if (reply.Status != IPStatus.Success)
{
IsServerUp();
}
}
return true;
}
For starters I think your Wcf error handling is wrong. It should look like this
var statusRepository = new StatusRepositoryClient.StatusRepositoryClient();
try
{
statusId = statusRepository.GetIdByName(licencePlateSeen.CameraId.ToString());
statusRepository.Close()
}
catch(Exception e)
{
statusRepository.Abort();
LogMessage(e);
throw; //I would do this to let user know.
}
I would also re-throw the error to let the user know about the problem.
Prior to designing your exception handling, one important decision to make is whether you want guaranteed delivery of each message the client sends or is it OK for the service to "lose" some. For guaranteed delivery, the best built-in solution is the netMsmqBinding assuming the client can be configured to support it. Otherwise, there is a lightweight reliable messaging capability built into WCF. You'll be going down a rabbit hole if you try to handle message delivery purely through exception handling... :)
I have a two-pronged approach to verifying the server is up:
1) I have set up a 'PING' to the server every 5 seconds. The server responds with a 'PONG' and a load rating (low, medium, high, so the client can adjust its load on the server). If the client EVER doesn't receive a pong it assumes the server is down (since this is very low stress on the server - just listen and respond).
2) Random timeouts like the one you are catching are logged in a ConnectionMonitor class along with all successful connections. A single one of these calls timing out is not enough to consider the server down since some may be very processor heavy, or may just take a very long time. However, a high enough percentage of these will cause the application to go into server timeout.
I also didn't want to throw up a message for every single connection timeout, because it was happening too frequently to people who use poorer servers (or just some computer lying in their lab as a server). Most of my calls can be missed once or twice, but missing 5 or 6 calls are clearly going to cause instrusion.
When a server-timeout state happens, I throw up a little dialog box explaining what's happening to the user.
Hi Please see my solution below. Also please note that the below code has not been compliled so may have some logic and typing errors.
bool IsServerUp()
{
var x = new Ping();
var reply = x.Send(IPAddress.Parse("127.0.0.1"));
if (reply == null) return false;
return reply.Status == IPStatus.Success ? true : false;
}
int? GetStatusId()
{
try
{
using (var statusRepository = new StatusRepositoryClient.StatusRepositoryClient())
{
return statusRepository.GetIdByName(licencePlateSeen.CameraId.ToString());
}
}catch(TimeoutException te)
{
//Log TimeOutException occured
return null;
}
}
void GetStatus()
{
try
{
TimeSpan sleepTime = new TimeSpan(0,0,5);
int maxRetries = 10;
while(!IsServerUp())
{
System.Threading.Thead.Sleep(sleepTime);
}
int? statusId = null;
int retryCount = 0;
while (!statusId.HasValue)
{
statusId = GetStatusId();
retryCount++;
if (retryCount > maxRetries)
throw new ApplicationException(String.Format("{0} Maximum Retries reached in order to get StatusId", maxRetries));
System.Threading.Thead.Sleep(sleepTime);
}
}catch(Exception ex)
{
//Log Exception Occured
}
}
I'm working on an Azure based project for some research and have been running into some issues when deleting messages from a CloudQueue instance. The code is fairly straightforward, so I'm a bit baffled as to why an exception is being thrown when I try to delete a message from the queue.
Here is the code that produces data for the queue:
foreach (var cell in scheme(cells))
{
string id = Guid.NewGuid().ToString();
var blob = sweepItemContainer.GetBlobReference(id);
using (BlobStream stream = blob.OpenWrite())
{
BinaryFormatter bf = new BinaryFormatter();
bf.Serialize(stream, cell);
}
sweepItemQueue.AddMessage(new CloudQueueMessage(id), new TimeSpan(1, 0, 0));
}
Here is the code that consumes the data from the queue:
var msgs = sweepItemsQueue.GetMessages(MsgAmt);
foreach (var msg in msgs)
{
_handleMessage(msg, sweepItemsContainer);
sweepItemsQueue.DeleteMessage(msg);
mergeItemsQueue.AddMessage(new CloudQueueMessage(msg.AsString), new TimeSpan(1, 0, 0));
}
I don't see how the message cannot exist in the queue. Nothing else is mutating the queue besides other consumers. But I am assured that they cannot get the same message (so long as the timespan doesn't run out), so how is this happening?
There are two timeouts that you need to worry about, how long the message lives in the queue (which you've specified in the your .AddMessage() call and the visibility timeout that is set when you call .GetMessages() (by default this is 30 seconds, there is an overload that allows you to specify the timeout). When you call .GetMessages() all of the messages returned are invisible to other consumers for the period 'visibilityTimeout'. Once this period finishes all of the messages you haven't already deleted become visible to all other consumers.
To check if this is the problem I would try using the overload of .GetMessages() with it's maximum visibility timeout of 2 hours. If this is the problem you can fine tune this value down to a more sensible number. Another option would be to just retrieve one message at a time.
Another answer from Steve Marx, basically look at the storage exception and move on. I have seen this in other frameworks too.:
Steve Marx blog post
try
{
q.DeleteMessage(msg);
}
catch (StorageClientException ex)
{
if (ex.ExtendedErrorInformation.ErrorCode == "MessageNotFound")
{
// pop receipt must be invalid
// ignore or log (so we can tune the visibility timeout)
}
else
{
// not the error we were expecting
throw;
}
}