Capture Slow Output from Method - c#

I have a slow running utility method that logs one line of output at a time. I need to be able to output each of those lines and then read them from other locations in code. I have attempted using Tasks and Streams similar to the code below:
public static Task SlowOutput(Stream output)
{
Task result = new Task(() =>
{
using(StreamWriter sw = new StreamWriter(output))
{
for(var i = 0; i < int.MaxValue; i++)
{
sw.WriteLine(i.ToString());
System.Threading.Thread.Sleep(1000);
}
}
}
}
And then called like this:
MemoryStream ms = new MemoryStream();
var t = SlowOutput(ms);
using (var sr = new StreamReader(ms))
{
while (!t.IsCompleted)
{
Console.WriteLine(sr.ReadLine())
}
}
But of course, sr.ReadLine() is always empty because as soon as the method's sw.WriteLine() is called, it changes the position of the underlying stream to the end.
What I'm trying to do is pipe the output of the stream by maybe queueing up the characters that the method outputs and then consuming them from outside the method. Streams don't seem to be the way to go.
Is there a generally accepted way to do this?

What I would do is switch to a BlockingCollection<String>.
public static Task SlowOutput(BlockingCollection<string> output)
{
return Task.Run(() =>
{
for(var i = 0; i < int.MaxValue; i++)
{
output.Add(i);
System.Threading.Thread.Sleep(1000);
}
output.Complete​Adding();
}
}
consumed by
var bc = BlockingCollection<string>();
SlowOutput(bc);
foreach(var line in bc.GetConsumingEnumerable()) //Blocks till a item is added to the collection. Leaves the foreach loop after CompleteAdding() is called and there are no more items to be processed.
{
Console.WriteLine(line)
}

Related

Loading collection view with task async method

I am trying to load thumbnails with async task method with depency service :
In my pcl page I have this :
protected override void OnAppearing()
{
Device.BeginInvokeOnMainThread(() => UserDialogs.Instance.ShowLoading("Loading...", MaskType.Black));
Task.Run(async () =>
{
directoryPath = await getThumbnails.GetBitmaps(fileInfo.FullName);
List<ThumbnailsModel> thumbnailsModels = new List<ThumbnailsModel>();
int i = 1;
Directory.GetFiles(directoryPath).ToList<string>().ForEach(delegate (string thumbnailsEmplacement)
{
thumbnailsModels.Add(new ThumbnailsModel(i, thumbnailsEmplacement));
i++;
});
CollectionViewThumbnails.ItemsSource = thumbnailsModels;
}).ContinueWith(result => Device.BeginInvokeOnMainThread(() =>
{
UserDialogs.Instance.HideLoading();
}
)
);
}
My method to get the thumbnails :
public async Task<string> GetBitmaps(string filePath)
{
//TODO-- WORK ON THIS
var appDirectory = System.Environment.GetFolderPath(System.Environment.SpecialFolder.MyDocuments);
string fileName = System.IO.Path.GetFileNameWithoutExtension(filePath);
string directoryPath = System.IO.Path.Combine(appDirectory, "thumbnailsTemp", System.IO.Path.GetFileNameWithoutExtension(fileName));
var stream = new MemoryStream();
using (Stream resourceStream = new FileStream(filePath, FileMode.Open))
{
resourceStream.CopyTo(stream);
}
Document document = new Document(stream);
int count = document.Pages.Count;
for(int i = 0; i<= count; i++) {
TallComponents.PDF.Rasterizer.Page page = document.Pages[0];
using (var outputStream = new FileStream(System.IO.Path.Combine(directoryPath, fileName + "Thumbnails" + i + ".png"), FileMode.Create, FileAccess.Write))
{
await Task.Run(() =>
{
page.SaveAsBitmap(outputStream, CompressFormat.Png, 5);
});
}
}
return directoryPath;
}
The problem is that my application is going in my Dependency service method then going back in my pcl OnAppearing method before the thumbnails are done and going at this line
UserDialogs.Instance.HideLoading();
Seems like you have an unhandled exception. That continuation will run even if an exception is thrown on the Task you're continuing.
This can be changed using something like TaskContinuationOptions.OnlyOnRanToCompleted (and others) in the overload for ContinueWith. The default is TaskContinuationOptions.None if not specified.
Alternatively, you can access result.Exception in your continuation if you want it to run on failure and handle it.

C# Reading XPS causing memory leak

I have a simple program that just reading XPS file, i've read the following post and it did solve part of the issue.
Opening XPS document in .Net causes a memory leak
class Program
{
static int intCounter = 0;
static object _intLock = new object();
static int getInt()
{
lock (_intLock)
{
return intCounter++;
}
}
static void Main(string[] args)
{
Console.ReadLine();
for (int i = 0; i < 100; i++)
{
Thread t = new Thread(() =>
{
var ogXps = File.ReadAllBytes(#"C:\Users\Nathan\Desktop\Objective.xps");
readXps(ogXps);
Console.WriteLine(getInt().ToString());
});
t.SetApartmentState(ApartmentState.STA);
t.Start();
Thread.Sleep(50);
}
Console.ReadLine();
}
static void readXps(byte[] originalXPS)
{
try
{
MemoryStream inputStream = new MemoryStream(originalXPS);
string memoryStreamUri = "memorystream://" + Path.GetFileName(Guid.NewGuid().ToString() + ".xps");
Uri packageUri = new Uri(memoryStreamUri);
Package oldPackage = Package.Open(inputStream);
PackageStore.AddPackage(packageUri, oldPackage);
XpsDocument xpsOld = new XpsDocument(oldPackage, CompressionOption.Normal, memoryStreamUri);
FixedDocumentSequence seqOld = xpsOld.GetFixedDocumentSequence();
//The following did solve some of the memory issue
//-----------------------------------------------
var docPager = seqOld.DocumentPaginator;
docPager.ComputePageCount();
for (int i = 0; i < docPager.PageCount; i++)
{
FixedPage fp = docPager.GetPage(i).Visual as FixedPage;
fp.UpdateLayout();
}
seqOld = null;
//-----------------------------------------------
xpsOld.Close();
oldPackage.Close();
oldPackage = null;
inputStream.Close();
inputStream.Dispose();
inputStream = null;
PackageStore.RemovePackage(packageUri);
}
catch (Exception e)
{
}
}
}
^ The program will read a XPS file for hundred times
^Before apply the fix
^After apply the fix
So the fix in the post suggested did eliminate some objects, however i found that there are still objects like Dispatcher , ContextLayoutManager and MediaContext still exists in memory and their number are exactly 100, is this a normal behavior or a memory leak? How do i fix this? Thanks.
25/7/2018 Update
Adding the line Dispatcher.CurrentDispatcher.InvokeShutdown(); did get rid of the Dispatcher , ContextLayoutManager and MediaContext object, don't know if this is an ideal way to fix.
It looks like those classes you're left with are from the XPSDocument, that implements IDisposable but you don't call those. And there are a few more classes that implement that same interface and if they do, as a rule of thumb, either wrap them in a using statement so it is guaranteed their Dispose method gets called or call their Dispose method your self.
An improved version of your readXps method will look like this:
static void readXps(byte[] originalXPS)
{
try
{
using (MemoryStream inputStream = new MemoryStream(originalXPS))
{
string memoryStreamUri = "memorystream://" + Path.GetFileName(Guid.NewGuid().ToString() + ".xps");
Uri packageUri = new Uri(memoryStreamUri);
using(Package oldPackage = Package.Open(inputStream))
{
PackageStore.AddPackage(packageUri, oldPackage);
using(XpsDocument xpsOld = new XpsDocument(oldPackage, CompressionOption.Normal, memoryStreamUri))
{
FixedDocumentSequence seqOld = xpsOld.GetFixedDocumentSequence();
//The following did solve some of the memory issue
//-----------------------------------------------
var docPager = seqOld.DocumentPaginator;
docPager.ComputePageCount();
for (int i = 0; i < docPager.PageCount; i++)
{
FixedPage fp = docPager.GetPage(i).Visual as FixedPage;
fp.UpdateLayout();
}
seqOld = null;
//-----------------------------------------------
} // disposes XpsDocument
} // dispose Package
PackageStore.RemovePackage(packageUri);
} // dispose MemoryStream
}
catch (Exception e)
{
// really do something here, at least:
Debug.WriteLine(e);
}
}
This should at least clean-up most of the objects. I'm not sure if you're going to see the effects in your profiling as that depends on if the objects are actually collected during your analysis. Profiling a debug build might give unanticipated results.
As the remainder of those object instances seem to be bound to the System.Windows.Threading.Dispatcher I suggest you could try to keep a reference to your Threads (but at this point you might consider looking into Tasks) ansd once all threads are done, call the static ExitAllFrames on the Dispatcher.
Your main method will then look like this:
Console.ReadLine();
Thread[] all = new Thread[100];
for (int i = 0; i < all.Length; i++)
{
var t = new Thread(() =>
{
var ogXps = File.ReadAllBytes(#"C:\Users\Nathan\Desktop\Objective.xps");
readXps(ogXps);
Console.WriteLine(getInt().ToString());
});
t.SetApartmentState(ApartmentState.STA);
t.Start();
all[i] = t; // keep reference
Thread.Sleep(50);
}
foreach(var t in all) t.Join(); // https://stackoverflow.com/questions/263116/c-waiting-for-all-threads-to-complete
all = null; // meh
Dispatcher.ExitAllFrames(); // https://stackoverflow.com/a/41953265/578411
Console.ReadLine();

Run Async every x number of times in a for loop

I'm downloading 100K+ files and want to do it in patches, such as 100 files at a time.
static void Main(string[] args) {
Task.WaitAll(
new Task[]{
RunAsync()
});
}
// each group has 100 attachments.
static async Task RunAsync() {
foreach (var group in groups) {
var tasks = new List<Task>();
foreach (var attachment in group.attachments) {
tasks.Add(DownloadFileAsync(attachment, downloadPath));
}
await Task.WhenAll(tasks);
}
}
static async Task DownloadFileAsync(Attachment attachment, string path) {
using (var client = new HttpClient()) {
using (var fileStream = File.Create(path + attachment.FileName)) {
var downloadedFileStream = await client.GetStreamAsync(attachment.url);
await downloadedFileStream.CopyToAsync(fileStream);
}
}
}
Expected
Hoping it to download 100 files at a time, then download next 100;
Actual
It downloads a lot more at the same time. Quickly got an error Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host
Running tasks in "batch" is not a good idea in terms of performance. A long running task would make whole batch block. A better approach would be starting a new task as soon as one is finished.
This can be implemented with a queue as #MertAkcakaya suggested. But I will post another alternative based on my other answer Have a set of Tasks with only X running at a time
int maxTread = 3;
System.Net.ServicePointManager.DefaultConnectionLimit = 50; //Set this once to a max value in your app
var urls = new Tuple<string, string>[] {
Tuple.Create("http://cnn.com","temp/cnn1.htm"),
Tuple.Create("http://cnn.com","temp/cnn2.htm"),
Tuple.Create("http://bbc.com","temp/bbc1.htm"),
Tuple.Create("http://bbc.com","temp/bbc2.htm"),
Tuple.Create("http://stackoverflow.com","temp/stackoverflow.htm"),
Tuple.Create("http://google.com","temp/google1.htm"),
Tuple.Create("http://google.com","temp/google2.htm"),
};
DownloadParallel(urls, maxTread);
async Task DownloadParallel(IEnumerable<Tuple<string,string>> urls, int maxThreads)
{
SemaphoreSlim maxThread = new SemaphoreSlim(maxThreads);
var client = new HttpClient();
foreach(var url in urls)
{
await maxThread.WaitAsync();
DownloadFile(client, url.Item1, url.Item2)
.ContinueWith((task) => maxThread.Release() );
}
}
async Task DownloadFile(HttpClient client, string url, string fileName)
{
var stream = await client.GetStreamAsync(url);
using (var fileStream = File.Create(fileName))
{
await stream.CopyToAsync(fileStream);
}
}
PS: DownloadParallel will return as soon as it starts the last download. So don't await it. If you really want to await it you should add for (int i = 0; i < maxThreads; i++) await maxThread.WaitAsync(); at the end of the method.
PS2: Don't forget to add exception handling to DownloadFile

DbContext OutOfMemoryException

I have a DbContext with a dataset of >20M records, that has to be converted to a different data format. Therefore, I read the data into memory, perform some tasks and then dispose the DbContext. The code works fine, but after a while I get OutOfMemoryExceptions. I have been able to narrow it down to the following piece of code, where I retrieve 2M records, then release them and fetch them again. The first retrieval works just fine, the second one throws an exception.
// first call runs fine
using (var dbContext = new CustomDbContext())
{
var list = dbContext.Items.Take(2000000).ToArray();
foreach (var item in list)
{
// perform conversion tasks...
item.Converted = true;
}
}
// second call throws exception
using (var dbContext = new CustomDbContext())
{
var list = dbContext.Items.Take(2000000).ToArray();
foreach (var item in list)
{
// perform conversion tasks...
item.Converted = true;
}
}
Shouldn't the GC automatically release all memory allocated in the first using block, such that the second block should run as fine as the first one?
In my actual code, I do not retrieve 2 million records at once, but something between 0 and 30K in each iteration. However, after about 15 minutes, I run out of memory, although all objects should have been released.
I suspect you met LOH. Probably your objects are bigger than threashold and they are getting there, thus GC doesnt help by default.
Try this: https://www.simple-talk.com/dotnet/.net-framework/large-object-heap-compaction-should-you-use-it/
and see if your exception goes away.
i.e. add this between first and second part:
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();
IEnumerable has GetEnumerator() so you could try this to avoid .ToArray() or .ToList() that aren´t necessary if you just want to read:
// first call
using (var dbContext = new CustomDbContext())
{
foreach (var item in dbContext.Items.Take(2000000))
{
// perform conversion tasks...
item.Converted = true;
}
}
// second call
using (var dbContext = new CustomDbContext())
{
foreach (var item in dbContext.Items.Take(2000000))
{
// perform conversion tasks...
item.Converted = true;
}
}
Running GC will not help you, you have to run each iteration in different context. And dispose your context.
// ID is your primary key
long startID = 0;
while(true){
using(var db = new CustomDbContext()){
var slice = db.Items.Where(x=>x.ID > startID)
.OrderBy(x=>x.ID)
.Take(1000).ToList();
// stop if there is nothing to process
if(!slice.Any())
break;
foreach(var item in slice){
// your logic...
item.Converted = true;
}
startID = slice.Last().ID;
}
}
If you want to process these things faster, alternate approach would be to run slices in parallel ....
Alternate Approach
I would recommend using dividing slices in 100x100, then I can process 100 slices of 100 items in parallel.
You can always easily customize slicing to meet your speed needs.
public IEnumerable<IEnumerable<T>> Slice(IEnumerable<T> src, int size){
while(src.Any()){
var s = src.Take(size);
src = src.Skip(size);
yield return s;
}
}
long startID = 0;
while(true){
using(var db = new CustomDbContext()){
var src = db.Items.Where(x=>x.ID > startID)
.OrderBy(x=>x.ID)
.Take(10000).Select(x=>x.ID).ToList();
// stop if there is nothing to process
if(!src.Any())
break;
Parallel.ForEach(src.Slice(100), slice => {
using(var sdb = new CustomDbContext()){
foreach(var item in sdb.Items.Where(x=> slice.Contains(x.ID)){
item.Converted = true;
}
}
} );
startID = src.Last();
}
}
After refactoring, memory gets released. I don't know why, but it works.
private static void Debug()
{
var iteration = 0;
while(true)
{
Console.WriteLine("Iteration {0}", iteration++);
Convert();
}
}
private static void Convert()
{
using (var dbContext = new CustomDbContext(args[0]))
{
var list = dbContext.Items.Take(2000000).ToList();
foreach (var item in list)
{
item.Converted = true;
}
}
}
When I move the content of Convert() to the while loop in Debug(), the OutOfMemoryExceptions is thrown.
private static void Debug()
{
var iteration = 0;
while(true)
{
Console.WriteLine("Iteration {0}", iteration++);
using (var dbContext = new CustomDbContext(args[0]))
{
// OutOfMemoryException in second iteration
var list = dbContext.Items.Take(2000000).ToList();
foreach (var item in list)
{
item.Converted = true;
}
}
}
}

Memory buffer and IO operations

Are the following two code samples equal in terms of perfomance?
Code Sample 1:
var count = 9999999999;
using(var sw = new StreamWriter())
{
for(int i=0;i<count;i++)
{
var result = SomeRelativeLongOperation(i);
sw.WriteLine(result);
}
}
Code Sample 2:
var count = 9999999999;
var resultCollection = new ....
using(var sw = new StreamWriter())
{
for(int i=0;i<count;i++)
{
resultCollection.Add(SomeRelativeLongOperation(i));
if(resultCollection.Count%100==0)
{
WriteBlock(sw,resultCollection);
resultCollection.Clear();
}
}
}
I know that Windows uses memory buffers for IO operations. So, when I call the StreamWriter.WriteLine method, it first stores data in memory and then flush to the hard drive, right?
StreamWriter is already buffered, so adding an additional buffer is simply going to make it less efficient.

Categories

Resources