I have a little sample application I was working on trying to get some of the new .Net 4.0 Parallel Extensions going (they are very nice). I'm running into a (probably really stupid) problem with an OutOfMemoryException. My main app that I'm looking to plug this sample into reads some data and lots of files, does some processing on them, and then writes them out somewhere. I was running into some issues with the files getting bigger (possibly GB's) and was concerned about memory so I wanted to parallelize things which led me down this path.
Now the below code gets an OOME on smaller files and I think I'm just missing something. It will read in 10-15 files and write them out in parellel nicely, but then it chokes on the next one. It looks like it's read and written about 650MB. A second set of eyes would be appreciated.
I'm reading into a MemorySteam from the FileStream because that is what is needed for the main application and I'm just trying to replicate that to some degree. It reads data and files from all types of places and works on them as MemoryStreams.
This is using .Net 4.0 Beta 2, VS 2010.
namespace ParellelJob
{
class Program
{
BlockingCollection<FileHolder> serviceToSolutionShare;
static void Main(string[] args)
{
Program p = new Program();
p.serviceToSolutionShare = new BlockingCollection<FileHolder>();
ServiceStage svc = new ServiceStage(ref p.serviceToSolutionShare);
SolutionStage sol = new SolutionStage(ref p.serviceToSolutionShare);
var svcTask = Task.Factory.StartNew(() => svc.Execute());
var solTask = Task.Factory.StartNew(() => sol.Execute());
while (!solTask.IsCompleted)
{
}
}
}
class ServiceStage
{
BlockingCollection<FileHolder> outputCollection;
public ServiceStage(ref BlockingCollection<FileHolder> output)
{
outputCollection = output;
}
public void Execute()
{
var di = new DirectoryInfo(#"C:\temp\testfiles");
var files = di.GetFiles();
foreach (FileInfo fi in files)
{
using (var fs = new FileStream(fi.FullName, FileMode.Open, FileAccess.Read))
{
int b;
var ms = new MemoryStream();
while ((b = fs.ReadByte()) != -1)
{
ms.WriteByte((byte)b); //OutOfMemoryException Occurs Here
}
var f = new FileHolder();
f.filename = fi.Name;
f.contents = ms;
outputCollection.TryAdd(f);
}
}
outputCollection.CompleteAdding();
}
}
class SolutionStage
{
BlockingCollection<FileHolder> inputCollection;
public SolutionStage(ref BlockingCollection<FileHolder> input)
{
inputCollection = input;
}
public void Execute()
{
FileHolder current;
while (!inputCollection.IsCompleted)
{
if (inputCollection.TryTake(out current))
{
using (var fs = new FileStream(String.Format(#"c:\temp\parellel\{0}", current.filename), FileMode.OpenOrCreate, FileAccess.Write))
{
using (MemoryStream ms = (MemoryStream)current.contents)
{
ms.WriteTo(fs);
current.contents.Close();
}
}
}
}
}
}
class FileHolder
{
public string filename { get; set; }
public Stream contents { get; set; }
}
}
The main logic seems OK, but if that empty while-loop in main is literal then you are burning unnecesary CPU cycles. Better use solTask.Wait() instead.
But if individual files can run in Gigabytes, you still have the problem of holding at least 1 completely in memory, and usually 2 (1 being read, 1 being processed/written.
PS1: I just realized you don't pre-allocate the MemStream. That's bad, it will have to re-size very often for a big file, and that costs a lot of memory. Better use something like:
var ms = new MemoryStream(fs.Length);
And then, for big files, you have to consider the Large Object Heap (LOH). Are you sure you can't break a file up in segments and process them?
PS2: And you don't need the ref's on the constructor parameters, but that's not the problem.
Just looking through quickly, inside your ServiceStage.Execute method you have
var ms = new MemoryStream();
I don't see where you are closing ms out or have it in a using. You do have the using in the other class. That's one thing to check out.
Related
What is the correct way to get the thumbnails of images when using C#? There must be some built-in system method for that, but I seem to be unable find it anywhere.
Right now I'm using a workaround, but it seems to be much heavier on the computing side, as generating the thumbnails of 50 images, when using parallel processing takes about 1-1,5 seconds, and during that time, my CPU is 100% loaded. Not to mention that it builds up quite some garbage, which it later needs to collect.
This is what my class currently looks like:
public class ImageData
{
public const int THUMBNAIL_SIZE = 160;
public string path;
private Image _thumbnail;
public string imageName { get { return Path.GetFileNameWithoutExtension(path); } }
public string folder { get { return Path.GetDirectoryName(path); } }
public Image image { get
{
try
{
using (FileStream stream = new FileStream(path, FileMode.Open, FileAccess.Read))
using (BinaryReader reader = new BinaryReader(stream))
{
var memoryStream = new MemoryStream(reader.ReadBytes((int)stream.Length));
return new Bitmap(memoryStream);
}
}
catch (Exception e) { }
return null;
}
}
public Image thumbnail
{
get
{
if (_thumbnail == null)
LoadThumbnail();
return _thumbnail;
}
}
public void LoadThumbnail()
{
if (_thumbnail != null) return;
Image img = image;
if (img == null) return;
float ratio = (float)image.Width / (float)image.Height;
int h = THUMBNAIL_SIZE;
int w = THUMBNAIL_SIZE;
if (ratio > 1)
h = (int)(THUMBNAIL_SIZE / ratio);
else
w = (int)(THUMBNAIL_SIZE * ratio);
_thumbnail = new Bitmap(image, w, h);
}
I am saving up the thumbnail once generated, to save up some computing time later on. Meanwhile, I have an array of 50 elements, containing picture boxes, where I inject the thumbnails into.
Anyways... when I open a folder, containing images, my PC certainly doesn't use up 100% CPU for the thumbnails, so I am wondering what is the correct method to generate them.
Windows pregenerates the thumbnails and stores them in the thumbs.db-File (hidden) for later use.
So unless you either access the thumbs.db file and are fine with relying on it being available or cache the thumbnails yourself somewehere you always will have to render them in some way or another.
That being said, you can probably rely on whatever framework you are using for your UI to display them scaled down seeing as you load them into memory anyway.
public string SavePath { get; set; } = #"I:\files\";
public void DownloadList(List<string> list)
{
var rest = ExcludeDownloaded(list);
var result = Parallel.ForEach(rest, link=>
{
Download(link);
});
}
private void Download(string link)
{
using(var net = new System.Net.WebClient())
{
var data = net.DownloadData(link);
var fileName = code to generate unique fileName;
if (File.Exists(fileName))
return;
File.WriteAllBytes(fileName, data);
}
}
var downloader = new DownloaderService();
var links = downloader.GetLinks();
downloader.DownloadList(links);
I observed the usage of RAM for the project keeps growing
I guess there is something wrong on the Parallel.ForEach(), but I cannot figure it out.
Is there the memory leak, or what is happening?
Update 1
After changed to the new code
private void Download(string link)
{
using(var net = new System.Net.WebClient())
{
var fileName = code to generate unique fileName;
if (File.Exists(fileName))
return;
var data = net.DownloadFile(link, fileName);
Track theTrack = new Track(fileName);
theTrack.Title = GetCDName();
theTrack.Save();
}
}
I still observed increasing memory use after keeping running for 9 hours, it is much slowly growing usage though.
Just wondering, is it because that I didn't free the memory use of theTrack file?
Btw, I use ALT package for update file metadata, unfortunately, it doesn't implement IDisposable interface.
The Parallel.ForEach method is intended for parallelizing CPU-bound workloads. Downloading a file is an I/O bound workload, and so the Parallel.ForEach is not ideal for this case because it needlessly blocks ThreadPool threads. The correct way to do it is asynchronously, with async/await. The recommended class for making asynchronous web requests is the HttpClient, and for controlling the level of concurrency an excellent option is the TPL Dataflow library. For this case it is enough to use the simplest component of this library, the ActionBlock class:
async Task DownloadListAsync(List<string> list)
{
using (var httpClient = new HttpClient())
{
var rest = ExcludeDownloaded(list);
var block = new ActionBlock<string>(async link =>
{
await DownloadFileAsync(httpClient, link);
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 10
});
foreach (var link in rest)
{
await block.SendAsync(link);
}
block.Complete();
await block.Completion;
}
}
async Task DownloadFileAsync(HttpClient httpClient, string link)
{
var fileName = Guid.NewGuid().ToString(); // code to generate unique fileName;
var filePath = Path.Combine(SavePath, fileName);
if (File.Exists(filePath)) return;
var response = await httpClient.GetAsync(link);
response.EnsureSuccessStatusCode();
using (var contentStream = await response.Content.ReadAsStreamAsync())
using (var fileStream = new FileStream(filePath, FileMode.Create,
FileAccess.Write, FileShare.None, 32768, FileOptions.Asynchronous))
{
await contentStream.CopyToAsync(fileStream);
}
}
The code for downloading a file with HttpClient is not as simple as the WebClient.DownloadFile(), but it's what you have to do in order to keep the whole process asynchronous (both reading from the web and writing to the disk).
Caveat: Asynchronous filesystem operations are currently not implemented efficiently in .NET. For maximum efficiency it may be preferable to avoid using the FileOptions.Asynchronous option in the FileStream constructor.
.NET 6 update: The preferable way for parallelizing asynchronous work is now the Parallel.ForEachAsync API. A usage example can be found here.
Use WebClient.DownloadFile() to download directly to a file so you don't have the whole file in memory.
I'm writing a Windows Phone Silverlight app. I want to save an object to a JSON file. I've written the following piece of code.
string jsonFile = JsonConvert.SerializeObject(usr);
IsolatedStorageFile isoStore = IsolatedStorageFile.GetUserStoreForApplication();
IsolatedStorageFileStream isoStream = new IsolatedStorageFileStream("users.json", FileMode.Create, isoStore);
StreamWriter str = new StreamWriter(isoStream);
str.Write(jsonFile);
This is enough to create a JSON file but it is empty. Am I doing something wrong? Wasn't this supposed to write the object to the file?
The problem is that you're not closing the stream.
File I/O in Windows have buffers at the operating system level, and .NET might even implement buffers at the API level, which means that unless you tell the class "Now I'm done", it will never know when to ensure those buffers are propagated all the way down to the platter.
You should rewrite your code just slightly, like this:
using (StreamWriter str = new StreamWriter(isoStream))
{
str.Write(jsonFile);
}
using (...) { ... } will ensure that when the code leaves the block, the { ... } part, it will call IDisposable.Dispose on the object, which in this case will flush the buffers and close the underlying file.
I use these. Shoud work for you as well.
public async Task SaveFile(string fileName, string data)
{
System.IO.IsolatedStorage.IsolatedStorageFile local =
System.IO.IsolatedStorage.IsolatedStorageFile.GetUserStoreForApplication();
if (!local.DirectoryExists("MyDirectory"))
local.CreateDirectory("MyDirectory");
using (var isoFileStream =
new System.IO.IsolatedStorage.IsolatedStorageFileStream(
string.Format("MyDirectory\\{0}.txt", fileName),
System.IO.FileMode.Create, System.IO.FileAccess.ReadWrite, System.IO.FileShare.ReadWrite,
local))
{
using (var isoFileWriter = new System.IO.StreamWriter(isoFileStream))
{
await isoFileWriter.WriteAsync(data);
}
}
}
public async Task<string> LoadFile(string fileName)
{
string data;
System.IO.IsolatedStorage.IsolatedStorageFile local =
System.IO.IsolatedStorage.IsolatedStorageFile.GetUserStoreForApplication();
using (var isoFileStream =
new System.IO.IsolatedStorage.IsolatedStorageFileStream
(string.Format("MyDirectory\\{0}.txt", fileName),
System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read,
local))
{
using (var isoFileReader = new System.IO.StreamReader(isoFileStream))
{
data = await isoFileReader.ReadToEndAsync();
}
}
return data;
}
I have a class library that gets called from a windows service, the class library can be called many times at the same time.
I have an issue where i have to read file contents in my class, so i get the error that the file is being used by another process if the class is getting called by many processes.
This is how i read the file content:
File.ReadAllBytes("path");
What is the best solution in this case ?
Thank you
The following code demonstrates to access a file by setting its share permissions. The first using block creates and writes file, second and third using blocks access and read the file.
var fileName = "test.txt";
using (var fsWrite = new FileStream(fileName, FileMode.OpenOrCreate, FileAccess.Write, FileShare.ReadWrite))
{
var content = Encoding.UTF8.GetBytes("test");
fsWrite.Write(content, 0, content.Length);
fsWrite.Flush();
using (var fsRead_1 = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
var bufRead_1 = new byte[fsRead_1.Length];
fsRead_1.Read(bufRead_1, 0, bufRead_1.Length);
Console.WriteLine("fsRead_1:" + Encoding.UTF8.GetString(bufRead_1));
using (var fsRead_2 = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
var bufRead_2 = new byte[fsRead_2.Length];
fsRead_2.Read(bufRead_2, 0, bufRead_2.Length);
Console.WriteLine("fsRead_2:" + Encoding.UTF8.GetString(bufRead_2));
}
}
}
You need to synchronize the access to the whole file using standard thread synchronization approaches.
The simplest one is Monitor using lock statement:
public class A
{
private static readonly object _sync = new object();
public void DoStuff()
{
// All threads trying to enter this critical section will
// wait until the first to enter exits it
lock(_sync)
{
byte[] buffer = File.ReadAllBytes(#"C:\file.jpg");
}
}
}
Note
Firstly, I was understanding OP was accessing the file from different processes, but when I double-checked the statement:
I have a class library that gets called from a windows service, the
class library can be called many times at the same time.
...I realized OP is calling a method which reads all bytes from some file within the same Windows service instance.
Use Mutex for syncing among different processes. File.ReadAllBytes uses a FileAccess.Read and FileShare.Read when reads the file, so normally you don't need to use any locks here. So you get this exception because the file is being written somewhere (or at least is locked for writing).
Solution 1 - if you are the one who writes this file
private static Mutex mutex;
public void WriteFile(string path)
{
Mutex mutex = GetOrCreateMutex();
try
{
mutex.WaitOne();
// TODO: ... write file
}
finally
{
mutex.ReleaseMutex();
}
}
public byte[] ReadFile(string path)
{
// Note: If you just read the file, this lock is completely unnecessary
// because ReadAllFile uses Read access. This just protects the file
// being read and written at the same time
Mutex mutex = GetOrCreateMutex();
try
{
mutex.WaitOne();
return File.ReadAllBytes(path);
}
finally
{
mutex.ReleaseMutex();
}
}
private static Mutex GetOrCreateMutex()
{
try
{
mutex = Mutex.OpenExisting("MyMutex");
}
catch (WaitHandleCannotBeOpenedException)
{
mutex = new Mutex(false, "MyMutex");
}
}
Remark: A ReadWriteLock would be better here because you can read a file safely parallelly when it is not being written; however, there is no built-in inter-process read-write lock in .NET. Here is an example how you can implement one with Mutex and Semaphor types.
Solution 2 - if you just read the file
You must simply being prepared that the file can be locked when it is being written by a 3rd process:
public byte[] TryReadFile(string path, int maxTry)
{
Exception e;
for (int i = 0; i < maxTry; i++)
{
try
{
return File.ReadAllBytes(path);
}
catch (IOException io)
{
e = io;
Thread.Sleep(100);
}
}
throw e; // or just return null
}
I'm using DotNetZip to add multiple MemoryStreams to a single archive. So far, my code works when I select 1 or 2 files, but does not work if I add more. I found the difference is the CRC32 are all 00000000 for those bad archive. Is it something about the archive size? Any help is appreciated!
My code in C#:
foreach(.....){
var zipEntryName=.....//Get the file name in string;
var UDocument = .....//Get a object
var UStream = UDocument .GetStream();
UStream.Seek(0, SeekOrigin.Begin);
ZipEntry entry = zipFile.AddEntry(zipEntryName,UStream );
}
var outputStream = new MemoryStream();
outputStream.Seek(0, SeekOrigin.Begin);
zipFile.Save(outputStream);
outputStream.Flush();
return outputStream;
I think its beacuse of memory leakage.
you are creating object in foreach loop and here the problem comes if the loop iterates more times.
here the problem comes in your code:
var UDocument = .....//Get a object
a singleton is a class that can be instantiated once, and only once.
use singleton class as below:
public static SingletonSample InstanceCreation()
{
private static object lockingObject = new object();
if(singletonObject == null)
{
lock (lockingObject)
{
singletonObject = new SingletonSample();
}
}
return singletonObject;
}