C# Exception, file being use by another process [duplicate]

C# Exception, file being use by another process [duplicate] - c#

Writing Stringbuilder to file asynchronously. This code takes control of a file, writes a stream to it and releases it. It deals with requests from asynchronous operations, which may come in at any time.
The FilePath is set per class instance (so the lock Object is per instance), but there is potential for conflict since these classes may share FilePaths. That sort of conflict, as well as all other types from outside the class instance, would be dealt with retries.
Is this code suitable for its purpose? Is there a better way to handle this that means less (or no) reliance on the catch and retry mechanic?
Also how do I avoid catching exceptions that have occurred for other reasons.
public string Filepath { get; set; }
private Object locker = new Object();
public async Task WriteToFile(StringBuilder text)
{
int timeOut = 100;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
while (true)
{
try
{
//Wait for resource to be free
lock (locker)
{
using (FileStream file = new FileStream(Filepath, FileMode.Append, FileAccess.Write, FileShare.Read))
using (StreamWriter writer = new StreamWriter(file, Encoding.Unicode))
{
writer.Write(text.ToString());
}
}
break;
}
catch
{
//File not available, conflict with other class instances or application
}
if (stopwatch.ElapsedMilliseconds > timeOut)
{
//Give up.
break;
}
//Wait and Retry
await Task.Delay(5);
}
stopwatch.Stop();
}

How you approach this is going to depend a lot on how frequently you're writing. If you're writing a relatively small amount of text fairly infrequently, then just use a static lock and be done with it. That might be your best bet in any case because the disk drive can only satisfy one request at a time. Assuming that all of your output files are on the same drive (perhaps not a fair assumption, but bear with me), there's not going to be much difference between locking at the application level and the lock that's done at the OS level.
So if you declare locker as:
static object locker = new object();
You'll be assured that there are no conflicts with other threads in your program.
If you want this thing to be bulletproof (or at least reasonably so), you can't get away from catching exceptions. Bad things can happen. You must handle exceptions in some way. What you do in the face of error is something else entirely. You'll probably want to retry a few times if the file is locked. If you get a bad path or filename error or disk full or any of a number of other errors, you probably want to kill the program. Again, that's up to you. But you can't avoid exception handling unless you're okay with the program crashing on error.
By the way, you can replace all of this code:
using (FileStream file = new FileStream(Filepath, FileMode.Append, FileAccess.Write, FileShare.Read))
using (StreamWriter writer = new StreamWriter(file, Encoding.Unicode))
{
writer.Write(text.ToString());
}
With a single call:
File.AppendAllText(Filepath, text.ToString());
Assuming you're using .NET 4.0 or later. See File.AppendAllText.
One other way you could handle this is to have the threads write their messages to a queue, and have a dedicated thread that services that queue. You'd have a BlockingCollection of messages and associated file paths. For example:
class LogMessage
{
public string Filepath { get; set; }
public string Text { get; set; }
}
BlockingCollection<LogMessage> _logMessages = new BlockingCollection<LogMessage>();
Your threads write data to that queue:
_logMessages.Add(new LogMessage("foo.log", "this is a test"));
You start a long-running background task that does nothing but service that queue:
foreach (var msg in _logMessages.GetConsumingEnumerable())
{
// of course you'll want your exception handling in here
File.AppendAllText(msg.Filepath, msg.Text);
}
Your potential risk here is that threads create messages too fast, causing the queue to grow without bound because the consumer can't keep up. Whether that's a real risk in your application is something only you can say. If you think it might be a risk, you can put a maximum size (number of entries) on the queue so that if the queue size exceeds that value, producers will wait until there is room in the queue before they can add.

You could also use ReaderWriterLock, it is considered to be more 'appropriate' way to control thread safety when dealing with read write operations...
To debug my web apps (when remote debug fails) I use following ('debug.txt' end up in \bin folder on the server):
public static class LoggingExtensions
{
static ReaderWriterLock locker = new ReaderWriterLock();
public static void WriteDebug(string text)
{
try
{
locker.AcquireWriterLock(int.MaxValue);
System.IO.File.AppendAllLines(Path.Combine(Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().GetName().CodeBase).Replace("file:\\", ""), "debug.txt"), new[] { text });
}
finally
{
locker.ReleaseWriterLock();
}
}
}
Hope this saves you some time.

Related

lock(obj) may not be working c#

I came across an issue, and I'm not sure if it's me or if there's an issue with thread locking.
I have a class I use for basic utilities. In that class is method to create or append a text file. And because I use it debug, I have the method using lock() to keep the access singular. Except, it appears to be failing and allowing multiple threads into the blocked code.
When running my test threads it doesn't throw an error every time. It's a little weird. There are 50 threads/tasks being created. Each thread is writing a line to a singe file using the class below. It cycles through about 3100 individual tasks. But a maximum of 50 tasks are created to handle each batch. As each thread completes its task, a new one is created to take its place. The last batch processed 3188 commands and threw 16 errors.
I have tried using Monitor.Enter and Exit, but I get the same results. I have also tried making the StdLibLockObj readonly. All with the same results.
Error: The process cannot access the file 'ThreadExe.txt' because it is being used by another process.
static class StdLib
{
private static object StdLibLockObj = new object();
public static void WriteLogFile(string #AFileName, string FileData, bool AppendIfExists = true, bool AddAppPath = true)
{
lock (StdLibLockObj)
{
StreamWriter sw = null;
try
{
if (AddAppPath)
{
AFileName = #Path.Combine(#ApplicationPath(), #AFileName);
}
if ((AppendIfExists) && File.Exists(AFileName))
{
sw = File.AppendText(AFileName);
}
else
{
sw = File.CreateText(AFileName);
}
sw.Write(FileData);
}
finally
{
if (sw != null)
{
sw.Flush();
sw.Close();
sw.Dispose();
}
sw = null;
}
}
}
}
My background is mostly in Delphi, where threading is a bit more granular.
Any help would be appreciated.

Wrap your StreamWriter entries in a "using" block. That will get rid of locking. Sort of like this:
public static void ErrorMessage(string logMessage)
{
using (StreamWriter sw_errors = new StreamWriter(m_errors, true))
{
sw_errors.Write("\r\nLog Entry : ");
sw_errors.WriteLine("{0} {1}", DateTime.Now.ToLongTimeString(),
DateTime.Now.ToLongDateString());
sw_errors.WriteLine(" :");
sw_errors.WriteLine(" :{0}", logMessage);
sw_errors.WriteLine("-------------------------------");
}
}

C# Read/Write file from multiple applications

I have scenario where we are maintaining Rates file(.xml) which is access by 3 different application running on 3 different servers. All 3 application uses RatesMaintenance.dll which has below 4 methods Load, Write, Read and Close.
All 3 applications writes into file continuously and hence I have added Monitor.Enter and Monitor.Exit mechanism assuming that these 3 operation from 3 different application will not collide. But at this moment, in some case, I am getting error - "Could not open the rates file"
As per my understanding, this means, some reason 3 application tries to access at same. Could anyone please suggest how to handle such scenario?
Monitor.Enter(RatesFileLock);
try
{
//Open Rates file
LoadRatesFile(false);
//Write Rates into file
WriteRatesToFile();
//Close Rates file
CloseRatesFile();
}
finally
{
Monitor.Exit(RatesFileLock);
}
Method signature of Load-
LoadRatesFile(bool isReadOnly)
For opening file-
new FileStream(RatesFilePath,
isReadOnly ? FileMode.Open : FileMode.OpenOrCreate,
isReadOnly ? FileAccess.Read : FileAccess.ReadWrite,
isReadOnly ? FileShare.ReadWrite : FileShare.None);
.... remaining Rates reading logic code here
For reading Rates from file-
Rates = LoadRatesFile(true);
For Writing Rates into file-
if (_RatesFileStream != null && _RatesInfo != null && _RatesFileSerializer != null)
{
_RatesFileStream.SetLength(0);
_RatesFileSerializer.Serialize(_RatesFileStream, _RatesInfo);
}
In closing file method-
_RatesFileStream.Close();
_RatesFileStream = null;
I hope, I try to explain my scenario in details. Please let me know in case anyone more details.

While the other answers are correct and that you won't be able to get a perfect solution with files that are accessed concurrently by multiple processes, adding a retry mechanism may make it reliable enough for your use case.
Before I show one way to do that, I've got two minor suggestions - C#'s "using" blocks are really useful for dealing with resources such as files and locks that you really want to be sure to dispose of after use. In your code, the monitor is always exited because you use try..finally (though this would still be clearer with an outer "lock" block) but you don't close the file if the WriteRatesToFile method fails.
So, firstly, I'd suggest changing your code to something like the following -
private static object _ratesFileLock = new object();
public void UpdateRates()
{
lock (_ratesFileLock)
{
using (var stream = GetRatesFileStream())
{
var rates = LoadRatesFile(stream);
// Apply any other update logic here
WriteRatesToFile(rates, stream);
}
}
}
private Stream GetRatesFileStream()
{
return File.Open("rates.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite);
}
private IEnumerable<Rate> LoadRatesFile(Stream stream)
{
// Apply any other logic here
return RatesSerialiser.Deserialise(stream);
}
private void WriteRatesToFile(IEnumerable<Rate> rates, Stream stream)
{
RatesSerialiser.Serialise(rates, stream);
}
This tries to opens the file stream once and then reuses it between load and write actions - and reliably dispose of it, even if an error is encountered inside in the using block (same applies to the "lock" block, which is simpler than Monitor.Enter/Exit and try..finally).
This could quite simply be extended to include a retry mechanism so that if the file is locked by another process then we wait a short time and then try again -
private static object _ratesFileLock = new object();
public void UpdateRates()
{
Attempt(TryToUpdateRates, maximumNumberOfAttempts: 50, timeToWaitBetweenRetriesInMs: 100);
}
private void TryToUpdateRates()
{
lock (_ratesFileLock)
{
using (var stream = GetRatesFileStream())
{
var rates = LoadRatesFile(stream);
// Apply any other update logic here
WriteRatesToFile(rates, stream);
}
}
}
private Stream GetRatesFileStream()
{
return File.Open("rates.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite);
}
private IEnumerable<Rate> LoadRatesFile(Stream stream)
{
// Apply any other logic here
return RatesSerialiser.Deserialise(stream);
}
private void WriteRatesToFile(IEnumerable<Rate> rates, Stream stream)
{
RatesSerialiser.Serialise(rates, stream);
}
private static void Attempt(Action work, int maximumNumberOfAttempts, int timeToWaitBetweenRetriesInMs)
{
var numberOfFailedAttempts = 0;
while (true)
{
try
{
work();
return;
}
catch
{
numberOfFailedAttempts++;
if (numberOfFailedAttempts >= maximumNumberOfAttempts)
throw;
Thread.Sleep(timeToWaitBetweenRetriesInMs);
}
}
}

What you're trying to do is difficult bordering on impossible. I won't say that it's impossible because there is always a way, but it's better not to try to make something work in a way it wasn't intended to.
And even if you get it to work and you could ensure that applications on multiple servers don't overstep each other, someone could write some other process that locks the same file because it doesn't know about the system in place for gaining access to that file and playing well with the other apps.
You could check to see if the file is in use before opening it, but there's no guarantee that another server won't open it in between when you checked and when you tried to open it.
The ideal answer is not to try to use a file as database accessed by multiple applications concurrently. That's exactly what databases are for. They can handle multiple concurrent requests to read and write records. Sometimes we use files for logs or other data. But if you've got applications going on three servers then you really need a database.

How to minimize Logging impact

There's already a question about this issue, but it's not telling me what I need to know:
Let's assume I have a web application, and there's a lot of logging on every roundtrip. I don't want to open a debate about why there's so much logging, or how can I do less loggin operations. I want to know what possibilities I have in order to make this logging issue performant and clean.
So far, I've implemented declarative (attribute based) and imperative logging, which seems to be a cool and clean way of doing it... now, what can I do about performance, assuming I can expect those logs to take more time than expected. Is it ok to open a thread and leave that work to it?

Things I would consider:
Use an efficient file format to minimise the quantity of data to be written (e.g. XML and text formats are easy to read but usually terribly inefficient - the same information can be stored in a binary format in a much smaller space). But don't spend lots of CPU time trying to pack data "optimally". Just go for a simple format that is compact but fast to write.
Test use of compression on the logs. This may not be the case with a fast SSD but in most I/O situations the overhead of compressing data is less than the I/O overhead, so compression gives a net gain (although it is a compromise - raising CPU usage to lower I/O usage).
Only log useful information. No matter how important you think everything is, it's likely you can find something to cut out.
Eliminate repeated data. e.g. Are you logging a client's IP address or domain name repeatedly? Can these be reported once for a session and then not repeated? Or can you store them in a map file and use a compact index value whenever you need to reference them? etc
Test whether buffering the logged data in RAM helps improve performance (e.g. writing a thousand 20 byte log records will mean 1,000 function calls and could cause a lot of disk seeking and other write overheads, while writing a single 20,000 byte block in one burst means only one function call and could give significant performance increase and maximise the burst rate you get out to disk). Often writing blocks in sizes like (4k, 16k, 32, 64k) of data works well as it tends to fit the disk and I/O architecture (but check your specific architecture for clues about what sizes might improve efficiency). The down side of a RAM buffer is that if there is a power outage you will lose more data. So you may have to balance performance against robustness.
(Especially if you are buffering...) Dump the information to an in-memory data structure and pass it to another thread to stream it out to disk. This will help stop your primary thread being held up by log I/O. Take care with threads though - for example, you may have to consider how you will deal with times when you are creating data faster than it can be logged for short bursts - do you need to implement a queue, etc?
Are you logging multiple streams? Can these be multiplexed into a single log to possibly reduce disk seeking and the number of open files?
Is there a hardware solution that will give a large bang for your buck? e.g. Have you used SSD or RAID disks? Will dumping the data to a different server help or hinder? It may not always make much sense to spend $10,000 of developer time making something perform better if you can spend $500 to simply upgrade the disk.

I use the code below to Log. It is a singleton that accepts Logging and puts every message into a concurrentqueue. Every two seconds it writes all that has come in to the disk. Your app is now only delayed by the time it takes to put every message in the list. It's my own code, feel free to use it.
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Windows.Forms;
namespace FastLibrary
{
public enum Severity : byte
{
Info = 0,
Error = 1,
Debug = 2
}
public class Log
{
private struct LogMsg
{
public DateTime ReportedOn;
public string Message;
public Severity Seriousness;
}
// Nice and Threadsafe Singleton Instance
private static Log _instance;
public static Log File
{
get { return _instance; }
}
static Log()
{
_instance = new Log();
_instance.Message("Started");
_instance.Start("");
}
~Log()
{
Exit();
}
public static void Exit()
{
if (_instance != null)
{
_instance.Message("Stopped");
_instance.Stop();
_instance = null;
}
}
private ConcurrentQueue<LogMsg> _queue = new ConcurrentQueue<LogMsg>();
private Thread _thread;
private string _logFileName;
private volatile bool _isRunning;
public void Message(string msg)
{
_queue.Enqueue(new LogMsg { ReportedOn = DateTime.Now, Message = msg, Seriousness = Severity.Info });
}
public void Message(DateTime time, string msg)
{
_queue.Enqueue(new LogMsg { ReportedOn = time, Message = msg, Seriousness = Severity.Info });
}
public void Message(Severity seriousness, string msg)
{
_queue.Enqueue(new LogMsg { ReportedOn = DateTime.Now, Message = msg, Seriousness = seriousness });
}
public void Message(DateTime time, Severity seriousness, string msg)
{
_queue.Enqueue(new LogMsg { ReportedOn = time, Message = msg, Seriousness = seriousness });
}
private void Start(string fileName = "", bool oneLogPerProcess = false)
{
_isRunning = true;
// Unique FileName with date in it. And ProcessId so the same process running twice will log to different files
string lp = oneLogPerProcess ? "_" + System.Diagnostics.Process.GetCurrentProcess().Id : "";
_logFileName = fileName == ""
? DateTime.Now.Year.ToString("0000") + DateTime.Now.Month.ToString("00") +
DateTime.Now.Day.ToString("00") + lp + "_" +
System.IO.Path.GetFileNameWithoutExtension(Application.ExecutablePath) + ".log"
: fileName;
_thread = new Thread(LogProcessor);
_thread.IsBackground = true;
_thread.Start();
}
public void Flush()
{
EmptyQueue();
}
private void EmptyQueue()
{
while (_queue.Any())
{
var strList = new List<string>();
//
try
{
// Block concurrent writing to file due to flush commands from other context
lock (_queue)
{
LogMsg l;
while (_queue.TryDequeue(out l)) strList.Add(l.ReportedOn.ToLongTimeString() + "|" + l.Seriousness + "|" + l.Message);
if (strList.Count > 0)
{
System.IO.File.AppendAllLines(_logFileName, strList);
strList.Clear();
}
}
}
catch
{
//ignore errors on errorlogging ;-)
}
}
}
public void LogProcessor()
{
while (_isRunning)
{
EmptyQueue();
// Sleep while running so we write in efficient blocks
if (_isRunning) Thread.Sleep(2000);
else break;
}
}
private void Stop()
{
// This is never called in the singleton.
// But we made it a background thread so all will be killed anyway
_isRunning = false;
if (_thread != null)
{
_thread.Join(5000);
_thread.Abort();
_thread = null;
}
}
}
}

Check if the logger is debug enabled before calling logger.debug, this means your code does not have to evaluate the message string when debug is turned off.
if (_logger.IsDebugEnabled) _logger.Debug($"slow old string {this.foo} {this.bar}");

Stop thread until enough memory is available

Environment : .net 4.0
I have a task that transforms XML files with a XSLT stylesheet, here is my code
public string TransformFileIntoTempFile(string xsltPath,
string xmlPath)
{
var transform = new MvpXslTransform();
transform.Load(xsltPath, new XsltSettings(true, false),
new XmlUrlResolver());
string tempPath = Path.GetTempFileName();
using (var writer = new StreamWriter(tempPath))
{
using (XmlReader reader = XmlReader.Create(xmlPath))
{
transform.Transform(new XmlInput(reader), null,
new XmlOutput(writer));
}
}
return tempPath;
}
I have X threads that can launch this task in parallel.
Sometimes my input file are about 300 MB, sometimes it's only a few MB.
My problem : I get OutOfMemoryException when my program try to transform some big XML files in the same time.
How can I avoid these OutOfMemoryEception ? My idea is to stop a thread before executing the task until there is enough available memory, but I don't know how to do that. Or there is some other solution (like putting my task in a distinct application).
Thanks

I don't recommend blocking a thread. In worst case, you'll just end up starving the task that could potentially free the memory you needed, leading to deadlock or very bad performance in general.
Instead, I suggest you keep a work queue with priorities. Get the tasks from the Queue scheduled fairly across a thread pool. Make sure no thread ever blocks on a wait operation, instead repost the task to the queue (with a lower priority).
So what you'd do (e.g. on receiving an OutOfMemory exception), is post the same job/task onto the queue and terminate the current task, freeing up the thread for another task.
A simplistic approach is to use LIFO which ensures that a task posted to the queue will have 'lower priority' than any other jobs already on that queue.

Since .NET Framework 4 we have API to work with good old Memory-Mapped Files feature which is available many years within from Win32API, so now you can use it from the .NET Managed Code.
For your task better fit "Persisted memory-mapped files" option,
MSDN:
Persisted files are memory-mapped files that are associated with a
source file on a disk. When the last process has finished working with
the file, the data is saved to the source file on the disk. These
memory-mapped files are suitable for working with extremely large
source files.
On the page of MemoryMappedFile.CreateFromFile() method description you can find a nice example describing how to create a memory mapped Views for the extremely large file.
EDIT: Update regarding considerable notes in comments
Just found method MemoryMappedFile.CreateViewStream() which creates a stream of type MemoryMappedViewStream which is inherited from a System.IO.Stream.
I believe you can create an instance of XmlReader from this stream and then instantiate your custom implementation of the XslTransform using this reader/stream.
EDIT2: remi bourgarel (OP) already tested this approach and looks like this particular XslTransform implementation (I wonder whether ANY would) wont work with MM-View stream in way which was supposed

The main problem is that you are loading the entire Xml file. If you were to just transform-as-you-read the out of memory problem should not normally appear.
That being said I found a MS support article which suggests how it can be done:
http://support.microsoft.com/kb/300934
Disclaimer: I did not test this so if you use it and it works please let us know.

You could consider using a queue to throttle how many concurrent transforms are being done based on some sort of artificial memory boundary e.g. file size. Something like the following could be used.
This sort of throttling strategy can be combined with maximum number of concurrent files being processed to ensure your disk is not being thrashed too much.
NB I have not included necessary try\catch\finally around execution to ensure that exceptions are propogated to calling thread and Waithandles are always released. I could go into further detail here.
public static class QueuedXmlTransform
{
private const int MaxBatchSizeMB = 300;
private const double MB = (1024 * 1024);
private static readonly object SyncObj = new object();
private static readonly TaskQueue Tasks = new TaskQueue();
private static readonly Action Join = () => { };
private static double _CurrentBatchSizeMb;
public static string Transform(string xsltPath, string xmlPath)
{
string tempPath = Path.GetTempFileName();
using (AutoResetEvent transformedEvent = new AutoResetEvent(false))
{
Action transformTask = () =>
{
MvpXslTransform transform = new MvpXslTransform();
transform.Load(xsltPath, new XsltSettings(true, false),
new XmlUrlResolver());
using (StreamWriter writer = new StreamWriter(tempPath))
using (XmlReader reader = XmlReader.Create(xmlPath))
{
transform.Transform(new XmlInput(reader), null,
new XmlOutput(writer));
}
transformedEvent.Set();
};
double fileSizeMb = new FileInfo(xmlPath).Length / MB;
lock (SyncObj)
{
if ((_CurrentBatchSizeMb += fileSizeMb) > MaxBatchSizeMB)
{
_CurrentBatchSizeMb = fileSizeMb;
Tasks.Queue(isParallel: false, task: Join);
}
Tasks.Queue(isParallel: true, task: transformTask);
}
transformedEvent.WaitOne();
}
return tempPath;
}
private class TaskQueue
{
private readonly object _syncObj = new object();
private readonly Queue<QTask> _tasks = new Queue<QTask>();
private int _runningTaskCount;
public void Queue(bool isParallel, Action task)
{
lock (_syncObj)
{
_tasks.Enqueue(new QTask { IsParallel = isParallel, Task = task });
}
ProcessTaskQueue();
}
private void ProcessTaskQueue()
{
lock (_syncObj)
{
if (_runningTaskCount != 0) return;
while (_tasks.Count > 0 && _tasks.Peek().IsParallel)
{
QTask parallelTask = _tasks.Dequeue();
QueueUserWorkItem(parallelTask);
}
if (_tasks.Count > 0 && _runningTaskCount == 0)
{
QTask serialTask = _tasks.Dequeue();
QueueUserWorkItem(serialTask);
}
}
}
private void QueueUserWorkItem(QTask qTask)
{
Action completionTask = () =>
{
qTask.Task();
OnTaskCompleted();
};
_runningTaskCount++;
ThreadPool.QueueUserWorkItem(_ => completionTask());
}
private void OnTaskCompleted()
{
lock (_syncObj)
{
if (--_runningTaskCount == 0)
{
ProcessTaskQueue();
}
}
}
private class QTask
{
public Action Task { get; set; }
public bool IsParallel { get; set; }
}
}
}
Update
Fixed bug in maintaining batch size when rolling over to next batch window:
_CurrentBatchSizeMb = fileSizeMb;

Out of memory exceptions when using MemoryStream in cache

We are dealing with a lot of files which need to be opened and close for data reads mostly.
Is it a good idea or not to cache the memorystream of each file in a temp hashtable or some other object?
We have noticed when opening files over 100MB we are running into out of memory exceptions.
We are using a wpf app.
We could successfully open the files 1 or 2 time sometimes 3 to 4 times but after that we are running into out of memory exceptions.

If you are currently caching these files, then you would expect to run out of memory quite quickly.
If you aren't caching them yet, don't, because you'll just make it worse. Perhaps you have a memory leak? Are you disposing of the memorystream once you've used it?
The best way to deal with large files is to stream data in and out (using FileStreams), so that you don't have to have the whole file in memory at once...

One issue with the MemoryStream is the internal buffer doubles in size each time the capacity is forced to increase. Even if your MemoryStream is 100MB and your file is 101MB, as soon as you try to write that last 1MB to the MemoryStream the internal buffer on MemoryStream is doubled to 200MB. You may reduce this if you give the Memory Buffer a starting capacity equal to that of you files. But this will still allow the files to use all of the memory and stop any new allocations after the some of the files are loaded. If create a cache object that is help inside of a WeakReference object you would be able to allow the garbage collector to toss a few of your cached files as needed. But don't forget you will need to add code to recreate the lost cache on demand.
public class CacheStore<TKey, TCache>
{
private static object _lockStore = new object();
private static CacheStore<TKey, TCache> _store;
private static object _lockCache = new object();
private static Dictionary<TKey, TCache> _cache =
new Dictionary<TKey, TCache>();
public TCache this[TKey index]
{
get
{
lock (_lockCache)
{
if (_cache.ContainsKey(index))
return _cache[index];
return default(TCache);
}
}
set
{
lock (_lockCache)
{
if (_cache.ContainsKey(index))
_cache.Remove(index);
_cache.Add(index, value);
}
}
}
public static CacheStore<TKey, TCache> Instance
{
get
{
lock (_lockStore)
{
if (_store == null)
_store = new CacheStore<TKey, TCache>();
return _store;
}
}
}
}
public class FileCache
{
private WeakReference _cache;
public FileCache(string fileLocation)
{
if (!File.Exists(fileLocation))
throw new FileNotFoundException("fileLocation", fileLocation);
this.FileLocation = fileLocation;
}
private MemoryStream GetStream()
{
if (!File.Exists(this.FileLocation))
throw new FileNotFoundException("fileLocation", FileLocation);
return new MemoryStream(File.ReadAllBytes(this.FileLocation));
}
public string FileLocation { get; private set; }
public MemoryStream Data
{
get
{
if (_cache == null)
_cache = new WeakReference(GetStream(), false);
var ret = _cache.Target as MemoryStream;
if (ret == null)
{
Recreated++;
ret = GetStream();
_cache.Target = ret;
}
return ret;
}
}
public int Recreated { get; private set; }
}
class Program
{
static void Main(string[] args)
{
var cache = CacheStore<string, FileCache>.Instance;
var fileName = #"c:\boot.ini";
cache[fileName] = new FileCache(fileName);
var ret = cache[fileName].Data.ToArray();
Console.WriteLine("Recreated {0}", cache[fileName].Recreated);
Console.WriteLine(Encoding.ASCII.GetString(ret));
GC.Collect();
var ret2 = cache[fileName].Data.ToArray();
Console.WriteLine("Recreated {0}", cache[fileName].Recreated);
Console.WriteLine(Encoding.ASCII.GetString(ret2));
GC.Collect();
var ret3 = cache[fileName].Data.ToArray();
Console.WriteLine("Recreated {0}", cache[fileName].Recreated);
Console.WriteLine(Encoding.ASCII.GetString(ret3));
Console.Read();
}
}

It's very dificutl say "yes" or "no", if is file content caching right in the common case and/or with so little informations. However - finited resources are real state of world, and you (as developer) must count with it. If you want cache something, you should use some mechanism for auto unloading data. In .NET framework you can use a WeakReference class, which unloads the target object (byte array and memory stream are objects too).
If you have the HW in you control, and you can use 64bit and have funds for very big RAM, you can cache big files.
However, you should be humble to resources (cpu,ram) and use the "cheap" way of implementation.

I think that the problem is that after you are done, the file is not disposed immediatly, it is waiting to the next GC cycle.
Streams are IDisposable, whice means you can and should use the using block. then the stream will dispose immidiatly when your are done dealing with it.

I don't think that caching such amount of data is a good solution, even if you don't get ever memroy overflow. Check out Memory Mapped files solution, which means that file lays on file system but speed of reading is almost equal to the in memory ones (there is an overhead for sure). Check out this link. MemoryMappedFiles
P.S. Ther are pretty good articles and examples on this topic arround in internet.
Good Luck.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.