HI,
My question has to do with a very basic understanding of Writing data to using a StreamWriter.
If you consider the following code:
StreamWriter writer = new StreamWriter(#"C:\TEST.XML");
writer.WriteLine("somestring");
writer.Flush();
writer.Close();
When the writer object is initialized, with the filename, all it has is a pointer to the file.
However when we write any string to the writer object, does it actually LOAD the whole file, read its contents, append the string towards the end and then close the handle?
I hope its not a silly questions.
I ask this because, I came across an application that writes frequently probably every half a second to a file, and the file size increased to about 1 GB, and it still continuted to write to the file. (logging)
Do you think this could have resulted in a CPU usage of 100 % ?
Please let me know if my question is unclear?
Thanks in advance.
does it actually LOAD the whole file, read its contents
After the framework opens the file, it will perform a FileStream.Seek operation to position the file pointer to the end of the file. This is supported by the operating system, and does not require reading or writing any file data.
and then close the handle
The handle is closed when you call Close or Dispose. Both are equivalent. (Note for convenience that you can take advantage of the C# using statement to create a scope where the call to Dispose is handled by the compiler on exiting the scope.)
every half a second to a file
That doesn't sound frequent enough to load the machine at 100%. Especially since disk I/O mainly consists of waiting on the disk, and this kind of wait does not contribute to CPU usage. Use a profiler to see where your application is spending its time. Alternatively, a simple technique that you might try is to run under the debugger, click pause, and examine the call stacks of your threads. There is a good chance that a method that is consuming a lot of time will be on a stack when you randomly pause the application.
The code you provided above will overwrite the content of the file, so it has no need to load the file upfront.
Nonetheless, you can append to a file by saying:
StreamWriter writer = new StreamWriter(#"C:\TEST.XML", true);
The true parameter is to tell it to append to the file.
And still, it does not load the entire file in memory before it appends to it.
That's what makes this called a "stream", which means if you're going one way, you're going one way.
Related
I am working on a code written by x-colleague. We have several Image files and we are converting them to XAML. The code is using XDocument to load in the image file (not of huge sizes but quite a lot of them) and do the processing on multi-thread. I have tried to look for every object which I think can be disposed once each iteration completes but still the issue is there, If I keep running the process it consumes the RAM fully and then Visual Studio crashes, what surprises me most is once this happened then I am unable to open anything on my PC, every single thing complains about memory is full including Visual Studio.
I am unable to upload the image here.
What I have tried it to run it on a single thread, though I encounter GC pressure but I am still able to run the code and memory stays good until the end.
I know I need to look for alternative instead of using XDocument but that is out of scope at the moment and I need to work through the code.
Can You please help me or give me some pointers?
Code below is how I load the XML before sending it to API for processing:
XDocument doc;
using (var fileStream = new MemoryStream(System.Text.Encoding.ASCII.GetBytes(Image1.sv.ToString())))
{
doc = XDocument.Load(fileStream);
}
API then uses multi-threading to process the image file to convert to XAML using different methods, each of these are using XDocument, its loading via memory stream, save in memory and continued the processing.
I have used Diagnostic Tools within VS to identify the memory leak.
Kind regards
The new MemoryStream(Encoding.ASCII.GetBytes(someString)) step seems very redundant, so we can shave a lot of things by just... not doing that, and using XDocument.Parse(someString):
var doc = XDocument.Parse(Image1.sv.ToString());
This also avoids losing data by going via ASCII, which is almost always the wrong choice.
More savings may be possible, if we knew what Image1.sv was here - i.e. it may be possible to avoid allocating a single large string in the first place.
Lets assume that exactly 1 byte after the File-1-EOF another file (file2) starts.
If I open up file 1 and use FileStream Filemode.Append, does it overwrite file2 or does it make another copy at a place where there is enough memory?
Thanks, in regards!
Edit:
For everyone after me: I forgot that you have a file system, which is split into chunks. Making this question nonsense!
You appear to be laboring under the misapprehension that files are stored sequentially on disk, and that extending one file might overwrite parts of another file. This doesn't happen when you go via a filestream append in c#. The operating system will write the bytes you add however it likes, wherever it likes (and it likes to not overwrite other files) which is how files end up broken into smaller chunks (and why defragging is thing) scattered all over the disk. None of this is of any concern to you, because the OS presents those scattered file fragments as a single contiguous stream of bytes to any program that wants to read them
Of course, if you wrote a program that bypassed the OS and performed low level disk access, located the end of the file and then blindly write more bytes into the locations after it then you would end up damaging other files, and even the OS's carefully curated filesystem .. but a .net file stream won't make that possible
TLDR; add your bytes and don't worry about it. Keeping the filesystem in order is not your job
If I open up file 1 and use FileStream Filemode.Append, does it overwrite file2 or does it make another copy at a place where there is enough memory?
Thankfully no.
Here's a brief overview why:
Your .NET C# code does not have direct OS level interaction.
Your code is compiled into byte-code and is interpreted at runtime by the .NET runtime.
During runtime your byte-code is executed by the .NET Runtime which is built mostly in a combination of C#/C/C++.
The runtime secures what it calls SafeHandles, which are wrappers around the file handles provided by what I can assume is window.h(for WIN32 applications at least), or whatever OS level provider for file handles you're architecture is running on.
The runtime uses these handles to read and write data using the OS level API.
It is the OS's job to ensure changes to yourfile.txt, using the handle it's provided to the runtime, only affects that file.
Files are not generally stored in memory, and as such are not subject to buffer overflows.
The runtime may use a buffer in memory to.. buffer your reads and writes but that is implemented by the runtime, and has no affect on the file and operating system.
Any attempt to overflow this buffer is safe-guarded by the runtime itself and the execution of your code will stop. Regardless, if a buffer overflow happened on this buffer successfully - no extra bytes will be written to the underlying handle. Rather the runtime would likely stop executing with a memory access violation, or general unspecified behavior.
The handle you're given is little more than a token that the OS uses to keep track which file you want to read or write bytes to.
If you attempt to write more bytes to a file than an architecture allows - most operating systems will have safe guards in place to end your process, close the file, or straight up send an interrupt to crash the system.
I used StreamWriter in my code without using or dispose to create a csv file. It worked fine at first but it always generated same file which it generated the first time I ran my code. Even if I changed Data selection, it was same file. Then I copied my dll on a different environment, only dll no other file was changed and it still generated the same file, with exact same data from previous environment. It seems that my code is buffering the data from first run, but where?? By even changing hosting environment, why it's the buffer didn't change?
When you create a StreamWriter it either accepts a Stream to write to or creates its own. The Stream instance uses buffering to hold data that is being written and needs to be flushed. This is done when you Dispose the Stream or the StreamWriter but is likely to be skipped if you let the garbage collector finalize either.
Which is why you should always dispose your streams when you're done with them, and why you should call Flush on a Stream when you've finished writing data to it that you don't want to lose. An unhandled exception in your code that bypasses your Dispose can cause you to lose data.
Personally I prefer to dispose of the Stream as soon as possible. Unless you're constantly writing hundreds of times a second it doesn't cost much to re-open a file to append some more data to it later.
Is using the FileStream class to write to a file and the .NET File.Copy method to copy the file at the same time thread safe? It seems like the operating system should safely handle concurrent access to the file, but I cannot find any documentation on this. I've written a simple application to test and am seeing weird results. The copy of the file is showing to be 2MB, but when I inspect the file content with notepad++ it's empty inside. The original file contains data.
using System;
using System.Threading.Tasks;
using System.Threading;
using System.IO;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
string filePath = Environment.CurrentDirectory + #"\test.txt";
using (FileStream fileStream = new FileStream(filePath, FileMode.Create, FileAccess.ReadWrite))
{
Task fileWriteTask = Task.Run(() =>
{
for (int i = 0; i < 10000000; i++)
{
fileStream.WriteByte((Byte)i);
}
});
Thread.Sleep(50);
File.Copy(filePath, filePath + ".copy", true);
fileWriteTask.Wait();
}
}
}
}
Thanks for the help!
It depends.
Depends what do you mean when you say "thread safe".
First of all, look at this constructor:
public FileStream(string path, FileMode mode, FileAccess access, FileShare share )
notice the last parameter, it states what do you allow other threads and processes to do with the file. the default that applies to constructors that don't have it is FileShare.Read, which means you allow others to view the file as read-only. this is of course unwise if you are writing to it.
That's what you basically did, you opened a file for writing, while allowing others to read it , and "read" includes copying.
Also please note, that without this: fileWriteTask.Wait(); at the end of your code, your entire function isn't thread safe, because the FileStream might be closed before you even start writing.
Windows does make file access thread safe, but in a pretty non trivial manner. for example if you would have opened the file with FileShare.None, it would have crashed File.Copy and to the best of my knowledge there isn't an elegant way to do this with .Net. The general approach Windows uses to synchronize file access is called optimistic concurrency, meaning to assume your action is possible, and fail if it isn't.
this question discusses waiting for file lock in .Net
Sharing files between process is a common issue and one of the ways to do this , mostly for Inter-Process Comunication is memory mapped files and this is the MSDN documentation
If you are brave and willing to play around with WinAPI and Overlapped IO, If I remember correctly LockFileEx allows nice file locking...
Also, once there was a magical thing called Transactional NTFS but it has moved on in to the realm of Microsoft Deprecated Technologies
It is thread-safe in a sense of "neither of C# object would be corrupted".
Result of operation will be more or less random (empty file, partial copy, access denied) and depends on sharing mode used to open file for each operation.
If carefully setup this can produce sensible results. I.e. flush file after each line and specify compatible share mode will allow to be reasonably sure that complete lines are copied.
The answer is no. You cannot in general operate on file system objects from different threads and achieve consistent or predictable results for the file contents.
Individual .NET Framework functions may or may not be thead-safe, but this is of little consequence. The timing and order in which data is read from, written to or copied between individual files on disk is essentially non-deterministic. By which I mean that if you do the same thing multiple times you will get different results, depending on factors outside your control such as machine load and disk layout.
The situation is made worse because the Windows API responsible for File.Copy is run on a system process and is only loosely synchronised with your program.
Bottom line is that if you want file level synchronisation you have no choice but to use file-level primitives to achieve it. That means things like open/close, flushing, locking. Finding combinations that work is non-trivial.
In general you are better off keeping all the operations on a file inside one thread, and synchronising access to that thread.
In answer to a comment, if you operate on a file by making it memory-mapped, the in-memory contents are not guaranteed to be consistent with the on-disk contents until the file is closed. The in-memory contents can be synchronised between processes or threads, but the on-disk contents cannot.
A named mutex locks as between processes, but does not guarantee anything as to consistency of file system objects.
File system locks are one of the ways I mentioned that could be used to ensure file system consistency, but in many situations there are still no guarantees. You are relying on the operating system to invalidate cached disk contents and flush to disk, and this is not guaranteed for all files at all times. For example, it may be necessary to use the FILE_FLAG_NO_BUFFERING, FILE_FLAG_OVERLAPPED and FILE_FLAG_WRITE_THROUGH flags, which may severely affect performance.
Anyone who thinks this is an easy problem with a simple one-size-fits-all solution has simply never tried to get it to work in practice.
I have a file that im going to fill up so I tought if its better to do it simultaneously.
Notes:
I get the file from multiple computers simultaneously.
I set the position every time befor calling StartWrite. -> Do I must lock it each time befor using it?
Is it good sulotion? Do you have a better one?
btw, what does Stream.Flush() ?
Thanks.
No, that would be conceptually wrong. Stream (I assume you mean a System.IO.Stream class) is an abstract class. When you instantiate an object you are using one of many child classes.
Assuming anything about child classes is wrong approach because:
a) Somebody might come after you to made modifications to your code and not see what actual child class implementation does.
b) Less likely, but the implementation can change. For example, what if someone installs your code on Mono framework.
If you are using FileStream class, consider creating two (or more) FileStream objects over the same underlying file with FileShare parameter set to Write. This way you specify that there might be simultaneous writing, but each stream has its own location pointer.
Update: Only now I saw your comment "each computer send me a part with start index, end index and byte[]". Actually, multiple FileStreams should work OK for this scenario.
void DataReceived(int start, byte[] data)
{
System.IO.FileStream f = new System.IO.FileStream("file.dat", System.IO.FileMode.Open, System.IO.FileAccess.Write, System.IO.FileShare.ReadWrite);
f.Seek(start, System.IO.SeekOrigin.Begin);
f.Write(data, start, data.Length);
f.Close();
}
This is unsafe by principle because even if your stream was thread-safe you would still have to non-atomically set the position and write.
The native Windows file APIs support this, .NET doesn't. Windows is perfectly capable of concurrent IO to the same file (how would SQL Server work if Windows didn't support this?).
I suggest you just use one writing FileStream per thread.
It's pointless to try to do several write operations to the same stream at the same time.
The underlying system can only write to one position in the file at a time, so even if the asynchronous write method would support multi threading, the writes would still be blocked.
Just do regular writes to the file, and use locking so that only one thread at a time writes to the file.