Does the C# compiler reorder File-IO instructions?

Does the C# compiler reorder File-IO instructions? - c#

I have the following C# algorithm for config file writeback:
string strPathConfigFile = "C:\File.txt"
string strPathBackupFile = "C:\File.backup"
string strContent = "File Content Text";
bool oldFilePresent = File.Exists(strPathConfigFile);
// Step 1
if (oldFilePresent)
{
File.Move(strPathConfigFile, strPathBackupFile);
}
// Step 2
using (FileStream f = new FileStream(strPath, FileMode.Create, FileAccess.ReadWrite, FileShare.None))
{
using (StreamWriter s = new StreamWriter(f))
{
s.Write(strContent);
s.Close();
}
f.Close();
}
// Step 3
if (oldFilePresent)
{
DeleteFile(strPathBackupFile);
}
It works like this:
The original File.txt is renamed to File.backup.
The new File.txt is written.
File.backup is deleted.
This way, if there is a power blackout during the write operation, there is still an intact backup file present. The backup file is only deleted after the write operation is completed. The reading process can check if the backup file is present. If it is, the normal file is considered broken.
For this approach to work, it is crucial that the order of the 3 steps is strictly followed.
Now to my question: Is it possible that the C# compiler swaps step 2 and 3?
It might be a slight performance benefit, as Step 1 and 3 are wrapped in identical if-conditions, which could tempt the compiler to put them together.
I suspect the compiler might do it, as Step 2 and 3 operate on completely different files. To a compiler who doesn't know the semantics of my exceptionally clever writeback procedure, Step 2 and 3 might seem unrelated.

According to the language specification, the C# compiler must preserve side effects when reordering statements. Writing to files is such a side effect.

In general, the compiler/jitter/CPU is free to reorder instructions as long as the result would be identical for a single thread. However, IO, system calls and most things involved with multi threading would involve memory barriers or other synchronization that prevents such reordering.
In the given example there is only a single thread involved. So if the File APIs are implemented correctly (and that would be a fairly safe assumption) there should be no risk of unintended behavior.
Reordering issues mostly popup when writing multi threaded code without being aware of all the potential hazards and requirements for synchronization. As long as you only use a single thread you should not need to care about potentials for reordering.

Related

How to know if a file system is read only (.NET Standard)?

I have a function that reads a file and if it doesn't exist it creates and fills it. However this is in a .NET Standard Library and is called from an AWS Lambda function which gives back the following error:
"Read-only file system"
How can I determine that the file system is read only in code so I can skip the file creation and when its the case?
EDIT: My question is different than Is there a way to check if a file is in use. That question is asking about trying to read a file before it is saved. I am asking how to know if the file system wouldn't allow me to save a file (due to restrictions in the file system)

Generally, for nearly everything to do with file systems, the trick is to remember they are volatile, and therefore the answer is to just try whatever you want to do and handle the exception if it fails.
To understand what I mean by, "volatile", let's say you have some magic Path.IsReadOnlyFileSystem() method that does exactly what you want. So you want to run some code like this:
if (!Path.IsReadOnlyFileSystem(fileIWantToCreate))
{
using(var sw = new StreamWriter(fileIWantToCreate))
{
//fill file here
}
}
But then something happens in your AWS cloud forcing the service to go into read-only mode. And it happens right here:
if (!Path.IsReadOnlyFileSystem(fileIWantToCreate))
{
/* <==== AWS goes read-only here ====> */
using(var sw = new StreamWriter(fileIWantToCreate))
Your magic method did it's job perfectly. At the moment you called the method, the file system still allowed writes. Unfortunately, a volatile file system means you can't trust operations from one moment to the next (and a non-volatile file system wouldn't be much good to you for saving data). So you might be tempted to go for a pattern like this:
try
{
if (!Path.IsReadOnlyFileSystem(fileIWantToCreate))
{
using(var sw = new StreamWriter(fileIWantToCreate))
{
//fill file here
}
}
}
catch( )
{
// handle the exception here
}
And now it's clear you have to handle the exception anyway. The if() condition doesn't really gain anything in terms of actual functionality. All it really does is add complexity to your code.
Even so, there's a tendency to believe keeping the if() condition gains a performance benefit by saving an exception handler and saving a disk access. This belief is flawed.
The if() condition actually adds disk access when it checks the file system status. Moreover, this cost is added on every execution. Keeping the if() condition does save the exception handler, but only on occasions where the initial access fails. Admittedly, unrolling the stack to handle an exception is among the most expensive operations you can do in all of computer science. This is the "why" behind the "don't use exceptions for normal control flow" mantra. However, disk access is one of the few things that is far and away worse for performance.
In summary, we have an exception handler stack unwind sometimes vs an extra disk access every time. From a performance standpoint it's clear you're better off without the if() condition at all. We don't need the if() condition for correctness reasons, and we don't want it for performance reasons.... remind me again why we have it? The better pattern skips that block completely, like this:
try
{
using(var sw = new StreamWriter(fileIWantToCreate))
{
//fill file here
}
}
catch( )
{
// handle the exception here
}
Thus the magic IsReadOnlyFileSystem() method is not needed or even helpful.
This isn't to say that method doesn't exist (I don't think it's part of .Net Standard, but depending on your platform there's likely a lower-level construct you can call into). It's just that it's not a particularly good idea.

Reading a file with FileStream and FILE_FLAG_NO_BUFFERING

A little background: I've been experimenting with using the FILE_FLAG_NO_BUFFERING flag when doing IO with large files. We're trying to reduce the load on the cache manager in the hope that with background IO, we'll reduce the impact of our app on user machines. Performance is not an issue. Being behind the scenes as much as possible is a big issue. I have a close-to-working wrapper for doing unbuffered IO but I ran into a strange issue. I get this error when I call Read with an offset that is not a multiple of 4.
Handle does not support synchronous operations. The parameters to the FileStream constructor may need to be changed to indicate that the handle was opened asynchronously (that is, it was opened explicitly for overlapped I/O).
Why does this happen? And is doesn't this message contradict itself? If I add the Asynchronous file option I get an IOException(The parameter is incorrect.)
I guess the real question is what do these requirements, http://msdn.microsoft.com/en-us/library/windows/desktop/cc644950%28v=vs.85%29.aspx, have to do with these multiples of 4.
Here is the code that demonstrates the issue:
FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;
int MinSectorSize = 512;
byte[] buffer = new byte[MinSectorSize * 2];
int i = 0;
while (i < MinSectorSize)
{
try
{
using (FileStream fs = new FileStream(#"<some file>", FileMode.Open, FileAccess.Read, FileShare.None, 8, FileFlagNoBuffering | FileOptions.Asynchronous))
{
fs.Read(buffer, i, MinSectorSize);
Console.WriteLine(i);
}
}
catch { }
i++;
}
Console.ReadLine();

When using FILE_FLAG_NO_BUFFERING, the documented requirement is that the memory address for a read or write must be a multiple of the physical sector size. In your code, you've allowed the address of the byte array to be randomly chosen (hence unlikely to be a multiple of the physical sector size) and then you're adding an offset.
The behaviour you're observing is that the call works if the offset is a multiple of 4. It is likely that the byte array is aligned to a 4-byte boundary, so the call is working if the memory address is a multiple of 4.
Therefore, your question can be rewritten like this: why is the read working when the memory address is a multiple of 4, when the documentation says it has to be a multiple of 512?
The answer is that the documentation doesn't make any specific guarantees about what happens if you break the rules. It may happen that the call works anyway. It may happen that the call works anyway, but only in September on even-numbered years. It may happen that the call works anyway, but only if the memory address is a multiple of 4. (It is likely that this depends on the specific hardware and device drivers involved in the read operation. Just because it works on your machine doesn't mean it will work on anybody else's.)
It probably isn't a good idea to use FILE_FLAG_NO_BUFFERING with FileStream in the first place, because I doubt that FileStream actually guarantees that it will pass the address you give it unmodified to the underlying ReadFile call. Instead, use P/Invoke to call the underlying API functions directly. You may also need to allocate your memory this way, because I don't know whether .NET provides any way to allocate memory with a particular alignment or not.

Just call CreateFile directly with FILE_FLAG_NO_BUFFERING and then close it before opening with FileStream to achieve the same effect.

Problems with Streamwriter

I know it sounds really stupid, but I have a really easy application that saves some data from some users on a database, and then I want to write all the data on a .txt file.
The code is as follows:
List<MIEMBRO> listaMiembros = bd.MIEMBRO.ToList<MIEMBRO>();
fic.WriteLine("PARTICIPACIONES GRATUITAS MIEMBROS: ");
mi = new Miembro();
foreach (MIEMBRO_GRATIS m in listaMiembroGratis)
{
mi.setNomMiembro(m.nomMiembro);
mi.setNumRifa(m.numRifa.ToString());
fic.WriteLine(mi.ToString());
}
fic.WriteLine();
As you see, really easy code. The thing is: I show the information on a datagrid and I know there are lots of more members, but it stops writing in some point.
Is there any number of lines or characters to write on the streamwriter?? Why I can't write all the members, only part of them???

fic is probably not being flushed by the time you are looking at the output file; if you instantiate it as the argument for a using block, it will be flushed, closed, and disposed of when you are done.
Also, in case you are flushing properly (but it is not being flushed by the time you are checking the file), you could flush at the end of each iteration:
foreach (MIEMBRO_GRATIS m in listaMiembroGratis)
{
mi.setNomMiembro(m.nomMiembro);
mi.setNumRifa(m.numRifa.ToString());
fic.WriteLine(mi.ToString());
fic.Flush();
}
This will decrease performance slightly, but it will at least give you an opportunity to see which record is failing to write (if, indeed, an exception is being thrown).

Is there any number of lines or characters to write on the
streamwriter??
No, there isn't.
As you see, really easy code. The thing is: I show the information on
a datagrid and I know there are lots of more members, but it stops
writing in some point
My guess is that your code is throwing an exception and you aren't catching it. I would look at the implementation of setNomMiembro, setNumRifa and ToString in Miembro; which, by the way, in the case of setNomMiembro and setNumRifa should probably be implemented as properties ({get;set;}) and not as methods.
For example, calling ToString in numRifa would throw a null pointer exception if numRifa is null.

Read a value multiple times or store as a variable first time round?

Basically, is it better practice to store a value into a variable at the first run through, or to continually use the value? The code will explain it better:
TextWriter tw = null;
if (!File.Exists(ConfigurationManager.AppSettings["LoggingFile"]))
{
// ...
tw = File.CreateText(ConfigurationManager.AppSettings["LoggingFile"]);
}
or
TextWriter tw = null;
string logFile = ConfigurationManager.AppSettings["LoggingFile"].ToString();
if (!File.Exists(logFile))
{
// ...
tw = File.CreateText(logFile);
}

Clarity is important, and DRY (don't repeat yourself) is important. This is a micro-abstraction - hiding a small, but still significant, piece of functionality behind a variable. The performance is negligible, but the positive impact of clarity can't be understated. Use a well-named variable to hold the value once it's been acquired.

the 2nd solution is better for me because :
the dictionary lookup has a cost
it's more readable
Or you can have a singleton object with it's private constructor that populates once all configuration data you need.

Second one would be the best choice.
Imagine this next situation. Settings are updated by other threads and during some of them, since setting value isn't locked, changes to another value.
In the first situation, your execution can fail, or it'll be executed fine, but code was checking for a file of some name, and later saves whatever to a file that's not the one checked before. This is too bad, isn't it?
Another benefit is you're not retrieving the value twice. You get once, and you use wherever your code needs to read the whole setting.

I'm pretty sure, the second one is more readable. But if you talk about performance - do not optimize on early stages and without profiler.

I must agree with the others. Readability and DRY is important and the cost of the variable is very low considering that often you will have just Objects and not really store the thing multiple times.
There might be exceptions with special or large objects. You must keep in mind the question if the value you cache might change in between and if you would like or not (most times the second!) to know the new value within your code! In your example, think what might happen when ConfigurationManager.AppSettings["LoggingFile"] changes between the two calls (due to accessor logic or thread or always reading the value from a file from disk).
Resumee: About 99% you will want the second method / the cache!

IMO that would depend on what you are trying to cache. Caching a setting from App.conig might not be as benefiial (apart from code readability) as caching the results of a web service call over a GPRS connection.

What is the best practice for debug statements which have string operations in them?

I often find myself adding either concatonated strings or using a string formatter in my debug statements in log4net and log4j should I surround these debug statements with an "if debug" block to stop myself from wasting resources by processing these parameters even though the debug statement will not be printed out?
I would assume that checking if (isDebug) would be quicker and more efficient than having the string operations occuring, however it would lead to the program operating differently (faster) when the debug level is set higher than debug, which could mean that synchronisation problems that happen in production don't happen when I'm writing to the log.

for Java you can try log5j.
log4j:
log.debug("This thing broke: " + foo + " due to bar: " + bar + " on this thing: " + car);
log5j:
log.debug("This thing broke: %s due to bar: %s on this thing: %s", foo, bar, car);
log.debug("Exception #%d", aThrowable, exceptionsCount++);

I'd say it depends on how often the debug statement is called and how important performance is. Beware of premature optimization and all.

This question is answered in detail in the SLF4J FAQ. In short, use parameterized messages. For example, entry is an object, you can write:
Object entry = new SomeObject();
logger.debug("The entry is {}.", entry);
After evaluating whether to log or not, and only if the decision is affirmative, will the logger implementation format the message and replace the '{}' pair with the string value of entry. In other words, this form does not incur the cost of parameter construction in case the log statement is disabled.
The following two lines will yield the exact same output. However, the second form will outperform the first form by a factor of at least 30, in case of a disabled logging statement.
logger.debug("The new entry is "+entry+".");
logger.debug("The new entry is {}.", entry);

Have you measured how much extra time is taken concatenating these strings ? Given that the logging infrastructure will take the resultant messages, check whether they need to be dispatched to (possibly) multiple sinks, and then possibly write using some I/O, then you may not be gaining anything. My feeling is that unless your toString() mechanism is slow, this may be an optimisation too far.
I'm also wary of this approach in case someone writes something like this (and we know they shouldn't but I've seen this before)
if (Log.isDebugEnabled()) {
Log.debug("Received " + (items++) + " items");
}
and this will then work/fail depending on your logging level.

In log4j the following is recommended best practice:
if ( log.isDebugEnabled() )
{
log.debug("my " + var + " message";
}
This saves on system resources from the string concatenation etc. You are correct in your assumption that the program may perform slower when debug level is enabled, but that is to be expected: A system under observation is changed because it is under observation. Synchronization issues deal (mostly) with unlucky timing or variable visibility between threads, both of which will not be directly affected by an alteration of the debug level. You will still need to "play" with the system to reproduce multi-threaded problems.

We have adopted the practice to define a private static readonly boolean DEBUG variable in each class.
private static readonly log4net.ILogger LOG = log4net.LogManager.GetLogger();
private static readonly bool DEBUG = LOG.IsDebugEnabled;
each actual debug log line looks like this
if (DEBUG) LOG.Debug(...);
where ... can have arbitrary complexity and is only evaluated when debugging is required.
See: http://logging.apache.org/log4net/release/faq.html, the answer to "What is REALLY the FASTEST way of (not) logging?"
This works for us since we only read the log config at startup.
Since we have at least one debug statement per function call, we felt that having to write if (DEBUG) was worth the effort to get maximum performance with debuging switched off. We did some measurements with debugging switched on and off and found performance increases between 10 and 20%. We haven't measured what the effect of the if (DEBUG) was.
By the way: we only do this for debug messages. Warnings, informationals and errors are produced directly via LOG.Warn, etc.

Try slf4j (http://www.slf4j.org/). You write statements like:
log.fine("Foo completed operation {} on widget {}", operation, widget);
The log message is not assembled inside the library until the log level has been determined to be high enough. I find this the best answer.
(This is a lot like the log5j solution above.)

With C#, we've started using delegates for expensive log statements. It's only called if the log level is high enough:
log.Debug(()=> "Getting the value " + SomeValue() " is expensive!");
This prevents logging bugs where the level checked for is different to the level logged at, ie:
if(log.Level == Level.Info)
log.Debug("Getting the value " + SomeValue() " is expensive!");
and I find it much more readable.
[Edit] If that was downvoted for not applying to Log4Net - it is trivial to write a wrapper around Log4Net to do this.

Conditional compilation works with final constants, which are static final variables. A class might define the constant like this:
private static final boolean DEBUG = false;
With such a constant defined, any code within an:
if (DEBUG) {
// code
}
is not actually compiled into the class file. To activate debugging for the class, it is only necessary to change the value of the constant to true and recompile the class (you can have two version of your binaries, one for development and one for production).
However this is solution is suboptimal for several reasons.

I tend to wrap all calls to debug in an isDebug statement. I don't think that's a premature optimization, it's just good practice.
Worrying about the app running at different speeds isn't really sensible, as the load on the processor can / will influence your app more then the debugging code.

Look for the externalized log level from log4j config file. That way you have the option to switch on or off logs based on your environment (and avoid all those string concatinations).
if(log.isDebugEnabled())
{
log.debug("Debug message");
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.