C# File.WriteAllBytes sometimes fails to create file when system seems busy - c#

Community,
I wrote a small .NET C# Console App (Target Framework 4.7.2) that reads all .msg files (came from Outlook via drag and drop) located in a folder, then uses the MsgReader to find attachments with file extension .csv, and when it finds a .csv file, writes it back to the folder where the .msg files are located. The pattern is always 1 .csv file per 1 .msg file. Filesizes of a few kb per .msg / .csv file. So pretty small. Everything straight forward so far.
Now I noticed that sometimes, out of 100 .msg files, I only get 99 oder 98 .csv files. Like when I run my console app 100 times in a row over the SAME test set of 100 .msg files - 9 out of 10 times it gives me 100 .csv files (correct) and one time out of ten times it fails and gives me only 99 or 98 .csv files (wrong). All other possibilities can be ruled out. My test set consisting of those 100 .msg files is valid (ALWAYS 1 .csv file per 1 .msg file, so result must be 100 .csv files).
After about a week of trying to figure out what the problem is, I suspect it has something to do with the system where the console app runs on (Win 11 Pro with latest updates, 16 GB RAM, 500 GB NVME M.2 SSD, Intel i7). So: when there are many other file operations ongoing, like shortly after reboot, or during Windows update, the error occurs. When the system is more or less idle, the error never occurs. To me, it looks like my console app writes out the .csv files too fast for the underlying OS. To "prove" my suspiscion, I added a "dumb" Sleep(500) in my foreach loop. And that seems to fix my problem. When I run my console app now, I get 100 files in every run.
BUT: I don't think it is a good idea to add an artificial pause to the code. Because who knows, on my system 500ms is sufficient. On your system, I may need 2000ms. And I do not want to add 2 seconds to each loop because that would potentiall be to long when handling thousands of .msg files.
Can someone please have a look at the code below an tell me what I am doing wrong when writing out the bytes from the attachment to the filename? (or any other mistake I made).
Thank you very much in advance!
DirectoryInfo myDir = new DirectoryInfo(# "C:\Test\"); // define working directory
FileInfo[] msgFiles = myDir.GetFiles("*.msg"); // read all files with .msg extension
foreach (FileInfo msgFile in msgFiles) // loop over result
{
using (var msg = new MsgReader.Outlook.Storage.Message(msgFile.FullName)) // use MsgReader to read the .msg files
{
foreach (var attachment in msg.Attachments) // loop over attachments
{
if (attachment.GetType() == typeof(MsgReader.Outlook.Storage.Attachment))
{
string newFilename = string.Empty;
var attach = (MsgReader.Outlook.Storage.Attachment)attachment;
if (Path.GetExtension((attach).FileName).ToLower() == ".csv") // if attachment extension is .csv
{
var tempFilename = Path.GetFileNameWithoutExtension((attach).FileName) + DateTime.Now.ToString("yyyyMMdd HHmmss ffffff") + Path.GetExtension((attach).FileName); // create a filename for the .csv file
tempFilename = Path.Combine(myDir, tempFilename); // create fullpath
File.WriteAllBytes(tempFilename, (attach).Data); // write attachment data to file
//System.Threading.Thread.Sleep(500); // when I add this PAUSE, I get 100 csv files every time - works, but ugly, right?
}
}
}
}
}

Related

Extracting 7z file with millions of 200 bit file size inside take hours to finish. How to speed up?

Good day, I've created my own custom Wizard Installer for my website project. My goal is to minimize the work during the installation of our client.
I'm trying to extract a 7z file that has millions of tiny files (200-bit size of each file) inside. I'm using sharpcompress to achieve this extracting process but it seems that it will take hours to finish the task which is very bad for the user.
I don't care about compression. What I need is to reduce the time of the extracting process of these millions of tiny files or if possible, to speed up the extraction.
My question is. What is the fastest way to extract millions of tiny files? or any method to pack and unpack the files with the highest speed of unpacking.
I'm trying to extract the 7z file by this code:
using (SevenZipArchive zipArchive = SevenZipArchive.Open(source7z))
{
zipArchive.WriteToDirectory(destination7z,
new ExtractionOptions { Overwrite = true, ExtractFullPath = true });
}
But seems the extracting time is very slow for tiny files.

Optimised way for byte array to memory stream

I use the following lines of code
byte[] byteInfo = workbook.SaveToMemory(FileFormat.OpenXMLWorkbookMacroEnabled);
workStream.Write(byteInfo, 0, byteInfo.Length);
workStream.Position = 0;
return workStream;
to download an excel file from the browser with the help of C# and Spreadsheetagear.
It works fine for less records but when I try to download a workbook with huge data (an excel with 50k rows 1k columns and macro enabled) this line
byte[] byteInfo = workbook.SaveToMemory(FileFormat.OpenXMLWorkbookMacroEnabled);
alone takes nearly 4 - 5 min. Is there any optimised way of doing it such that it takes only 1 or 2 min to download huge file.
try
workbook.SaveToStream(outputstream, SpreadsheetGear.FileFormat.OpenXMLWorkbook);
Streams typically can be faster as they normally save as fast as the ram will take. instead of filling up a page file.
For large workbooks the time taken for SpreadsheetGear processes to run is related to the time taken to open the workbook into memory - rather than the time take to compile the information and download it. You can test this by making a request to open the workbook and send a simple success response without the downloaded byte array. Spreadsheet files with a size greater than 1-2MB start to slow things down and the process can get very slow beyond 5MB.
Options are:
Create an empty spreadsheet and build the content by extracting it
from a database and inserting into the spreadsheet for download.
This is faster than opening a large spreadsheet file.
Get rid of non-essential content that takes up space and increases the size of
the spreadsheet
Break up the spreadsheet into smaller individual files. You can test this comparing the speed to process 10 x 1MB files vs 2 x 5MB files vs 1 x 10MB file
Work with CSV import / export file processes.
They are faster, but obviously not as smart as the SpreadsheetGear
functionality
Set up a direct download of a stored spreadsheet file, not via SpreadsheetGear. This is faster but obviously requires a static file in store or one that has been created and stored by another process

Code to copy file when it reaches certain size and move it to directory

We have developer code that uses log 4 net. It creates new file when it reaches 500 MB. This code creates only two files of 500 MB. Therefore, at point of time there are only two log files. We cannot change this code, as it is not visible. When application starts, it creates file named as 1.log. When it reaches 500 MB, program creates file as 1_1.log. Now, there are two file on disk. Therefore, program will overwrite first file. Therefore, at point of time, I have only two files. As the code is not visible, I cannot change this program.
But, I want to write another piece of code in C# that will copy these files to some directory when it reaches certain MB. Therefore, I can have more log files.
To sum up:
Two log files 1.log and 1_1.log
1_1.log is a copied version of 1.log when it reaches size above 500
MB
1 log is smaller than 500 MB
Aim is to copy the bigger file somewhere and archive it
I believe the best solution if you cannot modify this mechanism is creating a windows service. This service should observe the directory where log files are placed and when the file 1_1.log is created it should rename it to 1_current_date.log (or copy it with this name to another folder).

Best way to make constant data file for an application?

I have about 50.000 text file (~5KB each). I need make data file one time, then my app read (not write) to use.
I'm finding a way to keep all these file to one (or several) file. Current I store it in a .zip file, then when run app, I read zip file and get the entry I need. This way is very slow to read data (about 2 seconds).
Is any way I can store data that both fast to use and convenient to tranfer app between computer? Thank!
[Edit: I'm not well work with data before, and my app is portable. Data is one-time create, no modify after create, and is plain text but have structure]
Data structure:
Section abcd
Item 1234
Item klmn
Section def
Item ...
Item ...
...
...
as a suggestion:
use the zip as primary source
before the app starts check if a specific folder exists
if not - unzip the files there,
if exists - check files count in the directory (as a dirty check that you do have data in the folder)
read files from the folder

lastwritetime is changing while extracting a zipfile in c#?

I am using Sharpziplib version 0.86 to extract a zip file. It is working fine but while extracting a winzip file through code,Lastwritetime is changing in seconds...
Have used this also File.SetLastWriteTime(fullPath, theEntry.DateTime);
Actual file Lastwritetime:4/8/2010 2:29:03PM
After zipping that file using winzip and while extracting that file using the code,extracted file Lastwritetime changes to 4/8/2010 2:29:04PM...Is there any fix for this???
I got this response from Sharpziplib Forum
Hi
This appears to be a WinZip bug. I have not noticed this before.
I did this test:
1) Use WinZip to add a file to a zip. In WinZip click Properties and Details. Look through the details list and find the file time stamp.
2) Use SharpZipLib to create a similar zip file with the same inputfile. Open the result in Winzip and look at the Properties > Details for the file time stamp.
My input file has a Modified timestamp (file properties) of 2010-12-14 15:51:28 and in my test, SharpZipLib stored it correctly in the zip, while WinZip stored it as 2010-12-14 15:51:30
In other words WinZip added 2 seconds when putting it into the zip.
After extracting (either with WinZip or SharpZip), the Modified is now 15:51:30 instead of the original 15:51:28.
it is amazing that such an obvious bug in WinZip can have gone unreported and unfixed for so long. If you have a paid version you should certainly raise a bug fault with them.
I just remembered something about 2 second granularity in the old 8.3 file system timestamps.
Quick google found this ...
Quote "Original DOS file system had only 32 bytes to represent a file in the directory. The very restrictive 8.3 filename and the limited granularity (2 second) in file date are corrected in the Win32 file systems (VFAT)."
from http://www.xxcopy.com/xxcopy15.htm
The Zip format only allows 2 second granularity in the standard time stamp entry.The date and time are encoded in standard MS-DOS format.
An optional NTFS Extra Data field (0x000a) can be included, which may hold last modification time, last access time and creation time. WinZip does not appear to create it. SharpZip will use it if present but as far as I can see, it is not created when using FastZip to create a zip. That might be a useful option to add to the code. You can certainly create it manually if using ZipFile.
Hope this helps,
David
I think it might just be the operating system that is causing this. I've tried what happens in Explorer. I've got a text file with a modified time stamp of 17:06:45. I right click on the file and choose Send to | Compressed (zipped) folder. Then I right click on the new zip-file and choose Extract All... followed by Next, Next, Finish. Now the extracted text file has a time stamp of 17:06:46.
The same happens when I use 7-Zip, or WinRar. But then it only happens when using a .zip file. If I let them create a .7Z or a .RAR file the time stamp isn't changed.
Found an article on Wikipedia about the zip format. If you search it for "seconds" you'll find a section describing that the ZIP file system mimics the DOS FAT file system, which only has a time resolution of two seconds.

Categories

Resources