I am writing an encryption program for fun. I stumbled across the problem, that a deleted file has not to be gone even if it is overwritten by opening it with a FileStream and writing a bunch of random bytes into it.
But my current implementation of my program is creating a new temporary file, to write encrypted/decrypted data into it to save RAM. I was now wondering if the same problem does apply, even if I don't terminate the FileStream object, which exists since the files creation.
So if I just set my stream position back to zero and overwrite every single byte, does it really write onto the same positions as in the beginning or can parts of the temp file survive? If so, is there any workaround I could use?
My current approach:
var fileStream = new FileStream(path, FileMode.Create);
fileStream.Write(//possible decrypted data);
fileStream.Position = 0;
byte[] bytes = RandomBytes();
long amount = (fileStream.Length / bytes.Length + 1);
for (long i = 0; i < amount; i++)
{
fileStream.Write(bytes, 0, bytes.Length);
}
string name = fileStream.Name;
fileStream.Close();
File.Delete(name);
Related
I would like to anticipate the exact size of my file before writing it in my device, to handle the error or prevent the crash in case there is no space in the corresponding drive. So I have this simple console script, that generates the file:
using System;
using System.IO;
namespace myNamespace
{
class Program
{
static void Main(string[] args) {
byte[] myByteArray = new byte[100];
MemoryStream stream = new MemoryStream();
string fileName = "E:\\myFile.mine";
FileStream myFs = new FileStream(fileName, FileMode.CreateNew);
BinaryWriter toStreamWriter = new BinaryWriter(stream);
BinaryWriter toFileWriter = new BinaryWriter(myFs, System.Text.Encoding.ASCII);
myFs.Write(myByteArray, 0, myByteArray.Length);
for (int i = 0; i < 30000; i++) {
toStreamWriter.Write(i);
}
Console.WriteLine($"allocated memory: {stream.Capacity}" );
Console.WriteLine($"stream lenght {stream.Length}");
Console.WriteLine($"file size: {(stream.Length / 4) * 4.096 }");
toFileWriter.Write(stream.ToArray());
Console.ReadLine();
}
}
}
I got to the point when I get to anticipate the size of the file.
I will be stream.Length / 4) * 4.096, but as long an the ramainder of stream.Length / 4 is 0.
For example for the case of adding 13589 integers to the stream
for (int i = 0; i < 13589; i++) {
toStreamWriter.Write(i);
}
I get that the file size is 55660,544 bytes in the script, but then its 57344 bytes in the explorer.
Same result as if the integers added would have been 14000 instead of 13589.
How can I anticipate the exact size of my created file when the remainder of stream.Length / 4 is not 0?
Edit: For the potential helper running the script you need to delete the created file every time the script is run! Of course use a path and fileName of your choice :)
Regarding the relation stream.Length / 4) * 4.096, the 4 is coming for the byte size, and I guess that the 4.096 comes from the array and file generation, however any further explanation would be much appreciated.
Edit2: Check that if the pending results are logged with:
for (int i = 13589; i <= 14000; i++) {
Console.WriteLine($"result for {i} : {(i*4 / 4) * 4.096} ");
}
You obtain:
....
result for 13991 : 57307,136
result for 13992 : 57311,232
result for 13993 : 57315,328
result for 13994 : 57319,424
result for 13995 : 57323,52
result for 13996 : 57327,616
result for 13997 : 57331,712
result for 13998 : 57335,808
result for 13999 : 57339,904
result for 14000 : 57344
So I assume that file size fits the next cluster + byteStream size with no decimal reminder. Would this file size set logic make sense for the file size anrticipation? If the stream is very big also?
From what I understand from the comments, the question is about how to get the actual file size of the file. Not the file size on disk. And your code is actually almost correct in doing so.
The math is pretty basic. In your example, you create a file stream and write a 100 byte long array to the file stream. Then you create memory stream and write 30000 integers into the memory stream. Then you write the memory stream into the file stream. Considering that each integer here is 4 byte long, as specified by C#, the resulting file has a file size of (30000 * 4) + 100 = 120100 bytes. At least for me, it's exactly what the file properties say in the Windows Explorer.
You could get the same result a bit easier with the following code:
FileStream myFs = new FileStream("test.file", FileMode.CreateNew);
byte[] myByteArray = new byte[100];
myFs.Write(myByteArray, 0, myByteArray.Length);
BinaryWriter toFileWriter = new BinaryWriter(myFs, System.Text.Encoding.ASCII);
for (int i = 0; i < 30000; i++)
{
toFileWriter.Write(i);
}
Console.WriteLine($"stream lenght {myFs.Length}");
myFs.Close();
This will return a stream length of 120100 bytes.
In case I misunderstood your question and comments and you were actually trying to get the file size on disk:
Don't go there. You cannot reliably predict the file size on disk due to variable circumstances. For example, file compression, encryption, various RAID types, various file systems, various disk types, various operating systems.
I tried to read file (650 megabytes) from SQL Server:
using (var reader = command.ExecuteReader(CommandBehavior.SequentialAccess))
{
if (reader.Read())
{
using (var dbStream = reader.GetStream(0))
{
if (!reader.IsDBNull(0))
{
stream.Position = 0;
dbStream.CopyTo(stream, 256);
}
dbStream.Close();
}
}
reader.Close();
}
But I got OutOfMemoryException on CopyTo().
With small files, this code snippet works fine. How can I handle large file?
You can read and write data to some temp file in small chunks. You can see example on MSDN - Retrieving Binary Data.
//Column Index in the result set
const int colIdx = 0;
// Writes the BLOB to a file (*.bmp).
FileStream stream;
// Streams the BLOB to the FileStream object.
BinaryWriter writer;
// Size of the BLOB buffer.
int bufferSize = 100;
// The BLOB byte[] buffer to be filled by GetBytes.
byte[] outByte = new byte[bufferSize];
// The bytes returned from GetBytes.
long retval;
// The starting position in the BLOB output.
long startIndex = 0;
// Open the connection and read data into the DataReader.
connection.Open();
SqlDataReader reader = command.ExecuteReader(CommandBehavior.SequentialAccess);
while (reader.Read())
{
// Create a file to hold the output.
stream = new FileStream(
"some-physical-file-name-to-dump-data.bmp", FileMode.OpenOrCreate, FileAccess.Write);
writer = new BinaryWriter(stream);
// Reset the starting byte for the new BLOB.
startIndex = 0;
// Read bytes into outByte[] and retain the number of bytes returned.
retval = reader.GetBytes(colIdx, startIndex, outByte, 0, bufferSize);
// Continue while there are bytes beyond the size of the buffer.
while (retval == bufferSize)
{
writer.Write(outByte);
writer.Flush();
// Reposition start index to end of last buffer and fill buffer.
startIndex += bufferSize;
retval = reader.GetBytes(colIdx, startIndex, outByte, 0, bufferSize);
}
// Write the remaining buffer.
writer.Write(outByte, 0, (int)retval);
writer.Flush();
// Close the output file.
writer.Close();
stream.Close();
}
// Close the reader and the connection.
reader.Close();
connection.Close();
Make sure you are using SqlDataReader with CommandBehavior.SequentialAccess, note this line in above code snippet.
SqlDataReader reader = command.ExecuteReader(CommandBehavior.SequentialAccess);
More information on CommandBehavior enum can be found here.
EDIT:
Let me clarify myself. I agreed with #MickyD, cause of the problem is not whether you are using CommandBehavior.SequentialAccess or not, but reading the large file at-once.
I emphasized on this because it is commonly missed by developers, they tend to read files in chunks but without setting CommandBehavior.SequentialAccess they will encounter other problems. Although it is already posted with original question, but highlighted in my answer to give point to any new comers.
#MatthewWatson yeah var stream = new MemoreStream(); What is not right with it? – Kliver Max 15 hours ago
Your problem is not whether or not you are using:
`command.ExecuteReader(CommandBehavior.SequentialAccess)`
...which you are as we can see; or that your stream copy buffer size is too big (it's actually tiny) but rather that you are using MemoryStream as you indicated in the comments above. More than likely you are loading in the 650MB file twice, once from SQL and another to be stored in the MemoryStream thus leading to your OutOfMemoryException.
Though the solution is to instead write to a file stream, the cause of the problem wasn't highlighted in the accepted answer. Unless you know the cause of a problem, you won't learn to avoid such issues in the future.
I have written an application that implements a file copy that is written as below. I was wondering why, when attempting to copy from a network drive to a another network drive, the copy times are huge (20-30 mins to copy a 300mb file) with the following code:
public static void CopyFileToDestination(string source, string dest)
{
_log.Debug(string.Format("Copying file {0} to {1}", source, dest));
DateTime start = DateTime.Now;
string destinationFolderPath = Path.GetDirectoryName(dest);
if (!Directory.Exists(destinationFolderPath))
{
Directory.CreateDirectory(destinationFolderPath);
}
if (File.Exists(dest))
{
File.Delete(dest);
}
FileInfo sourceFile = new FileInfo(source);
if (!sourceFile.Exists)
{
throw new FileNotFoundException("source = " + source);
}
long totalBytesToTransfer = sourceFile.Length;
if (!CheckForFreeDiskSpace(dest, totalBytesToTransfer))
{
throw new ApplicationException(string.Format("Unable to copy file {0}: Not enough disk space on drive {1}.",
source, dest.Substring(0, 1).ToUpper()));
}
long bytesTransferred = 0;
using (FileStream reader = sourceFile.OpenRead())
{
using (FileStream writer = new FileStream(dest, FileMode.OpenOrCreate, FileAccess.Write))
{
byte[] buf = new byte[64 * 1024];
int bytesRead = reader.Read(buf, 0, buf.Length);
double lastPercentage = 0;
while (bytesRead > 0)
{
double percentage = ((float)bytesTransferred / (float)totalBytesToTransfer) * 100.0;
writer.Write(buf, 0, bytesRead);
bytesTransferred += bytesRead;
if (Math.Abs(lastPercentage - percentage) > 0.25)
{
System.Diagnostics.Debug.WriteLine(string.Format("{0} : Copied {1:#,##0} of {2:#,##0} MB ({3:0.0}%)",
sourceFile.Name,
bytesTransferred / (1024 * 1024),
totalBytesToTransfer / (1024 * 1024),
percentage));
lastPercentage = percentage;
}
bytesRead = reader.Read(buf, 0, buf.Length);
}
}
}
System.Diagnostics.Debug.WriteLine(string.Format("{0} : Done copying", sourceFile.Name));
_log.Debug(string.Format("{0} copied in {1:#,##0} seconds", sourceFile.Name, (DateTime.Now - start).TotalSeconds));
}
However, with a simple File.Copy, the time is as expected.
Does anyone have any insight? Could it be because we are making the copy in small chunks?
Changing the size of your buf variable doesn't change the size of the buffer that FileStream.Read or FileStream.Write use when communicating with the file system. To see any change with buffer size, you have to specify the buffer size when you open the file.
As I recall, the default buffer size is 4K. Performance testing I did some time ago showed that the sweet spot is somewhere between 64K and 256K, with 64K being more consistently the best choice.
You should change your File.OpenRead() to:
new FileStream(sourceFile.FullName, FileMode.Open, FileAccess.Read, FileShare.None, BufferSize)
Change the FileShare value if you don't want exclusive access, and declare BufferSize as a constant equal to whatever buffer size you want. I use 64*1024.
Also, change the way you open your output file to:
new FileStream(dest, FileMode.Create, FileAccess.Write, FileShare.None, BufferSize)
Note that I used FileMode.Create rather than FileMode.OpenOrCreate. If you use OpenOrCreate and the source file is smaller than the existing destination file, I don't think the file is truncated when you're done writing. So the destination file would contain extraneous data.
That said, I wouldn't expect this to change your copy time from 20-30 minutes down to the few seconds that it should take. I suppose it could if every low-level read requires a network call. With the default 4K buffer, you're making 16 read calls to the file system in order to fill your 64K buffer. So by increasing your buffer size you greatly reduce the number of OS calls (and potentially the number of network transactions) your code makes.
Finally, there's no need to check to see if a file exists before you delete it. File.Delete silently ignores an attempt to delete a file that doesn't exist.
Call the SetLength method on your writer Stream before actual copying, this should reduce operations by the target disk.
Like so
writer.SetLength(totalBytesToTransfer);
You may need to set the Stream's psoition back to the start after calling this method by using Seek. Check the position of the stream after calling SetLength, should be still zero.
writer.Seek(0, SeekOrigin.Begin); // Not sure on that one
If that still is too slow use the SetFileValidData
I am using this code to extract a chunk from file
// info is FileInfo object pointing to file
var percentSplit = info.Length * 50 / 100; // extract 50% of file
var bytes = new byte[percentSplit];
var fileStream = File.OpenRead(fileName);
fileStream.Read(bytes, 0, bytes.Length);
fileStream.Dispose();
File.WriteAllBytes(splitName, bytes);
Is there any way to speed up this process?
Currently for a 530 MB file it takes around 4 - 5 seconds. Can this time be improved?
There are several cases of you question, but none of them is language relevant.
Following are something to concern
What is the file system of source/destination file?
Do you want to keep original source file?
Are they lie on the same drive?
In c#, you almost do not have a method could be faster than File.Copy which invokes CopyFile of WINAPI internally. Because of the percentage is fifty, however, following code might not be faster. It copies whole file and then set the length of the destination file
var info=new FileInfo(fileName);
var percentSplit=info.Length*50/100; // extract 50% of file
File.Copy(info.FullName, splitName);
using(var outStream=File.OpenWrite(splitName))
outStream.SetLength(percentSplit);
Further, if
you don't keep original source after file splitted
destination drive is the same as source
your are not using a crypto/compression enabled file system
then, the best thing you can do, is don't copy files at all.
For example, if your source file lies on FAT or FAT32 file system, what you can do is
create new dir entry(entries) for newly splitted parts of file
let the entry(entries) point(s) to the cluster of target part(s)
set correct file size for each entry
check for cross-link and avoid that
If your file system was NTFS, you might need to spend a long time to study the spec.
Good luck!
var percentSplit = (int)(info.Length * 50 / 100); // extract 50% of file
var buffer = new byte[8192];
using (Stream input = File.OpenRead(info.FullName))
using (Stream output = File.OpenWrite(splitName))
{
int bytesRead = 1;
while (percentSplit > 0 && bytesRead > 0)
{
bytesRead = input.Read(buffer, 0, Math.Min(percentSplit, buffer.Length));
output.Write(buffer, 0, bytesRead);
percentSplit -= bytesRead;
}
output.Flush();
}
The flush may not be needed but it doesn't hurt, this was quite interesting, changing the loop to a do-while rather than a while had a big hit on performance. I suppose the IL is not as fast. My pc was running the original code in 4-6 secs, the attached code seemed to be running at about 1 second.
I get better results when reading/writing by chunks of a few megabytes. The performances changes also depending on the size of the chunk.
FileInfo info = new FileInfo(#"C:\source.bin");
FileStream f = File.OpenRead(info.FullName);
BinaryReader br = new BinaryReader(f);
FileStream t = File.OpenWrite(#"C:\split.bin");
BinaryWriter bw = new BinaryWriter(t);
long count = 0;
long split = info.Length * 50 / 100;
long chunk = 8000000;
DateTime start = DateTime.Now;
while (count < split)
{
if (count + chunk > split)
{
chunk = split - count;
}
bw.Write(br.ReadBytes((int)chunk));
count += chunk;
}
Console.WriteLine(DateTime.Now - start);
I am developing a TCP file transfer client-server program. At the moment I am able to send text files and other file formats perfectly fine, such as .zip with all contents intact on the server end. However, when I transfer a .gif the end result is a gif with same size as the original but with only part of the image showing as if most of the bytes were lost or not written correctly on the server end.
The client sends a 1KB header packet with the name and size of the file to the server. The server then responds with OK if ready and then creates a fileBuffer as large as the file to be sent is.
Here is some code to demonstrate my problem:
// Serverside method snippet dealing with data being sent
while (true)
{
// Spin the data in
if (streams[0].DataAvailable)
{
streams[0].Read(fileBuffer, 0, fileBuffer.Length);
break;
}
}
// Finished receiving file, write from buffer to created file
FileStream fs = File.Open(LOCAL_FOLDER + fileName, FileMode.CreateNew, FileAccess.Write);
fs.Write(fileBuffer, 0, fileBuffer.Length);
fs.Close();
Print("File successfully received.");
// Clientside method snippet dealing with a file send
while(true)
{
con.Read(ackBuffer, 0, ackBuffer.Length);
// Wait for OK response to start sending
if (Encoding.ASCII.GetString(ackBuffer) == "OK")
{
// Convert file to bytes
FileStream fs = new FileStream(inPath, FileMode.Open, FileAccess.Read);
fileBuffer = new byte[fs.Length];
fs.Read(fileBuffer, 0, (int)fs.Length);
fs.Close();
con.Write(fileBuffer, 0, fileBuffer.Length);
con.Flush();
break;
}
}
I've tried a binary writer instead of just using the filestream with the same result.
Am I incorrect in believing successful file transfer to be as simple as conversion to bytes, transportation and then conversion back to filename/type?
All help/advice much appreciated.
Its not about your image .. It's about your code.
if your image bytes were lost or not written correctly that's mean your file transfer code is wrong and even the .zip file or any other file would be received .. It's gonna be correpted.
It's a huge mistake to set the byte buffer length to the file size. imagine that you're going to send a large a file about 1GB .. then it's gonna take 1GB of RAM .. for an Idle transfering you should loop over the file to send.
This's a way to send/receive files nicely with no size limitation.
Send File
using (FileStream fs = new FileStream(srcPath, FileMode.Open, FileAccess.Read))
{
long fileSize = fs.Length;
long sum = 0; //sum here is the total of sent bytes.
int count = 0;
data = new byte[1024]; //8Kb buffer .. you might use a smaller size also.
while (sum < fileSize)
{
count = fs.Read(data, 0, data.Length);
network.Write(data, 0, count);
sum += count;
}
network.Flush();
}
Receive File
long fileSize = // your file size that you are going to receive it.
using (FileStream fs = new FileStream(destPath, FileMode.Create, FileAccess.Write))
{
int count = 0;
long sum = 0; //sum here is the total of received bytes.
data = new byte[1024 * 8]; //8Kb buffer .. you might use a smaller size also.
while (sum < fileSize)
{
if (network.DataAvailable)
{
{
count = network.Read(data, 0, data.Length);
fs.Write(data, 0, count);
sum += count;
}
}
}
}
happy coding :)
When you write over TCP, the data can arrive in a number of packets. I think your early tests happened to fit into one packet, but this gif file is arriving in 2 or more. So when you call Read, you'll only get what's arrived so far - you'll need to check repeatedly until you've got as many bytes as the header told you to expect.
I found Beej's guide to network programming a big help when doing some work with TCP.
As others have pointed out, the data doesn't necessarily all arrive at once, and your code is overwriting the beginning of the buffer each time through the loop. The more robust way to write your reading loop is to read as many bytes as are available and increment a counter to keep track of how many bytes have been read so far so that you know where to put them in the buffer. Something like this works well:
int totalBytesRead = 0;
int bytesRead;
do
{
bytesRead = streams[0].Read(fileBuffer, totalBytesRead, fileBuffer.Length - totalBytesRead);
totalBytesRead += bytesRead;
} while (bytesRead != 0);
Stream.Read will return 0 when there's no data left to read.
Doing things this way will perform better than reading a byte at a time. It also gives you a way to ensure that you read the proper number of bytes. If totalBytesRead is not equal to the number of bytes you expected when the loop is finished, then something bad happened.
Thanks for your input Tvanfosson. I tinkered around with my code and managed to get it working. The synchronicity between my client and server was off. I took your advice though and replaced read with reading a byte one at a time.