FileStream read from A after 128 bytes and write to B C# - c#

I have a large file (around 400GB) which I need to FileStream and skip the first 128 bytes into another file. I have the following code, but it is not working properly, because when I check the file sizes after the stream has finished, File B is missing a lot more than 128 bytes. What am I doing wrong?
private void SplitUnwantedHeader(string file1, string file2)
{
FileStream fr = new FileStream(file1, FileMode.Open, FileAccess.Read);
FileStream fw = new FileStream(file2, FileMode.Create, FileAccess.Write);
byte[] fByte = new byte[65534];
long headerToSplit = 128;
int bytesRead = 0;
try
{
fr.Position = headerToSplit;
do
{
bytesRead = fr.Read(fByte, 0, fByte.Length);
fw.Write(fByte, 0, fByte.Length - (int)headerToSplit);
} while (bytesRead != 0);
}
catch (Exception ex)
{
UpdateStatusBarMessage.ShowStatusMessage(ex.Message);
}
finally
{
fw.Close();
fr.Close();
}
}
Thanks.

The line
fw.Write(fByte, 0, fByte.Length - (int)headerToSplit);
is wrong when used in a loop like this. It will write the "buffer size" minus 128 bytes every loop cycle. Instead, the code should write bytesRead count during the copy.
fw.Write(fByte, 0, bytesRead);
Only perform the offset before entering the copy-everything-else loop. Also, the loop can be replaced with FileStream.CopyTo (since .NET 4) and using can tidy up resource management.
That is, consider:
using (var fr = new FileStream(file1, FileMode.Open, FileAccess.Read))
using (var fw = new FileStream(file2, FileMode.Create, FileAccess.Write)) {
fr.Position = 128; // or fr.Seek(128, SeekOrigin.Begin);
fr.CopyTo(fw, 65534);
}

There are two things wrong with the code:
Instead of skipping the first 128 bytes of the first block, it is skipping the last 128 bytes of every block.
It is ignoring the bytesRead value when writing, so it may be writing data from the buffer that was never read into the buffer. The number of bytes read can be less than the number of bytes requested, even when you are not at the end of the file.
The code is a mix between skipping the header before the loop and skipping the header inside the loop. You should do one, not both.
You can check how much data you have in the buffer compared to how much you should skip, and update the number of bytes to skip so that it's zero once you are beyond the header:
do {
bytesRead = fr.Read(fByte, 0, fByte.Length);
if (bytesRead > headerToSplit) {
fw.Write(fByte, (int)headerToSplit, bytesRead - (int)headerToSplit);
headerToSplit = 0;
} else {
headerToSplit -= bytesRead;
}
} while (bytesRead != 0);
Or if you are skipping the header before the loop, just write all the data that you have in the buffer:
fr.Position = headerToSplit;
do {
bytesRead = fr.Read(fByte, 0, fByte.Length);
fw.Write(fByte, 0, bytesRead);
} while (bytesRead != 0);

Related

Correct way to use GZipStream in dotNET C#

I'm working with GZipStream at the moment using .net 3.5.
I have two methods listed below. As input file I use text file which consists of chars 's'. Size of the file is 2MB. This code works fine if I use .net 4.5 but with .net 3.5 after compress and decompress I get file of size 435KB which of course isn't the same with source file.
If I try to decompress file via WinRAR it is also looks good (the same with source file).
If I try decompress file using GZipStream from .net4.5 (file compressed via GZipStream from .net 3.5) the result is bad.
UPD:
In general I really need to read the file as several separate gzip chunks, in this case all the bytes of copressed files are read at one call of the Read() method so I still don't understand why decompressing doesn't works.
public void CompressFile()
{
string fileIn = #"D:\sin2.txt";
string fileOut = #"D:\sin2.txt.pgz";
using (var fout = File.Create(fileOut))
{
using (var fin = File.OpenRead(fileIn))
{
using (var zip = new GZipStream(fout, CompressionMode.Compress))
{
var buffer = new byte[1024 * 1024 * 10];
int n = fin.Read(buffer, 0, buffer.Length);
zip.Write(buffer, 0, n);
}
}
}
}
public void DecompressFile()
{
string fileIn = #"D:\sin2.txt.pgz";
string fileOut = #"D:\sin2.1.txt";
using (var fsout = File.Create(fileOut))
{
using (var fsIn = File.OpenRead(fileIn))
{
var buffer = new byte[1024 * 1024 * 10];
int n;
while ((n = fsIn.Read(buffer, 0, buffer.Length)) > 0)
{
using (var ms = new MemoryStream(buffer, 0, n))
{
using (var zip = new GZipStream(ms, CompressionMode.Decompress))
{
int nRead = zip.Read(buffer, 0, buffer.Length);
fsout.Write(buffer, 0, nRead);
}
}
}
}
}
}
You're trying to decompress each "chunk" as if it's a separate gzip file. Don't do that - just read from the GZipStream in a loop:
using (var fsout = File.Create(fileOut))
{
using (var fsIn = File.OpenRead(fileIn))
{
using (var zip = new GZipStream(fsIn, CompressionMode.Decompress))
{
var buffer = new byte[1024 * 32];
int bytesRead;
while ((bytesRead = zip.Read(buffer, 0, buffer.Length)) > 0)
{
fsout.Write(buffer, 0, bytesRead);
}
}
}
}
Note that your compression code should look similar, reading in a loop rather than assuming a single call to Read will read all the data.
(Personally I'd skip fsIn, and just use new GZipStream(File.OpenRead(fileIn)) but that's just a personal preference.)
First, as #Jon Skeet mentioned, you are not using Stream.Read method correctly. It doesn't matter if your buffer is big enough or not, the stream is allowed to return less bytes than requested, with zero indicating no more, so reading from stream should always be performed in a loop.
However the main problem in your decompress code is the way you share the buffer. Your read the input into a buffer, than wrap it in a MemoryStream (note that the constructor used does not make a copy of the passed array, but actually sets it as it's internal buffer), and then you try to read and write to that buffer at the same time. Taking into account that decompressing writes data "faster" than reading, it's surprising that your code works at all.
The correct implementation is quite simple
static void CompressFile()
{
string fileIn = #"D:\sin2.txt";
string fileOut = #"D:\sin2.txt.pgz";
using (var input = File.OpenRead(fileIn))
using (var output = new GZipStream(File.Create(fileOut), CompressionMode.Compress))
Write(input, output);
}
static void DecompressFile()
{
string fileIn = #"D:\sin2.txt.pgz";
string fileOut = #"D:\sin2.1.txt";
using (var input = new GZipStream(File.OpenRead(fileIn), CompressionMode.Decompress))
using (var output = File.Create(fileOut))
Write(input, output);
}
static void Write(Stream input, Stream output, int bufferSize = 10 * 1024 * 1024)
{
var buffer = new byte[bufferSize];
for (int readCount; (readCount = input.Read(buffer, 0, buffer.Length)) > 0;)
output.Write(buffer, 0, readCount);
}

Reading(/Writing) Files in C#

I recently wanted to track progress of a HTTPWebRequest Upload progress. So I started small and started with buffered read of a simple text file. I then discovered that a simple task like
File.ReadAllText("text.txt");
becomes something like below, with all the streams, readers, writers etc. Or can somethings be removed? Also the code below is not working. Maybe I did something wrong, whats the way to read (i guess write will be similar) into buffer so that I can track progress, assuming the stream are not local eg. WebRequest
byte[] buffer = new byte[2560]; // 20KB Buffer, btw, how should I decide the buffer size?
int bytesRead = 0, read = 0;
FileStream inStream = new FileStream("./text.txt", FileMode.Open, FileAccess.Read);
MemoryStream outStream = new MemoryStream();
BinaryWriter outWriter = new BinaryWriter(outStream);
// I am getting "Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection."
// inStream.Length = Length = 9335092
// bytesRead = 2560
// buffer.Length = 2560
while ((read = inStream.Read(buffer, bytesRead, buffer.Length)) > 0)
{
outWriter.Write(buffer);
//outStream.Write(buffer, bytesRead, buffer.Length);
bytesRead += read;
Debug.WriteLine("Progress: " + bytesRead / inStream.Length * 100 + "%");
}
outWriter.Flush();
txtLog.Text = outStream.ToString();
Update: Solution
byte[] buffer = new byte[2560];
int bytesRead = 0, read = 0;
FileStream inStream = File.OpenRead("text.txt");
MemoryStream outStream = new MemoryStream();
while ((read = inStream.Read(buffer, 0, buffer.Length)) > 0)
{
outStream.Write(buffer, 0, buffer.Length);
bytesRead += read;
Debug.WriteLine((double)bytesRead / inStream.Length * 100);
}
inStream.Close();
outStream.Close();
should probably be
outWriter.Write(buffer,0,read);
Since you seem to be reading text (although I could be wrong), it seems that your program could be a lot simpler if you read character by character instead of calling the standard Read():
BinaryReader reader = new BinaryReader(File.Open("./text.txt", FileMode.Open));
MemoryStream outStream = new MemoryStream();
StreamWriter outWriter = new StreamWriter(outStream);
while (Reader.BaseStream.Position < Reader.BaseStream.Length)
{
outWriter.Write(reader.ReadChar());
Debug.WriteLine("Progress: " + ((double)reader.BaseStream.Position) / (double)(reader.BaseStream.Length) + "%");
}
outWriter.Close();
txtLog.Text = outStream.ToString();
Since you only need to check the progress of the upload operation you can just check the size of the file using a fileinfo object.
In the FileInfo class theres a property called length that returns the file size in bytes. Not sure if it gives the current size when the file being written. But I think it'll be worth giving a try as it is more simple and efficient than the method that you are using

How to write contents of one file to another file?

I need to write contents of a file to another file using File.OpenRead and File.OpenWrite methods. I am unable to figure out how to do it.
How can i modify the following code to work for me.
using (FileStream stream = File.OpenRead("C:\\file1.txt"))
using (FileStream writeStream = File.OpenWrite("D:\\file2.txt"))
{
BinaryReader reader = new BinaryReader(stream);
BinaryWriter writer = new BinaryWriter(writeStream);
writer.Write(reader.ReadBytes(stream.Length));
}
using (FileStream stream = File.OpenRead("C:\\file1.txt"))
using (FileStream writeStream = File.OpenWrite("D:\\file2.txt"))
{
BinaryReader reader = new BinaryReader(stream);
BinaryWriter writer = new BinaryWriter(writeStream);
// create a buffer to hold the bytes
byte[] buffer = new Byte[1024];
int bytesRead;
// while the read method returns bytes
// keep writing them to the output stream
while ((bytesRead =
stream.Read(buffer, 0, 1024)) > 0)
{
writeStream.Write(buffer, 0, bytesRead);
}
}
Just wonder why not to use this:
File.Copy("C:\\file1.txt", "D:\\file2.txt");
You should be using File.Copy unless you want to append to the second file.
If you want to append you can still use the File class.
string content = File.ReadAllText("C:\\file1.txt");
File.AppendAllText("D:\\file2.txt",content);
This works for file with small size as entire file in loaded into the memory.
Try something along these lines:
using (FileStream input = File.OpenRead(pathToInputFile),
output = File.OpenWrite(pathToOutputFile))
{
int read = -1;
byte[] buffer = new byte[4096];
while (read != 0)
{
read = input.Read(buffer, 0, buffer.Length);
output.Write(buffer, 0, read);
}
}
Note that this is somewhat 'skeletal' and you should amend as required for your application of it.
Is it necessary to us FileStream? Because you can do this very easily with simple File Class like;
using System.IO;
string FileContent = File.ReadAllText(FilePathWhoseTextYouWantToCopy);
File.WriteAllText(FilePathToWhomYouWantToPasteTheText,FileContent);
using (var inputStream = File.OpenRead(#"C:\file1.txt"))
{
using (var outputStream = File.OpenWrite(#"D:\file2.txt"))
{
int bufferLength = 128;
byte[] buffer = new byte[bufferLength];
int bytesRead = 0;
do
{
bytesRead = inputStream.Read(buffer, 0, bufferLength);
outputStream.Write(buffer, 0, bytesRead);
}
while (bytesRead != 0);
}
}
Use FileStream class, from System.IO.
[ComVisibleAttribute(true)]
public class FileStream : Stream
Have you checked that the reader is reading all the data? This MSDN page has an example that checks all the data is read:
byte[] verifyArray = binReader.ReadBytes(arrayLength);
if(verifyArray.Length != arrayLength)
{
Console.WriteLine("Error reading the data.");
return;
}
The other alternative is that you probably need to Flush the output buffer:
writer.Flush();
If you are not keen at using Read/Write function of File , you can better try using Copy functionality
Easiest will be :
File.Copy(source_file_name, destination_file_name, true)
true--> for overwriting existing file,without "true" it will create a new file.But if the file already exists it will throw exception without "true" argument.

C# 4.0: Convert pdf to byte[] and vice versa

How do I convert a pdf file to a byte[] and vice versa?
// loading bytes from a file is very easy in C#. The built in System.IO.File.ReadAll* methods take care of making sure every byte is read properly.
// note that for Linux, you will not need the c: part
// just swap out the example folder here with your actual full file path
string pdfFilePath = "c:/pdfdocuments/myfile.pdf";
byte[] bytes = System.IO.File.ReadAllBytes(pdfFilePath);
// munge bytes with whatever pdf software you want, i.e. http://sourceforge.net/projects/itextsharp/
// bytes = MungePdfBytes(bytes); // MungePdfBytes is your custom method to change the PDF data
// ...
// make sure to cleanup after yourself
// and save back - System.IO.File.WriteAll* makes sure all bytes are written properly - this will overwrite the file, if you don't want that, change the path here to something else
System.IO.File.WriteAllBytes(pdfFilePath, bytes);
using (FileStream fs = new FileStream("sample.pdf", FileMode.Open, FileAccess.Read))
{
byte[] bytes = new byte[fs.Length];
int numBytesToRead = (int)fs.Length;
int numBytesRead = 0;
while (numBytesToRead > 0)
{
// Read may return anything from 0 to numBytesToRead.
int n = fs.Read(bytes, numBytesRead, numBytesToRead);
// Break when the end of the file is reached.
if (n == 0)
{
break;
}
numBytesRead += n;
numBytesToRead -= n;
}
numBytesToRead = bytes.Length;
}
Easiest way:
byte[] buffer;
using (Stream stream = new IO.FileStream("file.pdf"))
{
buffer = new byte[stream.Length - 1];
stream.Read(buffer, 0, buffer.Length);
}
using (Stream stream = new IO.FileStream("newFile.pdf"))
{
stream.Write(buffer, 0, buffer.Length);
}
Or something along these lines...

If I check stream for valid image I can't write bytes to server

I am trying to check if a file is an image before I upload it to the image server.
I am doing it with the following function, which works exceptionally well:
static bool IsValidImage(Stream imageStream)
{
bool isValid = false;
try
{
// Read the image without validating image data
using (Image img = Image.FromStream(imageStream, false, false))
{
isValid = true;
}
}
catch
{
;
}
return isValid;
}
The problem is that when the below is called immediately afterwards, The line:
while ((bytesRead = request.FileByteStream.Read(buffer, 0, bufferSize)) > 0)
evalueates to zero and no bytes are read. I notice that when I remove the
IsValidImage function, bytes are read and the file is written. It seems
that bytes can only be read once? Any idea how to fix this?
using (FileStream outfile = new FileStream(filePath, FileMode.Create))
{
const int bufferSize = 65536; // 64K
int bytesRead = 0;
Byte[] buffer = new Byte[bufferSize];
while ((bytesRead = request.FileByteStream.Read(buffer, 0, bufferSize)) > 0)
{
outfile.Write(buffer, 0, bytesRead);
}
outfile.Close(); //necessary?
}
UPDATE: Thanks for your help Marc. I am new to stream manipulation and could use a little
more help here. I took a shot but may be mixing up the use of filestream and memorystream.
Would you mind taking a look? Thanks again.
using (FileStream outfile = new FileStream(filePath, FileMode.Create))
using (MemoryStream ms = new MemoryStream())
{
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = request.FileByteStream.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, bytesRead);
}
// ms now has a seekable/rewindable copy of the data
// TODO: read ms the first time
// I replaced request.FileByteStream with ms but am unsure about
// the using statement in the IsValidImage function.
if (!IsValidImage(ms) == true)
{
ms.Close();
request.FileByteStream.Close();
return;
}
ms.Position = 0;
// TODO: read ms the second time
byte[] m_buffer = new byte[ms.Length];
while ((bytesRead = ms.Read(m_buffer, 0, (int)ms.Length)) > 0)
{
outfile.Write(m_buffer, 0, bytesRead);
}
}
static bool IsValidImage(MemoryStream imageStream)
{
bool isValid = false;
try
{
// Read the image without validating image data
using (Image img = Image.FromStream(imageStream, false, false))
{
isValid = true;
}
}
catch
{
;
}
return isValid;
}
As you read from any stream, the position increases. If you read a stream to the end (as is typical), and then try to read again, then it will return EOF.
For some streams, you can seek - set the Position to 0, for example. However, you should try to avoid relying on this as it is not available for many streams (especially when network IO is involved). You can query this ability via CanSeek, but it would be simpler to avoid this - partly as if you are branching based on this, you suddenly have twice as much code to maintain.
If you need the data twice, then the options depends on the size of the data. For small streams, buffer it in-memory, as either a byte[] or a MemoryStream. For larger streams (or if you don't know the size) then writing to a scratch file (and deleting afterwards) is a reasonable approach. You can open and read the file as many times (in series, not in parallel) as you like.
If you are happy the stream isn't too large (although maybe add a cap to prevent people uploading swap-files, etc):
using (MemoryStream ms = new MemoryStream()) {
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = inputStream.Read(buffer, 0, buffer.Length)) > 0) {
ms.Write(buffer, 0, bytesRead);
}
// ms now has a seekable/rewindable copy of the data
// TODO: read ms the first time
ms.Position = 0;
// TODO: read ms the second time
}
Indeed Stream instances remember where the current "cursor" is. Some streams support "rewinding". The "CanSeek" property will then return true. In the case of a HTTP request Ithis won't work (CanSeek = false).
Isn't a MIME-type sent from the browser as well?
If you really want to keep your way of checking you'll have to go with Marc's proposition
In your update, you have a problem reading the stream a second time.
byte[] m_buffer = new byte[ms.Length];
while ((bytesRead = ms.Read(m_buffer, 0, (int)ms.Length)) > 0)
{
outfile.Write(m_buffer, 0, bytesRead);
}
The solution is simple:
byte[] m_buffer = ms.ToArray();
outfile.Write(m_buffer, 0, m_buffer.Length);
See also MemoryStream.ToArray
public static bool IsImagen(System.IO.Stream stream, String fileName)
{
try
{
using (Image img = Image.FromStream(stream, false, false))
{
if (fileName.ToLower().IndexOf(".jpg") > 0)
return true;
if (fileName.ToLower().IndexOf(".gif") > 0)
return true;
if (fileName.ToLower().IndexOf(".png") > 0)
return true;
}
}
catch (ArgumentException){}
return false;
}

Categories

Resources