Setting Buffer Size in GZipStream - c#

I was writing a light-weight proxy in c#. When I was decoding the gzip contentEncoding I noted that if I use a small buffer-size(4096) the stream is decoded partially depending the size of the input. Is it a bug in my code or something which is needed to make it work? I set the buffer to 10 MB, and it works okay but defeats my purpose of writing a light-weight proxy.
response = webEx.Response as HttpWebResponse;
Stream input = response.GetResponseStream();
//some other operations on response header
//calling DecompressGzip here
private static string DecompressGzip(Stream input, Encoding e)
{
StringBuilder sb = new StringBuilder();
using (Ionic.Zlib.GZipStream decompressor = new Ionic.Zlib.GZipStream(input, Ionic.Zlib.CompressionMode.Decompress))
{
// works okay for [1024*1024*8];
byte[] buffer = new byte[4096];
int n = 0;
do
{
n = decompressor.Read(buffer, 0, buffer.Length);
if (n > 0)
{
sb.Append(e.GetString(buffer));
}
} while (n > 0);
}
return sb.ToString();
}

Actually, I figured it out. I guess using the string builder causes the problem; instead, I used a memory stream and it works well.
private static string DecompressGzip(Stream input, Encoding e)
{
using (Ionic.Zlib.GZipStream decompressor = new Ionic.Zlib.GZipStream(input, Ionic.Zlib.CompressionMode.Decompress))
{
int read = 0;
var buffer = new byte[4096];
using (MemoryStream output = new MemoryStream())
{
while ((read = decompressor.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, read);
}
return e.GetString(output.ToArray());
}
}
}

Related

Correct way to use GZipStream in dotNET C#

I'm working with GZipStream at the moment using .net 3.5.
I have two methods listed below. As input file I use text file which consists of chars 's'. Size of the file is 2MB. This code works fine if I use .net 4.5 but with .net 3.5 after compress and decompress I get file of size 435KB which of course isn't the same with source file.
If I try to decompress file via WinRAR it is also looks good (the same with source file).
If I try decompress file using GZipStream from .net4.5 (file compressed via GZipStream from .net 3.5) the result is bad.
UPD:
In general I really need to read the file as several separate gzip chunks, in this case all the bytes of copressed files are read at one call of the Read() method so I still don't understand why decompressing doesn't works.
public void CompressFile()
{
string fileIn = #"D:\sin2.txt";
string fileOut = #"D:\sin2.txt.pgz";
using (var fout = File.Create(fileOut))
{
using (var fin = File.OpenRead(fileIn))
{
using (var zip = new GZipStream(fout, CompressionMode.Compress))
{
var buffer = new byte[1024 * 1024 * 10];
int n = fin.Read(buffer, 0, buffer.Length);
zip.Write(buffer, 0, n);
}
}
}
}
public void DecompressFile()
{
string fileIn = #"D:\sin2.txt.pgz";
string fileOut = #"D:\sin2.1.txt";
using (var fsout = File.Create(fileOut))
{
using (var fsIn = File.OpenRead(fileIn))
{
var buffer = new byte[1024 * 1024 * 10];
int n;
while ((n = fsIn.Read(buffer, 0, buffer.Length)) > 0)
{
using (var ms = new MemoryStream(buffer, 0, n))
{
using (var zip = new GZipStream(ms, CompressionMode.Decompress))
{
int nRead = zip.Read(buffer, 0, buffer.Length);
fsout.Write(buffer, 0, nRead);
}
}
}
}
}
}
You're trying to decompress each "chunk" as if it's a separate gzip file. Don't do that - just read from the GZipStream in a loop:
using (var fsout = File.Create(fileOut))
{
using (var fsIn = File.OpenRead(fileIn))
{
using (var zip = new GZipStream(fsIn, CompressionMode.Decompress))
{
var buffer = new byte[1024 * 32];
int bytesRead;
while ((bytesRead = zip.Read(buffer, 0, buffer.Length)) > 0)
{
fsout.Write(buffer, 0, bytesRead);
}
}
}
}
Note that your compression code should look similar, reading in a loop rather than assuming a single call to Read will read all the data.
(Personally I'd skip fsIn, and just use new GZipStream(File.OpenRead(fileIn)) but that's just a personal preference.)
First, as #Jon Skeet mentioned, you are not using Stream.Read method correctly. It doesn't matter if your buffer is big enough or not, the stream is allowed to return less bytes than requested, with zero indicating no more, so reading from stream should always be performed in a loop.
However the main problem in your decompress code is the way you share the buffer. Your read the input into a buffer, than wrap it in a MemoryStream (note that the constructor used does not make a copy of the passed array, but actually sets it as it's internal buffer), and then you try to read and write to that buffer at the same time. Taking into account that decompressing writes data "faster" than reading, it's surprising that your code works at all.
The correct implementation is quite simple
static void CompressFile()
{
string fileIn = #"D:\sin2.txt";
string fileOut = #"D:\sin2.txt.pgz";
using (var input = File.OpenRead(fileIn))
using (var output = new GZipStream(File.Create(fileOut), CompressionMode.Compress))
Write(input, output);
}
static void DecompressFile()
{
string fileIn = #"D:\sin2.txt.pgz";
string fileOut = #"D:\sin2.1.txt";
using (var input = new GZipStream(File.OpenRead(fileIn), CompressionMode.Decompress))
using (var output = File.Create(fileOut))
Write(input, output);
}
static void Write(Stream input, Stream output, int bufferSize = 10 * 1024 * 1024)
{
var buffer = new byte[bufferSize];
for (int readCount; (readCount = input.Read(buffer, 0, buffer.Length)) > 0;)
output.Write(buffer, 0, readCount);
}

How to get all data from NetworkStream

I am trying to read all data present in the buffer of the Machine connected through TCP/IP but i don't know why i am not getting all data ,some data is getting Missed.
Here is the code that i am using ..
using (NetworkStream stream = client.GetStream())
{
byte[] data = new byte[1024];
int numBytesRead = stream.Read(data, 0, data.Length);
if (numBytesRead > 0)
{
string str= Encoding.ASCII.GetString(data, 0, numBytesRead);
}
}
Please tell me what i am missing to get all the data from the machine.
Thanks in advance..
The problem with your code is that you will not get all the data if the data size is bigger than the buffer size (1024 bytes in your case) so you have to Read the stream inside the loop. Then you can Write all the data inside a MemoryStream until the end of the NetworkStream.
string str;
using (NetworkStream stream = client.GetStream())
{
byte[] data = new byte[1024];
using (MemoryStream ms = new MemoryStream())
{
int numBytesRead ;
while ((numBytesRead = stream.Read(data, 0, data.Length)) > 0)
{
ms.Write(data, 0, numBytesRead);
}
str = Encoding.ASCII.GetString(ms.ToArray(), 0, (int)ms.Length);
}
}
This example from MSDN: NetworkStream.DataAvailable shows how you can use that property to do so:
// Examples for CanRead, Read, and DataAvailable.
// Check to see if this NetworkStream is readable.
if(myNetworkStream.CanRead)
{
byte[] myReadBuffer = new byte[1024];
StringBuilder myCompleteMessage = new StringBuilder();
int numberOfBytesRead = 0;
// Incoming message may be larger than the buffer size.
do{
numberOfBytesRead = myNetworkStream.Read(myReadBuffer, 0, myReadBuffer.Length);
myCompleteMessage.AppendFormat("{0}", Encoding.ASCII.GetString(myReadBuffer, 0, numberOfBytesRead));
}
while(myNetworkStream.DataAvailable);
// Print out the received message to the console.
Console.WriteLine("You received the following message : " +
myCompleteMessage);
}
else
{
Console.WriteLine("Sorry. You cannot read from this NetworkStream.");
}
Try this:
private string GetResponse(NetworkStream stream)
{
byte[] data = new byte[1024];
using (MemoryStream memoryStream = new MemoryStream())
{
do
{
stream.Read(data, 0, data.Length);
memoryStream.Write(data, 0, data.Length);
} while (stream.DataAvailable);
return Encoding.ASCII.GetString(memoryStream.ToArray(), 0, (int)memoryStream.Length);
}
}
Try this code:
using (NetworkStream stream = client.GetStream())
{
while (!stream.DataAvailable)
{
Thread.Sleep(20);
}
if (stream.DataAvailable && stream.CanRead)
{
Byte[] data = new Byte[1024];
List<byte> allData = new List<byte>();
do
{
int numBytesRead = stream.Read(data,0,data.Length);
if (numBytesRead == data.Length)
{
allData.AddRange(data);
}
else if (numBytesRead > 0)
{
allData.AddRange(data.Take(numBytesRead));
}
} while (stream.DataAvailable);
}
}
Hope this helps, it should prevent that you miss any data sended to you.
The synchronous method sometimes does not display the request body. Using the asynchronous method stably displays request body.
string request = default(string);
StringBuilder sb = new StringBuilder();
byte[] buffer = new byte[client.ReceiveBufferSize];
int bytesCount;
if (client.GetStream().CanRead)
{
do
{
bytesCount = client.GetStream().ReadAsync(buffer, 0, buffer.Length).Result;
sb.Append(Encoding.UTF8.GetString(buffer, 0, bytesCount));
}
while(client.GetStream().DataAvailable);
request = sb.ToString();
}
TCP itself does not have any ways to define "end of data" condition. This is responsibility of application level portocol.
For instance see HTTP request description:
A client request (consisting in this case of the request line and only one header field) is followed by a blank line, so that the request ends with a double newline
So, for request end of data is determined by two newline sequences. And for response:
Content-Type specifies the Internet media type of the data conveyed by the HTTP message, while Content-Length indicates its length in bytes.
The response content size is specified in header before data.
So, it's up to you how to encode amount of data transferred at once - it can be just first 2 or 4 bytes in the beginning of the data holding total size to read or more complex ways if needed.
for my scenario, the message itself was telling the length of subsequent message. here is the code
int lengthOfMessage=1024;
string message = "";
using (MemoryStream ms = new MemoryStream())
{
int numBytesRead;
while ((numBytesRead = memStream.Read(MessageBytes, 0, lengthOfMessage)) > 0)
{
lengthOfMessage = lengthOfMessage - numBytesRead;
ms.Write(MessageBytes, 0, numBytesRead);
}
message = Encoding.ASCII.GetString(ms.ToArray(), 0, (int)ms.Length);
}
#George Chondrompilas answer is correct but instead of writing it by yourself you can use CopyTo function which does the same :
https://stackoverflow.com/a/65188160/4120180

Partially download and serialize big file in C#?

As part of an upcoming project at my university, I need to write a client that downloads a media file from a server and writes it to the local disk. Since these files can be very large, I need to implement partial download and serialization in order to avoid excessive memory use.
What I came up with:
namespace PartialDownloadTester
{
using System;
using System.Diagnostics.Contracts;
using System.IO;
using System.Net;
using System.Text;
public class DownloadClient
{
public static void Main(string[] args)
{
var dlc = new DownloadClient(args[0], args[1], args[2]);
dlc.DownloadAndSaveToDisk();
Console.ReadLine();
}
private WebRequest request;
// directory of file
private string dir;
// full file identifier
private string filePath;
public DownloadClient(string uri, string fileName, string fileType)
{
this.request = WebRequest.Create(uri);
this.request.Method = "GET";
var sb = new StringBuilder();
sb.Append("C:\\testdata\\DownloadedData\\");
this.dir = sb.ToString();
sb.Append(fileName + "." + fileType);
this.filePath = sb.ToString();
}
public void DownloadAndSaveToDisk()
{
// make sure directory exists
this.CreateDir();
var response = (HttpWebResponse)request.GetResponse();
Console.WriteLine("Content length: " + response.ContentLength);
var rStream = response.GetResponseStream();
int bytesRead = -1;
do
{
var buf = new byte[2048];
bytesRead = rStream.Read(buf, 0, buf.Length);
rStream.Flush();
this.SerializeFileChunk(buf);
}
while (bytesRead != 0);
}
private void CreateDir()
{
if (!Directory.Exists(dir))
{
Directory.CreateDirectory(dir);
}
}
private void SerializeFileChunk(byte[] bytes)
{
Contract.Requires(!Object.ReferenceEquals(bytes, null));
FileStream fs = File.Open(filePath, FileMode.Append);
fs.Write(bytes, 0, bytes.Length);
fs.Flush();
fs.Close();
}
}
}
For testing purposes, I've used the following parameters:
"http://itu.dk/people/janv/mufc_abc.jpg" "mufc_abc" "jpg"
However, the picture is incomplete (only the first ~10% look right) even though the content length prints 63780 which is the actual size of the image.
So my questions are:
Is this the right way to go for partial download and serialization or is there a better/easier approach?
Is the full content of the response stream stored in client memory? If this is the case, do I need to use HttpWebRequest.AddRange to partially download data from the server in order to conserve my client's memory?
How come the serialization fails and I get a broken image?
Do I introduce a lot of overhead when I use the FileMode.Append? (msdn states that this option "seeks to the end of the file")
Thanks in advance
You could definitely simplify your code using a WebClient:
class Program
{
static void Main()
{
DownloadClient("http://itu.dk/people/janv/mufc_abc.jpg", "mufc_abc.jpg");
}
public static void DownloadClient(string uri, string fileName)
{
using (var client = new WebClient())
{
using (var stream = client.OpenRead(uri))
{
// work with chunks of 2KB => adjust if necessary
const int chunkSize = 2048;
var buffer = new byte[chunkSize];
using (var output = File.OpenWrite(fileName))
{
int bytesRead;
while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
}
}
}
}
Notice how I am writing only the number of bytes I have actually read from the socket to the output file and not the entire 2KB buffer.
I don't know if this is the source of the problem, however I would change the loop like this
const int ChunkSize = 2048;
var buf = new byte[ChunkSize];
var rStream = response.GetResponseStream();
do {
int bytesRead = rStream.Read(buf, 0, ChunkSize);
if (bytesRead > 0) {
this.SerializeFileChunk(buf, bytesRead);
}
} while (bytesRead == ChunkSize);
The serialize method would get an additional argument
private void SerializeFileChunk(byte[] bytes, int numBytes)
and then write the right number of bytes
fs.Write(bytes, 0, numBytes);
UPDATE:
I do not see the need for closing and reopening the file each time. I also would use the using statement, which closes the resources, even if an exception should occur. The using statement calls the Dispose() method of the resource at the end, which in turn calls Close() in the case of file streams. using can be applied to all types implementing IDisposable.
var buf = new byte[2048];
using (var rStream = response.GetResponseStream()) {
using (FileStream fs = File.Open(filePath, FileMode.Append)) {
do {
bytesRead = rStream.Read(buf, 0, buf.Length);
fs.Write(bytes, 0, bytesRead);
} while (...);
}
}
The using statement does something like this
{
var rStream = response.GetResponseStream();
try
{
// do some work with rStream here.
} finally {
if (rStream != null) {
rStream.Dispose();
}
}
}
Here is the solution from Microsoft: http://support.microsoft.com/kb/812406
Updated 2021-03-16: seems the original article is not available now. Here is the archived one: https://mskb.pkisolutions.com/kb/812406

Downloading and saving excel file

i am using: `private void get_stocks_data()
{
byte[] result;
byte[] buffer = new byte[4096];
WebRequest wr = WebRequest.Create("http://www.tase.co.il/TASE/Pages/ExcelExport.aspx?sn=he-IL_ds&enumTblType=AllSecurities&Columns=he-IL_Columns&Titles=he-IL_Titles&TblId=0&ExportType=1");
using (WebResponse response = wr.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
using (MemoryStream memoryStream = new MemoryStream())
{
int count = 0;
do
{
count = responseStream.Read(buffer, 0, buffer.Length);
memoryStream.Write(buffer, 0, count);
} while (count != 0);
result = memoryStream.ToArray();
write_data_to_excel(result);
}
}
}`
to download the excel file,
And this method to fill the file on my computer:
private void write_data_to_excel(byte[] input)
{
StreamWriter str = new StreamWriter("stockdata.xls");
for (int i = 0; input.Length > i; i++)
{
str.WriteLine(input[i].ToString());
}
str.Close();
}
The result is that i get a lot of numbers...
What am i doing wrong? the file i am downloadin is excel version 2003, on my computer i have 2007...
Thanks.
I would suggest that you use WebClient.DownloadFile() instead.
This is a higher level method that will abstract from creating the request manually, dealing with encoding, etc.
Problem is in your Write_data_to_excel function
as you are using StreamWriter.WriteLine method it needs string,
you are passing byte as string so your binary value say 10 will be now string 10
try FileStream f = File.OpenWrite("stockdata.xlsx");
f.Write(input,0,input.Length); this will work.

If I check stream for valid image I can't write bytes to server

I am trying to check if a file is an image before I upload it to the image server.
I am doing it with the following function, which works exceptionally well:
static bool IsValidImage(Stream imageStream)
{
bool isValid = false;
try
{
// Read the image without validating image data
using (Image img = Image.FromStream(imageStream, false, false))
{
isValid = true;
}
}
catch
{
;
}
return isValid;
}
The problem is that when the below is called immediately afterwards, The line:
while ((bytesRead = request.FileByteStream.Read(buffer, 0, bufferSize)) > 0)
evalueates to zero and no bytes are read. I notice that when I remove the
IsValidImage function, bytes are read and the file is written. It seems
that bytes can only be read once? Any idea how to fix this?
using (FileStream outfile = new FileStream(filePath, FileMode.Create))
{
const int bufferSize = 65536; // 64K
int bytesRead = 0;
Byte[] buffer = new Byte[bufferSize];
while ((bytesRead = request.FileByteStream.Read(buffer, 0, bufferSize)) > 0)
{
outfile.Write(buffer, 0, bytesRead);
}
outfile.Close(); //necessary?
}
UPDATE: Thanks for your help Marc. I am new to stream manipulation and could use a little
more help here. I took a shot but may be mixing up the use of filestream and memorystream.
Would you mind taking a look? Thanks again.
using (FileStream outfile = new FileStream(filePath, FileMode.Create))
using (MemoryStream ms = new MemoryStream())
{
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = request.FileByteStream.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, bytesRead);
}
// ms now has a seekable/rewindable copy of the data
// TODO: read ms the first time
// I replaced request.FileByteStream with ms but am unsure about
// the using statement in the IsValidImage function.
if (!IsValidImage(ms) == true)
{
ms.Close();
request.FileByteStream.Close();
return;
}
ms.Position = 0;
// TODO: read ms the second time
byte[] m_buffer = new byte[ms.Length];
while ((bytesRead = ms.Read(m_buffer, 0, (int)ms.Length)) > 0)
{
outfile.Write(m_buffer, 0, bytesRead);
}
}
static bool IsValidImage(MemoryStream imageStream)
{
bool isValid = false;
try
{
// Read the image without validating image data
using (Image img = Image.FromStream(imageStream, false, false))
{
isValid = true;
}
}
catch
{
;
}
return isValid;
}
As you read from any stream, the position increases. If you read a stream to the end (as is typical), and then try to read again, then it will return EOF.
For some streams, you can seek - set the Position to 0, for example. However, you should try to avoid relying on this as it is not available for many streams (especially when network IO is involved). You can query this ability via CanSeek, but it would be simpler to avoid this - partly as if you are branching based on this, you suddenly have twice as much code to maintain.
If you need the data twice, then the options depends on the size of the data. For small streams, buffer it in-memory, as either a byte[] or a MemoryStream. For larger streams (or if you don't know the size) then writing to a scratch file (and deleting afterwards) is a reasonable approach. You can open and read the file as many times (in series, not in parallel) as you like.
If you are happy the stream isn't too large (although maybe add a cap to prevent people uploading swap-files, etc):
using (MemoryStream ms = new MemoryStream()) {
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = inputStream.Read(buffer, 0, buffer.Length)) > 0) {
ms.Write(buffer, 0, bytesRead);
}
// ms now has a seekable/rewindable copy of the data
// TODO: read ms the first time
ms.Position = 0;
// TODO: read ms the second time
}
Indeed Stream instances remember where the current "cursor" is. Some streams support "rewinding". The "CanSeek" property will then return true. In the case of a HTTP request Ithis won't work (CanSeek = false).
Isn't a MIME-type sent from the browser as well?
If you really want to keep your way of checking you'll have to go with Marc's proposition
In your update, you have a problem reading the stream a second time.
byte[] m_buffer = new byte[ms.Length];
while ((bytesRead = ms.Read(m_buffer, 0, (int)ms.Length)) > 0)
{
outfile.Write(m_buffer, 0, bytesRead);
}
The solution is simple:
byte[] m_buffer = ms.ToArray();
outfile.Write(m_buffer, 0, m_buffer.Length);
See also MemoryStream.ToArray
public static bool IsImagen(System.IO.Stream stream, String fileName)
{
try
{
using (Image img = Image.FromStream(stream, false, false))
{
if (fileName.ToLower().IndexOf(".jpg") > 0)
return true;
if (fileName.ToLower().IndexOf(".gif") > 0)
return true;
if (fileName.ToLower().IndexOf(".png") > 0)
return true;
}
}
catch (ArgumentException){}
return false;
}

Categories

Resources