I'm trying to unzip and read file from HttpWebResponse object with followed code:
using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
using (MemoryStream ms = new MemoryStream())
{
response.GetResponseStream().CopyTo(ms);
ms.Seek(0, SeekOrigin.Begin);
using (ZipArchive za = new ZipArchive(ms, ZipArchiveMode.Read))
{
foreach (ZipArchiveEntry zae in za.Entries)
{
using (StreamReader sr = new StreamReader(zae.Open(), Encoding.GetEncoding(1251), true, 2 << 18))
{
Console.WriteLine(sr.ReadLine());
}
}
}
}
but getting System.IO.InvalidDataException: End of Central Directory record could not be found.
What i'm doing wrong?
Here is kind of what I was thinking with GZipStream (just an example, no guarantees here)...
using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
using (GZipStream gzipStream = new GZipStream(response.GetResponseStream(), CompressionMode.Decompress))
using (StreamReader sr = new StreamReader(gzipStream))
{
Console.Write(sr.ReadToEnd());
}
Related
I will try to keep it short and precise.
Requirement:
Download large (400mb) xml response from 3rd party and store as ZipArchive on disk.
Current solution:
using (var memoryStream = new MemoryStream())
{
using (var archive = new ZipArchive(memoryStream, ZipArchiveMode.Create, true))
{
var file = archive.CreateEntry($"{deliveryDate:yyyyMMdd}.xml");
using(var entryStream = file.Open())
{
using (var payload = new MemoryStream())
{
using var response = await _httpClient.GetAsync(url, HttpCompletionOption.ResponseHeadersRead);
response.EnsureSuccessStatusCode();
await response.Content.CopyToAsync(payload);
payload.Seek(0, SeekOrigin.Begin);
await payload.CopyToAsync(entryStream);
}
}
}
using (var fileStream = new FileStream(Path.Combine(filePath), FileMode.Create, FileAccess.Write, FileShare.None))
{
memoryStream.Seek(0, SeekOrigin.Begin);
await memoryStream.CopyToAsync(fileStream);
}
}
Additional Information:
I can compress a 400mb file to approx. 20mb in about 40 seconds. 1/4 is download 3/4 is compression.
The httpClient is re-used.
The code runs in a long lived application hosted as a k8 linux pod.
Issues with current solution:
I fail to understand if this implementation will clean up after itself. I would be thankful for pointers towards potential leaks.
may be writing more directly to the filestream would be faster / cleaner
and the response should be disposed:
using System.IO.Compression;
string url = "https://stackoverflow.com/questions/70605408/better-way-to-process-large-httpclient-response-400mb-to-ziparchive";
string filePath = "test.zip";
using(HttpClient client = new HttpClient())
using (var fileStream = new FileStream(Path.Combine(filePath), FileMode.Create, FileAccess.Write, FileShare.None))
using (var archive = new ZipArchive(fileStream, ZipArchiveMode.Create, true))
{
var file = archive.CreateEntry($"test.xml");
using (var entryStream = file.Open())
using (var response = await client.GetAsync(url, HttpCompletionOption.ResponseHeadersRead))
{
response.EnsureSuccessStatusCode();
var stream = await response.Content.ReadAsStreamAsync();
await stream.CopyToAsync(entryStream);
}
}
Hi I am trying to make working this piece of code, after the copy of the word file template into a memory stream, read it and replace some text, I would convert the stream writer to byte array which will be used to download the result. Thanks in advance
public byte[] GetWordFile()
{
try
{
string sourceFile = Path.Combine("C:/[...]/somefile.docx");
using (MemoryStream inStream = new MemoryStream())
{
using (Stream fs = File.Open(sourceFile, FileMode.Open, FileAccess.Read))
{
fs.CopyTo(inStream);
}
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(inStream, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
docText = docText.Replace("numpol", "HAHAHHAHA");
using (MemoryStream outStream = new MemoryStream())
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
sw.Flush();
sw.BaseStream.CopyTo(outStream);
outStream.Position = 0;
return outStream.ToArray();
}
}
}
}
catch (Exception ex)
{
///...
}
}
I'm trying on c# to download a zip file from a webservice and extract an entry in the memory but when I try to read the stream how is in the documentation of the dotnetzip I get the exception "This stream does not support seek operations” in the "ZipFile.Read(stream)" part.
Somebody could tell me what I'm doing wrong? Thanks in advance
urlAuthentication="https://someurl/?login=foo&token=faa"
var request = (HttpWebRequest)WebRequest.Create(urlAuthentication);
request.Proxy = WebRequest.DefaultWebProxy;
request.Credentials = System.Net.CredentialCache.DefaultCredentials; ;
request.Proxy.Credentials = System.Net.CredentialCache.DefaultCredentials;
using (var ms = new MemoryStream())
{
using (var response = (HttpWebResponse)request.GetResponse())
{
using (var stream =response.GetResponseStream())
{
using (ZipFile zipout = ZipFile.Read(stream))
{
ZipEntry entry = zipout["file1.xml"];
entry.Extract(ms);
}
}
}
}
Apparently dotnetzip requires a stream to support seek operations and the response stream of a HttpWebResponse does not support seeking.
You can solve this issue by first downloading the entire file in memory, and then accessing it:
using (var ms = new MemoryStream())
{
using (MemoryStream seekable = new MemoryStream())
{
using (var stream = response.GetResponseStream())
{
int bytes;
byte[] buffer = new byte[1024];
while ((bytes = stream.Read(buffer, 0, buffer.Length)) > 0)
{
seekable.Write(buffer, 0, bytes);
}
}
seekable.Position = 0;
using (ZipFile zipout = ZipFile.Read(seekable))
{
ZipEntry entry = zipout["file1.xml"];
entry.Extract(ms);
}
}
// access ms
}
can someone tell me why I'm loosing information doing this process ? Some utf8 chars appears not decoded :
"Biography":"\u003clink type=... or Steve Blunt \u0026 Marty Kelley
but others do : "Name":"朱敬
// Creating a 64bit string containing gzip data
string bar;
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream gzip = new GZipStream(ms, CompressionMode.Compress))
using (StreamWriter writer = new StreamWriter(gzip, System.Text.Encoding.UTF8))
{
writer.Write(s);
}
ms.Flush();
bar = Convert.ToBase64String(ms.ToArray());
}
// Reading it
string foo;
byte[] itemData = Convert.FromBase64String(bar);
using (MemoryStream src = new MemoryStream(itemData))
using (GZipStream gzs = new GZipStream(src, CompressionMode.Decompress))
using (MemoryStream dest = new MemoryStream(itemData.Length*2))
{
gzs.CopyTo(dest);
foo = Encoding.UTF8.GetString(dest.ToArray());
}
Console.WriteLine(foo);
It could be because you are writing the string using StreamWriter but reading it using CopyTo() and Encoding.GetString().
What happens if you try this?
// Reading it
string foo;
byte[] itemData = Convert.FromBase64String(bar);
using (MemoryStream src = new MemoryStream(itemData))
using (GZipStream gzs = new GZipStream(src, CompressionMode.Decompress))
using (StreamReader reader = new StreamReader(gzs, Encoding.UTF8))
{
foo = reader.ReadLine();
}
Although I think you should be using BinaryReader and BinaryWriter:
string s = "Biography:\u003clink type...";
string bar;
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream gzip = new GZipStream(ms, CompressionMode.Compress))
using (var writer = new BinaryWriter(gzip, Encoding.UTF8))
{
writer.Write(s);
}
ms.Flush();
bar = Convert.ToBase64String(ms.ToArray());
}
// Reading it
string foo;
byte[] itemData = Convert.FromBase64String(bar);
using (MemoryStream src = new MemoryStream(itemData))
using (GZipStream gzs = new GZipStream(src, CompressionMode.Decompress))
using (var reader = new BinaryReader(gzs, Encoding.UTF8))
{
foo = reader.ReadString();
}
Console.WriteLine(foo);
The issue was simply that the characters were already encoded in the source string.
Ps : Credit goes to rik for this answer :)
Edit : I also had the StreamReader issue matthew-watson was suggesting.
I currently use the following code to retrieve and decompress string data from Amazon C#:
GetObjectRequest getObjectRequest = new GetObjectRequest().WithBucketName(bucketName).WithKey(key);
using (S3Response getObjectResponse = client.GetObject(getObjectRequest))
{
using (Stream s = getObjectResponse.ResponseStream)
{
using (GZipStream gzipStream = new GZipStream(s, CompressionMode.Decompress))
{
StreamReader Reader = new StreamReader(gzipStream, Encoding.Default);
string Html = Reader.ReadToEnd();
parseFile(Html);
}
}
}
I want to reverse this code so that I can compress and upload string data to S3 without being written to disk. I tried the following, but I am getting an Exception:
using (AmazonS3 client = Amazon.AWSClientFactory.CreateAmazonS3Client(AWSAccessKeyID, AWSSecretAccessKeyID))
{
string awsPath = AWSS3PrefixPath + "/" + keyName+ ".htm.gz";
byte[] buffer = Encoding.UTF8.GetBytes(content);
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress))
{
zip.Write(buffer, 0, buffer.Length);
PutObjectRequest request = new PutObjectRequest();
request.InputStream = ms;
request.Key = awsPath;
request.BucketName = AWSS3BuckenName;
using (S3Response putResponse = client.PutObject(request))
{
//process response
}
}
}
}
The exception I am getting is:
Cannot access a closed Stream.
What am I doing wrong?
EDIT:
The exception is occuring on the closing bracket of using (GZipStream zip
Stack trace:
at
System.IO.MemoryStream.Write(Byte[]
buffer, Int32 offset, Int32 count)
at
System.IO.Compression.DeflateStream.Dispose(Boolean
disposing) at
System.IO.Stream.Close() at
System.IO.Compression.GZipStream.Dispose(Boolean
disposing) at
System.IO.Stream.Close()
You need to flush and close the GZipStream and reset the Position of the MemoryStream to 0 before using it as input to the request:
MemoryStream ms = new MemoryStream();
using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true))
{
byte[] buffer = Encoding.UTF8.GetBytes(content);
zip.Write(buffer, 0, buffer.Length);
zip.Flush();
}
ms.Position = 0;
PutObjectRequest request = new PutObjectRequest();
request.InputStream = ms;
request.Key = AWSS3PrefixPath + "/" + keyName+ ".htm.gz";
request.BucketName = AWSS3BuckenName;
using (AmazonS3 client = Amazon.AWSClientFactory.CreateAmazonS3Client(
AWSAccessKeyID, AWSSecretAccessKeyID))
using (S3Response putResponse = client.PutObject(request))
{
//process response
}
It might also be possible to use the GZipStream as input if you first fill the MemoryStream with the data, but I've never tried this yet.