GzipStream blockwise compression results in a corrupted file - c#

I am trying to implement block-by-block compression using GzipStream class. .NET Core 3.1, Visual Studio 2019, Console App. My OS is Windows 10. I believe it should be possible because gzip files consist of independent blocks one after another as per format specification. But resulting files that I get are corrupted. Here's the code:
using System;
using System.Globalization;
using System.IO;
using System.IO.Compression;
namespace GzipToMemoryStreamExample
{
class Program
{
private static int blockSize;
private static string sourceFileName = "e:\\SomeFolder\\SomeFile.ext";
private static byte[] currentBlock;
private static FileStream readingStream;
private static FileStream writingStream;
static void Main(string[] args)
{
Console.WriteLine("Enter block size:");
string blockSizeStr = Console.ReadLine();
int.TryParse(blockSizeStr, out blockSize);
readingStream = new FileStream(sourceFileName, FileMode.Open);
string resultingFileName = Path.ChangeExtension(sourceFileName, ".gz");
CreateAndOpenResultingFile(resultingFileName);
while (ReadBlock())
{
byte[] processedBlock = ProcessBlock(currentBlock);
writingStream.Write(processedBlock, 0, processedBlock.Length);
}
readingStream.Dispose();
writingStream.Dispose();
Console.WriteLine("Finished.");
Console.ReadKey();
}
private static bool ReadBlock()
{
bool result;
int bytesRead;
currentBlock = new byte[blockSize];
bytesRead = readingStream.Read(currentBlock, 0, blockSize);
result = bytesRead > 0;
return result;
}
private static byte[] ProcessBlock(byte[] sourceData)
{
byte[] result;
using (var outputStream = new MemoryStream())
{
using var compressionStream = new GZipStream(outputStream, CompressionMode.Compress);
compressionStream.Write(sourceData, 0, sourceData.Length);
result = outputStream.ToArray();
}
return result;
}
private static void CreateAndOpenResultingFile(string fileName)
{
if (File.Exists(fileName))
{
File.Delete(fileName);
}
writingStream = File.Create(fileName);
}
}
}
When I look at resulting files I see that result somehow depends on block size I choose. if it's smaller than ~100 Kb, resulting "compressed" blocks are of 10 bytes size each, which leads to extremely small useless file. If size of block is greater than ~100 Kb, then the size of file becomes reasonably large, about 80% of the original, but still corrupted.
Also I checked the block headers and it turns out they're strange. OS is set to TOPS-20 (0x0a value), ISIZE at the end of block is always totally wrong.
What is my mistake?

It's solved just with moving result = outputStream.ToArray(); line out of the compressionStream using scope as Mark Adler suggested in the comments.
private static byte[] ProcessBlock(byte[] sourceData)
{
byte[] result;
using (var outputStream = new MemoryStream())
{
using (var compressionStream = new GZipStream(outputStream, CompressionMode.Compress))
{
compressionStream.Write(sourceData, 0, sourceData.Length);
}
result = outputStream.ToArray();
}
return result;
}

Related

Base64 to ZIP Base64 without writing to disk [duplicate]

I am newbie in .net. I am doing compression and decompression string in C#. There is a XML and I am converting in string and after that I am doing compression and decompression.There is no compilation error in my code except when I decompression my code and return my string, its returning only half of the XML.
Below is my code, please correct me where I am wrong.
Code:
class Program
{
public static string Zip(string value)
{
//Transform string into byte[]
byte[] byteArray = new byte[value.Length];
int indexBA = 0;
foreach (char item in value.ToCharArray())
{
byteArray[indexBA++] = (byte)item;
}
//Prepare for compress
System.IO.MemoryStream ms = new System.IO.MemoryStream();
System.IO.Compression.GZipStream sw = new System.IO.Compression.GZipStream(ms, System.IO.Compression.CompressionMode.Compress);
//Compress
sw.Write(byteArray, 0, byteArray.Length);
//Close, DO NOT FLUSH cause bytes will go missing...
sw.Close();
//Transform byte[] zip data to string
byteArray = ms.ToArray();
System.Text.StringBuilder sB = new System.Text.StringBuilder(byteArray.Length);
foreach (byte item in byteArray)
{
sB.Append((char)item);
}
ms.Close();
sw.Dispose();
ms.Dispose();
return sB.ToString();
}
public static string UnZip(string value)
{
//Transform string into byte[]
byte[] byteArray = new byte[value.Length];
int indexBA = 0;
foreach (char item in value.ToCharArray())
{
byteArray[indexBA++] = (byte)item;
}
//Prepare for decompress
System.IO.MemoryStream ms = new System.IO.MemoryStream(byteArray);
System.IO.Compression.GZipStream sr = new System.IO.Compression.GZipStream(ms,
System.IO.Compression.CompressionMode.Decompress);
//Reset variable to collect uncompressed result
byteArray = new byte[byteArray.Length];
//Decompress
int rByte = sr.Read(byteArray, 0, byteArray.Length);
//Transform byte[] unzip data to string
System.Text.StringBuilder sB = new System.Text.StringBuilder(rByte);
//Read the number of bytes GZipStream red and do not a for each bytes in
//resultByteArray;
for (int i = 0; i < rByte; i++)
{
sB.Append((char)byteArray[i]);
}
sr.Close();
ms.Close();
sr.Dispose();
ms.Dispose();
return sB.ToString();
}
static void Main(string[] args)
{
XDocument doc = XDocument.Load(#"D:\RSP.xml");
string val = doc.ToString(SaveOptions.DisableFormatting);
val = Zip(val);
val = UnZip(val);
}
}
My XML size is 63KB.
The code to compress/decompress a string
public static void CopyTo(Stream src, Stream dest) {
byte[] bytes = new byte[4096];
int cnt;
while ((cnt = src.Read(bytes, 0, bytes.Length)) != 0) {
dest.Write(bytes, 0, cnt);
}
}
public static byte[] Zip(string str) {
var bytes = Encoding.UTF8.GetBytes(str);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream()) {
using (var gs = new GZipStream(mso, CompressionMode.Compress)) {
//msi.CopyTo(gs);
CopyTo(msi, gs);
}
return mso.ToArray();
}
}
public static string Unzip(byte[] bytes) {
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream()) {
using (var gs = new GZipStream(msi, CompressionMode.Decompress)) {
//gs.CopyTo(mso);
CopyTo(gs, mso);
}
return Encoding.UTF8.GetString(mso.ToArray());
}
}
static void Main(string[] args) {
byte[] r1 = Zip("StringStringStringStringStringStringStringStringStringStringStringStringStringString");
string r2 = Unzip(r1);
}
Remember that Zip returns a byte[], while Unzip returns a string. If you want a string from Zip you can Base64 encode it (for example by using Convert.ToBase64String(r1)) (the result of Zip is VERY binary! It isn't something you can print to the screen or write directly in an XML)
The version suggested is for .NET 2.0, for .NET 4.0 use the MemoryStream.CopyTo.
IMPORTANT: The compressed contents cannot be written to the output stream until the GZipStream knows that it has all of the input (i.e., to effectively compress it needs all of the data). You need to make sure that you Dispose() of the GZipStream before inspecting the output stream (e.g., mso.ToArray()). This is done with the using() { } block above. Note that the GZipStream is the innermost block and the contents are accessed outside of it. The same goes for decompressing: Dispose() of the GZipStream before attempting to access the data.
according to
this snippet
i use this code and it's working fine:
using System;
using System.IO;
using System.IO.Compression;
using System.Text;
namespace CompressString
{
internal static class StringCompressor
{
/// <summary>
/// Compresses the string.
/// </summary>
/// <param name="text">The text.</param>
/// <returns></returns>
public static string CompressString(string text)
{
byte[] buffer = Encoding.UTF8.GetBytes(text);
var memoryStream = new MemoryStream();
using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress, true))
{
gZipStream.Write(buffer, 0, buffer.Length);
}
memoryStream.Position = 0;
var compressedData = new byte[memoryStream.Length];
memoryStream.Read(compressedData, 0, compressedData.Length);
var gZipBuffer = new byte[compressedData.Length + 4];
Buffer.BlockCopy(compressedData, 0, gZipBuffer, 4, compressedData.Length);
Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gZipBuffer, 0, 4);
return Convert.ToBase64String(gZipBuffer);
}
/// <summary>
/// Decompresses the string.
/// </summary>
/// <param name="compressedText">The compressed text.</param>
/// <returns></returns>
public static string DecompressString(string compressedText)
{
byte[] gZipBuffer = Convert.FromBase64String(compressedText);
using (var memoryStream = new MemoryStream())
{
int dataLength = BitConverter.ToInt32(gZipBuffer, 0);
memoryStream.Write(gZipBuffer, 4, gZipBuffer.Length - 4);
var buffer = new byte[dataLength];
memoryStream.Position = 0;
using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Decompress))
{
gZipStream.Read(buffer, 0, buffer.Length);
}
return Encoding.UTF8.GetString(buffer);
}
}
}
}
With the advent of .NET 4.0 (and higher) with the Stream.CopyTo() methods, I thought I would post an updated approach.
I also think the below version is useful as a clear example of a self-contained class for compressing regular strings to Base64 encoded strings, and vice versa:
public static class StringCompression
{
/// <summary>
/// Compresses a string and returns a deflate compressed, Base64 encoded string.
/// </summary>
/// <param name="uncompressedString">String to compress</param>
public static string Compress(string uncompressedString)
{
byte[] compressedBytes;
using (var uncompressedStream = new MemoryStream(Encoding.UTF8.GetBytes(uncompressedString)))
{
using (var compressedStream = new MemoryStream())
{
// setting the leaveOpen parameter to true to ensure that compressedStream will not be closed when compressorStream is disposed
// this allows compressorStream to close and flush its buffers to compressedStream and guarantees that compressedStream.ToArray() can be called afterward
// although MSDN documentation states that ToArray() can be called on a closed MemoryStream, I don't want to rely on that very odd behavior should it ever change
using (var compressorStream = new DeflateStream(compressedStream, CompressionLevel.Fastest, true))
{
uncompressedStream.CopyTo(compressorStream);
}
// call compressedStream.ToArray() after the enclosing DeflateStream has closed and flushed its buffer to compressedStream
compressedBytes = compressedStream.ToArray();
}
}
return Convert.ToBase64String(compressedBytes);
}
/// <summary>
/// Decompresses a deflate compressed, Base64 encoded string and returns an uncompressed string.
/// </summary>
/// <param name="compressedString">String to decompress.</param>
public static string Decompress(string compressedString)
{
byte[] decompressedBytes;
var compressedStream = new MemoryStream(Convert.FromBase64String(compressedString));
using (var decompressorStream = new DeflateStream(compressedStream, CompressionMode.Decompress))
{
using (var decompressedStream = new MemoryStream())
{
decompressorStream.CopyTo(decompressedStream);
decompressedBytes = decompressedStream.ToArray();
}
}
return Encoding.UTF8.GetString(decompressedBytes);
}
}
Here’s another approach using the extension methods technique to extend the String class to add string compression and decompression. You can drop the class below into an existing project and then use thusly:
var uncompressedString = "Hello World!";
var compressedString = uncompressedString.Compress();
and
var decompressedString = compressedString.Decompress();
To wit:
public static class Extensions
{
/// <summary>
/// Compresses a string and returns a deflate compressed, Base64 encoded string.
/// </summary>
/// <param name="uncompressedString">String to compress</param>
public static string Compress(this string uncompressedString)
{
byte[] compressedBytes;
using (var uncompressedStream = new MemoryStream(Encoding.UTF8.GetBytes(uncompressedString)))
{
using (var compressedStream = new MemoryStream())
{
// setting the leaveOpen parameter to true to ensure that compressedStream will not be closed when compressorStream is disposed
// this allows compressorStream to close and flush its buffers to compressedStream and guarantees that compressedStream.ToArray() can be called afterward
// although MSDN documentation states that ToArray() can be called on a closed MemoryStream, I don't want to rely on that very odd behavior should it ever change
using (var compressorStream = new DeflateStream(compressedStream, CompressionLevel.Fastest, true))
{
uncompressedStream.CopyTo(compressorStream);
}
// call compressedStream.ToArray() after the enclosing DeflateStream has closed and flushed its buffer to compressedStream
compressedBytes = compressedStream.ToArray();
}
}
return Convert.ToBase64String(compressedBytes);
}
/// <summary>
/// Decompresses a deflate compressed, Base64 encoded string and returns an uncompressed string.
/// </summary>
/// <param name="compressedString">String to decompress.</param>
public static string Decompress(this string compressedString)
{
byte[] decompressedBytes;
var compressedStream = new MemoryStream(Convert.FromBase64String(compressedString));
using (var decompressorStream = new DeflateStream(compressedStream, CompressionMode.Decompress))
{
using (var decompressedStream = new MemoryStream())
{
decompressorStream.CopyTo(decompressedStream);
decompressedBytes = decompressedStream.ToArray();
}
}
return Encoding.UTF8.GetString(decompressedBytes);
}
}
I like #fubo's answer the best but I think this is much more elegant.
This method is more compatible because it doesn't manually store the length up front.
Also I've exposed extensions to support compression for string to string, byte[] to byte[], and Stream to Stream.
public static class ZipExtensions
{
public static string CompressToBase64(this string data)
{
return Convert.ToBase64String(Encoding.UTF8.GetBytes(data).Compress());
}
public static string DecompressFromBase64(this string data)
{
return Encoding.UTF8.GetString(Convert.FromBase64String(data).Decompress());
}
public static byte[] Compress(this byte[] data)
{
using (var sourceStream = new MemoryStream(data))
using (var destinationStream = new MemoryStream())
{
sourceStream.CompressTo(destinationStream);
return destinationStream.ToArray();
}
}
public static byte[] Decompress(this byte[] data)
{
using (var sourceStream = new MemoryStream(data))
using (var destinationStream = new MemoryStream())
{
sourceStream.DecompressTo(destinationStream);
return destinationStream.ToArray();
}
}
public static void CompressTo(this Stream stream, Stream outputStream)
{
using (var gZipStream = new GZipStream(outputStream, CompressionMode.Compress))
{
stream.CopyTo(gZipStream);
gZipStream.Flush();
}
}
public static void DecompressTo(this Stream stream, Stream outputStream)
{
using (var gZipStream = new GZipStream(stream, CompressionMode.Decompress))
{
gZipStream.CopyTo(outputStream);
}
}
}
This is an updated version for .NET 4.5 and newer using async/await and IEnumerables:
public static class CompressionExtensions
{
public static async Task<IEnumerable<byte>> Zip(this object obj)
{
byte[] bytes = obj.Serialize();
using (MemoryStream msi = new MemoryStream(bytes))
using (MemoryStream mso = new MemoryStream())
{
using (var gs = new GZipStream(mso, CompressionMode.Compress))
await msi.CopyToAsync(gs);
return mso.ToArray().AsEnumerable();
}
}
public static async Task<object> Unzip(this byte[] bytes)
{
using (MemoryStream msi = new MemoryStream(bytes))
using (MemoryStream mso = new MemoryStream())
{
using (var gs = new GZipStream(msi, CompressionMode.Decompress))
{
// Sync example:
//gs.CopyTo(mso);
// Async way (take care of using async keyword on the method definition)
await gs.CopyToAsync(mso);
}
return mso.ToArray().Deserialize();
}
}
}
public static class SerializerExtensions
{
public static byte[] Serialize<T>(this T objectToWrite)
{
using (MemoryStream stream = new MemoryStream())
{
BinaryFormatter binaryFormatter = new BinaryFormatter();
binaryFormatter.Serialize(stream, objectToWrite);
return stream.GetBuffer();
}
}
public static async Task<T> _Deserialize<T>(this byte[] arr)
{
using (MemoryStream stream = new MemoryStream())
{
BinaryFormatter binaryFormatter = new BinaryFormatter();
await stream.WriteAsync(arr, 0, arr.Length);
stream.Position = 0;
return (T)binaryFormatter.Deserialize(stream);
}
}
public static async Task<object> Deserialize(this byte[] arr)
{
object obj = await arr._Deserialize<object>();
return obj;
}
}
With this you can serialize everything BinaryFormatter supports, instead only of strings.
Edit:
In case, you need take care of Encoding, you could just use Convert.ToBase64String(byte[])...
Take a look at this answer if you need an example!
For those who still getting The magic number in GZip header is not correct. Make sure you are passing in a GZip stream. ERROR
and if your string was zipped using php you'll need to do something like:
public static string decodeDecompress(string originalReceivedSrc) {
byte[] bytes = Convert.FromBase64String(originalReceivedSrc);
using (var mem = new MemoryStream()) {
//the trick is here
mem.Write(new byte[] { 0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00 }, 0, 8);
mem.Write(bytes, 0, bytes.Length);
mem.Position = 0;
using (var gzip = new GZipStream(mem, CompressionMode.Decompress))
using (var reader = new StreamReader(gzip)) {
return reader.ReadToEnd();
}
}
}
We can reduce code complexity by using StreamReader and StreamWriter rather than manually converting strings to byte arrays. Three streams is all you need:
public static byte[] Zip(string uncompressed)
{
byte[] ret;
using (var outputMemory = new MemoryStream())
{
using (var gz = new GZipStream(outputMemory, CompressionLevel.Optimal))
{
using (var sw = new StreamWriter(gz, Encoding.UTF8))
{
sw.Write(uncompressed);
}
}
ret = outputMemory.ToArray();
}
return ret;
}
public static string Unzip(byte[] compressed)
{
string ret = null;
using (var inputMemory = new MemoryStream(compressed))
{
using (var gz = new GZipStream(inputMemory, CompressionMode.Decompress))
{
using (var sr = new StreamReader(gz, Encoding.UTF8))
{
ret = sr.ReadToEnd();
}
}
}
return ret;
}
For .net6 cross platform Compression/Decompression string with C# using SharpZipLib library. Test for ubuntu(18.0.x) and windows.
#region helper
private byte[] Zip(string text)
{
if (text == null)
return null;
byte[] ret;
using (var outputMemory = new MemoryStream())
{
using (var gz = new GZipStream(outputMemory, CompressionLevel.Optimal))
{
using (var sw = new StreamWriter(gz, Encoding.UTF8))
{
sw.Write(text);
}
}
ret = outputMemory.ToArray();
}
return ret;
}
private string Unzip(byte[] bytes)
{
string ret = null;
using (var inputMemory = new MemoryStream(bytes))
{
using (var gz = new GZipStream(inputMemory, CompressionMode.Decompress))
{
using (var sr = new StreamReader(gz, Encoding.UTF8))
{
ret = sr.ReadToEnd();
}
}
}
return ret;
}
#endregion

Decompress string in java from compressed string in C#

I was searching for the correct solution to decompress the string in java coming from c# code.I tried myself with lot of techniques in java like(gzip,inflatter etc.).but didn't get the solution.i got some error while trying to decompress the string in java from compressed string from c# code.
My C# code to compress the string is,
public static string CompressString(string text)
{
byte[] byteArray = Encoding.GetEncoding(1252).GetBytes(text);// Encoding.ASCII.GetBytes(text);
using (var ms = new MemoryStream())
{
// Compress the text
using (var ds = new DeflateStream(ms, CompressionMode.Compress))
{
ds.Write(byteArray, 0, byteArray.Length);
}
return Convert.ToBase64String(ms.ToArray());
}
}
And decompress the string in java using,
private static void compressAndDecompress(){
try {
// Encode a String into bytes
String string = "xxxxxxSAMPLECOMPRESSEDSTRINGxxxxxxxxxx";
// // Compress the bytes
byte[] decoded = Base64.decodeBase64(string.getBytes());
byte[] output = new byte[4096];
// Decompress the bytes
Inflater decompresser = new Inflater();
decompresser.setInput(decoded);
int resultLength = decompresser.inflate(output);
decompresser.end();
// Decode the bytes into a String
String outputString = new String(output, 0, resultLength, "UTF-8");
System.out.println(outputString);
} catch(java.io.UnsupportedEncodingException ex) {
ex.printStackTrace();
} catch (java.util.zip.DataFormatException ex) {
ex.printStackTrace();
}
}
I get this exception when running the above code:
java.util.zip.DataFormatException: incorrect header check
Kindly give me the sample code in java to decompress the string java.Thanks
My C# code to compress is
private string Compress(string text)
{
byte[] buffer = Encoding.UTF8.GetBytes(text);
MemoryStream ms = new MemoryStream();
using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true))
{
zip.Write(buffer, 0, buffer.Length);
}
ms.Position = 0;
MemoryStream outStream = new MemoryStream();
byte[] compressed = new byte[ms.Length];
ms.Read(compressed, 0, compressed.Length);
byte[] gzBuffer = new byte[compressed.Length + 4];
System.Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
System.Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
return Convert.ToBase64String(gzBuffer);
}
Java code to decompress the text is
private String Decompress(String compressedText)
{
byte[] compressed = compressedText.getBytes("UTF8");
compressed = org.apache.commons.codec.binary.Base64.decodeBase64(compressed);
byte[] buffer=new byte[compressed.length-4];
buffer = copyForDecompression(compressed,buffer, 4, 0);
final int BUFFER_SIZE = 32;
ByteArrayInputStream is = new ByteArrayInputStream(buffer);
GZIPInputStream gis = new GZIPInputStream(is, BUFFER_SIZE);
StringBuilder string = new StringBuilder();
byte[] data = new byte[BUFFER_SIZE];
int bytesRead;
while ((bytesRead = gis.read(data)) != -1)
{
string.append(new String(data, 0, bytesRead));
}
gis.close();
is.close();
return string.toString();
}
private byte[] copyForDecompression(byte[] b1,byte[] b2,int srcoffset,int dstoffset)
{
for(int i=0;i<b2.length && i<b1.length;i++)
{
b2[i]=b1[i+4];
}
return b2;
}
This code works perfectly fine for me.
Had exactly the same issue. Could solve it via
byte[] compressed = Base64Utils.decodeFromString("mybase64encodedandwithc#zippedcrap");
Inflater decompresser = new Inflater(true);
decompresser.setInput(compressed);
byte[] result = new byte[4096];
decompresser.inflate(result);
decompresser.end();
System.out.printf(new String(result));
The magic happens with the boolen parameter on instantiating the Inflator
BW Hubert
For beloved googlers,
As #dbw mentioned,
according to post How to decompress stream deflated with java.util.zip.Deflater in .NET?,
java.util.zip.deflater equivalent in c# the default deflater used in C#
is not having any java equivalent that's why users prefer Gzip, Ziplib
or some other zip techniques.
a relatively simple method would be using GZip.
And for the accepted answer, one problem is that in this method you should append the data size to the compressed string yourself, and more importantly as per my own experience in our production app, It is buggy when the string reaches ~2000 chars!
the bug is in the System.io.Compression.GZipStream
any way using SharpZipLib in c# the problem goes away and everything would be as simple as following snippets:
JAVA:
import android.util.Base64;
import com.google.android.gms.common.util.IOUtils;
import org.jetbrains.annotations.Nullable;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
public class CompressionHelper {
#Nullable
public static String compress(#Nullable String data) {
if(data == null || data.length() == 0)
return null;
try {
// Create an output stream, and a gzip stream to wrap over.
ByteArrayOutputStream bos = new ByteArrayOutputStream(data.length());
GZIPOutputStream gzip = new GZIPOutputStream(bos);
// Compress the input string
gzip.write(data.getBytes());
gzip.close();
byte[] compressed;
// Convert to base64
compressed = Base64.encode(bos.toByteArray(),Base64.NO_WRAP);
bos.close();
// return the newly created string
return new String(compressed);
} catch(IOException e) {
return null;
}
}
#Nullable
public static String decompress(#Nullable String compressedText) {
if(compressedText == null || compressedText.length() == 0)
return null;
try {
// get the bytes for the compressed string
byte[] compressed = compressedText.getBytes("UTF-8");
// convert the bytes from base64 to normal string
compressed = Base64.decode(compressed, Base64.NO_WRAP);
ByteArrayInputStream bis = new ByteArrayInputStream(compressed);
GZIPInputStream gis = new GZIPInputStream(bis);
byte[] bytes = IOUtils.toByteArray(gis);
return new String(bytes, "UTF-8");
}catch (IOException e){
e.printStackTrace();
}
return null;
}
}
and c#:
using ICSharpCode.SharpZipLib.GZip; //PM> Install-Package SharpZipLib
using System;
using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace GeneralTools
{
public static class CompressionTools
{
public static string CompressString(string text)
{
if (string.IsNullOrEmpty(text))
return null;
byte[] buffer = Encoding.UTF8.GetBytes(text);
using (var compressedStream = new MemoryStream())
{
GZip.Compress(new MemoryStream(buffer), compressedStream, false);
byte[] compressedData = compressedStream.ToArray();
return Convert.ToBase64String(compressedData);
}
}
public static string DecompressString(string compressedText)
{
if (string.IsNullOrEmpty(compressedText))
return null;
byte[] gZipBuffer = Convert.FromBase64String(compressedText);
using (var memoryStream = new MemoryStream())
{
using (var compressedStream = new MemoryStream(gZipBuffer))
{
var decompressedStream = new MemoryStream();
GZip.Decompress(compressedStream, decompressedStream, false);
return Encoding.UTF8.GetString(decompressedStream.ToArray()).Trim();
}
}
}
}
}
you may also find the codes here
If anyone still interested, here's my full solution with outputstream to handle unknown string size. Using C# DeflateStream and Java Inflater (based on Hubert Ströbitzer answer).
C# Compression:
string CompressString(string raw)
{
byte[] uncompressedData = Encoding.UTF8.GetBytes(raw);
MemoryStream output = new MemoryStream();
using (DeflateStream dStream = new DeflateStream(output, CompressionLevel.Optimal))
{
dStream.Write(uncompressedData, 0, uncompressedData.Length);
}
string compressedString = Convert.ToBase64String(output.ToArray());
return compressedString;
}
Java decompress:
String decompressString(String compressedString) {
byte[] compressed = Base64Utils.decodeFromString(compressedString);
Inflater inflater = new Inflater(true);
inflater.setInput(compressed);
//Using output stream to handle unknown size of decompressed string
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
try {
while(!inflater.finished()){
int count = inflater.inflate(buffer);
outputStream.write(buffer, 0, count);
}
inflater.end();
outputStream.close();
} catch (DataFormatException e) {
//Handle DataFormatException
} catch (IOException e) {
//Handle IOException
}
return outputStream.toString();
}

ASP .Net C#: Get Binaries from fileupoad to generate XML

I am currently at the end of my program but am coming into a section that I have NO experience in. I've been able to convert the PDF to a Base64 in other methods but those are not permitted as per the instructions I've been given. I've posted my code below in the hopes that someone can maybe get me started in the right direction. I do not have any clue how to put in the volumes either so ANY assistance would be awesome! Contained is the Get Binaries and the IO class which it reads.
Default.cs.aspx
private static List<Binary> GetBinaries()
{
return new List<Binary>
{
new Binary
{
//hardcoded but need to call from fileUpload1
BinaryBase64Object = IO.ReadFromFile(#"..\..\EFACTS eRecord Technical Specification.pdf"), **<-- How do I get this to read from fileupload1**
BinaryID = "BIN1234", //hardcoded
BinarySizeValue = 56443, //hardcoded
FileName = " test.my.pdf", //hardcoded
PageRange = "23-89", //hardcoded
NoOfPages = 14, //hardcoded
TotalVolumes = 1, //hardcoded
Volume = 1 //hardcoded
}
};
}
IO.cs
using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Electronic_Filing_of_Appeals;
namespace Electronic_Filing_of_Appeals
{
public static class IO
{
public static void ReadWriteStream(MemoryStream readStream, Stream writeStream)
{
using (writeStream)
{
int Length = 256;
Byte[] buffer = new Byte[Length];
readStream.Position = 0;
int bytesRead = readStream.Read(buffer, 0, Length);
// write the required bytes
while (bytesRead > 0)
{
writeStream.Write(buffer, 0, bytesRead);
bytesRead = readStream.Read(buffer, 0, Length);
}
readStream.Close();
writeStream.Close();
}
}
public static MemoryStream FileToMemoryStream(string filename)
{
FileStream inStream = System.IO.File.OpenRead(filename);
MemoryStream outStream = new MemoryStream();
outStream.SetLength(inStream.Length);
inStream.Read(outStream.GetBuffer(), 0, (int)inStream.Length);
outStream.Flush();
inStream.Close();
return outStream;
}
public static MemoryStream ConvertStreamToMemoryStream(Stream stream)
{
MemoryStream memoryStream = new MemoryStream();
if (stream != null)
{
byte[] buffer = stream.ReadFully();
if (buffer != null)
{
var binaryWriter = new BinaryWriter(memoryStream);
binaryWriter.Write(buffer);
}
}
return memoryStream;
}
public static byte[] ReadFromFile(string filePath)
{
FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
byte[] fileRD = br.ReadBytes((int)fs.Length);
br.Close();
fs.Close();
return fileRD;
}
public static void SaveToFile(byte[] byteData, string fileName)
{
FileStream fs = new FileStream(fileName, FileMode.Create);
fs.Write(byteData, 0, byteData.Length);
fs.Close();
}
public static byte[] ReadFully(this Stream input)
{
byte[] buffer = new byte[16 * 1024];
using (MemoryStream ms = new MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
}
}
}
You should ask for some more assistance. There are probably some library for reading PDF-files already in use. Try to find out what it is, and read the code to see how it is used.
If this is the first time in the project anyone had to read PDF-files, you will have to adopt some library. Writing one on your own is very time-consuming. What you are looking for is a library that can read the meta-data from the PDF.
Two such libraries are iTextSharp and PDFSharp. Here are an example with PDFSharp:
var doc = PdfReader.Open(#"icpc_briefing_1_sep_10.pdf");
Console.WriteLine("Number of pages: {0}", doc.PageCount);
Console.WriteLine();
Console.WriteLine("=== INFO ===");
foreach (KeyValuePair<string, PdfItem> pair in doc.Info)
{
Console.WriteLine("{0}: {1}", pair.Key, pair.Value);
}
byte[] xmlData = GetMetadata(doc);
if (xmlData != null)
{
Console.WriteLine();
Console.WriteLine("=== XMP ===");
Console.Write(Encoding.UTF8.GetString(xmlData));
Console.WriteLine();
}
byte[] GetMetadata(PdfDocument doc)
{
// PdfSharp does not have direct support for the XML metadata, but it does
// allow you to go poking into the internal structure.
PdfItem metadataItem;
if (doc.Internals.Catalog.Elements.TryGetValue("/Metadata", out metadataItem))
{
var metadataRef = (PdfReference) metadataItem;
var metadata = (PdfDictionary) metadataRef.Value;
return metadata.Stream.Value;
}
return null;
}
Example output:
Number of pages: 27
=== INFO ===
/CreationDate: D:20100910191635+08'00'
/Author: Steven Halim
/Creator: PScript5.dll Version 5.2.2
/Producer: Acrobat Distiller 8.1.0 (Windows)
/ModDate: D:20130610173442+02'00'
/Title: Microsoft PowerPoint - icpc_briefing_1_sep_10 [Compatibility Mode]
=== XMP ===
<?xpacket begin="?" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.0-c316 44.253921, Sun Oct 01 2006 17:14:39">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:xap="http://ns.adobe.com/xap/1.0/">
<xap:CreatorTool>PScript5.dll Version 5.2.2</xap:CreatorTool>
<xap:ModifyDate>2010-09-10T19:16:35+08:00</xap:ModifyDate>
<xap:CreateDate>2010-09-10T19:16:35+08:00</xap:CreateDate>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/pdf</dc:format>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">Microsoft PowerPoint - icpc_briefing_1_sep_10 [Compatibility Mode]</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Seq>
<rdf:li>Steven Halim</rdf:li>
</rdf:Seq>
</dc:creator>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
<pdf:Producer>Acrobat Distiller 8.1.0 (Windows)</pdf:Producer>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/">
<xapMM:DocumentID>uuid:b5ffc85b-6578-48fb-bddf-a5215ffa9d35</xapMM:DocumentID>
<xapMM:InstanceID>uuid:f45826a4-8c58-4a54-bac1-7f3192866d57</xapMM:InstanceID>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
Here are some more examples: http://www.pdfsharp.net/wiki/PDFsharpSamples.ashx
To get the data from a <asp:FileUpload> control, you just need to access FileUpload.FileBytes (byte[]) or FileUpload.FileContent (Stream). Alternatively, you can access FileUpload.PostedFile or Request.Files for more information.
var doc = PdfReader.Open(FileUpload1.FileContent);

Partially download and serialize big file in C#?

As part of an upcoming project at my university, I need to write a client that downloads a media file from a server and writes it to the local disk. Since these files can be very large, I need to implement partial download and serialization in order to avoid excessive memory use.
What I came up with:
namespace PartialDownloadTester
{
using System;
using System.Diagnostics.Contracts;
using System.IO;
using System.Net;
using System.Text;
public class DownloadClient
{
public static void Main(string[] args)
{
var dlc = new DownloadClient(args[0], args[1], args[2]);
dlc.DownloadAndSaveToDisk();
Console.ReadLine();
}
private WebRequest request;
// directory of file
private string dir;
// full file identifier
private string filePath;
public DownloadClient(string uri, string fileName, string fileType)
{
this.request = WebRequest.Create(uri);
this.request.Method = "GET";
var sb = new StringBuilder();
sb.Append("C:\\testdata\\DownloadedData\\");
this.dir = sb.ToString();
sb.Append(fileName + "." + fileType);
this.filePath = sb.ToString();
}
public void DownloadAndSaveToDisk()
{
// make sure directory exists
this.CreateDir();
var response = (HttpWebResponse)request.GetResponse();
Console.WriteLine("Content length: " + response.ContentLength);
var rStream = response.GetResponseStream();
int bytesRead = -1;
do
{
var buf = new byte[2048];
bytesRead = rStream.Read(buf, 0, buf.Length);
rStream.Flush();
this.SerializeFileChunk(buf);
}
while (bytesRead != 0);
}
private void CreateDir()
{
if (!Directory.Exists(dir))
{
Directory.CreateDirectory(dir);
}
}
private void SerializeFileChunk(byte[] bytes)
{
Contract.Requires(!Object.ReferenceEquals(bytes, null));
FileStream fs = File.Open(filePath, FileMode.Append);
fs.Write(bytes, 0, bytes.Length);
fs.Flush();
fs.Close();
}
}
}
For testing purposes, I've used the following parameters:
"http://itu.dk/people/janv/mufc_abc.jpg" "mufc_abc" "jpg"
However, the picture is incomplete (only the first ~10% look right) even though the content length prints 63780 which is the actual size of the image.
So my questions are:
Is this the right way to go for partial download and serialization or is there a better/easier approach?
Is the full content of the response stream stored in client memory? If this is the case, do I need to use HttpWebRequest.AddRange to partially download data from the server in order to conserve my client's memory?
How come the serialization fails and I get a broken image?
Do I introduce a lot of overhead when I use the FileMode.Append? (msdn states that this option "seeks to the end of the file")
Thanks in advance
You could definitely simplify your code using a WebClient:
class Program
{
static void Main()
{
DownloadClient("http://itu.dk/people/janv/mufc_abc.jpg", "mufc_abc.jpg");
}
public static void DownloadClient(string uri, string fileName)
{
using (var client = new WebClient())
{
using (var stream = client.OpenRead(uri))
{
// work with chunks of 2KB => adjust if necessary
const int chunkSize = 2048;
var buffer = new byte[chunkSize];
using (var output = File.OpenWrite(fileName))
{
int bytesRead;
while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
}
}
}
}
Notice how I am writing only the number of bytes I have actually read from the socket to the output file and not the entire 2KB buffer.
I don't know if this is the source of the problem, however I would change the loop like this
const int ChunkSize = 2048;
var buf = new byte[ChunkSize];
var rStream = response.GetResponseStream();
do {
int bytesRead = rStream.Read(buf, 0, ChunkSize);
if (bytesRead > 0) {
this.SerializeFileChunk(buf, bytesRead);
}
} while (bytesRead == ChunkSize);
The serialize method would get an additional argument
private void SerializeFileChunk(byte[] bytes, int numBytes)
and then write the right number of bytes
fs.Write(bytes, 0, numBytes);
UPDATE:
I do not see the need for closing and reopening the file each time. I also would use the using statement, which closes the resources, even if an exception should occur. The using statement calls the Dispose() method of the resource at the end, which in turn calls Close() in the case of file streams. using can be applied to all types implementing IDisposable.
var buf = new byte[2048];
using (var rStream = response.GetResponseStream()) {
using (FileStream fs = File.Open(filePath, FileMode.Append)) {
do {
bytesRead = rStream.Read(buf, 0, buf.Length);
fs.Write(bytes, 0, bytesRead);
} while (...);
}
}
The using statement does something like this
{
var rStream = response.GetResponseStream();
try
{
// do some work with rStream here.
} finally {
if (rStream != null) {
rStream.Dispose();
}
}
}
Here is the solution from Microsoft: http://support.microsoft.com/kb/812406
Updated 2021-03-16: seems the original article is not available now. Here is the archived one: https://mskb.pkisolutions.com/kb/812406

Downloading and saving excel file

i am using: `private void get_stocks_data()
{
byte[] result;
byte[] buffer = new byte[4096];
WebRequest wr = WebRequest.Create("http://www.tase.co.il/TASE/Pages/ExcelExport.aspx?sn=he-IL_ds&enumTblType=AllSecurities&Columns=he-IL_Columns&Titles=he-IL_Titles&TblId=0&ExportType=1");
using (WebResponse response = wr.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
using (MemoryStream memoryStream = new MemoryStream())
{
int count = 0;
do
{
count = responseStream.Read(buffer, 0, buffer.Length);
memoryStream.Write(buffer, 0, count);
} while (count != 0);
result = memoryStream.ToArray();
write_data_to_excel(result);
}
}
}`
to download the excel file,
And this method to fill the file on my computer:
private void write_data_to_excel(byte[] input)
{
StreamWriter str = new StreamWriter("stockdata.xls");
for (int i = 0; input.Length > i; i++)
{
str.WriteLine(input[i].ToString());
}
str.Close();
}
The result is that i get a lot of numbers...
What am i doing wrong? the file i am downloadin is excel version 2003, on my computer i have 2007...
Thanks.
I would suggest that you use WebClient.DownloadFile() instead.
This is a higher level method that will abstract from creating the request manually, dealing with encoding, etc.
Problem is in your Write_data_to_excel function
as you are using StreamWriter.WriteLine method it needs string,
you are passing byte as string so your binary value say 10 will be now string 10
try FileStream f = File.OpenWrite("stockdata.xlsx");
f.Write(input,0,input.Length); this will work.

Categories

Resources