Uploading DataTable to Azure blob storage - c#

I am trying to serialize a DataTable to XML and then upload it to Azure blob storage.
The below code works, but seems clunky and memory hungry. Is there a better way to do this? I'm especially referring to the fact that I am dumping a memory stream to a byte array and then creating a new memory stream from it.
var container = blobClient.GetContainerReference("container");
var blockBlob = container.GetBlockBlobReference("blob");
byte[] blobBytes;
using (var writeStream = new MemoryStream())
{
using (var writer = new StreamWriter(writeStream))
{
table.WriteXml(writer, XmlWriteMode.WriteSchema);
}
blobBytes = writeStream.ToArray();
}
using (var readStream = new MemoryStream(blobBytes))
{
blockBlob.UploadFromStream(readStream);
}

New answer:
I've learned of a better approach, which is to open a write stream directly to the blob. For example:
using (var writeStream = blockBlob.OpenWrite())
{
using (var writer = new StreamWriter(writeStream))
{
table.WriteXml(writer, XmlWriteMode.WriteSchema);
}
}
Per our developer, this does not require the entire table to be buffered in-memory, and will probably encounter less copying around of data.
Original answer:
You can use the CloudBlockBlob.UploadFromByteArray method, and upload the byte array directly, instead of creating the second stream.
See https://msdn.microsoft.com/en-us/library/microsoft.windowsazure.storage.blob.cloudblockblob.uploadfrombytearray.aspx for the method syntax.

Related

Azure Blob Storage SDK 12 compress files by GZipStream not working

I use the same code for gzipping streams all the time in the same way.
But for some reason it does not work in Azure.Storage.Blobs version 12.6.0 lib
m_currentFileName = Guid.NewGuid() + ".txt.gz";
var blockBlob = new BlockBlobClient(m_connectionString, m_containerName, GetTempFilePath());
using (var stream = await blockBlob.OpenWriteAsync(true))
using (var currentStream = new GZipStream(stream, CompressionMode.Compress))
using (var writer = new StreamWriter(currentStream))
{
writer.WriteLine("Hello world!");
}
After that I got 0B file in Azure Blob Storage
The code without GZipStream works as expected.
I found lots of code examples with a copy data to a MemoryStream first but I do not wanna keep my data in RAM. I did not find any issues on StackOverflow or Azure Blob Storage GitHub. So I may do something wrong here. Any suggestions?
It appears GZipStream stream needs to be explicitly closed, according to GZipStream.Write method
The write operation might not occur immediately but is buffered until
the buffer size is reached or until the Flush or Close method is
called.
For example:
using (var stream = new MemoryStream())
{
using (var gzipStream = new GZipStream(stream, CompressionMode.Compress, true))
using (var writer = new StreamWriter(gzipStream))
{
writer.Write("Hello world!");
}
stream.Position = 0;
await blockBlob.UploadAsync(stream);
}

Convert List<string> to a stream

The problem I have is that I have a CSV file full of records, that currently is being mapped to a strongly typed collection via the open source CsvHelper.CsvReader.GetRecords<T> method. It gets passed a GZIP stream which is built on a FileStream so is reading the stream from disk.
My suspicion is that the CsvHelper class when used with a FileStream is not very efficient as this load takes a long time. I want to try and load the raw file efficiently first just into memory, and then do the strong type mapping afterwards.
Unfortunately, the mapping class CsvHelper.CsvReader.GetRecords<T> accepts only a stream. I have managed to load the raw data into a List<string> very fast, however I now cannot figure out how to "streamify" this to pass to the mapper. Is this something I can do or is there another solution?
My code so far is
var fileStream = ...
var gzipStream = new GZipStream(fileStream, CompressionMode.Decompress);
var entries = new List<string>();
using (var unzip = new StreamReader(gzipStream))
while(!unzip.EndOfStream)
entries.Add(unzip.ReadLine());
Parse(??);
public IReadOnlyCollection<TRow> Parse(Stream stream)
{
Func<Stream> streamFactory = () => stream;
var results = ParseCsvWithConfig <TRow>(streamFactory, _configuration).AsReadOnly();
}
public static IEnumerable<T> ParseCsvWithConfig<T>(Func<Stream> streamFactory, CsvConfiguration configuration)
{
using (var stream = streamFactory())
{
var streamReader = new StreamReader(stream);
using (var csvReader = new CsvReader(streamReader, configuration ?? new CsvConfiguration()))
{
return csvReader.GetRecords<T>().ToList();
}
}
}
Skip the list altogether:
var fileStream = ...
var gzipStream = new GZipStream(fileStream, CompressionMode.Decompress);
var memoryStream = new MemoryStream();
gzipStream.CopyTo(memoryStream);
// call Parse on memorystream
Feel free to add using blocks where appropriate in your code.

create zip from byte[] and return to browser

I want to create a zip-file and return it to the browser so that it downloads the zip to the downloads-folder.
var images = imageRepository.GetAll(allCountryId);
using (FileStream f2 = new FileStream("SudaAmerica", FileMode.Create))
using (GZipStream gz = new GZipStream(f2, CompressionMode.Compress, false))
{
foreach (var image in images)
{
gz.Write(image.ImageData, 0, image.ImageData.Length);
}
return base.File(gz, "application/zip", "SudaAmerica");
}
i have tried the above but then i get an error saying the stream is disposed.
Is this possible or should i use another library then gzipstream?
The problem here is exactly what it says: you are handing it something based on gz, but gz gets disposed the moment you leave the using.
One option would be to wait until outside the using block, then tell it to use the filename of the thing you just wrote ("SudaAmerica"). However, IMO you shouldn't actually be writing a file here at all. If you use a MemoryStream instead, you can use .ToArray() to get a byte[] of the contents, which you can use in the File method. This requires no IO access, which is a win in about 20 different ways. Well, maybe 3 ways. But...
var images = imageRepository.GetAll(allCountryId);
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream gz = new GZipStream(ms, CompressionMode.Compress, false))
{
foreach (var image in images)
{
gz.Write(image.ImageData, 0, image.ImageData.Length);
}
}
return base.File(ms.ToArray(), "application/zip", "SudaAmerica");
}
Note that a gzip stream is not the same as a .zip archive, so I very much doubt this will have the result you want. Zip archive creation is available elsewhere in the .NET framework, but it is not via GZipStream.
You probably want ZipArchive

Can I get a GZipStream for a file without writing to intermediate temporary storage?

Can I get a GZipStream for a file on disk without writing the entire compressed content to temporary storage? I'm currently using a temporary file on disk in order to avoid possible memory exhaustion using MemoryStream on very large files (this is working fine).
public void UploadFile(string filename)
{
using (var temporaryFileStream = File.Open("tempfile.tmp", FileMode.CreateNew, FileAccess.ReadWrite))
{
using (var fileStream = File.OpenRead(filename))
using (var compressedStream = new GZipStream(temporaryFileStream, CompressionMode.Compress, true))
{
fileStream.CopyTo(compressedStream);
}
temporaryFileStream.Position = 0;
Uploader.Upload(temporaryFileStream);
}
}
What I'd like to do is eliminate the temporary storage by creating GZipStream, and have it read from the original file only as the Uploader class requests bytes from it. Is such a thing possible? How might such an implementation be structured?
Note that Upload is a static method with signature static void Upload(Stream stream).
Edit: The full code is here if it's useful. I hope I've included all the relevant context in my sample above however.
Yes, this is possible, but not easily with any of the standard .NET stream classes. When I needed to do something like this, I created a new type of stream.
It's basically a circular buffer that allows one producer (writer) and one consumer (reader). It's pretty easy to use. Let me whip up an example. In the meantime, you can adapt the example in the article.
Later: Here's an example that should come close to what you're asking for.
using (var pcStream = new ProducerConsumerStream(BufferSize))
{
// start upload in a thread
var uploadThread = new Thread(UploadThreadProc(pcStream));
uploadThread.Start();
// Open the input file and attach the gzip stream to the pcStream
using (var inputFile = File.OpenRead("inputFilename"))
{
// create gzip stream
using (var gz = new GZipStream(pcStream, CompressionMode.Compress, true))
{
var bytesRead = 0;
var buff = new byte[65536]; // 64K buffer
while ((bytesRead = inputFile.Read(buff, 0, buff.Length)) != 0)
{
gz.Write(buff, 0, bytesRead);
}
}
}
// The entire file has been compressed and copied to the buffer.
// Mark the stream as "input complete".
pcStream.CompleteAdding();
// wait for the upload thread to complete.
uploadThread.Join();
// It's very important that you don't close the pcStream before
// the uploader is done!
}
The upload thread should be pretty simple:
void UploadThreadProc(object state)
{
var pcStream = (ProducerConsumerStream)state;
Uploader.Upload(pcStream);
}
You could, of course, put the producer on a background thread and have the upload be done on the main thread. Or have them both on background threads. I'm not familiar with the semantics of your uploader, so I'll leave that decision to you.

Read a PDF into a string or byte[] and write that string/byte[] back to disk

I am having a problem in my app where it reads a PDF from disk, and then has to write it back to a different location later.
The emitted file is not a valid PDF anymore.
In very simplified form, I have tried reading/writing it using
var bytes = File.ReadAllBytes(#"c:\myfile.pdf");
File.WriteAllBytes(#"c:\output.pdf", bytes);
and
var input = new StreamReader(#"c:\myfile.pdf").ReadToEnd();
File.WriteAllText("c:\output.pdf", input);
... and about 100 permutations of the above with various encodings being specified. None of the output files were valid PDFs.
Can someone please lend a hand? Many thanks!!
In C#/.Net 4.0:
using (var i = new FileStream(#"input.pdf", FileMode.Open, FileAccess.Read))
using (var o = File.Create(#"output.pdf"))
i.CopyTo(o);
If you insist on having the byte[] first:
using (var i = new FileStream(#"input.pdf", FileMode.Open, FileAccess.Read))
using (var ms = new MemoryStream())
{
i.CopyTo(ms);
byte[] rawdata = ms.GetBuffer();
using (var o = File.Create(#"output.pdf"))
ms.CopyTo(o);
}
The memory stream may need to be ms.Seek(0, SeekOrigin.Origin) or something like that before the second CopyTo. look it up, or try it out
You're using File.WriteAllText to write your file out.
Try File.WriteAllBytes.

Categories

Resources