I have written a C# application to archive data from a SQL Server table into an Azure blob. The archiving is configured by a JSON file and the values retrieved from the JSON file dictate what data to retrieve and archive.
The data needs to be stored in a blob name in this format
year/month/day/hour/older-than-[query-date]
Where query-date is the current date minus a number of days specified in the JSON file.
The issue I am having is how to incorporate compression to the process.
We would like to compress the data being archived to save space.
Currently the JSON settings mean that any data only than 30 days should be archived, but this results in about 3.7 million rows of data, so sometimes I get out of memory exceptions.
Regardless, how can I compress using GZip each row of data to the Azure blob? Here is existing code.
using (SqlDataAdapter adr = new SqlDataAdapter(comm))
{
adr.Fill(data);
data.TableName = config.TargetTableName;
}
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("blank");
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
blobClient.DefaultRequestOptions.ParallelOperationThreadCount = 20;
blobClient.DefaultRequestOptions.MaximumExecutionTime = TimeSpan.FromMinutes(20);
blobClient.DefaultRequestOptions.ServerTimeout = TimeSpan.FromMinutes(20);
CloudBlobContainer container = blobClient.GetContainerReference(config.AzureContainerName);
StringBuilder jsonData = new StringBuilder();
CloudBlockBlob blob = container.GetBlockBlobReference($"{config.TargetTableName}/{DateTime.Now.Year}/{DateTime.Now.Month}/{DateTime.Now.Day}/{DateTime.Now.Hour}/Older-Than-{queryParameter.Value}.log");
using (var writeStream = blob.OpenWrite())
{
using (var writer = new StreamWriter(writeStream))
{
data.WriteXml(writer, XmlWriteMode.WriteSchema);
}
}
I suggest you write your data to a MemoryStream. Then we can compress the memory stream and write it to Azure Blob Service. Code below is for your reference.
CloudBlockBlob blob = container.GetBlockBlobReference($"{config.TargetTableName}/{DateTime.Now.Year}/{DateTime.Now.Month}/{DateTime.Now.Day}/{DateTime.Now.Hour}/Older-Than-{queryParameter.Value}.log");
using (var writeStream = blob.OpenWrite())
{
MemoryStream memoryStream = new MemoryStream();
using (var writer = new StreamWriter(memoryStream))
{
data.WriteXml(writer, XmlWriteMode.WriteSchema);
}
using (GZipStream compressionStream = new GZipStream(writeStream,
CompressionMode.Compress))
{
memoryStream.Position = 0;
memoryStream.CopyTo(compressionStream);
}
}
Related
I'm uploading files to Azure Blob Storage with the .Net package specifying the encoding iso-8859-1. The stream seems ok in Memory but when I upload to the blob storage it ends with corrupted characters that seems that could not be converted to that encoding. It would seem as if the file gets storaged in a corrupted state and when I download it again and check it the characters get all messed up. Here is the code I'm using.
public static async Task<bool> UploadFileFromStream(this CloudStorageAccount account, string containerName, string destBlobPath, string fileName, Stream stream, Encoding encoding)
{
if (account is null) throw new ArgumentNullException(nameof(account));
if (string.IsNullOrEmpty(containerName)) throw new ArgumentException("message", nameof(containerName));
if (string.IsNullOrEmpty(destBlobPath)) throw new ArgumentException("message", nameof(destBlobPath));
if (stream is null) throw new ArgumentNullException(nameof(stream));
stream.Position = 0;
CloudBlockBlob blob = GetBlob(account, containerName, $"{destBlobPath}/{fileName}");
blob.Properties.ContentType = FileUtils.GetFileContentType(fileName);
using var reader = new StreamReader(stream, encoding);
var ct = await reader.ReadToEndAsync();
await blob.UploadTextAsync(ct, encoding ?? Encoding.UTF8, AccessCondition.GenerateEmptyCondition(), new BlobRequestOptions(), new OperationContext());
return true;
}
This is the file just before uploading it
<provinciaDatosInmueble>Sevilla</provinciaDatosInmueble>
<inePoblacionDatosInmueble>969</inePoblacionDatosInmueble>
<poblacionDatosInmueble>Valencina de la Concepción</poblacionDatosInmueble>
and this is the file after the upload
<provinciaDatosInmueble>Sevilla</provinciaDatosInmueble>
<inePoblacionDatosInmueble>969</inePoblacionDatosInmueble>
<poblacionDatosInmueble>Valencina de la Concepci�n</poblacionDatosInmueble>
The encoding I send is ISO-5589-1 in the parameter of the encoding. Anybody knows why Blob Storage seems to ignore the encoding I'm specifying? Thanks in advance!
We could able to achieve this using Azure.Storage.Blobs instead of WindowsAzure.Storage which is a legacy Storage SDK. Below is the code that worked for us.
class Program
{
static async Task Main(string[] args)
{
string sourceContainerName = "<Source_Container_Name>";
string destBlobPath = "<Destination_Path>";
string fileName = "<Source_File_name>";
MemoryStream stream = new MemoryStream();
BlobServiceClient blobServiceClient = new BlobServiceClient("<Your_Connection_String>");
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient(sourceContainerName);
BlobClient blobClientSource = containerClient.GetBlobClient(fileName);
BlobClient blobClientDestination = containerClient.GetBlobClient(destBlobPath);
// Reading From Blob
var line =" ";
if (await blobClientSource.ExistsAsync())
{
var response = await blobClientSource.DownloadAsync();
using (StreamReader streamReader = new StreamReader(response.Value.Content))
{
line = await streamReader.ReadToEndAsync();
}
}
// Writing To Blob
var content = Encoding.UTF8.GetBytes(line);
using (var ms = new MemoryStream(content))
blobClientDestination.Upload(ms);
}
}
RESULT:
I am trying to take a file and split it into piece and then push each new smaller file piece to azure. I have tried writing a MemoryStream to azure, but that causes the file to upload immediately, but the file is basically empty. I have tried using a BufferedStream which allows the data to be sent as i am writing to it, but i am not sure how to end the stream. I have tried to close each of the different streams i am using, but that does not work as it results in a stream closed exception. Any idea how to mark the stream as complete so the azure library will know to finish the file upload?
It does work to wait until the full file is build and then upload the memory stream, but i would like to be able to write to it while it is uploading if possible.
CloudBlobClient blobClient = StorageAccount.CreateCloudBlobClient();
CloudBlobContainer blobContainer = blobClient.GetContainerReference("containerName");
using (FileStream fileStream = File.Open(path)
{
int key = 0;
CsvWriter csvWriter = null;
MemoryStream memoryStream = null;
BufferedStream bufferedStream = null;
StreamWriter streamWriter = null;
Task<StorageUri> uploadTask = null;
using (var reader = new StreamReader(fileStream))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Read();
csv.ReadHeader();
await foreach (model row in csv.GetRecordsAsync<MyModel>())
{
if (row.KeyColumn != key)
{
if (memoryStream != null)
{
//Wiat for the current upload to finish
await csvWriter.FlushAsync();
csvWriter.Dispose();
await uploadTask;
}
//Start New Upload
key = row.KeyColumn;
memoryStream = new MemoryStream();
bufferedStream = new BufferedStream(memoryStream)
streamWriter = new StreamWriter(bufferedStream);
csvWriter = new CsvWriter(streamWriter, CultureInfo.InvariantCulture);
csvWriter.WriteHeader<MyModel>();
await csvWriter.FlushAsync();
CloudBlockBlob blockBlob = blobContainer.GetBlockBlobReference($"file_{key}.csv");
uploadTask = blockBlob.UploadFromStreamAsync(bufferedStream);
}
csvWriter.WriteRecord(row);
await csvWriter.FlushAsync();
}
if (memoryStream != null)
{
await csvWriter.FlushAsync();
csvWriter.Dispose();
await uploadTask;
}
}
}
I am using the following code to upload an XML file to an Azure Blob Storage account using the DotNetZip nuget package.
XmlDocument doc = new XmlDocument();
doc.Load(path);
string xmlContent = doc.InnerXml;
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(connectionString);
var cloudBlobClient = storageAccount.CreateCloudBlobClient();
var cloudBlobContainer = cloudBlobClient.GetContainerReference(container);
cloudBlobContainer.CreateIfNotExists();
using (var fs = File.Create("test.zip"))
{
using (var s = new ZipOutputStream(fs))
{
s.PutNextEntry("entry1.xml");
byte[] buffer = Encoding.ASCII.GetBytes(xmlContent);
s.Write(buffer, 0, buffer.Length);
fs.Position = 0;
//Get the blob ref
var blob = cloudBlobContainer.GetBlockBlobReference("test.zip");
blob.Properties.ContentEncoding = "zip"
blob.Properties.ContentType = "text/plain";
blob.Metadata["filename"] = "test.zip";
blob.UploadFromStream(fs);
}
}
This code creates a zip file in my container. But when I download it and try to open it, I get the following error:
"Windows cannot open the folder. The compressed (zipped) folder is invalid". But the saved zipped file in my application directory can be unzipped fine and contains my xml file.
What am I doing wrong?
I am able to reproduce the problem you're having. Essentially the issue is that the content is not completely written in the zip file when you initiated the upload command. In my test, the zip file size on the local disk was 902 bytes however at the time of uploading the size of file stream was just 40 bytes and that's causing the problem.
What I did was split the two functionality where the first one just creates the file and other reads from file and uploads in storage. Here's the code I used:
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("UseDevelopmentStorage=true");
var cloudBlobClient = storageAccount.CreateCloudBlobClient();
var cloudBlobContainer = cloudBlobClient.GetContainerReference("test");
cloudBlobContainer.CreateIfNotExists();
using (var fs = File.Create("test.zip"))
{
using (var s = new ZipOutputStream(fs))
{
s.PutNextEntry("entry1.xml");
byte[] buffer = File.ReadAllBytes(#"Path\To\MyFile.txt");
s.Write(buffer, 0, buffer.Length);
//Get the blob ref
}
}
using (var fs = File.OpenRead("test.zip"))
{
var blob = cloudBlobContainer.GetBlockBlobReference("test.zip");
blob.Properties.ContentEncoding = "zip";
blob.Properties.ContentType = "text/plain";
blob.Metadata["filename"] = "test.zip";
fs.Position = 0;
blob.UploadFromStream(fs);
}
I have a ASP.NET core application that need to send a stream(the stream is posted by client) to Microsoft Cognitive Service to get the ID. And then send the same stream to azure blob for backup, the file name should be the ID received from Cognitive Service.
But it seems like the MemoryStream ms closed after used by faceServiceClient : an error accrued at the second "ms.Position = 0" statement saying "Cannot access a closed stream".
public static async Task CreatPerson(string _key, HttpRequest _req)
{
var faceServiceClient = new FaceServiceClient(_key);
using (MemoryStream ms = new MemoryStream())
{
_req.Body.CopyTo(ms);
ms.Position = 0;
var facesTask = faceServiceClient.AddFaceToFaceListAsync("himlens", ms);
//init azure blob
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(AZURE_STORE_CONN_STR);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("xxx");
var faces = await facesTask;
var blob = container.GetBlockBlobReference(faces.PersistedFaceId.ToString());
ms.Position = 0;//Error Here
await blob.UploadFromStreamAsync(ms);
}
}
I'm confused about it, can anybody help me solve this problem?
Thanks!
ms.Position = 0;//Error Here
To easily fix it, you could create a new instance of MemoryStream and copy the value from ms. Then you could upload it to your blob storage. Code below is for your reference.
using (MemoryStream ms = new MemoryStream())
{
_req.Body.CopyTo(ms);
ms.Position = 0;
//new code which I added
MemoryStream ms2 = new MemoryStream();
ms.CopyTo(ms2);
ms.Position = 0;
var facesTask = faceServiceClient.AddFaceToFaceListAsync("himlens", ms);
//init azure blob
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(AZURE_STORE_CONN_STR);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("xxx");
var faces = await facesTask;
var blob = container.GetBlockBlobReference(faces.PersistedFaceId.ToString());
//Code which I modified
ms2.Position = 0;
await blob.UploadFromStreamAsync(ms2);
}
I'm working on a little C# ASP.NET web app that pulls 3 files from my server, creates a zip of those files, and sends the zip file to an e-mail recipient.
The problem I'm having is finding a way to combine those 3 files without creating a zip file on the hard drive of the server. I think I need to use some sort of memorystream or filestream, but I'm in a little beyond my understanding when it comes to merging them into 1 zip file. I've tried SharpZipLib and DotNetZip, but I haven't been able to figure it out.
The reason I don't want the zip saved locally is that there might be a number of users on this app at once, and I don't want to clog up my server machine with those zips. I'm looking for 2 answers, how to zip files without saving the zip as a file, and how to attach that zip to a MailMessage.
Check this example for SharpZipLib:
https://github.com/icsharpcode/SharpZipLib/wiki/Zip-Samples#wiki-anchorMemory
using ICSharpCode.SharpZipLib.Zip;
// Compresses the supplied memory stream, naming it as zipEntryName, into a zip,
// which is returned as a memory stream or a byte array.
//
public MemoryStream CreateToMemoryStream(MemoryStream memStreamIn, string zipEntryName) {
MemoryStream outputMemStream = new MemoryStream();
ZipOutputStream zipStream = new ZipOutputStream(outputMemStream);
zipStream.SetLevel(3); //0-9, 9 being the highest level of compression
ZipEntry newEntry = new ZipEntry(zipEntryName);
newEntry.DateTime = DateTime.Now;
zipStream.PutNextEntry(newEntry);
StreamUtils.Copy(memStreamIn, zipStream, new byte[4096]);
zipStream.CloseEntry();
zipStream.IsStreamOwner = false; // False stops the Close also Closing the underlying stream.
zipStream.Close(); // Must finish the ZipOutputStream before using outputMemStream.
outputMemStream.Position = 0;
return outputMemStream;
// Alternative outputs:
// ToArray is the cleaner and easiest to use correctly with the penalty of duplicating allocated memory.
byte[] byteArrayOut = outputMemStream.ToArray();
// GetBuffer returns a raw buffer raw and so you need to account for the true length yourself.
byte[] byteArrayOut = outputMemStream.GetBuffer();
long len = outputMemStream.Length;
}
Try this:
public static Attachment CreateAttachment(string fileNameAndPath, bool zipIfTooLarge = true, int bytes = 1 << 20)
{
if (!zipIfTooLarge)
{
return new Attachment(fileNameAndPath);
}
var fileInfo = new FileInfo(fileNameAndPath);
// Less than 1Mb just attach as is.
if (fileInfo.Length < bytes)
{
var attachment = new Attachment(fileNameAndPath);
return attachment;
}
byte[] fileBytes = File.ReadAllBytes(fileNameAndPath);
using (var memoryStream = new MemoryStream())
{
string fileName = Path.GetFileName(fileNameAndPath);
using (var zipArchive = new ZipArchive(memoryStream, ZipArchiveMode.Create))
{
ZipArchiveEntry zipArchiveEntry = zipArchive.CreateEntry(fileName, CompressionLevel.Optimal);
using (var streamWriter = new StreamWriter(zipArchiveEntry.Open()))
{
streamWriter.Write(Encoding.Default.GetString(fileBytes));
}
}
var attachmentStream = new MemoryStream(memoryStream.ToArray());
string zipname = $"{Path.GetFileNameWithoutExtension(fileName)}.zip";
var attachment = new Attachment(attachmentStream, zipname, MediaTypeNames.Application.Zip);
return attachment;
}
}