Convert List<string> to a stream - c#

The problem I have is that I have a CSV file full of records, that currently is being mapped to a strongly typed collection via the open source CsvHelper.CsvReader.GetRecords<T> method. It gets passed a GZIP stream which is built on a FileStream so is reading the stream from disk.
My suspicion is that the CsvHelper class when used with a FileStream is not very efficient as this load takes a long time. I want to try and load the raw file efficiently first just into memory, and then do the strong type mapping afterwards.
Unfortunately, the mapping class CsvHelper.CsvReader.GetRecords<T> accepts only a stream. I have managed to load the raw data into a List<string> very fast, however I now cannot figure out how to "streamify" this to pass to the mapper. Is this something I can do or is there another solution?
My code so far is
var fileStream = ...
var gzipStream = new GZipStream(fileStream, CompressionMode.Decompress);
var entries = new List<string>();
using (var unzip = new StreamReader(gzipStream))
while(!unzip.EndOfStream)
entries.Add(unzip.ReadLine());
Parse(??);
public IReadOnlyCollection<TRow> Parse(Stream stream)
{
Func<Stream> streamFactory = () => stream;
var results = ParseCsvWithConfig <TRow>(streamFactory, _configuration).AsReadOnly();
}
public static IEnumerable<T> ParseCsvWithConfig<T>(Func<Stream> streamFactory, CsvConfiguration configuration)
{
using (var stream = streamFactory())
{
var streamReader = new StreamReader(stream);
using (var csvReader = new CsvReader(streamReader, configuration ?? new CsvConfiguration()))
{
return csvReader.GetRecords<T>().ToList();
}
}
}

Skip the list altogether:
var fileStream = ...
var gzipStream = new GZipStream(fileStream, CompressionMode.Decompress);
var memoryStream = new MemoryStream();
gzipStream.CopyTo(memoryStream);
// call Parse on memorystream
Feel free to add using blocks where appropriate in your code.

Related

Extract tgz file in memory and access files in C#

I have a service that downloads a *.tgz file from a remote endpoint. I use SharpZipLib to extract and write the content of that compressed archive to disk. But now I want to prevent writing the files to disk (because that process doesn't have write permissions on that disk) and keep them in memory.
How can I access the decompressed files from memory? (Let's assume the archive holds simple text files)
Here is what I have so far:
public void Decompress(byte[] byteArray)
{
Stream inStream = new MemoryStream(byteArray);
Stream gzipStream = new GZipInputStream(inStream);
TarArchive tarArchive = TarArchive.CreateInputTarArchive(gzipStream);
tarArchive.ExtractContents(#".");
tarArchive.Close();
gzipStream.Close();
inStream.Close();
}
Check this and this out.
Turns out, ExtractContents() works by iterating over TarInputStream. When you create your TarArchive like this:
TarArchive.CreateInputTarArchive(gzipStream);
it actually wraps the stream you're passing into a TarInputStream. Thus, if you want more fine-grained control over how you extract files, you must use TarInputStream directly.
See, if you can iterate over files, directories and actual file contents like this:
Stream inStream = new MemoryStream(byteArray);
Stream gzipStream = new GZipInputStream(inStream);
using (var tarInputStream = new TarInputStream(gzipStream))
{
TarEntry entry;
while ((entry = tarInputStream.GetNextEntry()) != null)
{
var fileName = entry.Name;
using (var fileContents = new MemoryStream())
{
tarInputStream.CopyEntryContents(fileContents);
// use entry, fileName or fileContents here
}
}
}

Uploading DataTable to Azure blob storage

I am trying to serialize a DataTable to XML and then upload it to Azure blob storage.
The below code works, but seems clunky and memory hungry. Is there a better way to do this? I'm especially referring to the fact that I am dumping a memory stream to a byte array and then creating a new memory stream from it.
var container = blobClient.GetContainerReference("container");
var blockBlob = container.GetBlockBlobReference("blob");
byte[] blobBytes;
using (var writeStream = new MemoryStream())
{
using (var writer = new StreamWriter(writeStream))
{
table.WriteXml(writer, XmlWriteMode.WriteSchema);
}
blobBytes = writeStream.ToArray();
}
using (var readStream = new MemoryStream(blobBytes))
{
blockBlob.UploadFromStream(readStream);
}
New answer:
I've learned of a better approach, which is to open a write stream directly to the blob. For example:
using (var writeStream = blockBlob.OpenWrite())
{
using (var writer = new StreamWriter(writeStream))
{
table.WriteXml(writer, XmlWriteMode.WriteSchema);
}
}
Per our developer, this does not require the entire table to be buffered in-memory, and will probably encounter less copying around of data.
Original answer:
You can use the CloudBlockBlob.UploadFromByteArray method, and upload the byte array directly, instead of creating the second stream.
See https://msdn.microsoft.com/en-us/library/microsoft.windowsazure.storage.blob.cloudblockblob.uploadfrombytearray.aspx for the method syntax.

creating a zip file from an object directly without disk IO

I am writing a REST API which will take in a JSON request object. The request object will have to be serialized to a file in JSON format; the file has to be compressed into a zip file and the ZIP file has to be posted to another service, for which I would have to deserialize the ZIP file. All this because the service I have to call expects me to post data as ZIP file. I am trying to see if I can avoid disk IO. Is there a way to directly convert the object into a byte array representing ZIP content in-memory instead of all the above steps?
Note : I'd prefer accomplishing this using .net framework libraries (as against external libraries)
Yes, it is possible to create a zip file completely on memory, here is an example using SharpZip Library (Update: A sample using ZipArchive added at the end):
public static void Main()
{
var fileContent = Encoding.UTF8.GetBytes(
#"{
""fruit"":""apple"",
""taste"":""yummy""
}"
);
var zipStream = new MemoryStream();
var zip = new ZipOutputStream(zipStream);
AddEntry("file0.json", fileContent, zip); //first file
AddEntry("file1.json", fileContent, zip); //second file (with same content)
zip.Close();
//only for testing to see if the zip file is valid!
File.WriteAllBytes("test.zip", zipStream.ToArray());
}
private static void AddEntry(string fileName, byte[] fileContent, ZipOutputStream zip)
{
var zipEntry = new ZipEntry(fileName) {DateTime = DateTime.Now, Size = fileContent.Length};
zip.PutNextEntry(zipEntry);
zip.Write(fileContent, 0, fileContent.Length);
zip.CloseEntry();
}
You can obtain SharpZip using Nuget command PM> Install-Package SharpZipLib
Update:
Note : I'd prefer accomplishing this using .net framework libraries (as against external libraries)
Here is an example using Built-in ZipArchive from System.IO.Compression.Dll
public static void Main()
{
var fileContent = Encoding.UTF8.GetBytes(
#"{
""fruit"":""apple"",
""taste"":""yummy""
}"
);
var zipContent = new MemoryStream();
var archive = new ZipArchive(zipContent, ZipArchiveMode.Create);
AddEntry("file1.json",fileContent,archive);
AddEntry("file2.json",fileContent,archive); //second file (same content)
archive.Dispose();
File.WriteAllBytes("testa.zip",zipContent.ToArray());
}
private static void AddEntry(string fileName, byte[] fileContent,ZipArchive archive)
{
var entry = archive.CreateEntry(fileName);
using (var stream = entry.Open())
stream.Write(fileContent, 0, fileContent.Length);
}
You could use the GZipStream class along with MemoryStream.
A quick example:
using System.IO;
using System.IO.Compression;
//Put JSON into a MemoryStream
var theJson = "Your JSON Here";
var jsonStream = new MemoryStream();
var jsonStreamWriter = new StreamWriter(jsonStream);
jsonStreamWriter.Write(theJson);
jsonStreamWriter.Flush();
//Reset stream so it points to the beginning of the JSON
jsonStream.Seek(0, System.IO.SeekOrigin.Begin);
//Create stream to hold your zipped JSON
var zippedStream = new MemoryStream();
//Zip JSON and put it in zippedStream via compressionStream.
var compressionStream = new GZipStream(zippedStream, CompressionLevel.Optimal);
jsonStream.CopyTo(compressionStream);
//Reset zipped stream to point at the beginning of data
zippedStream.Seek(0, SeekOrigin.Begin);
//Get ByteArray with zipped JSON
var zippedJsonBytes = zippedStream.ToArray();
You should try the ZipArchive Class streaming to a MemoryStream Class
Yes. You can return it as a binary stream. Depending on the language, you can use special libraries. You will also need libraries on the client.

How to get contents of System.Net.Mail.Attachment

I have a System.Net.Mail.Attachment object with some .csv data in it. I need to save the contents of the attachment in a file. I tried this:
var sb = new StringBuilder();
sb.AppendLine("Accounts,JOB,Usage Count");
sb.AppendLine("One,Two,Three");
sb.AppendLine("One,Two,Three");
sb.AppendLine("One,Two,Three");
var stream = new MemoryStream(Encoding.ASCII.GetBytes(sb.ToString()));
//Add a new attachment to the E-mail message, using the correct MIME type
var attachment = new Attachment(stream, new ContentType("text/csv"))
{
Name = "theAttachment.csv"
};
var sr = new StreamWriter(#"C:\Blah\Look.csv");
sr.WriteLine(attachment.ContentStream.ToString());
sr.Close();
But the file has only the following: "System.IO.MemoryStream".
Could you please tell me how I can get the real data there?
Thanks.
You can't call ToString on an arbitrary stream. Instead you should use CopyTo:
using (var fs = new FileStream(#"C:\temp\Look.csv", FileMode.Create))
{
attachment.ContentStream.CopyTo(fs);
}
Use this to replace the last three lines of your example. By default, ToString just returns that name of the type unless the class overrides ToString. ContentStream is just the abstract Stream (at runtime it is a MemoryStream), so there is just the default implementation.
CopyTo is new in .NET Framework 4. If you aren't using the .NET Framework 4, you can mimic it with an extension method:
public static void CopyTo(this Stream fromStream, Stream toStream)
{
if (fromStream == null)
throw new ArgumentNullException("fromStream");
if (toStream == null)
throw new ArgumentNullException("toStream");
var bytes = new byte[8092];
int dataRead;
while ((dataRead = fromStream.Read(bytes, 0, bytes.Length)) > 0)
toStream.Write(bytes, 0, dataRead);
}
Credit to Gunnar Peipman for the extension method on his blog.
Assuming your stream isn't too big you can just write it all to the file like so:
StreamWriter writer = new StreamWriter(#"C:\Blah\Look.csv");
StreamReader reader = new StreamReader(attachment.ContentStream);
writer.WriteLine(reader.ReadToEnd());
writer.Close();
If it is bigger you probably want to chunk the reads up into a loop as to not demolish your RAM (and risk out of memory exceptions).

ServiceStack Stream Compression

I am returning a stream of data from a ServiceStack service as follows. Note that I need to do it this way instead of the ways outlined here because I need to perform some cleanup after the data has been written to the output stream.
using (var fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
{
fs.WriteTo(Response.OutputStream);
}
Response.EndRequest();
...cleanup code...
Compression is handled in the other services that return simple DTOs by using a ServiceRunner similar to this answer. However the stream response above never hits that code as the response object in OnAfterExecute is always null. I am able to manually compress the result inside of the service method as follows, but it requires a lot of setup to determine if and what compression is needed and manually setting up the correct HTTP headers (omitted below).
var outStream = new MemoryStream();
using (var fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
using (var tinyStream = new GZipStream(outStream, CompressionMode.Compress))
{
fs.CopyTo(tinyStream);
outStream.WriteTo(Response.OutputStream);
}
Response.EndRequest();
...cleanup code...
Is there a way in ServiceStack to handle this compression for me similar to the way it works with the ServiceRunner?
I'm not exactly sure what you like about the way ServiceStack handles compression within ServiceRunner. Is it because it is global across ServiceStack APIs?
For your example, I think something like below works and meets your need to perform some cleanup after the data has been written to the output stream ...
public class FStreamService : Service
{
public object Get(FStream request)
{
var filePath = #"c:\test.xml";
var compressFileResult = new CompressedFileResult(filePath); //CompressedResult in ServiceStack.Common.Web.CompressedFileResult
compressFileResult.WriteTo(Response.OutputStream);
Response.EndRequest();
// ...cleanup code...
}
}
Based on comments update of above to add compression using some ServiceStack extensions
public object Get(FStream request)
{
var filePath = #"c:\test.xml";
using (var fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
{
var compressedBtyes = fs.ToUtf8String().Compress(this.RequestContext.CompressionType);
new CompressedResult(compressedBtyes).WriteTo(Response.OutputStream);
}
Response.EndRequest();
// ...cleanup code...
}
If you want it within a ServiceRunner something like this should work...
public override object OnAfterExecute(IRequestContext requestContext, object response)
{
var resp = requestContext.Get<IHttpResponse>();
response = requestContext.ToOptimizedResult(requestContext.Get<IHttpResponse>().OutputStream);
return base.OnAfterExecute(requestContext, response);
}

Categories

Resources