zip HUGE csv files in c# - c#

I am working in .Net 4.7.2. We have List of objects say MyObject which is to be converted to csv single file.
Currently using below code i use to create HUGE csv file ( 10GB and onwards).
using (var writ = new StreamWriter(fileStream, Encoding.UTF8))
{
using (var csvWrit = new CsvWriter(writ))
{
//logic
csvWrit.NextRecord();
}
}
ZipFile.CreateFromDirectory(<sourcefileName>, <destFileName>);
Now I need to create zip of these HUGE file. I found ZipFile.CreateFromDirectory in C#. After csv is created I call ZipFile.CreateFromDirectory to create zip file.
My Question Should I continue to first create the csv and then zip
OR
Any other efficient way to do this?

Related

Deserialize SQL Server image field back in Excel format

I have a SQL Server table that contains serialized Excel files, with 3 fields:
IdDocument -> int (PK)
DataFile -> image
FileName -> nvarchar
where DataFile contains the Excel file serialized, and FileName the name of the file (with path).
Something like this:
0xD0CF11E0A1B11AE100.....
U:\SAP_R3V4_Validation_Documents\March2012.xls
Now I need to get these files back in Excel format.
How can I accomplish this?
Using C# console application or SQL Server features could be fine.
Thank you in advance.
Luis
Excel files are binary. The xls format is obsolete, replaced since 2007 (15 years ago) by xlsx, a ZIP package containing XML files. What the question shows is how binary data looks in SSMS, not some kind of serialized format.
BTW the image is deprecated, replaced by varbinary(max) in 2005 or 2008 (can't remember).
In any case, reading binary data is the same as reading any other data. A DbDataReader is used to retrieve the query results and strongly typed methods are used to read specific fields per row. In this particular case GetStream() can be used to retrieve the data as a Stream that can be saved to disk:
using var con=new SqlConnection(connectionString)
{
using (var cmd=new SqlCommand(sql,con))
{
using (var reader=cmd.ExecuteReader())
{
while(reader.Read())
{
var path=reader.GetString(2);
var finalPath=Path.Combine(root,Path.GetFileName(path))
using(var stream=reader.GetStream(1))
{
using(var fileStream=File.Create(finalPath))
{
stream.CopyTo(fileStream);
}
}
}
}
}
}
The only thing that's different is the code that reads the field as a stream and saves it to disk
using(var stream=reader.GetStream(1))
{
using(var fileStream=File.Create(finalPath))
{
stream.CopyTo(fileStream);
}
}
The using clauses are used to ensure the data and file streams are closed even in case of error. The path itself is constructed by combining a root folder with the stored filename, not the full path

.Net web api download csv file (Excel separator)

I have a controller endpoint from which I want to generate a csv file and download it.
Currently I am using nuget CsvHelper and my code is like this:
var cc = new CsvConfiguration(new System.Globalization.CultureInfo("sl-SI"));
using (var ms = new MemoryStream())
{
using (var sw = new StreamWriter(stream: ms, encoding: new UTF8Encoding(true)))
{
using (var cw = new CsvWriter(sw, cc))
{
cw.WriteRecords(ListOfReports);
}// The stream gets flushed here.
return File(ms.ToArray(), "text/csv", $"{docNumber.Trim()}_{docType}.csv");
}
}
It generated csv pretty nice, but the problem was, if I opened it in Excel, whole row was in the first column and was not splitted.
I added this part:
cw.WriteField("sep=,", false);
cw.NextRecord();
Before cw.WriteRecords(ListOfReports);, which made it work in Excel, but if I open it in Notepad, there is a sep=, in my first row.
I noticed there is a difference in CultureInfo, If I set "sl-SI" it will work properly on Slovenian windows (separator will be ;), if I set "en-US" it will work on English Windows (separator ,). But what do i need to do to work on any Culture?
Does anyone has any idea how to fix this so it will work properly in Excel and any other text editor?
This is effectively the same as this SuperUser question, but it appears you want a programming-oriented solution rather than a user-oriented one.
The problem is fundamentally that Excel is really bad at dealing with CSV files, especially when taking non-US cultures into account. My suggestion would be to allow users to download a real Excel file using a library like DocumentFormat.OpenXml.
If you have two separate use cases for your downloads (i.e. some users who open the file in notepad or consume it with software that reads CSV, and others who open the file in Excel), give the users separate options to download in CSV or Excel.

How to read data from inner archives without extracting zip file?

I have a zip file which contains inner zip file (Ex:ZipFile1.zip->ZipFile2.zip->file.txt). I want to read the data of inner archive file content (file.txt) using ICSharpCode.SharpZipLib library without extracting to disk. Is it possible? If it is possible, Let me know how to get this.
Based on this answer, you can open a file within the zip as a Stream. You can also open a ZipFile from a Stream. I'm sure you can see where this is heading.
using (var zip = new ZipFile("ZipFile1.zip"))
{
var nestedZipEntry = zip.GetEntry("ZipFile2.zip");
using (var nestedZipStream = zip.GetInputStream(nestedZipEntry))
using (var nestedZip = new ZipFile(nestedZipStream))
{
var fileEntry = nestedZip.GetEntry("file.txt");
using (var fileStream = nestedZip.GetInputStream(fileEntry))
using (var reader = new StreamReader(fileStream))
{
Console.WriteLine(reader.ReadToEnd());
}
}
}
What we're doing here:
Open ZipFile1.zip
Find the entry for ZipFile2.zip
Open ZipFile2.zip as a Stream
Create a new ZipFile object around nestedZipStream.
Find the entry for file.txt
Create a StreamReader around fileStream to read the text file.
Read the contents of file.txt and output it to the console.
Try it online - in this sample, the base64 data is the binary data of a zip file which contains "test.zip", which in turn contains "file.txt". The contents of that text file is "hello".
P.S. If an entry isn't found then GetEntry will return null. You'll want to check for that in any code you write. It works here because I'm sure that these entries exist in their respective archives.

How to use string instead of file for XSLT.Load and XSLT.Transform method?

I am using http://www.thescarms.com/dotnet/XSLT.aspx to Convert comma delimited data (CSV) to XML using XSLT template.
It uses the foll. 2 lines of .NET code:
XSLT.Load(mstrInputXSLTFile, resolver);
XSLT.Transform(mstrInputCSVFile, mstrOutputXMLFile, resolver);
I am looking for a way in which I can use the string contents (contents of the XSLT, CSV file) instead of files in above 2 methods.. Any help will be usefull.
I am planning to implement this logic in a WCF webservice which will receive the csv string. If there is no workaround then I will have to create temp files based on the values of csv and xsl received. Process the conversion of csv to xml on the server and return the xml output to the client. Then delete the files created above.
If you want to load the input from a string then create an XmlReader over a StringReader over your string e.g.
XslCompiledTransform proc = new XslCompiledTransform();
using (StringReader sr = new StringReader(stringVar))
{
using (XmlReader xr = XmlReader.Create(sr))
{
proc.Load(xr);
}
}
There is no suitable method in XslTransform which does what you want.
However, you could write your own extensions methods (I would call them Parse..), which take content as string, create files in the temporary directory, and load/transform them by the suitable methods.

Compress large log file before reading

We have a large amount of logs (117 logs with total of about 17gb of data). It's straight text so I know it will compress well. I'm not looking for great compression, or speed (but that would be a good bonus). What I currently do is get a list of log files to read (they have a date stamp in the file name, so I filter on that first). After I get the list I then read each file using File.ReadAllLines() but we also filter on that...
private void GetBulkUpdateItems(List<string> allLines, Regex updatedRowsRegEx)
{
foreach (var file in this)
allLines.AddRange(File.ReadAllLines(file).Where(x => updatedRowsRegEx.IsMatch(x)));
allLines.Sort();
}
reading 5 files from the network takes about 22 seconds. What I'd like to do is compress the list of files into a single zip file. copy the zip file locally, then unzip them and do the rest. Problem is I can't figure out how to start. Since I'm using .net 4.5 I first tried System.IO.Compression.ZipFile but it wants a Directory and I don't want all 117 files. I saw someone use a network stream and 7zip which sounded promising, and I'm fairly certain that 7zip is installed on the server I need the logs from (Probably not important because we use the UNC path). So I'm stuck. Any suggestions?
ZipArchive is the underlying class for ZipFile and allows more granular manipulation.
Sample from the article adding hardcoded text:
using (FileStream zipToOpen = new FileStream(
#"c:\users\exampleuser\release.zip", FileMode.Open))
{
using (ZipArchive archive = new ZipArchive(zipToOpen, ZipArchiveMode.Update))
{
ZipArchiveEntry readmeEntry = archive.CreateEntry("Readme.txt");
using (StreamWriter writer = new StreamWriter(readmeEntry.Open()))
{
writer.WriteLine("Information about this package.");
writer.WriteLine("========================");
}
}
}
As Praveen Paulose suggested you can use ZipFileExtensions.CreateEntryFromFile to create entry from file to add to archive.

Categories

Resources