Decompress large BZip2 file - C# - c#

So I need to decompress a BZip2-file, use the files data, and then remove the decompressed file. The issue is that my method doesn't work when the BZip2-file is too large.
Been using:
using ICSharpCode.SharpZipLib.BZip2;
This is what I've been trying to do:
private List<JObject> listWithObjects;
private void DecompressBzip2File(string bzipPath)
{
string tempPath = Path.GetRandomFileName();
FileStream fs = new FileStream(bzipPath, FileMode.Open);
using (FileStream decompressedStream = File.Create(tempPath))
{
BZip2.Decompress(fs, decompressedStream, true);
}
LoadJson(tempPath);
File.Delete(tempPath);
}
private void LoadJson(string tempPath)
{
List<JObject> jsonList = new List<JObject>();
using (StreamReader file = new StreamReader(tempPath))
{
string line;
while ((line = file.ReadLine()) != null)
{
JObject jObject = JObject.Parse(line);
jsonList.Add(jObject);
}
file.Close();
}
listWithObjects = jsonList;
}
It's working when I've got a .bz2 ~14mb, but not when I've tried a .bz2 ~900mb my program just stops (and I get no error-message(my RAM goes crazy)). I read something about buffer size, but couldn't figure out how to use it.
Does anyone have any tip on how I could decompress a large bzip2-file? Could you like chunk the file to smaller pieces?

Related

C# - How can I download a zip file from url, unzip it, and read the extracted files, all in memory? [duplicate]

I have files (from 3rd parties) that are being FTP'd to a directory on our server. I download them and process them even 'x' minutes. Works great.
Now, some of the files are .zip files. Which means I can't process them. I need to unzip them first.
FTP has no concept of zip/unzipping - so I'll need to grab the zip file, unzip it, then process it.
Looking at the MSDN zip api, there seems to be no way i can unzip to a memory stream?
So is the only way to do this...
Unzip to a file (what directory? need some -very- temp location ...)
Read the file contents
Delete file.
NOTE: The contents of the file are small - say 4k <-> 1000k.
Zip compression support is built in:
using System.IO;
using System.IO.Compression;
// ^^^ requires a reference to System.IO.Compression.dll
static class Program
{
const string path = ...
static void Main()
{
using(var file = File.OpenRead(path))
using(var zip = new ZipArchive(file, ZipArchiveMode.Read))
{
foreach(var entry in zip.Entries)
{
using(var stream = entry.Open())
{
// do whatever we want with stream
// ...
}
}
}
}
}
Normally you should avoid copying it into another stream - just use it "as is", however, if you absolutely need it in a MemoryStream, you could do:
using(var ms = new MemoryStream())
{
stream.CopyTo(ms);
ms.Position = 0; // rewind
// do something with ms
}
You can use ZipArchiveEntry.Open to get a stream.
This code assumes the zip archive has one text file.
using (FileStream fs = new FileStream(path, FileMode.Open))
using (ZipArchive zip = new ZipArchive(fs) )
{
var entry = zip.Entries.First();
using (StreamReader sr = new StreamReader(entry.Open()))
{
Console.WriteLine(sr.ReadToEnd());
}
}
using (ZipArchive archive = new ZipArchive(webResponse.GetResponseStream()))
{
foreach (ZipArchiveEntry entry in archive.Entries)
{
Stream s = entry.Open();
var sr = new StreamReader(s);
var myStr = sr.ReadToEnd();
}
}
Looks like here is what you need:
using (var za = ZipFile.OpenRead(path))
{
foreach (var entry in za.Entries)
{
using (var r = new StreamReader(entry.Open()))
{
//your code here
}
}
}
You can use SharpZipLib among a variety of other libraries to achieve this.
You can use the following code example to unzip to a MemoryStream, as shown on their wiki:
using ICSharpCode.SharpZipLib.Zip;
// Compresses the supplied memory stream, naming it as zipEntryName, into a zip,
// which is returned as a memory stream or a byte array.
//
public MemoryStream CreateToMemoryStream(MemoryStream memStreamIn, string zipEntryName) {
MemoryStream outputMemStream = new MemoryStream();
ZipOutputStream zipStream = new ZipOutputStream(outputMemStream);
zipStream.SetLevel(3); //0-9, 9 being the highest level of compression
ZipEntry newEntry = new ZipEntry(zipEntryName);
newEntry.DateTime = DateTime.Now;
zipStream.PutNextEntry(newEntry);
StreamUtils.Copy(memStreamIn, zipStream, new byte[4096]);
zipStream.CloseEntry();
zipStream.IsStreamOwner = false; // False stops the Close also Closing the underlying stream.
zipStream.Close(); // Must finish the ZipOutputStream before using outputMemStream.
outputMemStream.Position = 0;
return outputMemStream;
// Alternative outputs:
// ToArray is the cleaner and easiest to use correctly with the penalty of duplicating allocated memory.
byte[] byteArrayOut = outputMemStream.ToArray();
// GetBuffer returns a raw buffer raw and so you need to account for the true length yourself.
byte[] byteArrayOut = outputMemStream.GetBuffer();
long len = outputMemStream.Length;
}
Ok so combining all of the above, suppose you want to in a very simple way take a zip file called
"file.zip" and extract it to "C:\temp" folder. (Note: This example was only tested for compress text files) You may need to do some modifications for binary files.
using System.IO;
using System.IO.Compression;
static void Main(string[] args)
{
//Call it like this:
Unzip("file.zip",#"C:\temp");
}
static void Unzip(string sourceZip, string targetPath)
{
using (var z = ZipFile.OpenRead(sourceZip))
{
foreach (var entry in z.Entries)
{
using (var r = new StreamReader(entry.Open()))
{
string uncompressedFile = Path.Combine(targetPath, entry.Name);
File.WriteAllText(uncompressedFile,r.ReadToEnd());
}
}
}
}

Load a file from Asset using Xamarin android

I want to load a file from the Asset, I found the solution but with Java. How can I convert following Java code to c#.
public String loadKMLFromAsset() {
String kmlData = null;
try {
InputStream is = getAssets().open("yourKMLFile");
int size = is.available();
byte[] buffer = new byte[size];
is.read(buffer);
is.close();
kmlData = new String(buffer, "UTF-8");
} catch (IOException ex) {
ex.printStackTrace();
return null;
}
return kmlData;
}
Use AssetManager
// Read the contents of our asset
string content;
AssetManager assets = this.Assets;
using (StreamReader sr = new StreamReader (assets.Open ("read_asset.txt")))
{
content = sr.ReadToEnd ();
}
Use BinaryReader instead of streamReader, if u are working with files such as db, kml, shapefiles,, video formats, etc. StreamReader reads only strings or just plain text, so when reading binary file some of the content may be skipped, since streamreader doesnt read byte by byte
This code writes the asset file to a file in your mobile file system:
if (!System.IO.File.Exists("yourKMLFile_mobile"))
{
var s = Resources.OpenRawResource(Resource.Raw.yourKMLFile);
FileStream writeStream = new FileStream("yourKMLFile_mobile", FileMode.OpenOrCreate, FileAccess.Write);
ReadWriteStream(s, writeStream);
}

How to access a text file in c# that is being used by another process

I have text file which is being been used by modscan to write data into the file. At a particular time I have to read the data and save in database. In offline mode ie; without modscan using it I can read the data and very well save in database. however as it online with modscan it gives exception
Cannot access file as it been used by other process.
My code:
using System.IO;
string path = dt.Rows[i][11].ToString();
string[] lines = System.IO.File.ReadAllLines(#path);
path has "E:\Metertxt\02.txt"
So what changes I need to make in order to read it without interfering with modscan.
I googled and I found this which might work, however I am not sure how to use it
FileShare.ReadWrite
You can use a FileStream to open a file that is already open in another application. Then you'll need a StreamReader if you want to read it line by line. This works, assuming a file encoding of UTF8:
using (var stream = new FileStream(#"c:\tmp\locked.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (var reader = new StreamReader(stream, Encoding.UTF8))
{
string line;
while ((line = reader.ReadLine()) != null)
{
// Do something with line, e.g. add to a list or whatever.
Console.WriteLine(line);
}
}
}
Alternative in case you really need a string[]:
var lines = new List<string>();
using (var stream = new FileStream(#"c:\tmp\locked.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (var reader = new StreamReader(stream, Encoding.UTF8))
{
string line;
while ((line = reader.ReadLine()) != null)
{
lines.Add(line);
}
}
}
// Now you have a List<string>, which can be converted to a string[] if you really need one.
var stringArray = lines.ToArray();
FileStream fstream = new FileStream("#path", FileMode.Open,FileAccess.Read, FileShare.ReadWrite);
StreamReader sreader = new StreamReader(fstream);
List<string> lines = new List<string>();
string line;
while((line = sreader.ReadeLine()) != null)
lines.Add(line);
//do something with the lines
//if you need all lines at once,
string allLines = sreader.ReadToEnd();

How can I unzip a file to a .NET memory stream?

I have files (from 3rd parties) that are being FTP'd to a directory on our server. I download them and process them even 'x' minutes. Works great.
Now, some of the files are .zip files. Which means I can't process them. I need to unzip them first.
FTP has no concept of zip/unzipping - so I'll need to grab the zip file, unzip it, then process it.
Looking at the MSDN zip api, there seems to be no way i can unzip to a memory stream?
So is the only way to do this...
Unzip to a file (what directory? need some -very- temp location ...)
Read the file contents
Delete file.
NOTE: The contents of the file are small - say 4k <-> 1000k.
Zip compression support is built in:
using System.IO;
using System.IO.Compression;
// ^^^ requires a reference to System.IO.Compression.dll
static class Program
{
const string path = ...
static void Main()
{
using(var file = File.OpenRead(path))
using(var zip = new ZipArchive(file, ZipArchiveMode.Read))
{
foreach(var entry in zip.Entries)
{
using(var stream = entry.Open())
{
// do whatever we want with stream
// ...
}
}
}
}
}
Normally you should avoid copying it into another stream - just use it "as is", however, if you absolutely need it in a MemoryStream, you could do:
using(var ms = new MemoryStream())
{
stream.CopyTo(ms);
ms.Position = 0; // rewind
// do something with ms
}
You can use ZipArchiveEntry.Open to get a stream.
This code assumes the zip archive has one text file.
using (FileStream fs = new FileStream(path, FileMode.Open))
using (ZipArchive zip = new ZipArchive(fs) )
{
var entry = zip.Entries.First();
using (StreamReader sr = new StreamReader(entry.Open()))
{
Console.WriteLine(sr.ReadToEnd());
}
}
using (ZipArchive archive = new ZipArchive(webResponse.GetResponseStream()))
{
foreach (ZipArchiveEntry entry in archive.Entries)
{
Stream s = entry.Open();
var sr = new StreamReader(s);
var myStr = sr.ReadToEnd();
}
}
Looks like here is what you need:
using (var za = ZipFile.OpenRead(path))
{
foreach (var entry in za.Entries)
{
using (var r = new StreamReader(entry.Open()))
{
//your code here
}
}
}
You can use SharpZipLib among a variety of other libraries to achieve this.
You can use the following code example to unzip to a MemoryStream, as shown on their wiki:
using ICSharpCode.SharpZipLib.Zip;
// Compresses the supplied memory stream, naming it as zipEntryName, into a zip,
// which is returned as a memory stream or a byte array.
//
public MemoryStream CreateToMemoryStream(MemoryStream memStreamIn, string zipEntryName) {
MemoryStream outputMemStream = new MemoryStream();
ZipOutputStream zipStream = new ZipOutputStream(outputMemStream);
zipStream.SetLevel(3); //0-9, 9 being the highest level of compression
ZipEntry newEntry = new ZipEntry(zipEntryName);
newEntry.DateTime = DateTime.Now;
zipStream.PutNextEntry(newEntry);
StreamUtils.Copy(memStreamIn, zipStream, new byte[4096]);
zipStream.CloseEntry();
zipStream.IsStreamOwner = false; // False stops the Close also Closing the underlying stream.
zipStream.Close(); // Must finish the ZipOutputStream before using outputMemStream.
outputMemStream.Position = 0;
return outputMemStream;
// Alternative outputs:
// ToArray is the cleaner and easiest to use correctly with the penalty of duplicating allocated memory.
byte[] byteArrayOut = outputMemStream.ToArray();
// GetBuffer returns a raw buffer raw and so you need to account for the true length yourself.
byte[] byteArrayOut = outputMemStream.GetBuffer();
long len = outputMemStream.Length;
}
Ok so combining all of the above, suppose you want to in a very simple way take a zip file called
"file.zip" and extract it to "C:\temp" folder. (Note: This example was only tested for compress text files) You may need to do some modifications for binary files.
using System.IO;
using System.IO.Compression;
static void Main(string[] args)
{
//Call it like this:
Unzip("file.zip",#"C:\temp");
}
static void Unzip(string sourceZip, string targetPath)
{
using (var z = ZipFile.OpenRead(sourceZip))
{
foreach (var entry in z.Entries)
{
using (var r = new StreamReader(entry.Open()))
{
string uncompressedFile = Path.Combine(targetPath, entry.Name);
File.WriteAllText(uncompressedFile,r.ReadToEnd());
}
}
}
}

Split large XML file after string found

What I have:
A large XML file # nearly 1 million lines worth of content. Example of content:
<etc35yh3 etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
<etc123 etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
<etc15y etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
^ repeat that by 900k or so lines (content changing of course)
What I need:
Search the XML file for "<etc123". Once found move (write) that line along with all lines below it to a separate XML file.
Would it be advisable to use a method such as File.ReadAllLines for the search portion? What would you all recommend for the writing portion. Line by line is not an option as far as I can tell as it would take much too long.
To quite literaly discard the content above your search string, I would not use File.ReadAllLines, as it would load the entire file into memory. Try File.Open and wrap it in a StreamReader. Loop on StreamReader.ReadLine, then start writing to a new StreamWriter, or do a byte copy on the underlying filestream.
An example of how to do so with StreamWriter/StreamReader alone is listed below.
//load the input file
//open with read and sharing
using (FileStream fsInput = new FileStream("input.txt",
FileMode.Open, FileAccess.Read, FileShare.Read))
{
//use streamreader to search for start
var srInput = new StreamReader(fsInput);
string searchString = "two";
string cSearch = null;
bool found = false;
while ((cSearch = srInput.ReadLine()) != null)
{
if (cSearch.StartsWith(searchString, StringComparison.CurrentCultureIgnoreCase)
{
found = true;
break;
}
}
if (!found)
throw new Exception("Searched string not found.");
//we have the data, write to a new file
using (StreamWriter sw = new StreamWriter(
new FileStream("out.txt", FileMode.OpenOrCreate, //create or overwrite
FileAccess.Write, FileShare.None))) // write only, no sharing
{
//write the line that we found in the search
sw.WriteLine(cSearch);
string cline = null;
while ((cline = srInput.ReadLine()) != null)
sw.WriteLine(cline);
}
}
//both files are closed and complete
You can copy with LINQ2XML
XElement doc=XElement.Load("yourXML.xml");
XDocument newDoc=new XDocument();
foreach(XElement elm in doc.DescendantsAndSelf("etc123"))
{
newDoc.Add(elm);
}
newDoc.Save("yourOutputXML.xml");
You could do one line at a time... Would not use read to end if checking contents of each line.
FileInfo file = new FileInfo("MyHugeXML.xml");
FileInfo outFile = new FileInfo("ResultFile.xml");
using(FileStream write = outFile.Create())
using(StreamReader sr = file.OpenRead())
{
bool foundit = false;
string line;
while((line = sr.ReadLine()) != null)
{
if(foundit)
{
write.WriteLine(line);
}
else if (line.Contains("<etc123"))
{
foundit = true;
}
}
}
Please note, this method may not produce valid XML, given your requirements.

Categories

Resources