C# Garbage Collection Weird Behavior - c#

I have a block of code that loads a custom storage file (data.00x) and dumps it's file contents (several files...) [for this example we'll say the referenced index only contains data.001 files for export]
Example:
public void ExportFileEntries(ref List<IndexEntry> filteredIndex, string dataDirectory, string buildDirectory, int chunkSize)
{
OnTotalMaxDetermined(new TotalMaxArgs(8));
// For each set of dataId files in the filteredIndex
for (int dataId = 1; dataId < 8; dataId++)
{
OnTotalProgressChanged(new TotalChangedArgs(dataId, string.Format("Exporting selected files from data.00{0}", dataId)));
// Filter only entries with current dataId into temp index
List<IndexEntry> tempIndex = GetEntriesByDataId(ref filteredIndex, dataId, SortType.Offset);
// Determine the path of the data.xxx file being exported from
string dataPath = string.Format(#"{0}\data.00{1}", dataDirectory, dataId);
if (File.Exists(dataPath))
{
// Load the data.xxx into filestream
using (FileStream dataFs = new FileStream(dataPath, FileMode.Open, FileAccess.Read))
{
// Loop through filex to export
foreach (IndexEntry indexEntry in tempIndex)
{
int fileLength = indexEntry.Length;
OnCurrentMaxDetermined(new CurrentMaxArgs(fileLength));
// Set the filestreams position to the file entries offset
dataFs.Position = indexEntry.Offset;
// Read the file into a byte array (buffer)
byte[] fileBytes = new byte[indexEntry.Length];
dataFs.Read(fileBytes, 0, fileBytes.Length);
// Define some information about the file being exported
string fileExt = Path.GetExtension(indexEntry.Name).Remove(0, 1);
string buildPath = string.Format(#"{0}\{1}\{2}", buildDirectory, fileExt.ToUpper(), indexEntry.Name);
// If needed unencrypt the data (fileBytes buffer)
if (XOR.Encrypted(fileExt)) { byte b = 0; XOR.Cipher(ref fileBytes, ref b); }
// If no chunkSize is provided, generate default
if (chunkSize == 0) { chunkSize = Math.Max(64000, (int)(fileBytes.Length * .02)); }
// If the build directory doesn't exist yet, create it.
if (!Directory.Exists(Path.GetDirectoryName(buildPath))) { Directory.CreateDirectory(Path.GetDirectoryName(buildPath)); }
using (FileStream buildFs = new FileStream(buildPath, FileMode.Create, FileAccess.Write))
{
using (BinaryWriter bw = new BinaryWriter(buildFs, encoding))
{
for (int byteCount = 0; byteCount < fileLength; byteCount += Math.Min(fileLength - byteCount, chunkSize))
{
bw.Write(fileBytes, byteCount, Math.Min(fileLength - byteCount, chunkSize));
OnCurrentProgressChanged(new CurrentChangedArgs(byteCount, ""));
}
}
}
OnCurrentProgressReset(EventArgs.Empty);
fileBytes = null;
}
}
}
else { OnError(new ErrorArgs(string.Format("[ExportFileEntries] Cannot locate: {0}", dataPath))); }
}
OnTotalProgressReset(EventArgs.Empty);
GC.Collect();
}
The data.001 stores about 12k files, most are very small .jpg pictures etc...etc.. for about the first half of the export process the gc collects just fine, but out of nowhere toward the last half of the export process the gc just stops giving a crap.
If I don't issue GC.Collect() at the end of the method the tool sits at around 255mb ram, but if I do call it goes down to about 14mb. What I'm asking, is there any obvious improvements over the way I coded the method (to increase gc performance)?

Related

What is different with the writing in FileStream?

When I searched the method about decompress the file by using SharpZipLib, I found lot of methods like this:
public static void TarWriteCharacters(string tarfile, string targetDir)
{
using (TarInputStream s = new TarInputStream(File.OpenRead(tarfile)))
{
//some codes here
using (FileStream fileWrite = File.Create(targetDir + directoryName + fileName))
{
int size = 2048;
byte[] data = new byte[2048];
while (true)
{
size = s.Read(data, 0, data.Length);
if (size > 0)
{
fileWrite.Write(data, 0, size);
}
else
{
break;
}
}
fileWrite.Close();
}
}
}
The format FileStream.Write is:
FileStream.Write(byte[] array, int offset, int count)
Now I try to separate part of read and write because I want to use thread to speed up the decompress rate in write function, and I use dynamic array byte[] and int[] to deposit the file's data and size like below
Read:
public static void TarWriteCharacters(string tarfile, string targetDir)
{
using (TarInputStream s = new TarInputStream(File.OpenRead(tarfile)))
{
//some codes here
using (FileStream fileWrite= File.Create(targetDir + directoryName + fileName))
{
int size = 2048;
List<int> SizeList = new List<int>();
List<byte[]> mydatalist = new List<byte[]>();
while (true)
{
byte[] data = new byte[2048];
size = s.Read(data, 0, data.Length);
if (size > 0)
{
mydatalist.Add(data);
SizeList.Add(size);
}
else
{
break;
}
}
test = new Thread(() =>
FileWriteFun(pathToTar, args, SizeList, mydatalist)
);
test.Start();
streamWriter.Close();
}
}
}
Write:
public static void FileWriteFun(string pathToTar , string[] args, List<int> SizeList, List<byte[]> mydataList)
{
//some codes here
using (FileStream fileWrite= File.Create(targetDir + directoryName + fileName))
{
for (int i = 0; i < mydataList.Count; i++)
{
fileWrite.Write(mydataList[i], 0, SizeList[i]);
}
fileWrite.Close();
}
}
Edit
(1)byte[] data = new byte[2048] into while loop to assign data to new array.
(2)change int[] SizeList = new int[2048] to List<int> SizeList = new List<int>() because of int range
As read on a stream is only guarantied to return one byte (typically it will be more, but you can't rely on the full requested length each time), your solution can theoretically fail after 2048 bytes as your SizeList can only hold 2048 entries.
You could use a List to hold the sizes.
Or use a MemoryStream instead of inventing your own.
But the two main problems are:
1) You keep reading into the same byte array, overwriting previously read data. When you add your data byte array to mydatalist, you must assign data to a new byte array.
2) you close your stream before the second thread is done writing.
In general threading is difficult and should only be used where you know it will improve performance. Simply reading and writing data is typically IO bound in performance, not cpu bound, so introducing a second thread will just give a small performance penalty and no gain in speed. You could use multithreading to ensure concurrent read/write operations, but most likely the disk cache will do this for you if you stick to the first solution - amd if not, using async is easier than multithreaded to achieve this.

Binaryreader read from Filestream which loads in chunks

I'm reading values from a huge file (> 10 GB) using the following code:
FileStream fs = new FileStream(fileName, FileMode.Open);
BinaryReader br = new BinaryReader(fs);
int count = br.ReadInt32();
List<long> numbers = new List<long>(count);
for (int i = count; i > 0; i--)
{
numbers.Add(br.ReadInt64());
}
unfortunately the read-speed from my SSD is stuck at a few MB/s. I guess the limit are the IOPS of the SSD, so it might be better to read in chunks from the file.
Question
Does the FileStream in my code really read only 8 bytes from the file everytime the BinaryReader calls ReadInt64()?
If so, is there a transparent way for the BinaryReader to provide a stream that reads in larger chunks from the file to speed up the procedure?
Test-Code
Here's a minimal example to create a test-file and to measure the read-performance.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
namespace TestWriteRead
{
class Program
{
static void Main(string[] args)
{
System.IO.File.Delete("test");
CreateTestFile("test", 1000000000);
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
IEnumerable<long> test = Read("test");
stopwatch.Stop();
Console.WriteLine("File loaded within " + stopwatch.ElapsedMilliseconds + "ms");
}
private static void CreateTestFile(string filename, int count)
{
FileStream fs = new FileStream(filename, FileMode.CreateNew);
BinaryWriter bw = new BinaryWriter(fs);
bw.Write(count);
for (int i = 0; i < count; i++)
{
long value = i;
bw.Write(value);
}
fs.Close();
}
private static IEnumerable<long> Read(string filename)
{
FileStream fs = new FileStream(filename, FileMode.Open);
BinaryReader br = new BinaryReader(fs);
int count = br.ReadInt32();
List<long> values = new List<long>(count);
for (int i = 0; i < count; i++)
{
long value = br.ReadInt64();
values.Add(value);
}
fs.Close();
return values;
}
}
}
You should configure the stream to use SequentialScan to indicate that you will read the stream from start to finish. It should improve the speed significantly.
Indicates that the file is to be accessed sequentially from beginning
to end. The system can use this as a hint to optimize file caching. If
an application moves the file pointer for random access, optimum
caching may not occur; however, correct operation is still guaranteed.
using (
var fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 8192,
FileOptions.SequentialScan))
{
var br = new BinaryReader(fs);
var count = br.ReadInt32();
var numbers = new List<long>();
for (int i = count; i > 0; i--)
{
numbers.Add(br.ReadInt64());
}
}
Try read blocks instead:
using (
var fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 8192,
FileOptions.SequentialScan))
{
var br = new BinaryReader(fs);
var numbersLeft = (int)br.ReadInt64();
byte[] buffer = new byte[8192];
var bufferOffset = 0;
var bytesLeftToReceive = sizeof(long) * numbersLeft;
var numbers = new List<long>();
while (true)
{
// Do not read more then possible
var bytesToRead = Math.Min(bytesLeftToReceive, buffer.Length - bufferOffset);
if (bytesToRead == 0)
break;
var bytesRead = fs.Read(buffer, bufferOffset, bytesToRead);
if (bytesRead == 0)
break; //TODO: Continue to read if file is not ready?
//move forward in read counter
bytesLeftToReceive -= bytesRead;
bytesRead += bufferOffset; //include bytes from previous read.
//decide how many complete numbers we got
var numbersToCrunch = bytesRead / sizeof(long);
//crunch them
for (int i = 0; i < numbersToCrunch; i++)
{
numbers.Add(BitConverter.ToInt64(buffer, i * sizeof(long)));
}
// move the last incomplete number to the beginning of the buffer.
var remainder = bytesRead % sizeof(long);
Buffer.BlockCopy(buffer, bytesRead - remainder, buffer, 0, remainder);
bufferOffset = remainder;
}
}
Update in response to a comment:
May I know what's the reason that manual reading is faster than the other one?
I don't know how the BinaryReader is actually implemented. So this is just assumptions.
The actual read from the disk is not the expensive part. The expensive part is to move the reader arm into the correct position on the disk.
As your application isn't the only one reading from a hard drive the disk have to re-position itself every time an application requests a read.
Thus if the BinaryReader just reads the requested int it have to wait on the disk for every read (if some other application make a read in-between).
As I read a much larger buffer directly (which is faster) I can process more integers without having to wait for the disk between reads.
Caching will of course speed things up a bit, and that's why it's "just" three times faster.
(future readers: If something above is incorrect, please correct me).
You can use a BufferedStream to increase the read buffer size.
In theory memory mapped files should help here. You could load it into memory using several very large chunks. Not sure though how much is this relevant when using SSDs.

How Can I Edit Bytes As They Pass Through A Stream?

My goal is to have a file stream open up a user-chosen file, then, it should stream the files bytes through in chunks (buffers) of about 4mb (this can be changed it's just for fun). As the bytes travel (in chunks) through the stream, I'd like to have a looping if-statement see if the bytes value is contained in an array I have declared elsewhere. (The code below will build a random array for replacing bytes), and the replacement loop could just say something like the bottom for-loop. As you can see I'm fairly fluent in this language but for some reason the editing and rewriting of chunks as they are read from a file to a new one is eluding me. Thanks in advance!
private void button2_Click(object sender, EventArgs e)
{
GenNewKey();
const int chunkSize = 4096; // read the file by chunks of 4KB
using (var file = File.OpenRead(textBox1.Text))
{
int bytesRead;
var buffer = new byte[chunkSize];
while ((bytesRead = file.Read(buffer, 0, buffer.Length)) > 0)
{
byte[] newbytes = buffer;
int index = 0;
foreach (byte b in buffer)
{
for (int x = 0; x < 256; x++)
{
if (buffer[index] == Convert.ToByte(lst[x]))
{
try
{
newbytes[index] = Convert.ToByte(lst[256 - x]);
}
catch (System.Exception ex)
{
//just to show why the error was thrown, but not really helpful..
MessageBox.Show(index + ", " + newbytes.Count().ToString());
}
}
}
index++;
}
AppendAllBytes(textBox1.Text + ".ENC", newbytes);
}
}
}
private void GenNewKey()
{
Random rnd = new Random();
while (lst.Count < 256)
{
int x = rnd.Next(0, 255);
if (!lst.Contains(x))
{
lst.Add(x);
}
}
foreach (int x in lst)
{
textBox2.Text += ", " + x.ToString();
//just for me to see what was generated
}
}
public static void AppendAllBytes(string path, byte[] bytes)
{
if (!File.Exists(path + ".ENC"))
{
File.Create(path + ".ENC");
}
using (var stream = new FileStream(path, FileMode.Append))
{
stream.Write(bytes, 0, bytes.Length);
}
}
Where textbox1 holds the path and name of file to encrypt, textBox2 holds the generated cipher for personal debugging purposes, button two is the encrypt button, and of course I am using System.IO.
Indeed you have a off by one error in newbytes[index] = Convert.ToByte(lst[256 - x])
if x is 0 then you will have lst[256], however lst only goes between 0-255. Change that to 255 should fix it.
The reason it freezes up is your program is EXTREMELY inefficient and working on the UI thread (and has a few more errors like you should only go up to bytesRead in size when processing buffer, but that will just give you extra data in your output that should not be there. Also you are reusing the same array for buffer and newbytes so your inner for loop could modify the same index more than once because every time you do newbytes[index] = Convert.ToByte(lst[256 - x]) you are modifying buffer[index] which will get checked again the next itteration of the for loop).
There is a lot of ways you can improve your code, here is a snippet that does similar to what you are doing (I don't do the whole "find the index and use the opposite location", I just use the byte that is passed in as the index in the array).
while ((bytesRead = file.Read(buffer, 0, buffer.Length)) > 0)
{
byte[] newbytes = new byte[bytesRead];
for(int i = 0; i < newbytes.Length; i++)
{
newbytes[i] = (byte)lst[buffer[i]]))
}
AppendAllBytes(textBox1.Text + ".ENC", newbytes);
}
This may also lead to freezing but not as much, to solve the freeing you should put all of this code in to a BackgroundWorker or similar to run on another thread.

Reading a file one byte at a time in reverse order

Hi I am trying to read a file one byte at a time in reverse order.So far I only managed to read the file from begining to end and write it on another file.
I need to be able to read the file from the end to the begining and print it to another file.
This is what I have so far:
string fileName = Console.ReadLine();
using (FileStream file = new FileStream(fileName ,FileMode.Open , FileAccess.Read))
{
//file.Seek(endOfFile, SeekOrigin.End);
int bytes;
using (FileStream newFile = new FileStream("newsFile.txt" , FileMode.Create , FileAccess.Write))
{
while ((bytes = file.ReadByte()) >= 0)
{
Console.WriteLine(bytes.ToString());
newFile.WriteByte((byte)bytes);
}
}
}
I know that I have to use the Seek method on the fileStream and that gets me to the end of the file.I already did that at the commented protion of the code , but I do not know how to read the file now in the while loop.
How can I achive this?
string fileName = Console.ReadLine();
using (FileStream file = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
byte[] output = new byte[file.Length]; // reversed file
// read the file backwards using SeekOrigin.Current
//
long offset;
file.Seek(0, SeekOrigin.End);
for (offset = 0; offset < fs.Length; offset++)
{
file.Seek(-1, SeekOrigin.Current);
output[offset] = (byte)file.ReadByte();
file.Seek(-1, SeekOrigin.Current);
}
// write entire reversed file array to new file
//
File.WriteAllBytes("newsFile.txt", output);
}
You could do it by reading one byte at a time, or you could read a larger buffer, write it to the output file in reverse, and continue like that until you've reached the beginning of the file. For example:
string inputFilename = "inputFile.txt";
string outputFilename = "outputFile.txt";
using (ofile = File.OpenWrite(outputFilename))
{
using (ifile = File.OpenRead(inputFilename))
{
int bufferSize = 4096;
byte[] buffer = new byte[bufferSize];
long filePos = ifile.Length;
do
{
long newPos = Math.Max(0, filePos - bufferSize);
int bytesToRead = (int)(filePos - newPos);
ifile.Seek(newPos, SeekOrigin.Set);
int bytesRead = ifile.Read(buffer, 0, bytesToRead);
// write the buffer to the output file, in reverse
for (int i = bytesRead-1; i >= 0; --i)
{
ofile.WriteByte(buffer[i]);
}
filePos = newPos;
} while (filePos > 0);
}
}
An obvious optimization would be to reverse the buffer after you've read it, and then write it in one whole chunk to the output file.
And if you know that the file will fit into memory, it's really easy:
var buffer = File.ReadAllBytes(inputFilename);
// now, reverse the buffer
int i = 0;
int j = buffer.Length-1;
while (i < j)
{
byte b = buffer[i];
buffer[i] = buffer[j];
buffer[j] = b;
++i;
--j;
}
// and write it
File.WriteAllBytes(outputFilename, buffer);
If the file is small (fits in your RAM) then this would work:
public static IEnumerable<byte> Reverse(string inputFilename)
{
var bytes = File.ReadAllBytes(inputFilename);
Array.Reverse(bytes);
foreach (var b in bytes)
{
yield return b;
}
}
Usage:
foreach (var b in Reverse("smallfile.dat"))
{
}
If the file is large (bigger than your RAM) then this would work:
using (var inputFile = File.OpenRead("bigfile.dat"))
using (var inputFileReversed = new ReverseStream(inputFile))
using (var binaryReader = new BinaryReader(inputFileReversed))
{
while (binaryReader.BaseStream.Position != binaryReader.BaseStream.Length)
{
var b = binaryReader.ReadByte();
}
}
It uses the ReverseStream class which can be found here.

insert hex data at specific offset

I need to be able to insert audio data into existing ac3 files. AC3 files are pretty simple and can be appended to each other without stripping headers or anything. The problem I have is that if you want to add/overwrite/erase a chunk of an ac3 file, you have to do it in 32ms increments, and each 32ms is equal to 1536 bytes of data. So when I insert a data chunk (which must be 1536 bytes, as I just said), I need to find the nearest offset that is divisible by 1536 (like 0, 1536 (0x600), 3072 (0xC00), etc). Let's say I can figure that out. I've read about changing a particular character at a specific offset, but I need to INSERT (not overwrite) that entire 1536-byte data chunk. How would I do that in C#, given the starting offset and the 1536-byte data chunk?
Edit: The data chunk I want to insert is basically just 32ms of silence, and I have the hex, ASCII and ANSI text translations of it. Of course, I may want to insert this chunk multiple times to get 128ms of silence instead of just 32, for example.
byte[] filbyte=File.ReadAllBytes(#"C:\abc.ac3");
byte[] tobeinserted=;//allocate in your way using encoding whatever
byte[] total=new byte[filebyte.Length+tobeinserted.Length];
for(int i=0;int j=0;i<total.Length;)
{
if(i==1536*pos)//make pos your choice
{
while(j<tobeinserted.Length)
total[i++]=tobeinserted[j++];
}
else{total[i++]=filbyte[i-j];}
}
File.WriteAllBytes(#"C:\abc.ac3",total);
Here is the helper method that will do what you need:
public static void Insert(string filepath, int insertOffset, Stream dataToInsert)
{
var newFilePath = filepath + ".tmp";
using (var source = File.OpenRead(filepath))
using (var destination = File.OpenWrite(newFilePath))
{
CopyTo(source, destination, insertOffset);// first copy the data before insert
dataToInsert.CopyTo(destination);// write data that needs to be inserted:
CopyTo(source, destination, (int)(source.Length - insertOffset)); // copy remaining data
}
// delete old file and rename new one:
File.Delete(filepath);
File.Move(newFilePath, filepath);
}
private static void CopyTo(Stream source, Stream destination, int count)
{
const int bufferSize = 32 * 1024;
var buffer = new byte[bufferSize];
var remaining = count;
while (remaining > 0)
{
var toCopy = remaining > bufferSize ? bufferSize : remaining;
var actualRead = source.Read(buffer, 0, toCopy);
destination.Write(buffer, 0, actualRead);
remaining -= actualRead;
}
}
And here is an NUnit test with example usage:
[Test]
public void TestInsert()
{
var originalString = "some original text";
var insertString = "_ INSERTED TEXT _";
var insertOffset = 8;
var file = #"c:\someTextFile.txt";
if (File.Exists(file))
File.Delete(file);
using (var originalData = new MemoryStream(Encoding.ASCII.GetBytes(originalString)))
using (var f = File.OpenWrite(file))
originalData.CopyTo(f);
using (var dataToInsert = new MemoryStream(Encoding.ASCII.GetBytes(insertString)))
Insert(file, insertOffset, dataToInsert);
var expectedText = originalString.Insert(insertOffset, insertString);
var actualText = File.ReadAllText(file);
Assert.That(actualText, Is.EqualTo(expectedText));
}
Be aware that I have removed some checks for code clarity - do not forget to check for null, file access permissions and file size. For example insertOffset can be bigger than file length - this condition is not checked here.

Categories

Resources