What is different with the writing in FileStream? - c#

When I searched the method about decompress the file by using SharpZipLib, I found lot of methods like this:
public static void TarWriteCharacters(string tarfile, string targetDir)
{
using (TarInputStream s = new TarInputStream(File.OpenRead(tarfile)))
{
//some codes here
using (FileStream fileWrite = File.Create(targetDir + directoryName + fileName))
{
int size = 2048;
byte[] data = new byte[2048];
while (true)
{
size = s.Read(data, 0, data.Length);
if (size > 0)
{
fileWrite.Write(data, 0, size);
}
else
{
break;
}
}
fileWrite.Close();
}
}
}
The format FileStream.Write is:
FileStream.Write(byte[] array, int offset, int count)
Now I try to separate part of read and write because I want to use thread to speed up the decompress rate in write function, and I use dynamic array byte[] and int[] to deposit the file's data and size like below
Read:
public static void TarWriteCharacters(string tarfile, string targetDir)
{
using (TarInputStream s = new TarInputStream(File.OpenRead(tarfile)))
{
//some codes here
using (FileStream fileWrite= File.Create(targetDir + directoryName + fileName))
{
int size = 2048;
List<int> SizeList = new List<int>();
List<byte[]> mydatalist = new List<byte[]>();
while (true)
{
byte[] data = new byte[2048];
size = s.Read(data, 0, data.Length);
if (size > 0)
{
mydatalist.Add(data);
SizeList.Add(size);
}
else
{
break;
}
}
test = new Thread(() =>
FileWriteFun(pathToTar, args, SizeList, mydatalist)
);
test.Start();
streamWriter.Close();
}
}
}
Write:
public static void FileWriteFun(string pathToTar , string[] args, List<int> SizeList, List<byte[]> mydataList)
{
//some codes here
using (FileStream fileWrite= File.Create(targetDir + directoryName + fileName))
{
for (int i = 0; i < mydataList.Count; i++)
{
fileWrite.Write(mydataList[i], 0, SizeList[i]);
}
fileWrite.Close();
}
}
Edit
(1)byte[] data = new byte[2048] into while loop to assign data to new array.
(2)change int[] SizeList = new int[2048] to List<int> SizeList = new List<int>() because of int range

As read on a stream is only guarantied to return one byte (typically it will be more, but you can't rely on the full requested length each time), your solution can theoretically fail after 2048 bytes as your SizeList can only hold 2048 entries.
You could use a List to hold the sizes.
Or use a MemoryStream instead of inventing your own.
But the two main problems are:
1) You keep reading into the same byte array, overwriting previously read data. When you add your data byte array to mydatalist, you must assign data to a new byte array.
2) you close your stream before the second thread is done writing.
In general threading is difficult and should only be used where you know it will improve performance. Simply reading and writing data is typically IO bound in performance, not cpu bound, so introducing a second thread will just give a small performance penalty and no gain in speed. You could use multithreading to ensure concurrent read/write operations, but most likely the disk cache will do this for you if you stick to the first solution - amd if not, using async is easier than multithreaded to achieve this.

Related

Channels & Memory Management Strategies for Large Objects

I'm trying to determine how to best implement .Net Core 3 Channels and whether it's a good idea to pass very large objects between tasks. In my example, one task that is very fast can read in a 1GB chunk from a very large file. A number of consumer tasks can read a chunk from the channel and process them in parallel, as processing is much slower and needs parallel (multi-threaded) execution.
In testing my code, there is a massive amount of GC happening and total RAM used far exceeds the sum of all data waiting in one bounded channel and all executing tasks. I've simplified my code down to the most basic example hoping someone can give me some tips on how to better allocate/manage memory or if this approach is a good idea?
using System;
using System.IO;
using System.Threading.Channels;
using System.Threading.Tasks;
namespace MergeSort
{
public class Example
{
private Channel<byte[]> _channelProcessing;
public async Task DoSort(int queueDepth, int parallelTaskCount)
{
// Hard-code some values so we can talk about details
queueDepth = 2;
parallelTasks = 8;
_channelProcessing = Channel.CreateBounded<byte[]>(queueDepth);
Task[] processingTasks = new Task[parallelTaskCount];
int outputBufferSize = 1024 * 1024;
for (int x = 0; x < parallelTaskCount; x++)
{
string outputFile = $"C:\\Output.{x:00000000}.txt";
processingTasks[x] = Task.Run(() => ProcessChunkAsync(outputBufferSize));
}
// Task put unsorted chunks on the channel
string inputFile = "C:\\Input.txt";
int chunkSize = 1024 * 1024 * 1024; // 1GiB
Task inputTask = Task.Run(() => ReadInputAsync(inputFile, chunkSize));
// Wait for all tasks building chunk files to complete before continuing
await inputTask;
await Task.WhenAll(processingTasks);
}
private async Task ReadInputAsync(string inputFile, int chunkSize)
{
int bytesRead = 0;
byte[] chunkBuffer = new byte[chunkSize];
using (FileStream fileStream = File.Open(inputFile, FileMode.Open, FileAccess.Read, FileShare.Read))
{
// Read chunks until input EOF
while (fileStream.Position != fileStream.Length)
{
bytesRead = fileStream.Read(chunkBuffer, 0, chunkBuffer.Length);
// Fake code him to simulate the work I need to do showing outBuffer.Length is calculated at runtime
Random rnd = new Random();
int runtimeCalculatedAmount = rnd.Next(100, 600);
byte[] tempBuffer = new byte[runtimeCalculatedAmount];
// Create the buffer with a slightly variable size that needs to be passed to the channel for next task
byte[] outBuffer = new byte[1024 * 1024 * 1024 + runtimeCalculatedAmount];
Array.Copy(chunkBuffer, outBuffer, bytesRead);
Array.Copy(tempBuffer, 0, outBuffer, bytesRead, outBuffer.Length);
await _channelProcessing.Writer.WriteAsync(outBuffer);
outBuffer = null;
}
}
// Not sure if it's safe to .Complete() before consumers have read all data from channel?
_channelProcessing.Writer.Complete();
}
private async Task ProcessChunkAsync(int outputBufferSize)
{
while (await _channelProcessing.Reader.WaitToReadAsync())
{
if (_channelProcessing.Reader.TryRead(out byte[] inBuffer))
{
// myBigThing is also a very large object (result of processing inBuffer and slightly larger)
MyBigThing myBigThing = new MyBigThing(inBuffer);
inBuffer = null;
// Create file and write all rows
using (FileStream fileStream = File.Create("C:\\Output.txt", outputBufferSize, FileOptions.SequentialScan))
{
// Write myBigThing to output file
fileStream.Write(myBigThing.Data);
}
myBigThing = null;
}
}
}
}
}

C# Garbage Collection Weird Behavior

I have a block of code that loads a custom storage file (data.00x) and dumps it's file contents (several files...) [for this example we'll say the referenced index only contains data.001 files for export]
Example:
public void ExportFileEntries(ref List<IndexEntry> filteredIndex, string dataDirectory, string buildDirectory, int chunkSize)
{
OnTotalMaxDetermined(new TotalMaxArgs(8));
// For each set of dataId files in the filteredIndex
for (int dataId = 1; dataId < 8; dataId++)
{
OnTotalProgressChanged(new TotalChangedArgs(dataId, string.Format("Exporting selected files from data.00{0}", dataId)));
// Filter only entries with current dataId into temp index
List<IndexEntry> tempIndex = GetEntriesByDataId(ref filteredIndex, dataId, SortType.Offset);
// Determine the path of the data.xxx file being exported from
string dataPath = string.Format(#"{0}\data.00{1}", dataDirectory, dataId);
if (File.Exists(dataPath))
{
// Load the data.xxx into filestream
using (FileStream dataFs = new FileStream(dataPath, FileMode.Open, FileAccess.Read))
{
// Loop through filex to export
foreach (IndexEntry indexEntry in tempIndex)
{
int fileLength = indexEntry.Length;
OnCurrentMaxDetermined(new CurrentMaxArgs(fileLength));
// Set the filestreams position to the file entries offset
dataFs.Position = indexEntry.Offset;
// Read the file into a byte array (buffer)
byte[] fileBytes = new byte[indexEntry.Length];
dataFs.Read(fileBytes, 0, fileBytes.Length);
// Define some information about the file being exported
string fileExt = Path.GetExtension(indexEntry.Name).Remove(0, 1);
string buildPath = string.Format(#"{0}\{1}\{2}", buildDirectory, fileExt.ToUpper(), indexEntry.Name);
// If needed unencrypt the data (fileBytes buffer)
if (XOR.Encrypted(fileExt)) { byte b = 0; XOR.Cipher(ref fileBytes, ref b); }
// If no chunkSize is provided, generate default
if (chunkSize == 0) { chunkSize = Math.Max(64000, (int)(fileBytes.Length * .02)); }
// If the build directory doesn't exist yet, create it.
if (!Directory.Exists(Path.GetDirectoryName(buildPath))) { Directory.CreateDirectory(Path.GetDirectoryName(buildPath)); }
using (FileStream buildFs = new FileStream(buildPath, FileMode.Create, FileAccess.Write))
{
using (BinaryWriter bw = new BinaryWriter(buildFs, encoding))
{
for (int byteCount = 0; byteCount < fileLength; byteCount += Math.Min(fileLength - byteCount, chunkSize))
{
bw.Write(fileBytes, byteCount, Math.Min(fileLength - byteCount, chunkSize));
OnCurrentProgressChanged(new CurrentChangedArgs(byteCount, ""));
}
}
}
OnCurrentProgressReset(EventArgs.Empty);
fileBytes = null;
}
}
}
else { OnError(new ErrorArgs(string.Format("[ExportFileEntries] Cannot locate: {0}", dataPath))); }
}
OnTotalProgressReset(EventArgs.Empty);
GC.Collect();
}
The data.001 stores about 12k files, most are very small .jpg pictures etc...etc.. for about the first half of the export process the gc collects just fine, but out of nowhere toward the last half of the export process the gc just stops giving a crap.
If I don't issue GC.Collect() at the end of the method the tool sits at around 255mb ram, but if I do call it goes down to about 14mb. What I'm asking, is there any obvious improvements over the way I coded the method (to increase gc performance)?

Multithreading file compress

I've just started to work with threads,
I want to write simple file compressor. It should create two background threads - one for reading and other one for writing. The first one should read file by small chunks and put them into Queue, where int - is chunkId. The second thread should dequeue chunks and write them down in order(using chunkId) into output stream (file, which this thread created in begin).
I did it. But I cant understand why after my program ends and I open my gziped file - I see, that my chunks mixed, and file doesn't have previous order.
public static class Reader
{
private static readonly object Locker = new object();
private const int ChunkSize = 1024*1024;
private static readonly int MaxThreads;
private static readonly Queue<KeyValuePair<int, byte[]>> ChunksQueue;
private static int _chunksComplete;
static Reader()
{
MaxThreads = Environment.ProcessorCount;
ChunksQueue = new Queue<KeyValuePair<int,byte[]>>(MaxThreads);
}
public static void Read(string filename)
{
_chunksComplete = 0;
var tRead = new Thread(Reading) { IsBackground = true };
var tWrite = new Thread(Writing) { IsBackground = true };
tRead.Start(filename);
tWrite.Start(filename);
tRead.Join();
tWrite.Join();
Console.WriteLine("Finished");
}
private static void Writing(object threadContext)
{
var filename = (string) threadContext;
using (var s = File.Create(filename + ".gz"))
{
while (true)
{
var dataPair = DequeueSafe();
if (dataPair.Value == null)
return;
while (dataPair.Key != _chunksComplete)
{
Thread.Sleep(1);
}
Console.WriteLine("write chunk {0}", dataPair.Key);
using (var gz = new GZipStream(s, CompressionMode.Compress, true))
{
gz.Write(dataPair.Value, 0, dataPair.Value.Length);
}
_chunksComplete++;
}
}
}
private static void Reading(object threadContext)
{
var filename = (string) threadContext;
using (var s = File.OpenRead(filename))
{
var counter = 0;
var buffer = new byte[ChunkSize];
while (s.Read(buffer, 0, buffer.Length) != 0)
{
while (ChunksQueue.Count == MaxThreads)
{
Thread.Sleep(1);
}
Console.WriteLine("read chunk {0}", counter);
var dataPair = new KeyValuePair<int, byte[]>(counter, buffer);
EnqueueSafe(dataPair);
counter++;
}
EnqueueSafe(new KeyValuePair<int, byte[]>(0, null));
}
}
private static void EnqueueSafe(KeyValuePair<int, byte[]> dataPair)
{
lock (ChunksQueue)
{
ChunksQueue.Enqueue(dataPair);
}
}
private static KeyValuePair<int, byte[]> DequeueSafe()
{
while (true)
{
lock (ChunksQueue)
{
if (ChunksQueue.Count > 0)
{
return ChunksQueue.Dequeue();
}
}
Thread.Sleep(1);
}
}
}
UPD:
I can use only .NET 3.5
Stream.Read() returns the actual number of bytes it consumed. Use it to limit the size of chunk for the writer. And, since there is concurrent reading and writing involved you'll need more than one buffer.
Try 4096 as the chunk size.
Reader:
var buffer = new byte[ChunkSize];
int bytesRead = s.Read(buffer, 0, buffer.Length);
while (bytesRead != 0)
{
...
var dataPair = new KeyValuePair<int, byte[]>(bytesRead, buffer);
buffer = new byte[ChunkSize];
bytesRead = s.Read(buffer, 0, buffer.Length);
}
Writer:
gz.Write(dataPair.Value, 0, dataPair.Key)
PS: The performance can be improved with adding a pool of free data buffers instead of allocating new each time and using events (e.g. ManualResetEvent) to signal queue is empty, queue is full instead of using Thread.Sleep().
While alexm's answer does bring up a very important point that Stream.Read may fill buffer with less bytes than you requested, the main problem you have is you only have one byte[] you keep using over and over again.
When your reading loop goes to read a 2nd value it overwrites the byte[] that is sitting inside the dataPair you passed to the queue. You must have a buffer = new byte[ChunkSize]; inside the loop to solve this problem. You also must record how many bytes where read in and only write the same number of bytes.
You don't need to keep the counter in the pair as a Queue will maintain the order, use the int in the pair to store the number of bytes recorded as in alexm's example.

How to Divide an array on c#?

I have to do a program that read and image and puts it into a byte array
var Imagenoriginal = File.ReadAllBytes("10M.bmp");
And Divide That byte Array into 3 Diferent Arrays in order to send each one of this new arrays to other computer ( Using Pipes ) to process them there and finally take them back to the origial computer and finally give the result.
But my question Is how do I do an Algorithm able to divide the byte array in three different bytes arrays if the image selected can have diferent size.
Thanks for your help, have a nice day. =)
You can divide length of array, so you have three integers n1, n2 and n3 with all of them summing up to array.Length. Then, this snippet, using LINQ should be of help:
var arr1 = sourceArray.Take(n1).ToArray();
var arr2 = sourceArray.Skip(n1).Take(n2).ToArray();
var arr3 = sourceArray.Skip(n1+n2).Take(n3).ToArray();
Now, in arr1,arr2 and arr3 you will have three parts of your source array. You need to use LINQ, so in the beginning of the code don't forget using System.Linq;.
You can try like this:
public static IEnumerable<IEnumerable<T>> DivideArray<T>(this T[] array, int size)
{
for (var i = 0; i < (float)array.Length / size; i++)
{
yield return array.Skip(i * size).Take(size);
}
}
The call like this:
var arr = new byte[] {1, 2, 3, 4, 5,6};
var dividedArray = arr.DivideArray(3);
Here is a LINQ approach:
public List<List<byte>> DivideArray(List<byte> arr)
{
return arr
.Select((x, i) => new { Index = i, Value = x })
.GroupBy(x => x.Index / 100)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}
Have you considered using Streams? You could extend the Stream class to provide the desired behavior, as follows:
public static class Utilities
{
public static IEnumerable<byte[]> ReadBytes(this Stream input, long bufferSize)
{
byte[] buffer = new byte[bufferSize];
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
using (MemoryStream tempStream = new MemoryStream())
{
tempStream.Write(buffer, 0, read);
yield return tempStream.ToArray();
}
}
}
public static IEnumerable<byte[]> ReadBlocks(this Stream input, int nblocks)
{
long bufferSize = (long)Math.Ceiling((double)input.Length / nblocks);
return input.ReadBytes(bufferSize);
}
}
The ReadBytes extension method reads the input Stream and returns its data as a sequence of byte arrays, using the specified bufferSize.
The ReadBlocks extension method calls ReadBytes with the appropriate buffer size, so that the number of elements in the sequence equals nblocks.
You could then use ReadBlocks to achieve what you want:
public class Program
{
static void Main()
{
FileStream inputStream = File.Open(#"E:\10M.bmp", FileMode.Open);
const int nblocks = 3;
foreach (byte[] block in inputStream.ReadBlocks(nblocks))
{
Console.WriteLine("{0} bytes", block.Length);
}
}
}
Note how ReadBytes uses tempStream and read to write in memory the bytes read from the input stream before converting them into an array of bytes, it solves the problem with the leftover bytes mentioned in the comments.

How Can I Edit Bytes As They Pass Through A Stream?

My goal is to have a file stream open up a user-chosen file, then, it should stream the files bytes through in chunks (buffers) of about 4mb (this can be changed it's just for fun). As the bytes travel (in chunks) through the stream, I'd like to have a looping if-statement see if the bytes value is contained in an array I have declared elsewhere. (The code below will build a random array for replacing bytes), and the replacement loop could just say something like the bottom for-loop. As you can see I'm fairly fluent in this language but for some reason the editing and rewriting of chunks as they are read from a file to a new one is eluding me. Thanks in advance!
private void button2_Click(object sender, EventArgs e)
{
GenNewKey();
const int chunkSize = 4096; // read the file by chunks of 4KB
using (var file = File.OpenRead(textBox1.Text))
{
int bytesRead;
var buffer = new byte[chunkSize];
while ((bytesRead = file.Read(buffer, 0, buffer.Length)) > 0)
{
byte[] newbytes = buffer;
int index = 0;
foreach (byte b in buffer)
{
for (int x = 0; x < 256; x++)
{
if (buffer[index] == Convert.ToByte(lst[x]))
{
try
{
newbytes[index] = Convert.ToByte(lst[256 - x]);
}
catch (System.Exception ex)
{
//just to show why the error was thrown, but not really helpful..
MessageBox.Show(index + ", " + newbytes.Count().ToString());
}
}
}
index++;
}
AppendAllBytes(textBox1.Text + ".ENC", newbytes);
}
}
}
private void GenNewKey()
{
Random rnd = new Random();
while (lst.Count < 256)
{
int x = rnd.Next(0, 255);
if (!lst.Contains(x))
{
lst.Add(x);
}
}
foreach (int x in lst)
{
textBox2.Text += ", " + x.ToString();
//just for me to see what was generated
}
}
public static void AppendAllBytes(string path, byte[] bytes)
{
if (!File.Exists(path + ".ENC"))
{
File.Create(path + ".ENC");
}
using (var stream = new FileStream(path, FileMode.Append))
{
stream.Write(bytes, 0, bytes.Length);
}
}
Where textbox1 holds the path and name of file to encrypt, textBox2 holds the generated cipher for personal debugging purposes, button two is the encrypt button, and of course I am using System.IO.
Indeed you have a off by one error in newbytes[index] = Convert.ToByte(lst[256 - x])
if x is 0 then you will have lst[256], however lst only goes between 0-255. Change that to 255 should fix it.
The reason it freezes up is your program is EXTREMELY inefficient and working on the UI thread (and has a few more errors like you should only go up to bytesRead in size when processing buffer, but that will just give you extra data in your output that should not be there. Also you are reusing the same array for buffer and newbytes so your inner for loop could modify the same index more than once because every time you do newbytes[index] = Convert.ToByte(lst[256 - x]) you are modifying buffer[index] which will get checked again the next itteration of the for loop).
There is a lot of ways you can improve your code, here is a snippet that does similar to what you are doing (I don't do the whole "find the index and use the opposite location", I just use the byte that is passed in as the index in the array).
while ((bytesRead = file.Read(buffer, 0, buffer.Length)) > 0)
{
byte[] newbytes = new byte[bytesRead];
for(int i = 0; i < newbytes.Length; i++)
{
newbytes[i] = (byte)lst[buffer[i]]))
}
AppendAllBytes(textBox1.Text + ".ENC", newbytes);
}
This may also lead to freezing but not as much, to solve the freeing you should put all of this code in to a BackgroundWorker or similar to run on another thread.

Categories

Resources