How Can I Edit Bytes As They Pass Through A Stream?

How Can I Edit Bytes As They Pass Through A Stream? - c#

My goal is to have a file stream open up a user-chosen file, then, it should stream the files bytes through in chunks (buffers) of about 4mb (this can be changed it's just for fun). As the bytes travel (in chunks) through the stream, I'd like to have a looping if-statement see if the bytes value is contained in an array I have declared elsewhere. (The code below will build a random array for replacing bytes), and the replacement loop could just say something like the bottom for-loop. As you can see I'm fairly fluent in this language but for some reason the editing and rewriting of chunks as they are read from a file to a new one is eluding me. Thanks in advance!
private void button2_Click(object sender, EventArgs e)
{
GenNewKey();
const int chunkSize = 4096; // read the file by chunks of 4KB
using (var file = File.OpenRead(textBox1.Text))
{
int bytesRead;
var buffer = new byte[chunkSize];
while ((bytesRead = file.Read(buffer, 0, buffer.Length)) > 0)
{
byte[] newbytes = buffer;
int index = 0;
foreach (byte b in buffer)
{
for (int x = 0; x < 256; x++)
{
if (buffer[index] == Convert.ToByte(lst[x]))
{
try
{
newbytes[index] = Convert.ToByte(lst[256 - x]);
}
catch (System.Exception ex)
{
//just to show why the error was thrown, but not really helpful..
MessageBox.Show(index + ", " + newbytes.Count().ToString());
}
}
}
index++;
}
AppendAllBytes(textBox1.Text + ".ENC", newbytes);
}
}
}
private void GenNewKey()
{
Random rnd = new Random();
while (lst.Count < 256)
{
int x = rnd.Next(0, 255);
if (!lst.Contains(x))
{
lst.Add(x);
}
}
foreach (int x in lst)
{
textBox2.Text += ", " + x.ToString();
//just for me to see what was generated
}
}
public static void AppendAllBytes(string path, byte[] bytes)
{
if (!File.Exists(path + ".ENC"))
{
File.Create(path + ".ENC");
}
using (var stream = new FileStream(path, FileMode.Append))
{
stream.Write(bytes, 0, bytes.Length);
}
}
Where textbox1 holds the path and name of file to encrypt, textBox2 holds the generated cipher for personal debugging purposes, button two is the encrypt button, and of course I am using System.IO.

Indeed you have a off by one error in newbytes[index] = Convert.ToByte(lst[256 - x])
if x is 0 then you will have lst[256], however lst only goes between 0-255. Change that to 255 should fix it.
The reason it freezes up is your program is EXTREMELY inefficient and working on the UI thread (and has a few more errors like you should only go up to bytesRead in size when processing buffer, but that will just give you extra data in your output that should not be there. Also you are reusing the same array for buffer and newbytes so your inner for loop could modify the same index more than once because every time you do newbytes[index] = Convert.ToByte(lst[256 - x]) you are modifying buffer[index] which will get checked again the next itteration of the for loop).
There is a lot of ways you can improve your code, here is a snippet that does similar to what you are doing (I don't do the whole "find the index and use the opposite location", I just use the byte that is passed in as the index in the array).
while ((bytesRead = file.Read(buffer, 0, buffer.Length)) > 0)
{
byte[] newbytes = new byte[bytesRead];
for(int i = 0; i < newbytes.Length; i++)
{
newbytes[i] = (byte)lst[buffer[i]]))
}
AppendAllBytes(textBox1.Text + ".ENC", newbytes);
}
This may also lead to freezing but not as much, to solve the freeing you should put all of this code in to a BackgroundWorker or similar to run on another thread.

Related

In C#, what's the best way to deal with partially received messages using SocketAsyncEventArgs Buffer

In order to clean some messy code and get a better understanding of the SocketAsyncEventArgs class, I'd to know what's the most efficient technique to reassemble partially received messages from SocketAsyncEventArgs buffers.
To give you the big picture, I'm connected to a TCP server using a C# Socket client that will essentially receive data. The data received is message-based delimited by a \n character.
As you're probably already aware of, when using the ReceiveAsync method, this is almost a certitude that the last received message will be uncompleted such as you'll have to locate the index of the last complete message, copy the incomplete buffer section and keep it as start for the next received buffer and so on.
The thing is, I wish to abstract this operation from the upper layer and call the ProcessReceiveDataImpl as soon I get completed messages in the _tmpBuffer. I found that my Buffer.BlockCopy is not much readable (very old code also (-:) but anyway I wish to know what are you doing in this typical use case?
Code to reassemble messages:
public class SocketClient
{
private const int _receiveBufferSize = 8192;
private byte[] _remBuffer = new byte[2 * _receiveBufferSize];
private byte[] _tmpBuffer = new byte[2 * _receiveBufferSize];
private int _remBufferSize = 0;
private int _tmpBufferSize = 0;
private void ProcessReceiveData(SocketAsyncEventArgs e)
{
// the buffer to process
byte[] curBuffer = e.Buffer;
int curBufferSize = e.BytesTransferred;
int curBufferOffset = e.Offset;
int curBufferLastIndex = e.BytesTransferred - 1;
int curBufferLastSplitIndex = int.MinValue;
if (_remBufferSize > 0)
{
curBufferLastSplitIndex = GetLastSplitIndex(curBuffer, curBufferOffset, curBufferSize);
if (curBufferLastSplitIndex != curBufferLastIndex)
{
// copy the remain + part of the current into tmp
Buffer.BlockCopy(_remBuffer, 0, _tmpBuffer, 0, _remBufferSize);
Buffer.BlockCopy(curBuffer, curBufferOffset, _tmpBuffer, _remBufferSize, curBufferLastSplitIndex + 1);
_tmpBufferSize = _remBufferSize + curBufferLastSplitIndex + 1;
ProcessReceiveDataImpl(_tmpBuffer, _tmpBufferSize);
Buffer.BlockCopy(curBuffer, curBufferLastSplitIndex + 1, _remBuffer, 0, curBufferLastIndex - curBufferLastSplitIndex);
_remBufferSize = curBufferLastIndex - curBufferLastSplitIndex;
}
else
{
// copy the remain + entire current into tmp
Buffer.BlockCopy(_remBuffer, 0, _tmpBuffer, 0, _remBufferSize);
Buffer.BlockCopy(curBuffer, curBufferOffset, _tmpBuffer, _remBufferSize, curBufferSize);
ProcessReceiveDataImpl(_tmpBuffer, _remBufferSize + curBufferSize);
_remBufferSize = 0;
}
}
else
{
curBufferLastSplitIndex = GetLastSplitIndex(curBuffer, curBufferOffset, curBufferSize);
if (curBufferLastSplitIndex != curBufferLastIndex)
{
// we must copy the unused byte into remaining buffer
_remBufferSize = curBufferLastIndex - curBufferLastSplitIndex;
Buffer.BlockCopy(curBuffer, curBufferLastSplitIndex + 1, _remBuffer, 0, _remBufferSize);
// process the msg
ProcessReceiveDataImpl(curBuffer, curBufferLastSplitIndex + 1);
}
else
{
// we can process the entire msg
ProcessReceiveDataImpl(curBuffer, curBufferSize);
}
}
}
protected virtual void ProcessReceiveDataImpl(byte[] buffer, int bufferSize)
{
}
private int GetLastSplitIndex(byte[] buffer, int offset, int bufferSize)
{
for (int i = offset + bufferSize - 1; i >= offset; i--)
{
if (buffer[i] == '\n')
{
return i;
}
}
return -1;
}
}
Your input is very important and appreciated!
Thank you!
Updated:
Also, rather then calling the ProcessReceiveDataImpl and block further receive operations, will it be useful to queue completed messages and make them available to the consumer?

What is different with the writing in FileStream?

When I searched the method about decompress the file by using SharpZipLib, I found lot of methods like this:
public static void TarWriteCharacters(string tarfile, string targetDir)
{
using (TarInputStream s = new TarInputStream(File.OpenRead(tarfile)))
{
//some codes here
using (FileStream fileWrite = File.Create(targetDir + directoryName + fileName))
{
int size = 2048;
byte[] data = new byte[2048];
while (true)
{
size = s.Read(data, 0, data.Length);
if (size > 0)
{
fileWrite.Write(data, 0, size);
}
else
{
break;
}
}
fileWrite.Close();
}
}
}
The format FileStream.Write is:
FileStream.Write(byte[] array, int offset, int count)
Now I try to separate part of read and write because I want to use thread to speed up the decompress rate in write function, and I use dynamic array byte[] and int[] to deposit the file's data and size like below
Read:
public static void TarWriteCharacters(string tarfile, string targetDir)
{
using (TarInputStream s = new TarInputStream(File.OpenRead(tarfile)))
{
//some codes here
using (FileStream fileWrite= File.Create(targetDir + directoryName + fileName))
{
int size = 2048;
List<int> SizeList = new List<int>();
List<byte[]> mydatalist = new List<byte[]>();
while (true)
{
byte[] data = new byte[2048];
size = s.Read(data, 0, data.Length);
if (size > 0)
{
mydatalist.Add(data);
SizeList.Add(size);
}
else
{
break;
}
}
test = new Thread(() =>
FileWriteFun(pathToTar, args, SizeList, mydatalist)
);
test.Start();
streamWriter.Close();
}
}
}
Write:
public static void FileWriteFun(string pathToTar , string[] args, List<int> SizeList, List<byte[]> mydataList)
{
//some codes here
using (FileStream fileWrite= File.Create(targetDir + directoryName + fileName))
{
for (int i = 0; i < mydataList.Count; i++)
{
fileWrite.Write(mydataList[i], 0, SizeList[i]);
}
fileWrite.Close();
}
}
Edit
(1)byte[] data = new byte[2048] into while loop to assign data to new array.
(2)change int[] SizeList = new int[2048] to List<int> SizeList = new List<int>() because of int range

As read on a stream is only guarantied to return one byte (typically it will be more, but you can't rely on the full requested length each time), your solution can theoretically fail after 2048 bytes as your SizeList can only hold 2048 entries.
You could use a List to hold the sizes.
Or use a MemoryStream instead of inventing your own.
But the two main problems are:
1) You keep reading into the same byte array, overwriting previously read data. When you add your data byte array to mydatalist, you must assign data to a new byte array.
2) you close your stream before the second thread is done writing.
In general threading is difficult and should only be used where you know it will improve performance. Simply reading and writing data is typically IO bound in performance, not cpu bound, so introducing a second thread will just give a small performance penalty and no gain in speed. You could use multithreading to ensure concurrent read/write operations, but most likely the disk cache will do this for you if you stick to the first solution - amd if not, using async is easier than multithreaded to achieve this.

C# Garbage Collection Weird Behavior

I have a block of code that loads a custom storage file (data.00x) and dumps it's file contents (several files...) [for this example we'll say the referenced index only contains data.001 files for export]
Example:
public void ExportFileEntries(ref List<IndexEntry> filteredIndex, string dataDirectory, string buildDirectory, int chunkSize)
{
OnTotalMaxDetermined(new TotalMaxArgs(8));
// For each set of dataId files in the filteredIndex
for (int dataId = 1; dataId < 8; dataId++)
{
OnTotalProgressChanged(new TotalChangedArgs(dataId, string.Format("Exporting selected files from data.00{0}", dataId)));
// Filter only entries with current dataId into temp index
List<IndexEntry> tempIndex = GetEntriesByDataId(ref filteredIndex, dataId, SortType.Offset);
// Determine the path of the data.xxx file being exported from
string dataPath = string.Format(#"{0}\data.00{1}", dataDirectory, dataId);
if (File.Exists(dataPath))
{
// Load the data.xxx into filestream
using (FileStream dataFs = new FileStream(dataPath, FileMode.Open, FileAccess.Read))
{
// Loop through filex to export
foreach (IndexEntry indexEntry in tempIndex)
{
int fileLength = indexEntry.Length;
OnCurrentMaxDetermined(new CurrentMaxArgs(fileLength));
// Set the filestreams position to the file entries offset
dataFs.Position = indexEntry.Offset;
// Read the file into a byte array (buffer)
byte[] fileBytes = new byte[indexEntry.Length];
dataFs.Read(fileBytes, 0, fileBytes.Length);
// Define some information about the file being exported
string fileExt = Path.GetExtension(indexEntry.Name).Remove(0, 1);
string buildPath = string.Format(#"{0}\{1}\{2}", buildDirectory, fileExt.ToUpper(), indexEntry.Name);
// If needed unencrypt the data (fileBytes buffer)
if (XOR.Encrypted(fileExt)) { byte b = 0; XOR.Cipher(ref fileBytes, ref b); }
// If no chunkSize is provided, generate default
if (chunkSize == 0) { chunkSize = Math.Max(64000, (int)(fileBytes.Length * .02)); }
// If the build directory doesn't exist yet, create it.
if (!Directory.Exists(Path.GetDirectoryName(buildPath))) { Directory.CreateDirectory(Path.GetDirectoryName(buildPath)); }
using (FileStream buildFs = new FileStream(buildPath, FileMode.Create, FileAccess.Write))
{
using (BinaryWriter bw = new BinaryWriter(buildFs, encoding))
{
for (int byteCount = 0; byteCount < fileLength; byteCount += Math.Min(fileLength - byteCount, chunkSize))
{
bw.Write(fileBytes, byteCount, Math.Min(fileLength - byteCount, chunkSize));
OnCurrentProgressChanged(new CurrentChangedArgs(byteCount, ""));
}
}
}
OnCurrentProgressReset(EventArgs.Empty);
fileBytes = null;
}
}
}
else { OnError(new ErrorArgs(string.Format("[ExportFileEntries] Cannot locate: {0}", dataPath))); }
}
OnTotalProgressReset(EventArgs.Empty);
GC.Collect();
}
The data.001 stores about 12k files, most are very small .jpg pictures etc...etc.. for about the first half of the export process the gc collects just fine, but out of nowhere toward the last half of the export process the gc just stops giving a crap.
If I don't issue GC.Collect() at the end of the method the tool sits at around 255mb ram, but if I do call it goes down to about 14mb. What I'm asking, is there any obvious improvements over the way I coded the method (to increase gc performance)?

C# Filestream Read - Recycle array?

I am working with filestream read: https://msdn.microsoft.com/en-us/library/system.io.filestream.read%28v=vs.110%29.aspx
What I'm trying to do is read a large file in a loop a certain number of bytes at a time; not the whole file at once. The code example shows this for reading:
int n = fsSource.Read(bytes, numBytesRead, numBytesToRead);
The definition of "bytes" is: "When this method returns, contains the specified byte array with the values between offset and (offset + count - 1) replaced by the bytes read from the current source."
I want to only read in 1 mb at a time so I do this:
using (FileStream fsInputFile = new FileStream(strInputFileName, FileMode.Open, FileAccess.Read)) {
int intBytesToRead = 1024;
int intTotalBytesRead = 0;
int intInputFileByteLength = 0;
byte[] btInputBlock = new byte[intBytesToRead];
byte[] btOutputBlock = new byte[intBytesToRead];
intInputFileByteLength = (int)fsInputFile.Length;
while (intInputFileByteLength - 1 >= intTotalBytesRead)
{
if (intInputFileByteLength - intTotalBytesRead < intBytesToRead)
{
intBytesToRead = intInputFileByteLength - intTotalBytesRead;
}
// *** Problem is here ***
int n = fsInputFile.Read(btInputBlock, intTotalBytesRead, intBytesToRead);
intTotalBytesRead += n;
fsOutputFile.Write(btInputBlock, intTotalBytesRead - n, n);
}
fsOutputFile.Close(); }
Where the problem area is stated, btInputBlock works on the first cycle because it reads in 1024 bytes. But then on the second loop, it doesn't recycle this byte array. It instead tries to append the new 1024 bytes into btInputBlock. As far as I can tell, you can only specify the offset and length of the file you want to read and not the offset and length of btInputBlock. Is there a way to "re-use" the array that is being dumped into by Filestream.Read or should I find another solution?
Thanks.
P.S. The exception on the read is: "Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection."

Your code can be simplified somewhat
int num;
byte[] buffer = new byte[1024];
while ((num = fsInputFile.Read(buffer, 0, buffer.Length)) != 0)
{
//Do your work here
fsOutputFile.Write(buffer, 0, num);
}
Note that Read takes in the Array to fill, the offset (which is the offset of the array where the bytes should be placed, and the (max) number of bytes to read.

That's because you're incrementing intTotalBytesRead, which is an offset for the array, not for the filestream. In your case it should always be zero, which will overwrite previous byte data in the array, rather than append it at the end, using intTotalBytesRead.
int n = fsInputFile.Read(btInputBlock, intTotalBytesRead, intBytesToRead); //currently
int n = fsInputFile.Read(btInputBlock, 0, intBytesToRead); //should be
Filestream doesn't need an offset, every Read picks up where the last one left off.
See https://msdn.microsoft.com/en-us/library/system.io.filestream.read(v=vs.110).aspx
for details

Your Read call should be Read(btInputBlock, 0, intBytesToRead). The 2nd parameter is the offset into the array you want to start writing the bytes to. Similarly for Write you want Write(btInputBlock, 0, n) as the 2nd parameter is the offset in the array to start writing bytes from. Also you don't need to call Close as the using will clean up the FileStream for you.
using (FileStream fsInputFile = new FileStream(strInputFileName, FileMode.Open, FileAccess.Read))
{
int intBytesToRead = 1024;
byte[] btInputBlock = new byte[intBytesToRead];
while (fsInputFile.Postion < fsInputFile.Length)
{
int n = fsInputFile.Read(btInputBlock, 0, intBytesToRead);
intTotalBytesRead += n;
fsOutputFile.Write(btInputBlock, 0, n);
}
}

Manually unpinning a byte[] in C#?

In the following code, it seems that the client.Connect.Receive is pinning the "byte[] result" permanently, causing the memory to never be freed (as it's always pinned). I'm looking for a way to tell C# that result no-longer needs to be pinned after it's usage in this.OnReceive, but I can't find the built-in function or keyword to do this.
Does anyone know how I can get C# to unpin the byte[] array? (this is one of the sources of memory leaks in my C# application)
this.m_TcpListener = new TcpListener(this.p_TcpEndPoint.Port);
this.m_TcpThread = new Thread(delegate()
{
try
{
this.m_TcpListener.Start();
while (this.p_Running)
{
TcpClient client = this.m_TcpListener.AcceptTcpClient();
new Thread(() =>
{
try
{
// Read the length header.
byte[] lenbytes = new byte[4];
int lbytesread = client.Client.Receive(lenbytes, 0, 4, SocketFlags.None);
if (lbytesread != 4) return; // drop this packet :(
int length = System.BitConverter.ToInt32(lenbytes, 0);
int r = 0;
// Read the actual data.
byte[] result = new byte[length];
while (r < length)
{
int bytes = client.Client.Receive(result, r, length - r, SocketFlags.None);
r += bytes;
}
Console.WriteLine("Received TCP packet from " + (client.Client.RemoteEndPoint as IPEndPoint).Address.ToString() + ".");
this.OnReceive(client.Client.RemoteEndPoint as IPEndPoint, result, length);
}
catch (SocketException)
{
// Do nothing.
}
client.Close();
}).Start();
//this.Log(LogType.DEBUG, "Received a message from " + from.ToString());
}
}
catch (Exception e)
{
if (e is ThreadAbortException)
return;
Console.WriteLine(e.ToString());
throw e;
}
}
);
this.m_TcpThread.IsBackground = true;
this.m_TcpThread.Start();

You can pin/unpin it yourself, thusly:
//Pin it
GCHandle myArrayHandle = GCHandle.Alloc(result,GCHandleType.Pinned);
//use array
while (r < length)
{
int bytes = client.Client.Receive(result, r, length - r, SocketFlags.None);
r += bytes;
}
//Unpin it
myArrayHandle.Free();
But I'd personally be pretty surprised that client.Connect.Receive pins it "for all time". I've used it before (as I'm sure many have) and not run into an issue of this type. Alternately, if you're certain that's the problem, then instead of allocating a new result array each time, you can re-use one across the entire while loop (allocate it up where you start the listener, and only use lenbytes bytes each time).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How Can I Edit Bytes As They Pass Through A Stream? - c#

Related

In C#, what's the best way to deal with partially received messages using SocketAsyncEventArgs Buffer

What is different with the writing in FileStream?

C# Garbage Collection Weird Behavior

C# Filestream Read - Recycle array?

Manually unpinning a byte[] in C#?

Categories

Resources