How can I replace a unicode string in a binary file?

How can I replace a unicode string in a binary file? - c#

I've been trying to get my program to replace unicode in a binary file.
The user would input what to find, and the program would find and replace it with a specific string if it can find it.
I've searched around, but there's nothing I can find to my specifics, what I would like would be something like:
string text = File.ReadAllText(path, Encoding.Unicode);
text = text.Replace(userInput, specificString);
File.WriteAllText(path, text);
but anything that works in a similar manner should suffice.
Using that results in a file that is larger and unusable, though.
I use:
int var = File.ReadAllText(path, Encoding.Unicode).Contains(userInput) ? 1 : 0;
if (var == 1)
{
//Missing Part
}
for checking if the file contains the user inputted string, if it matters.

This can work only in very limited situations. Unfortunately, you haven't offered enough details as to the nature of the binary file for anyone to know if this will work in your situation or not. There are a practically endless variety of binary file formats out there, at least some of which would be rendered invalid if you modify a single byte, many more of which could be rendered invalid if the file length changes (i.e. data after your insertion point is no longer where it is expected to be).
Of course, many binary files are also either encrypted, compressed, or both. In such cases, even if you do by some miracle find the text you're looking for, it probably doesn't actually represent that text, and modifying it will render the file unusable.
All that said, for the sake of argument let's assume your scenario doesn't have any of these problems and it's perfectly okay to just completely replace some text found in the middle of the file with some entirely different text.
Note that we also need to make an assumption about the text encoding. Text can be represented in a wide variety of ways, and you will need to use the correct encoding not just to find the text, but also to ensure the replacement text will be valid. For the sake of argument, let's say your text is encoded as UTF8.
Now we have everything we need:
void ReplaceTextInFile(string fileName, string oldText, string newText)
{
byte[] fileBytes = File.ReadAllBytes(fileName),
oldBytes = Encoding.UTF8.GetBytes(oldText),
newBytes = Encoding.UTF8.GetBytes(newText);
int index = IndexOfBytes(fileBytes, oldBytes);
if (index < 0)
{
// Text was not found
return;
}
byte[] newFileBytes =
new byte[fileBytes.Length + newBytes.Length - oldBytes.Length];
Buffer.BlockCopy(fileBytes, 0, newFileBytes, 0, index);
Buffer.BlockCopy(newBytes, 0, newFileBytes, index, newBytes.Length);
Buffer.BlockCopy(fileBytes, index + oldBytes.Length,
newFileBytes, index + newBytes.Length,
fileBytes.Length - index - oldBytes.Length);
File.WriteAllBytes(filename, newFileBytes);
}
int IndexOfBytes(byte[] searchBuffer, byte[] bytesToFind)
{
for (int i = 0; i < searchBuffer.Length - bytesToFind.Length; i++)
{
bool success = true;
for (int j = 0; j < bytesToFind.Length; j++)
{
if (searchBuffer[i + j] != bytesToFind[j])
{
success = false;
break;
}
}
if (success)
{
return i;
}
}
return -1;
}
Notes:
The above is destructive. You may want to run it only on a copy of the file, or prefer to modify the code so that it takes an addition parameter specifying the new file to which the modification should be written.
This implementation does everything in-memory. This is much more convenient, but if you are dealing with large files, and especially if you are on a 32-bit platform, you may find you need to process the file in smaller chunks.

Related

Custom Newline in Binary Stream using Hex Array in WPF

I have a binary file I am reading and printing into a textbox while wrapping at a set point, but it is wrapping at places it shouldn't be. I want to ignore all line feed characters except those I have defined.
There isn't a single Newline byte, rather it seems to be a series of them. I think I found the series of Hex values 00-01-01-0B that seem to correspond with where the line feeds should be.
How do I ignore existing line breaks, and use what I want instead?
This is where I am at:
shortFile = new FileStream(#"tempfile.dat", FileMode.Open, FileAccess.Read);
DisplayArea.Text = "";
byte[] block = new byte[1000];
shortFile.Position = 0;
while (shortFile.Read(block, 0, 1000) > 0)
{
string trimmedText = System.Text.Encoding.Default.GetString(block);
DisplayArea.Text += trimmedText + "\n";
}

I had just figured it out a couple minutes before dlatikay posted, but really appreciated seeing that he also had the right idea. I just replaced all control characters with spaces.
for (int i = 0; i < block.Length; i++)
{
if (block[i] < 32)
{
block[i] = 0x20;
}
}

C# How would set the output of this to a string?

So this is a section of code for my tcp client. This part is to convert the bytes recieved into characters. However, i would like to put some logic to it and to do that i need to set this output to a string. As it keeps printing out every character individually how would i do this? If you require anymore information feel free to ask.
byte[] bb = new byte[100];
int k = stm.Read(bb, 0, 100);
for (int i = 0; i < k; i++)
Console.Write(Convert.ToChar(bb[i]));
Thanks in advace.

As per comments, if you want to decode a byte array in a particular encoding, just use Encoding.GetString. For example:
string text = Encoding.ASCII.GetString(bb, 0, k);
(Note that ASCII is rarely a good choice if the text is meant to be arbitrary human text. UTF-8 is usually a better option at that point, but then you need to bear in mind the possibility that a single character may be split across multiple bytes - and therefore multiple calls to Stream.Read.)

string str = "";
foreach (byte b in bb) str += Convert.ToChar(b);
Console.Write(str);

C# "Tag" 4-byte Hex Chunks for reconstruction to original string later

I am wrestling with a particular issue and like to ask for guidance on how I can achieve what I seek. Given the below function, a variable length string is used as input which produces a 4-byte Hex chunk equivalent. These 4-byte chunks are being written to an XML file for storage. And that XML file's schema cannot be altered. However, my issue is when the application which governs the XML file sorts the 4-byte chunks in the XML file. The result, is when I read that same XML file my string is destroyed. So, I'd like a way to "tag" each 4-byte chunk with some sort of identifier that I can in my decoder function inspite of the sorting that may have been done to it.
Encoding Function (Much of which was provided by (Antonín Lejsek)
private static string StringEncoder(string strInput)
{
try
{
// instantiate our StringBuilder object and set the capacity for our sb object to the length of our message.
StringBuilder sb = new StringBuilder(strInput.Length * 9 / 4 + 10);
int count = 0;
// iterate through each character in our message and format the sb object to follow Microsofts implementation of ECMA-376 for rsidR values of type ST_LongHexValue
foreach (char c in strInput)
{
// pad the first 4 byte chunk with 2 digit zeros.
if (count == 0)
{
sb.Append("00");
count = 0;
}
// every three bytes add a space and append 2 digit zeros.
if (count == 3)
{
sb.Append(" ");
sb.Append("00");
count = 0;
}
sb.Append(String.Format("{0:X2}", (int)c));
count++;
}
// handle encoded bytes which are greater than a 1 byte but smaller than 3 bytes so we know how many bytes to pad right with.
for (int i = 0; i < (3 - count) % 3; ++i)
{
sb.Append("00");
}
// DEBUG: echo results for testing.
//Console.WriteLine("");
//Console.WriteLine("String provided: {0}", strInput);
//Console.WriteLine("Hex in 8-digit chunks: {0}", sb.ToString());
//Console.WriteLine("======================================================");
return sb.ToString();
}
catch (NullReferenceException e)
{
Console.WriteLine("");
Console.WriteLine("ERROR : StringEncoder has received null input.");
Console.WriteLine("ERROR : Please ensure there is something to read in the output.txt file.");
Console.WriteLine("");
//Console.WriteLine(e.Message);
return null;
}
}
For Example : This function when provided with the following input "coolsss" would produce the following output : 0020636F 006F6C73 00737300
The above (3) 8 digit chunks would get written to the XML file starting with the first chunk and proceeding onto the last. Like so,
0020636F
006F6C73
00737300
Now, there are other 8-digit chunks in the XML file which were not created by the function above. This presents an issue as the Application can reorder these chunks among themselves and the others already present in the file like so,
00737300
00111111
006F6C73
00000000
0020636F
So, can you help me think of anyway to add a tag of some sort or use some C# Data Structure to be able to read each chunk and reconstruct my original string despite the the reordering?
I appreciate any guidance you can provide. Credit to Antonín Lejsek for his help with the function above.
Thank you,
Gabriel Alicea

Well, I am reluctant to suggest this as a proposed solution because it feels a bit too hackish for me.
Having said that; I suppose you could leverage the second byte as an ordinal so you can track the chunks and "re-assemble" the string later.
You could use the following scheme to track your chunks.
00XY0000
Where the second byte XY could be split up into two 4-bit parts representing an ordinal and a checksum.
X = Ordinal
Y = 16 % X
When reading the chunks you can split up the second byte into two words just like above and verify that the checksum aligns for the ordinal.
This solution does introduce a 16 character constraint on string length unless you eliminate the checksum and use the entire byte as an ordinal for which you can increase your string lengths to 256 characters.

What to do when file output turns Chinese?

Suddenly, my output file decided to become Chinese. I tried to write some random ASCII characters to a file, but instead of writing ASCII, C# decided to write ancient Chinese letters instead. Is it trying to tell me something?
static void WriteToFile()
{
for (int i = 0; i < 100; i++)
{
int x = 0;
x = rand.Next(0, 127);
writer.Write((char)x);
}
writer.Close();
}

When you write a text file without a BOM, you leave it up to the program that reads the file to guess at the encoding that was used to convert text to the bytes in the file. Notepad uses a heuristic if you don't pick the Encoding from its File + Open dialog. Underlying winapi call is IsTextUnicode().
With random byte values, like you use, and way too many ASCII control characters present it isn't unlikely to pick IS_TEXT_UNICODE_ASCII16 (aka utf-16). Yes, that looks like Chinese, two bytes select the glyph. Writing the BOM keeps you out of trouble, utf-8 being the sane choice. And no control characters, most don't have a matching glyph. Pick from the range 32..127. Google "bush hid the facts" for an amusing story about an early version of IsTextUnicode() fumbling the guess.

I guess the issue is you are writing values that are not displayable, like the first 32 characters in ASCII. When writing them as UTF-8 without a BOM (which is the default in .NET for StreamWriter), you might end up with unexpected results.
This code yields the expected result:
StringWriter writer = new StringWriter();
Random rand = new Random();
for (int i = 0; i < 100; i++)
{
int x = 0;
x = rand.Next(32, 126);
writer.Write((char)x);
}
writer.Close();
string s = writer.ToString();
File.WriteAllText(#"C:\temp\so2343.dat", s, Encoding.ASCII);
Also note the code change I made to rand.Next to only get the visible characters.

You're writing raw bytes into the file and Notepad treats the resulting file as unicode.

How to generate string of a certain length to insert into a file to meet a file size criteria?

I have a requirement to test some load issues with regards to file size. I have a windows application written in C# which will automatically generate the files. I know the size of each file, ex. 100KB, and how many files to generate. What I need help with is how to generate a string less than or equal to the required file size.
pseudo code:
long fileSizeInKB = (1024 * 100); //100KB
int numberOfFiles = 5;
for(var i = 0; i < numberOfFiles - 1; i++) {
var dataSize = fileSizeInKB;
var buffer = new byte[dataSize];
using (var fs = new FileStream(File, FileMode.Create, FileAccess.Write)) {
}
}

You can always use the a constructor for string which takes a char and a number of times you want that character repeated:
string myString = new string('*', 5000);
This gives you a string of 5000 stars - tweak to your needs.

Easiest way would be following code:
var content = new string('A', fileSizeInKB);
Now you've got a string with as many A as required.
To fill it with Lorem Ipsum or some other repeating string build something like the following pseudocode:
string contentString = "Lorem Ipsum...";
for (int i = 0; i < fileSizeInKB / contentString.Length; i++)
//write contentString to file
if (fileSizeInKB % contentString.Length > 0)
// write remaining substring of contentString to file
Edit: If you're saving in Unicode you may need to half the filesize count because unicode uses two bytes per character if I remember correctly.

There are so many variations on how you can do this. One would be, fill the file with a bunch of chars. You need 100KB? No problem.. 100 * 1024 * 8 = 819200 bits. A single char is 16 bits. 819200 / 16 = 51200. You need to stick 51,200 chars into a file. But consider that a file may have additional header/meta data, so you may need to account for that and decrease the number of chars to write to file.

As a partial answer to your question I recently created a portable WPF app that easily creates 'junk' files of almost any size: https://github.com/webmooch/FileCreator

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How can I replace a unicode string in a binary file? - c#

Related

Custom Newline in Binary Stream using Hex Array in WPF

C# How would set the output of this to a string?

C# "Tag" 4-byte Hex Chunks for reconstruction to original string later

What to do when file output turns Chinese?

How to generate string of a certain length to insert into a file to meet a file size criteria?

Categories

Resources