What to do when file output turns Chinese? - c#

Suddenly, my output file decided to become Chinese. I tried to write some random ASCII characters to a file, but instead of writing ASCII, C# decided to write ancient Chinese letters instead. Is it trying to tell me something?
static void WriteToFile()
{
for (int i = 0; i < 100; i++)
{
int x = 0;
x = rand.Next(0, 127);
writer.Write((char)x);
}
writer.Close();
}

When you write a text file without a BOM, you leave it up to the program that reads the file to guess at the encoding that was used to convert text to the bytes in the file. Notepad uses a heuristic if you don't pick the Encoding from its File + Open dialog. Underlying winapi call is IsTextUnicode().
With random byte values, like you use, and way too many ASCII control characters present it isn't unlikely to pick IS_TEXT_UNICODE_ASCII16 (aka utf-16). Yes, that looks like Chinese, two bytes select the glyph. Writing the BOM keeps you out of trouble, utf-8 being the sane choice. And no control characters, most don't have a matching glyph. Pick from the range 32..127. Google "bush hid the facts" for an amusing story about an early version of IsTextUnicode() fumbling the guess.

I guess the issue is you are writing values that are not displayable, like the first 32 characters in ASCII. When writing them as UTF-8 without a BOM (which is the default in .NET for StreamWriter), you might end up with unexpected results.
This code yields the expected result:
StringWriter writer = new StringWriter();
Random rand = new Random();
for (int i = 0; i < 100; i++)
{
int x = 0;
x = rand.Next(32, 126);
writer.Write((char)x);
}
writer.Close();
string s = writer.ToString();
File.WriteAllText(#"C:\temp\so2343.dat", s, Encoding.ASCII);
Also note the code change I made to rand.Next to only get the visible characters.

You're writing raw bytes into the file and Notepad treats the resulting file as unicode.

Related

How to replace hex without a binary writer

I need to replace a bit of text string in a hex file. I have already used a binary writer but as i add more stuff to the file, the offsets change. Therefore I have to keep fixing the offsets.
I have already tried the binary writer method.
BinaryWriter BinaryWriter1 = new BinaryWriter((Stream) File.OpenWrite("[File]"));
for (int index = [Offset]; index <= [Offset]; ++index) {
BinaryWriter1.BaseStream.Position = (long) index;
BinaryWriter1.Write([Name of form].Byte1);
BinaryWriter1.Close();
}

Custom Newline in Binary Stream using Hex Array in WPF

I have a binary file I am reading and printing into a textbox while wrapping at a set point, but it is wrapping at places it shouldn't be. I want to ignore all line feed characters except those I have defined.
There isn't a single Newline byte, rather it seems to be a series of them. I think I found the series of Hex values 00-01-01-0B that seem to correspond with where the line feeds should be.
How do I ignore existing line breaks, and use what I want instead?
This is where I am at:
shortFile = new FileStream(#"tempfile.dat", FileMode.Open, FileAccess.Read);
DisplayArea.Text = "";
byte[] block = new byte[1000];
shortFile.Position = 0;
while (shortFile.Read(block, 0, 1000) > 0)
{
string trimmedText = System.Text.Encoding.Default.GetString(block);
DisplayArea.Text += trimmedText + "\n";
}
I had just figured it out a couple minutes before dlatikay posted, but really appreciated seeing that he also had the right idea. I just replaced all control characters with spaces.
for (int i = 0; i < block.Length; i++)
{
if (block[i] < 32)
{
block[i] = 0x20;
}
}

C# How would set the output of this to a string?

So this is a section of code for my tcp client. This part is to convert the bytes recieved into characters. However, i would like to put some logic to it and to do that i need to set this output to a string. As it keeps printing out every character individually how would i do this? If you require anymore information feel free to ask.
byte[] bb = new byte[100];
int k = stm.Read(bb, 0, 100);
for (int i = 0; i < k; i++)
Console.Write(Convert.ToChar(bb[i]));
Thanks in advace.
As per comments, if you want to decode a byte array in a particular encoding, just use Encoding.GetString. For example:
string text = Encoding.ASCII.GetString(bb, 0, k);
(Note that ASCII is rarely a good choice if the text is meant to be arbitrary human text. UTF-8 is usually a better option at that point, but then you need to bear in mind the possibility that a single character may be split across multiple bytes - and therefore multiple calls to Stream.Read.)
string str = "";
foreach (byte b in bb) str += Convert.ToChar(b);
Console.Write(str);

How can I replace a unicode string in a binary file?

I've been trying to get my program to replace unicode in a binary file.
The user would input what to find, and the program would find and replace it with a specific string if it can find it.
I've searched around, but there's nothing I can find to my specifics, what I would like would be something like:
string text = File.ReadAllText(path, Encoding.Unicode);
text = text.Replace(userInput, specificString);
File.WriteAllText(path, text);
but anything that works in a similar manner should suffice.
Using that results in a file that is larger and unusable, though.
I use:
int var = File.ReadAllText(path, Encoding.Unicode).Contains(userInput) ? 1 : 0;
if (var == 1)
{
//Missing Part
}
for checking if the file contains the user inputted string, if it matters.
This can work only in very limited situations. Unfortunately, you haven't offered enough details as to the nature of the binary file for anyone to know if this will work in your situation or not. There are a practically endless variety of binary file formats out there, at least some of which would be rendered invalid if you modify a single byte, many more of which could be rendered invalid if the file length changes (i.e. data after your insertion point is no longer where it is expected to be).
Of course, many binary files are also either encrypted, compressed, or both. In such cases, even if you do by some miracle find the text you're looking for, it probably doesn't actually represent that text, and modifying it will render the file unusable.
All that said, for the sake of argument let's assume your scenario doesn't have any of these problems and it's perfectly okay to just completely replace some text found in the middle of the file with some entirely different text.
Note that we also need to make an assumption about the text encoding. Text can be represented in a wide variety of ways, and you will need to use the correct encoding not just to find the text, but also to ensure the replacement text will be valid. For the sake of argument, let's say your text is encoded as UTF8.
Now we have everything we need:
void ReplaceTextInFile(string fileName, string oldText, string newText)
{
byte[] fileBytes = File.ReadAllBytes(fileName),
oldBytes = Encoding.UTF8.GetBytes(oldText),
newBytes = Encoding.UTF8.GetBytes(newText);
int index = IndexOfBytes(fileBytes, oldBytes);
if (index < 0)
{
// Text was not found
return;
}
byte[] newFileBytes =
new byte[fileBytes.Length + newBytes.Length - oldBytes.Length];
Buffer.BlockCopy(fileBytes, 0, newFileBytes, 0, index);
Buffer.BlockCopy(newBytes, 0, newFileBytes, index, newBytes.Length);
Buffer.BlockCopy(fileBytes, index + oldBytes.Length,
newFileBytes, index + newBytes.Length,
fileBytes.Length - index - oldBytes.Length);
File.WriteAllBytes(filename, newFileBytes);
}
int IndexOfBytes(byte[] searchBuffer, byte[] bytesToFind)
{
for (int i = 0; i < searchBuffer.Length - bytesToFind.Length; i++)
{
bool success = true;
for (int j = 0; j < bytesToFind.Length; j++)
{
if (searchBuffer[i + j] != bytesToFind[j])
{
success = false;
break;
}
}
if (success)
{
return i;
}
}
return -1;
}
Notes:
The above is destructive. You may want to run it only on a copy of the file, or prefer to modify the code so that it takes an addition parameter specifying the new file to which the modification should be written.
This implementation does everything in-memory. This is much more convenient, but if you are dealing with large files, and especially if you are on a 32-bit platform, you may find you need to process the file in smaller chunks.

How to generate string of a certain length to insert into a file to meet a file size criteria?

I have a requirement to test some load issues with regards to file size. I have a windows application written in C# which will automatically generate the files. I know the size of each file, ex. 100KB, and how many files to generate. What I need help with is how to generate a string less than or equal to the required file size.
pseudo code:
long fileSizeInKB = (1024 * 100); //100KB
int numberOfFiles = 5;
for(var i = 0; i < numberOfFiles - 1; i++) {
var dataSize = fileSizeInKB;
var buffer = new byte[dataSize];
using (var fs = new FileStream(File, FileMode.Create, FileAccess.Write)) {
}
}
You can always use the a constructor for string which takes a char and a number of times you want that character repeated:
string myString = new string('*', 5000);
This gives you a string of 5000 stars - tweak to your needs.
Easiest way would be following code:
var content = new string('A', fileSizeInKB);
Now you've got a string with as many A as required.
To fill it with Lorem Ipsum or some other repeating string build something like the following pseudocode:
string contentString = "Lorem Ipsum...";
for (int i = 0; i < fileSizeInKB / contentString.Length; i++)
//write contentString to file
if (fileSizeInKB % contentString.Length > 0)
// write remaining substring of contentString to file
Edit: If you're saving in Unicode you may need to half the filesize count because unicode uses two bytes per character if I remember correctly.
There are so many variations on how you can do this. One would be, fill the file with a bunch of chars. You need 100KB? No problem.. 100 * 1024 * 8 = 819200 bits. A single char is 16 bits. 819200 / 16 = 51200. You need to stick 51,200 chars into a file. But consider that a file may have additional header/meta data, so you may need to account for that and decrease the number of chars to write to file.
As a partial answer to your question I recently created a portable WPF app that easily creates 'junk' files of almost any size: https://github.com/webmooch/FileCreator

Categories

Resources