All Characters in my Bitmap Textfile are in Chinese

All Characters in my Bitmap Textfile are in Chinese - c#

So I am currently working on a program that will extract materials from .fbm files. In the ASCII fbm files, the data to extracted looks as follows:
/9j/4Sb7RXhpZgAATU0AKgAAAAgADAEAAAMAAAABEAAAAAEBAAMAAAABEAAAAAECAAMAAAADAAAAngEGAAMAAAABAAIAAAESAAMAAAABAAEAAAEVAAMAAAABAAMAAAEaAAUAAAABAAAApAEbAAUAAAABAAAArAEoAAMAAAABAAIAAAExAAIAAAAiAAAAtAEyAAIAAAAUAAAA1odpAAQAAAABAAAA7AAAASQACAAIAAgACvyAAAAnEAAK/IAAACcQQWRvYmUgUGhvdG9zaG9wIENDIDIwMTcgKFdpbmRvd3MpADIwMTk6MDc6MD
...
And there are several sets of these in the fbm file, each in quotations and comma-separated. Now, when I convert the first one of these strings in the file to a jpg, using the following:
//Convert ASCII data to binary
string asciiFileData = "";
StreamReader binaryDataReader = File.OpenText(Path.Combine(inputDirectory, convertedFBX + iterator.ToString() + ".txt"));
while (!binaryDataReader.EndOfStream)
{
var lineData = binaryDataReader.ReadLine();
if (String.IsNullOrEmpty(lineData)) continue;
asciiFileData += lineData;
}
string[] imageStrings = asciiFileData.Split(',');
List<byte[]> imageList = new List<byte[]>();
foreach (string imageString in imageStrings)
{
if(imageString.Length > 10)//A way of checking if there's actual data for the file to save
imageList.Add(Convert.FromBase64String(imageString.Trim().Replace(",", "").Replace("\"", "")));
}
//Save images
int iterator2 = 1;
foreach(byte[] image in imageList)
{
File.WriteAllBytes(Path.Combine(inputDirectory, convertedFBX + iterator.ToString() + iterator2++ + ".jpg"), image);
}
The first string creates a proper jpg of the materials. When I open it up in a text file, it's the usual strange alien characters (not quite positive what those are called). However, the jpgs after the first one cannot open. I open up the text files for them, and all the characters are in Chinese! Why on earth is that happening? What does it mean, and is it supposed to be that way? Thanks in advance for your answers!

This resource was mentioned in the comments; however, my answer was found at https://devblogs.microsoft.com/oldnewthing/20140930-00/?p=43953. Essentially, this error is what occurs when you try to force binary content to be some sort of specific encoding that it's not meant to be.

Related

Write the data to specific line C#

I have some data and I want to write them to a specific line in notepad using C#.
For example I have two textboxes and the data inside them are "123 Hello", for textBox1, and "565878 Hello2" for textBox2.
When I press SAVE button, those data will be saved into one file but with different line. I want to save the first data in the first line and the second data in the third line.
How can I do this?

This question is too broad. The simple answer is that you write the two lines to a file, but write a newline (either "\r\n" or Environment.NewLine) between each string. That will put the two strings on different lines. If you want the second string on the third line, then you should write two newlines between each string.
If neither of those are the answer, then you need to be a lot more specific about why not. Is the file empty to start with? What have you tried? Where, specifically, are you getting stuck? What platform?
And I really don't see what this has to do with NotePad.
EDIT:
You have clarified that you are starting with an existing text file and want to replace the content at the specified lines.
This is a more complex thing to do, and may be beyond your skills if you are just starting out. The basic approach is this:
Assuming you can read the entire file into memory, load the file into a string. You will have to parse new lines to find the lines you want to replace. You can then just replace those parts of the string with the new data. When finished, write the file back to disk.
If the file is too big to load into memory, then it becomes much more complex. I'm sorry, but since you've done such a poor job of describing the issue, I'm not going to the trouble of going over the details for this case. And such a task probably falls outside the scope of a stackoverflow answer any way.

If you line numbers are not fixed you can do something like below:
class Program
{
private static void Main()
{
var data = "";
const string data1 = "Data1";//First Data
const string data2 = "Data2";//Second Data
const int line1 = 1;//First Data Line
const int line2 = 3;//Second Data Line
var maxNoOfLines = Math.Max(line1, line2);
for (var i = 1; i <= maxNoOfLines; i++)
{
if (i == line1)
{
data += data1 + Environment.NewLine;
}
else if (i == line2)
{
data += data2 + Environment.NewLine;
}
else
{
data += Environment.NewLine;
}
}
File.WriteAllText(#"C:\NOBACKUP\test.txt", data);
}
}
Otherwise if line numbers are fixed it will be much more simpler. You can just remove the loop from above and hardcode the values.

How can I replace my txt file with an array?

I made a array out of a txt file and now i want to replace the array in this txt file (replace like in update).
becouse i edited the array, and now i want to replace the txt file again , i hope its possible and i hope its possible with breaklines.
string[] linesa = File.ReadLines("file1.txt").ToArray();
this is the line where i make a array of my txt.
number = Array.IndexOf(linesa, commonElement);
number = number + 1;
email = linesa[number];
linesa[number] = "";
number = number - 1;
linesa[number] = "";
this is the edit i made and now i want to put it back in the txt file this is where i have alot of problems with.

Just use WriteAllLines method. It will replace the contents of the file if the file exists.
File.WriteAllLines("file1.txt", linesa);

How to strip out 0x0a special char from utf8 file using c# and keep file as utf8?

The following is a line from a UTF-8 file from which I am trying to remove the special char (0X0A), which shows up as a black diamond with a question mark below:
2464577 外國法譯評 True s6620178 Unspecified <1>�1009-672
This is generated when SSIS reads a SQL table then writes out, using a flat file mgr set to code page 65001.
When I open the file up in Notepad++, displays as 0X0A.
I'm looking for some C# code to definitely strip that char out and replace it with either nothing or a blank space.
Here's what I have tried:
string fileLocation = "c:\\MyFile.txt";
var content = string.Empty;
using (StreamReader reader = new System.IO.StreamReader(fileLocation))
{
content = reader.ReadToEnd();
reader.Close();
}
content = content.Replace('\u00A0', ' ');
//also tried: content.Replace((char)0X0A, ' ');
//also tried: content.Replace((char)0X0A, '');
//also tried: content.Replace((char)0X0A, (char)'\0');
Encoding encoding = Encoding.UTF8;
using (FileStream stream = new FileStream(fileLocation, FileMode.Create))
{
using (BinaryWriter writer = new BinaryWriter(stream, encoding))
{
writer.Write(encoding.GetPreamble()); //This is for writing the BOM
writer.Write(content);
}
}
I also tried this code to get the actual string value:
byte[] bytes = { 0x0A };
string text = Encoding.UTF8.GetString(bytes);
And it comes back as "\n". So in the code above I also tried replacing "\n" with " ", both in double quotes and single quotes, but still no change.
At this point I'm out of ideas. Anyone got any advice?
Thanks.

may wanna have a look at regex replacement, for a good example of this, take a look at the post towards the bottom of this page...
http://social.msdn.microsoft.com/Forums/en-US/1b523d24-dab6-4870-a9ca-5d313d1ee602/invalid-character-returned-from-webservice

You can convert the string to a char array and loop through the array.
Then check what char the black diamond is and just remove it.

string content = "blahblah" + (char)10 + "blahblah";
char find = (char)10;
content = content.Replace(find.ToString(), "");

Seeking for a line i a text file

I need some assistance, I am writing a method to read a text file, and if any exception occurs I append a line to the text file. e.g "**"
So what I need to know is how can I check for that specific line of text in the text file without reading every line of the text file, like a peek method or something.
Any help would be appreciated.
Thanks in advance.

You can use File.ReadLines in combination with Any:
bool isExcFile = System.IO.File.ReadLines(path).Any(l => l == "**");
The ReadLines and ReadAllLines methods differ as follows: When you use
ReadLines, you can start enumerating the collection of strings before
the whole collection is returned; when you use ReadAllLines, you must
wait for the whole array of strings be returned before you can access
the array. Therefore, when you are working with very large files,
ReadLines can be more efficient.

I have found a solution, the line I have appended to the file will always be the last line in the file, so I created a method to read the last line. See below:
public string ReadLastLine(string path)
{
string returnValue = "";
FileStream fs = new FileStream(path, FileMode.Open);
for (long pos = fs.Length - 2; pos > 0; --pos)
{
fs.Seek(pos, SeekOrigin.Begin);
StreamReader ts = new StreamReader(fs);
returnValue = ts.ReadToEnd();
int eol = returnValue .IndexOf("\n");
if (eol >= 0)
{
fs.Close();
return returnValue .Substring(eol + 1);
}
}
fs.Close();
return returnValue ;
}

You will need to maintain a separate file with indexes (such as comma delimited) of where your special markers are. Then you can only read those indexes and use the Seek method to jump to that point in the filestream.
If your file is relatively small, let's say <50MB this is an overkill. More than that you can consider maintaining the index file. You basically have to weigh the performance of an extra IO call (that is reading the index file) with that of simply reading from the filestream each line.

From what I understand you want to process some files and after the processing find out which files contain the "**" symbol, without reading every line of the file.
If you append the "**" to the end of the file you could do something like:
using (StreamReader sr = new StreamReader(File.OpenText(fileName)))
{
sr.BaseStream.Seek(-3, SeekOrigin.End);
string endToken = sr.ReadToEnd();
if (endToken == "**\n")
{
// if needed, go back to start of file:
sr.BaseStream.Seek(0, SeekOrigin.Begin);
// do something with the file
}
}

Decoding base64-encoded data from xml document

I receive some xml-files with embedded base64-encoded images, that I need to decode and save as files.
An unmodified (other than zipped) example of such a file can be downloaded below:
20091123-125320.zip (60KB)
However, I get errors like "Invalid length for a Base-64 char array" and "Invalid character in a Base-64 string". I marked the line in the code where I get the error in the code.
A file could look like this:
<?xml version="1.0" encoding="windows-1252"?>
<mediafiles>
<media media-type="image">
<media-reference mime-type="image/jpeg"/>
<media-object encoding="base64"><![CDATA[/9j/4AAQ[...snip...]P4Vm9zOR//Z=]]></media-object>
<media.caption>What up</media.caption>
</media>
</mediafiles>
And the code to process like this:
var xd = new XmlDocument();
xd.Load(filename);
var nodes = xd.GetElementsByTagName("media");
foreach (XmlNode node in nodes)
{
var mediaObjectNode = node.SelectSingleNode("media-object");
//The line below is where the errors occur
byte[] imageBytes = Convert.FromBase64String(mediaObjectNode.InnerText);
//Do stuff with the bytearray to save the image
}
The xml-data is from an enterprise newspaper system, so I am pretty sure the files are ok - and there must be something in the way I process them, that is just wrong. Maybe a problem with the encoding?
I have tried writing out the contents of mediaObjectNode.InnerText, and it is the base64 encoded data - so the navigating the xml-doc is not the issue.
I have been googling, binging, stackoverflowing and crying - and found no solution... Help!
Edit:
Added an actual example file (and a bounty). PLease note the downloadable file is in a bit different schema, since I simplified it in the above example, removing irrelevant stuff...

For a first shot i didn't use any programming language, just Notepad++
I opened the xml file within and copy and pasted the raw base64 content into a new file (without square brackets).
Afterwards I selected everything (Strg-A) and used the option Extensions - Mime Tools - Base64 decode. This threw an error about the wrong text length (must be mod 4). So i just added two equal signs ('=') as placeholder at the end to get the correct length.
Another retry and it decoded successfully into 'something'. Just save the file as .jpg and it opens like a charm in any picture viewer.
So i would say, there IS something wrong with the data you'll get. They just don't have the right numbers of equal signs at the end to fill up to a number of signs which can be break into packets of 4.
The 'easy' way would be to add the equal sign till the decoding doesn't throw an error. The better way would be to count the number of characters (minus CR/LFs!) and add the needed ones in one step.
Further investigations
After some coding and reading of the convert function, the problem is a wrong attaching of a equal sign from the producer. Notepad++ has no problem with tons of equal signs, but the Convert function from MS only works with zero, one or two signs. So if you fill up the already existing one with additional equal signs you get an error too! To get this damn thing to work, you have to cut off all existing signs, calculate how much are needed and add them again.
Just for the bounty, here is my code (not absolute perfect, but enough for a good starting point): ;-)
static void Main(string[] args)
{
var elements = XElement
.Load("test.xml")
.XPathSelectElements("//media/media-object[#encoding='base64']");
foreach (XElement element in elements)
{
var image = AnotherDecode64(element.Value);
}
}
static byte[] AnotherDecode64(string base64Decoded)
{
string temp = base64Decoded.TrimEnd('=');
int asciiChars = temp.Length - temp.Count(c => Char.IsWhiteSpace(c));
switch (asciiChars % 4)
{
case 1:
//This would always produce an exception!!
//Regardless what (or what not) you attach to your string!
//Better would be some kind of throw new Exception()
return new byte[0];
case 0:
asciiChars = 0;
break;
case 2:
asciiChars = 2;
break;
case 3:
asciiChars = 1;
break;
}
temp += new String('=', asciiChars);
return Convert.FromBase64String(temp);
}

The base64 string is not valid as Oliver has already said, the string length must be multiples of 4 after removing white space characters. If you look at then end of the base64 string (see below) you will see the line is shorter than the rest.
RRRRRRRRRRRRRRRRRRRRRRRRRRRRX//Z=
If you remove this line, your program will work, but the resulting image will have a missing section in the bottom right hand corner. You need to pad this line so the overall string length is corect. From my calculations if you had 3 characters it should work.
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRX//Z=

remove last 2 characters while image not get proper
public Image Base64ToImage(string base64String)
{
// Convert Base64 String to byte[]
byte[] imageBytes=null;
bool iscatch=true;
while(iscatch)
{
try
{
imageBytes = Convert.FromBase64String(base64String);
iscatch = false;
}
catch
{
int length=base64String.Length;
base64String=base64String.Substring(0,length-2);
}
}
MemoryStream ms = new MemoryStream(imageBytes, 0,
imageBytes.Length);
// Convert byte[] to Image
ms.Write(imageBytes, 0, imageBytes.Length);
Image image = Image.FromStream(ms, true);
pictureBox1.Image = image;
return image;
}

Try using Linq to XML:
using System.Xml.XPath;
class Program
{
static void Main(string[] args)
{
var elements = XElement
.Load("test.xml")
.XPathSelectElements("//media/media-object[#encoding='base64']");
foreach (var element in elements)
{
byte[] image = Convert.FromBase64String(element.Value);
}
}
}
UPDATE:
After downloading the XML file and analyzing the value of the media-object node it is clear that it is not a valid base64 string:
string value = "PUT HERE THE BASE64 STRING FROM THE XML WITHOUT THE NEW LINES";
byte[] image = Convert.FromBase64String(value);
throws a System.FormatException saying that the length is not a valid base 64 string. Event when I remove the \n from the string it doesn't work:
var elements = XElement
.Load("20091123-125320.xml")
.XPathSelectElements("//media/media-object[#encoding='base64']");
foreach (var element in elements)
{
string value = element.Value.Replace("\n", "");
byte[] image = Convert.FromBase64String(value);
}
also throws System.FormatException.

I've also had a problem with decoding Base64 encoded string from XML document (specifically Office OpenXML package document).
It turned out that string had additional encoding applied: HTML encoding, so doing first HTML decoding and then Base64 decoding did the trick:
private static byte[] DecodeHtmlBase64String(string value)
{
return System.Convert.FromBase64String(System.Net.WebUtility.HtmlDecode(value));
}
Just in case someone else stumbles on the same issue.

Well, it's all very simple. CDATA is a node itself, so mediaObjectNode.InnerText actually produces <![CDATA[/9j/4AAQ[...snip...]P4Vm9zOR//Z=]]>, which is obviously not valid Base64-encoded data.
To make things work, use mediaObjectNode.ChildNodes[0].Value and pass that value to Convert.FromBase64String'.

Is the character encoding correct? The error sounds like there's a problem that causes invalid characters to appear in the array. Try copying out the text and decoding manually to see if the data is indeed valid.
(For the record, windows-1252 is not exactly the same as iso-8859-1, so that may be the cause of a problem, barring other sources of corruption.)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

All Characters in my Bitmap Textfile are in Chinese - c#

This resource was mentioned in the comments; however, my answer was found at https://devblogs.microsoft.com/oldnewthing/20140930-00/?p=43953. Essentially, this error is what occurs when you try to force binary content to be some sort of specific encoding that it's not meant to be.

Related

Write the data to specific line C#

How can I replace my txt file with an array?

How to strip out 0x0a special char from utf8 file using c# and keep file as utf8?

Seeking for a line i a text file

Decoding base64-encoded data from xml document

Categories

Resources