Exception reading *.htm file from local app data (Metro App) - c#

I'm using FileIO.ReadTextAsync() to read an *.htm webpage which I have saved into "ms-appdata:///local", using Utf8 encoding.
But I get a System.ArgumentOutOfRangeException when doing it. Additional information is No mapping for the Unicode character exists in the target multi-byte code page.
Reading an ordinary *.txt file using the same function works fine. What am I doing wrong ?
Edit : Code
async private void Button_Click(object sender, RoutedEventArgs e)
{
StorageFile SF = await StorageFile.GetFileFromApplicationUriAsync(new Uri("ms-appdata:///local/test3.html"));
string html = await FileIO.ReadTextAsync(SF, Windows.Storage.Streams.UnicodeEncoding.Utf8);
}

Change the file encoding using Visual Studio. When I opened the file it had the encoding: "Western European (Windows) - Codepage 1252"
Open the file in Visual Studio
File > Advanced Save Options... >
Change the encoding to "Unicode (UTF-8 with signature) - Codepage 65001"
Save the file
Credits: Advanced save options in visual studio

Related

RichTextBox shows results in Chinese?

Trying to import PlainText file with English characters using a RichTextBox in C# with UWP and VS 2017. Imports fine except all the characters are Chinese. I have to use a StorageFile class for the file because that's the only one that works with UWP file privacy issues. I tried all TexSetOptions with no success and can't find a way to specify format in either the stream or rtb. Here's the code:
StorageFile file = await StorageFile.GetFileFromPathAsync(filePath));
IRandomAccessStream stream = await file.OpenAsync(Windows.Storage.FileAccessMode.Read);
/* NOTE: RichTextBox (Name="editor") is defined in Xaml */
editor.Document.LoadFromStream(Windows.UI.Text.TextSetOptions.ApplyRtfDocumentDefaults, stream);
As noted in the comments, this is due to an encoding mismatch. The API expects UTF-16 but you have UTF-8 (or maybe ASCII). Consider using FileIO.ReadTextAsync instead. This should auto-detect the encoding, or if it doesn't there is an overload where you can specify it directly.
Note that if you have a file encoded with an ANSI codepage (not any flavour of Unicode) you'll need to convert it first (check other SO posts).
The UWP RichTextBox standard is random access unicode, so just had to adjust the file stream to match.
string x = await FileIO.ReadTextAsync(file);
byte[] bytes = System.Text.Encoding.Unicode.GetBytes(x);
InMemoryRandomAccessStream randomAccessStream = new InMemoryRandomAccessStream();
await randomAccessStream.WriteAsync(bytes.AsBuffer());
IRandomAccessStream stream2 = randomAccessStream; //await file.OpenAsync(Windows.Storage.FileAccessMode.Read);
editor.Document.LoadFromStream(Windows.UI.Text.TextSetOptions.ApplyRtfDocumentDefaults, stream2);

Unable to force ANSI encoding (windows-1252)

I'm trying to save two text files in ANSI encoding for later processing by a legacy system. However when I save it in the correct encoding, it still saves as a UTF-8 file.
I've tried the following:
File.WriteAllLines(filePath, lines, Encoding.GetEncoding(1252));
File.WriteAllLines(filePath, lines, Encoding.GetEncoding("windows-1252"));
using (StreamWriter writer = new StreamWriter(fileName, false, Encoding.GetEncoding(1252)))
{
foreach (string line in lines)
{
writer.WriteLine(line);
}
}
I've also tried converting an existing utf-8 file to ansi
File.WriteAllBytes(fileName, Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding(1252), File.ReadAllBytes(fileName)));
None of the above solutions have worked; they're still UTF-8. The only way I managed to make it save as ANSI was by inserting swedish characters like åäö, which is a hack I cannot use for one of the files.
I'm at a loss. Has anyone got a solution to this issue?
We're on .NET Framework 4.5, C# 7.3
I did a thorough investigation and found that it works, just not in the way I expected. As #jdweng said: Nothing in the data contains the encoding, you're just saving bytes. For the most part you're saving regular ASCII characters, so when you for instance open Notepad++ to read it, it will default to whatever encoding it prefers, unless you have a special character that hints to the program which encoding to use.
I encoded a file in four encodings (default (UTF-8), ANSI, ASCII and UTF-8-BOM) and opened up all files in a hex editor and found that in most cases the ä in these files determined which decoder to use in Notepad++.
So if the legacy system uses an ANSI decoder, it should be able to open an "ANSI" encoded file without special characters. Despite it showing up as UTF-8 in Notepad++.
It definitely works. Try the following program:
using System.IO;
using System.Text;
namespace Demo
{
static class Program
{
static void Main()
{
string filePath = #"E:\tmp\test"; // Put your path here.
string[] lines = { "ÿ" };
File.WriteAllLines(filePath + ".1.bin", lines, Encoding.GetEncoding(1252));
File.WriteAllLines(filePath + ".2.bin", lines);
}
}
}
Run the program and then inspect the contents of the files in a binary editor.
You will see the following:
test.1.bin contains: FF 0D 0A
test.2.bin contains: 0C BF 0D 0A
(Note: If you drag and drop a ".bin" file into Visual Studio, it will open it in binary mode.)

C# Write to file using StreamWriter with ANSI encoding doesn't work

I want to write to text file with ANSI encoding. Code looks like below:
string text = "abc123";
string filePath = "C:\\Data\\MyFile.csv";
using (StreamWriter sw = new StreamWriter(File.Open(filePath, FileMode.OpenOrCreate), Encoding.Default))
{
sw.Write(text);
}
When I open result file with Notepad++ and click on 'Encoding' button on menu then there is always 'UTF-8 (without BOM)' which I want to avoid.
I tried to choose option 'convert to ANSI', but after save of file and reopen it's still 'UTF-8.
I am stuck with this issue for long time, could anyone give some hint ?
There is no problem with StreamWriter, it is how Notepad++ works. You can easily see it yourself. Just open classic windows Notepad, type "test" and "save as" with ANSI encoding. Then open in Notepad++ - it will recognize encoding as UTF8.
If you export text containing special characters, for example 'é', Notepadd++ will then show the encoding as ANSI instead of UTF-8.
string text = "éçë";

How to change text file encoding pragmatically using C# in WinRT/Windows store app

I need to change the encoding of some text file from UTF-8 to ASCII pragmatically in my Windows store app project(c#). On WinRT/Win8.1, we can do this simply by manually open it with notepad and then choose "Save as" menu, but my question is how to do it in code(c#)?
[EDIT]
In WinRT, we can use FileIO.WriteLinesAsync() or FileIO.WriteTextAsync() to save a string to text file, but we can only specify UnicodeEncoding as the encoding. So, the SDK is quite different compare to full fledged .NET SDK.
[EDIT]
I know ASCII is a subset of UTF-8, but I really need to make sure the file encoding is ASCII, because I want to upload the file to a web site and it only accept ASCII encoding txt files(UTF-8/Unicode encoding would cause it complain file format error!);
[EDIT]
Problem solved:
public async void SaveStringToAnsiFile()
{
StorageFile file = await ApplicationData.Current.LocalFolder.CreateFileAsync("test.txt", CreationCollisionOption.ReplaceExisting);
await Windows.Storage.FileIO.WriteBytesAsync(file, Encoding.GetEncoding("gb2312").GetBytes("abcd→1234"));
}
Since ASCII isn't directly supported, you'll need to convert the text to a byte array and use something like WriteBytesAsync (reference). Here's a simple technique. Of course, non-ascii characters won't work (but that's not what you need anyway).
string str = "these are characters";
byte[] bytes = new byte[str.Length];
for (var i = 0; i < str.Length; i++)
{
bytes[i] = Convert.ToByte(str[i]);
}
// create the file here ... then ...
await Windows.Storage.FileIO.WriteBytesAsync(file, bytes);

How do you read a UTF8 Arabic text file in Metro?

I'm using the following code to read the contents of the text file. The file is encoded in some sort of Utf8 format:
String File = "ms-appx:///Arabic/file.txt";
contents = await Windows.Storage.PathIO.ReadTextAsync(File, Windows.Storage.Streams.UnicodeEncoding.Utf8);
But the above gives me the error:
WinRT information: No mapping for the Unicode character exists in the target multi-byte code page.
Any ideas what I'm doing wrong here?
Thanks
I had a similar issue trying to read text files that contained certain characters (’, °, –) in a file that was using "Western European (Windows) - Codepage 1252" encoding.
The solution in my case was to force Visual Studio to save the files using UTF-8 encoding.
Open the file in Visual Studio
File > Advanced Save Options... >
Change the encoding to "Unicode (UTF-8 with signature) - Codepage 65001"
Save the file
Try using Windows.Storage.Streams.DataReader:
StorageFolder folder =
Windows.ApplicationModel.Package.Current.InstalledLocation;
StorageFile file = await folder.GetFileAsync("ms-appx:///Arabic/file.txt");
var stream = (await file.OpenAsync(FileAccessMode.Read));
Windows.Storage.Streams.DataReader mreader =
new Windows.Storage.Streams.DataReader(stream.GetInputStreamAt(0));
byte[] dgram = new byte[file.Size];
await mreader.LoadAsync((uint)dgram.Length);
mreader.ReadBytes(dgram);
Hope it helps.

Categories

Resources