I'm trying to read a rss feed which uses the iso-8859-1 encoding.
I can get all elements fine, the problem is when I put it in a textblock it will not show all characters. I'm not sure what i'm doing wrong. i've tried a few solutions I found on google but this didn't work for me. I must be missing something.. It's also the first time I really work with anything other than utf-16. I never had to convert anything before.
The app works as follows I downloadstring async(WebClient). So when that is called I get a string containing the complete rss feed.
I have tried getting the bytes, then encoding.convert.. But I must be missing something.
Like this is a sample
WebClient RSS = new WebClient();
RSS.Encoding = Encoding.GetEncoding("ISO-8859-1");
RSS.DownloadStringCompleted += new DownloadStringCompletedEventHandler(RSS_DSC);
RSS.DownloadStringAsync(new Uri("some rss feed"));
public void RSS_DSC(object sender, DownloadStringCompletedEventArgs args)
{
_xml = XElement.Parse(args.Result);
foreach(XElement item in _xml.Elements("channel").Elements("item"))
{
feeditem.title = item.Element("title").Value;
// + all other items
}
}
I've tried this aswell
private void RSS_ORC(object sender, OpenReadCompletedEventArgs args)
{
Encoding e = Encoding.GetEncoding("ISO-8859-1");
Stream ez = args.Result;
StreamReader rdr = new StreamReader(ez, e);
XElement _xml = _xml = XElement.Parse(rdr.ReadToEnd());
feedlist = new List<Code.NewsItem>();
XNamespace dc = "http://purl.org/dc/elements/1.1/";
foreach (XElement item in _xml.Elements("channel").Elements("item"))
{
Code.NewsItem feeditem = new Code.NewsItem();
feeditem.title = item.Element("title").Value;
feeditem.description = item.Element("description").Value;
feeditem.pubdate = item.Element("pubDate").Value;
feeditem.author = item.Element(dc + "creator").Value;
feedlist.Add(feeditem);
}
listBox1.ItemsSource = feedlist;
}
Though titles contain characters that are not displayed well either. Like.. I can get the encoding to partially work. Instead of having these characters: the square with a question mark, a question mark or the singe square.
Don't get me wrong I'm a total beginner on this. But the solutions that has been posted on the web do not solve it for me.
Note that I removed the encoding part because it wasn't working :/
If someone would be able to help me that would be amazing.
You can specify an encoding by setting encoding before calling client.DownloadStringAsync:
webClient.Encoding = Encoding.GetEncoding("iso-8859-1")
In your code sample you do not create the XML doc anywhere. Are some code missing? You should initialize it with something like:
var xml = XDocument.Load((string)args.Result);
If it helps, you can use:
var myString = HttpUtility.HtmlDecode(feeditem.description);
This way every special character will be decode, you can then display myString correctly
Windows Phone 7 and Silverlight does not support other encodings such as ISO-8859-1, they only support ASCII and the Unicode encoders. For anything else you will need to use OpenReadAsync to get a stream of bytes then apply your own implementation of an encoding.
This blog might be helpful to you in creating one.
ISO-8859-1 most definitely is supported in WP7. It is the only one of the ISO-8859-* encodings that is. I use an XmlReader to deserialize RSS streams and UTF-* and ISO-8859-1 are the only encodings that are supported by that class (windows-* and ISO-8859-2 and above throw exceptions in the XmlReader c'tor).
Try using an XmlReader like this (without specifying the encoding):
using (XmlReader reader = XmlReader.Create(stream))
{
...
}
The XmlReader will get the encoding from the xml declaration in the stream.
You may still have problems displaying the upper half of the characters (above 0x80). I had this problem in feed me (my WP7 app) and used this little hack to fix things up:
public static string EncodeHtml(string text)
{
if (text == null) return string.Empty;
StringBuilder decodedText = new StringBuilder();
foreach (char value in text)
{
int i = (int)value;
if (i > 127)
{
decodedText.Append(string.Format("&#{0};", i));
}
else
{
decodedText.Append(value);
}
}
return decodedText.ToString();
}
It only works in a WebBrowser control of course, but that is the only place that I ever saw an incorrect display.
Hope this helps,
Calum
This worked for me when needing to decode the rss xml. It's generic enough so that it will support all encryption types supported by .NET
WebClient wcRSSFeeds = new WebClient();
String rssContent;
// Support for international chars
Encoding encoding = wcRSSFeeds.Encoding;
if (encoding != null)
{
encoding = Encoding.GetEncoding(encoding.BodyName);
}
else
{
encoding = Encoding.UTF8; // set to standard if none given
}
Stream stRSSFeeds = wcRSSFeeds.OpenRead(feedURL); // feedURL is a string eg, "http://blah.com"
using (StreamReader srRSSFeeds = new StreamReader(stRSSFeeds, encoding, false))
{
rssContent = srRSSFeeds.ReadToEnd();
}
Related
I have a problem with string reading, I will explain the problem:
I have this code to read a web page and put it in a string:
System.Net.WebRequest request = System.Net.WebRequest.Create(textBox1.Text);
using (System.Net.WebResponse response = request.GetResponse())
{
using (System.IO.Stream stream = response.GetResponseStream())
{
using (StreamReader sr = new StreamReader(stream))
{
html = sr.ReadToEnd();
}
}
}
Now I would like to take only some parts of this string, how can I do, if I use substring it doesn't take the selected pieces.
Example of a substring code:
Name = html.Substring((html.IndexOf("og:title")+19), (html.Substring(html.IndexOf("og:title") +19).FirstOrDefault(x=> x== '>')));
I would like it to start after the "og: title" and get to the '>', but it doesn't work.
The result is example:
"Valchiria “Intera” Pendragon\">\n<meta property=\"og:image\" conte"
It is easier if you use a library to do it, for example you can take a look at this
Your code, if I understood what you desire, should be like the following:
static void Main(string[] args)
{
const string startingToken = "og:title\"";
const string endingToken = "\">";
var html = "<html><meta property=\"og:title\" Valchiria “Intera” Pendragon\">\n<meta property=\"og:image\" content></html>";
var indexWhereOgTitleBegins = html.IndexOf(startingToken);
var htmlTrimmedHead = html.Substring(indexWhereOgTitleBegins + startingToken.Length);
var indexOfTheEndingToken = htmlTrimmedHead.IndexOf(endingToken);
var parsedText = htmlTrimmedHead.Substring(0, indexOfTheEndingToken).TrimStart(' ').TrimEnd(' ');
Console.WriteLine(parsedText);
}
Note that you can also use regular expressions to achieve the same in less line of code, but managing regex are not always easy.
Take a look at this answer:
Parsing HTML String
Your question title is probably not correct, because it looks more specific to HTML parsing.
I want to read a simple CSV file with comma separated with this code:
var reader = new StreamReader(File.OpenRead(#"d:\34.csv"));
List<string> listA = new List<string>();
List<string> listB = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(',');
listA.Add(values[0]);
listB.Add(values[1]);
}
MessageBox.Show("READ IT!!!");
But when I read the file an debug that code,attention can not read Persian or Arabic character! How can I solve that? I think my file is not valid encoding?
if your CSV file contains just one line the ReadToEnd could be acceptable, but if you have a log file composed by more than one line then it is better to read line by line using ReadLine of the StreamReader object
link for true answer and more information
using (StreamReader sr = new StreamReader("c:/temp/34.csv"))
{
string currentLine;
// currentLine will be null when the StreamReader reaches the end of file
while((currentLine = sr.ReadLine()) != null)
{
// Search, case insensitive, if the currentLine contains the searched keyword
if(currentLine.IndexOf("I/RPTGEN", StringComparison.CurrentCultureIgnoreCase) >= 0)
{
Console.WriteLine(currentLine);
}
}
}
More information
You can create a class composed of get and set for each line of the CSV . You can then instantiate an object list to retrieve the CSV lines.
Try this way :
class Program
{
static void Main(string[] args)
{
var reader = new StreamReader(File.OpenRead(#"YourCSV"),Encoding.Unicode);
List<Customer> customer = new List<Customer>();
while (!reader.EndOfStream)
{
Customer c = new Customer
{
m_line1 = null,
m_line2 = null,
};
var line = reader.ReadLine();
var tokens = line.Split(',');
c.m_line1 = tokens[0];
c.m_line2 = tokens[1];
customer.Add(c);
}
foreach(var s in customer)
{
Console.Writline(s);
Console.Readline();
}
}
}
class Customer
{
private string line1;
public string m_line1
{
get
{
return line1;
}
set
{
line1= value;
}
}
private string line2;
public string m_line2
{
get
{
return line2;
}
set
{
line2= value;
}
}
You will have to pass the character encoding to the StreamReader constructor. There is no such thing as plain text. Reading text requires knowing its encoding.
The line
using (StreamReader sr = new StreamReader("c:/temp/34.csv"))
should be
using (StreamReader sr = new StreamReader("c:/temp/34.csv"), myencoding)
what myencoding is is something only you can know. With what encoding was the file saved? That's the encoding you need there. If the file was generated on Windows, and educated guess of the most likely encoding would be it is UTF-16LE. That encoding is available as Encoding.Unicode - which is a bad name, it should have been Encoding.UTF16LE, but that's the name the .NET framework uses.
Other possible encodings that are supported by StreamReader are listed on https://msdn.microsoft.com/en-us/library/System.Text.Encoding_properties(v=vs.110).aspx
If you don't know with what encoding the file was saved, some encodings leave hints in the form of a Byte order mark sometimes abbreviated to BOM. A byte order mark are the first few bytes of a text document that tell you its encoding. You can find more information on the byte order mark, and some of its values on http://en.wikipedia.org/wiki/Byte_order_mark
Relying on the BOM is generally a bad idea, because
it's not a full-proof solution: some encodings don't use a BOM, or make the BOM optional
Even if you successfully determine the encoding, that doesn't mean that StreamReader knows how to handle that encoding (though this is unlikely, but possible)
the BOM might not be a BOM at all, but be part of the actual text (also unlikely but possible)
In some cases it is impossible to know the encoding of a file, notably if the file comes from a file upload on the web, or if someone just mailed you the file, and they don't know how they encoded it. This can be a good reason not to allow "plain text" uploads (which is reasonable because, it can do with a little repetition, there is no such thing as plain text).
tl;dr: The most likely thing to work is one of
using (StreamReader sr = new StreamReader(File.OpenRead(#"c:/temp/34.csv"),Encoding.Unicode) {
...
}
or
using (StreamReader sr = new StreamReader(File.OpenRead(#"c:/temp/34.csv"),Encoding.UTF8)
or
using (StreamReader sr = new StreamReader(File.OpenRead(#"c:/temp/34.csv"),Encoding.UTF32)
For my unix/java friends I would like to send newlines ('\n') instead of a CRLF (
'\r\n') in Json.Net. I tried setting a StreamWriter to use a newline without any success.
I think Json.Net code is using Environment.NewLine instead of calling TextWriter.WriteNewLine(). Changing Environment.NewLine is not an option because I'm running as a server and the newline encoding is based on the request.
Is there some other way to force newline for crlf?
Here's my code -
using (var streamWriter = new StreamWriter(writeStream, new UTF8Encoding(false))
{
NewLine = "\n"
})
using (var jsonWriter = new JsonTextWriter(streamWriter)
{
CloseOutput = true,
Indentation = 2,
Formatting = Formatting.Indented
})
{
// serialise object to JSON
}
After delving into the Json.Net code, I see the issue is with JsonTextWriter.WriteIndent, thanks Athari.
Instead of _writer.Write(Environment.NewLine); it should be _writer.WriteLine();.
I've posted a pull request to github. https://github.com/JamesNK/Newtonsoft.Json/pull/271
If you want to customize indentation whitespace, just override JsonTextWriter.WriteIndent:
public class JsonTextWriterEx : JsonTextWriter
{
public string NewLine { get; set; }
public JsonTextWriterEx (TextWriter textWriter) : base(textWriter)
{
NewLine = Environment.NewLine;
}
protected override void WriteIndent ()
{
if (Formatting == Formatting.Indented) {
WriteWhitespace(NewLine);
int currentIndentCount = Top * Indentation;
for (int i = 0; i < currentIndentCount; i++)
WriteIndentSpace();
}
}
}
I see several solutions here that are configuring the serializer. But if you want quick and dirty, just replace the characters with what you want. After all, JSON is just a string.
string json = JsonConvert.SerializeObject(myObject);
json = json.Replace("\r\n", "\n");
I know this is an old question but I had a hard time getting the accepted answer to work and maintain proper indented formatting. As a plus, I am also not creating an override to make this work.
Here is how I got this to work correctly:
using (var writer = new StringWriter())
{
writer.NewLine = "\r\n";
var serializer = JsonSerializer.Create(
new JsonSerializerSettings
{
Formatting = Formatting.Indented
});
serializer.Serialize(writer, data);
// do something with the written string
}
I am assuming that the code changes referenced in this question enabled the setting of NewLine on the StringWriter to be respected by the serializer.
On the wire protocol for just about everthing is that a line ends with crlf. If you feel that you really must do this, post processing is the way to go -- write it out to a string and then change the string before returning it.
Note that this is a lot of extra processing for what I would consider an actual negative. Not recommended.
result = result.Replace(#"\r\n", "\n");
I know that there is a lot of tutorials about this and even answered questions here, but I have problem I'm trying to resolve for hours and I read almost everything here, but this still remains mistery for me. Please help:
I'm creating XML, and it's created, but the problem is that encoding is UTF-16, and it should be UTF-8. This is what I tried so far, but still is UTF-16:
var xmlText = new StringBuilder();
using (var xml = XmlWriter.Create(xmlText))
{
xml.WriteStartDocument();
xml.WriteStartElement("Weather");
if (model.ModuleList[0] != null)
{
foreach (var weather in model.ModuleList)
{
var AddProperty = new Action<XmlWriter, ModuleModel>((a, forc) =>
{
xml.WriteStartElement("Forecast");
a.WriteElementString("Description", forc.Description);
a.WriteElementString("Date", forc.Date.ToString());
a.WriteElementString("MinTemp", forc.Min_Temp.ToString());
a.WriteElementString("MaxTemp", forc.Max_Temp.ToString());
a.WriteElementString("Pressure", forc.Pressure.ToString());
a.WriteElementString("Humidity", forc.Humidity.ToString());
xml.WriteEndElement();
});
AddProperty(xml, weather);
}
}
xml.WriteEndElement();
xml.WriteEndDocument();
}
var xmlresult = xmlText.ToString();
How to set encoding to my XML to UTF-8? Please help...
The result of your code is a string xmlresult - and strings do not have an encoding, they are always Unicode.
You use an encoding when you convert a string to a sequence of byte - so your problem is not in the piece of code you posted, but in the code you use to write that string to a file.
Something like this:
using (StreamWriter writer = new StreamWriter(fileName, true, Encoding.UTF8))
{
writer.Write(xmlresult);
}
will write a UTF-8 file - where filename contains the path of the file.
If you need UTF-8 encoded bytes in memory use:
var utf8Bytes = Encoding.UTF8.GetBytes("xmlresult");
I'm actually playing around with the last.FM web serivce API which I call via REST. When I get the response I try to convert the result into a XDocument so I could use LINQ to work with it.
But when I pass the result string to the XDocumnet constructor an ArgumentException is thrown telling me that "Non white space characters cannot be added to content.". Unfortunately I'm very new to web services and XML programming so I don't really know how to interpret this exception.
I hope someone could give me a hint how to solve this problem.
It sounds to me as though you are holding the response in a string. If that is the case, you can try to use the Parse method on XDocument which is for parsing XML out of a string.
string myResult = "<?xml blahblahblah>";
XDocument doc = XDocument.Parse(myResult);
This may or may not solve your problem. Just a suggestion that is worth a try to see if you get a different result.
Here's a sample you can use to query the service:
class Program
{
static void Main(string[] args)
{
using (WebClient client = new WebClient())
using (Stream stream = client.OpenRead("http://ws.audioscrobbler.com/2.0/?method=album.getinfo&api_key=b25b959554ed76058ac220b7b2e0a026&artist=Cher&album=Believe"))
using (TextReader reader = new StreamReader(stream))
{
XDocument xdoc = XDocument.Load(reader);
var summaries = from element in xdoc.Descendants()
where element.Name == "summary"
select element;
foreach (var summary in summaries)
{
Console.WriteLine(summary.Value);
}
}
}
}
http://jamescrisp.org/2008/08/08/simple-rest-client/ has posted a little REST Client. Maybe a starting point for you.