HttpWebRequest wrong encoding determination - c#

I'm trying to read the html page text from site - http://konungstvo.ru/ , which has utf-8 encoding.
var request = _requestCreator.Create(uri);
try
{
using (var response = request.GetResponse())
{
if (response.ContentType.Contains("text/html"))
{
using (var reader = new System.IO.StreamReader(response.GetResponseStream()))
{
string responseText = reader.ReadToEnd();
}
But I'm getting \u001f�\b\01V\u0002X\u0002��X�n\u001b�, and so on, although code works with other sites.

I think you need the character encoding for the Latin/Cyrillic alphabet which could by ISO/IEC 8859-5 or e.g. Windows-1251:
var encoding = Encoding.GetEncoding("iso-8859-5");
using (var reader = new System.IO.StreamReader(response.GetResponseStream(), encoding))
Using this while reading the response stream yields some cyrillic content which unfortunately isn't the correct output, too: https://dotnetfiddle.net/x8jnN8. So, I'm sorry but this isn't a real answer to your problem :/

Related

Why does this UTF-16 HTTP response end up as UTF-8 when in the resulting Stream?

I have an issue where a service is returning me a HTTP Header:
Content-Type: application/json; charset=utf-16
When this is serialised by C# this ends up in a UTF-8 stream, which obviously breaks. It seems that utf-16 is a valid encoding in IANA spec. So why is this code not working?
System.Net.Http.HttpClient httpClient ...;
using (var response = await httpClient.SendAsync(request, HttpCompletionOption.ResponseHeadersRead))
{
//response.Content.Headers.ContentType.CharSet = "utf-16"
using (var responseContentStream = await response.Content.ReadAsStreamAsync())
{
using (var streamReader = new StreamReader(stream))
{
//streamReader.CurrentEncoding.BodyName returns utf-8 here?!
}
}
}
so initially the response seems fine but then once it gets as far as the streamReader it seems to of reverted back to utf-8. Why?
You can specify the encoding the StreamReader should use in the constructor.
In your case it should look like this:
using (var streamReader = new StreamReader(stream, Encoding.Unicode, true))
{
// The reader should read the Stream with UTF-16 here
}

UWP - How to get website contents and store them in a string?

I am trying to make an app, that can read a text from a website and store it in a string.
For example my app could open this random generator website, which would generate a random number string and then my program would read it and store it in a string.
Is that even possible?
I didn't get your goal but you may get the whole HTML page and parse it as you wish:
var httpClient = new HttpClient();
var htmlString = await httpClient.GetStringAsync(new Uri("http://google.com"));
You can also use that, and precise the encoding :
string text = null;
using (WebResponse response = WebRequest.Create(url).GetResponse())
{
using (StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("iso-8859-1")))
{
text = reader.ReadToEnd();
reader.Close();
}
response.Close();
}

How to read data from URL page that has no HTML tags defined using MVC

I am trying to read the content of a page from URL by using the below code in MVC C#
var webRequest = WebRequest.Create(#"https://example.com/aa/aa");
webRequest.Method = "GET";
using (var response = webRequest.GetResponse())
using (var content = response.GetResponseStream())
using (var reader = new StreamReader(content))
{
var strContent = reader.ReadToEnd();
}
but I didnot receive any response (the call never returned to strContent)
but when I run the same code using URL : https://google.com/, it worked fine.
I checked the source code for both pages, and found that https://google.com/ has a proper doctype and tags declared but the one I am hitting seems to be a properties file with no tags and doctype defined.
Any help will be appreciated.
using (var client = new WebClient())
{
string data = client.DownloadString("www.yourUrl.com");
}

WebClient, WebRequest and Stream not returning anything...?

I have tried both WebClient's DownloadSring and WebRequest+Stream to try and scrape a page (This one) and get the Raw Paste data from it. I have scoured the net but have found no answers.
I have this code:
WebRequest request = WebRequest.Create("http://pastebin.com/raw.php?i=" + textBox1.Text);
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string pasteContent = "";
using (StreamReader sr = new StreamReader(data))
{
pasteContent = sr.ReadToEnd();
}
new Note().txtMain.Text += pasteContent;
new Note().txtMain.Refresh();
and I have multiple forms so I am editing Note's txtMain textbox to add the paste content but it seems to return nothing, no matter which function I use. I know cross-form editing works because I have multiple things that can return to it.
How can I scrape the raw data?
Thank you VERY much,
P
There is no problem in downloading the content of your site. You simply doesn't use the instance of the Node class you created.
var note = new Note();
note.txtMain.Text += pasteContent;
note.Show();

How do i prevent Web Response Stream to Close ?

Context:
I am improving my .dll to execute WebRequests (Get and Post), adding a new feature, that is : automatic encoding detecting after the response
How will it work:
User of the library configures it's default parameters for requests, including Encoding
Library executes the request for a certain url
Library checks if the encoding of the page is the same as the pre-configured one (via checking the value of the meta tag)
If the encodes are not the same, i re-encode the response using the right encoding (the one found on the page)
Code Fragment:
// Executes web request and wait for response
using (HttpWebResponse resp = (HttpWebResponse) m_HttpWebRequest.GetResponse())
{
using (var stream = resp.GetResponseStream ())
{
using (var reader = new StreamReader (stream))
{
// Reading Stream to the end
response = reader.ReadToEnd ();
// Identifying the encode used on the page
// I will not paste method here, but it works
m_PageEncoding = IdentifyEncoding (response);
}
// Checking if the page encode is not the same as the preconfigured one
if (m_PageEncoding != System.Text.Encoding.GetEncoding (m_encoding))
{
using (var encodedReader = new StreamReader (stream, m_PageEncoding))
{
response = encodedReader.ReadToEnd();
}
}
}
}
Problem:
Once i create another Reader, which is the EncodedReaderwith the Encoding argument, an exception is thrown: Stream was not readable.
If, i nest the readers within the responseStream using block, the response value after the second reading is always "empty"
using (var stream = resp.GetResponseStream ())
{
using (var reader = new StreamReader (stream))
{
// Reading Stream to the end
response = reader.ReadToEnd ();
// Identifying the encode used on the page
m_PageEncoding = IdentifyEncoding (response);
// Checking if the page encode is not the one i've used as argument
if (m_PageEncoding != System.Text.Encoding.GetEncoding(m_encoding))
{
using (var encodedReader = new StreamReader(stream, m_PageEncoding))
{
response = encodedReader.ReadToEnd();
}
}
}
}
Question:
How can i execute the ReadToEnd method twice, on the same WebResponse, without executing the request twice, which would be lousy.

Categories

Resources