How can I get the title of a page on another site? - c#

Pretty long question;
How can I do the following in C#:
Open a web page (Preferably not visible)
Check whether the page redirects to a different page (Site is down, 404, etc.)
Check if the title is not equal to a said string
Then separately, (They need to click a confirm button)
open their browser, and go to the address of the first (It'll be the only one) hyperlink on the site.
I literally have been looking on Google for ages and haven't found anything similar to what I need.
Whether you give me a link to a site with a tutorial on this area of programming or actual source code doesn't make a difference to me.

check out the webrequest class, it can do redirection :) then you can just parse the html and find the title tag using xpath or something
sort of like this
using System.Xml;
using System.Xml.XPath;
using System.Xml.Linq;
using System.Net;
...
HttpWebRequest myReq = ( HttpWebRequest )WebRequest.Create( "http://www.contoso.com/" );
myReq.AllowAutoRedirect = true;
myReq.MaximumAutomaticRedirections = 5;
XNode result;
using( var responseStream = myReq.GetResponse( ).GetResponseStream( ) ) {
result = XElement.Load( responseStream );
}
var title = result.XPathSelectElement( "//title" ).Value;
obviosly your xpath can (and probably should) be more sophisticated :) you can find out more on xpath here
on a similar note you can use xpath on the xml you get back to find the links and pick out the first one:
var links = result.XPathSelectElements( "//a" ).Select( linktag => linktag.Attribute( "href" ).Value );
when you eventually find the url you want to open you can use
System.Diagnostics.Process.Start( links.First() );
to get it to open in the browser. a nice aspect of this is that it will open what ever browser is the default for the client. it does have security implications though, you should make sure that its an url and not an exe file or something.
also, its possible that the html use diffrent capital letters for its elements, you'd have to deal with that when looking for linsk

You could use WebRequest or HttpWebRequest, but if you want a browser UI you will need to use the WebBrowser control: http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.aspx
You will need to handle the completion event from the Navigate call which will load the page for you:
WebBrowser myWebBrowser = new WebBrowser();
webBrowser1.Navigating += new WebBrowserNavigatingEventHandler(webBrowser1_IDontKnow);
myWebBrowser.Navigate("http://myurl.com/mypage.htm");
You can then implement your handler as follows, and interact with the WebBrowser ui as necessary... the DocumentText property contains the HTML of the currently loaded web page:
private void webBrowser1_IDontKnow(object sender, WebBrowserNavigatingEventArgs e)
{
CheckHTMLConfirmAndRedirect(webBrowser1.DocumentText);
}

Use HttpWebRequest and parse the response:
private static void method1()
{
string strWORD = "pain";
const string WORDWEBURI = "http://www.wordwebonline.com/search.pl?w=";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(WORDWEBURI + strWORD.ToUpper());
request.UserAgent = #"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)";
request.ContentType = "text/html";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StringBuilder sb = new StringBuilder();
Stream resStream = response.GetResponseStream();
byte[] buffer = new byte[8192];
string tempString = null;
int count = 0;
do
{
// fill the buffer with data
count = resStream.Read(buffer, 0, buffer.Length);
// make sure we read some data
if (count != 0)
{
// translate from bytes to ASCII text
tempString = Encoding.UTF8.GetString(buffer, 0, count);
// continue building the string
sb.Append(tempString);
}
}
while (count > 0); // any more data to read?
Console.Write(sb.ToString());
}

Related

.NET won't download full XML response from REST API

When downloading an XML response from a REST API, I cannot get .NET to download the full XML document on many requests. In each case, I'm missing the last several characters of the XML file which means I can't parse it. The requests work fine in a browser.
I have tried WebResponse.GetResponseStream() using a StreamReader. Within the StreamReader I have tried Read(...) with a buffer, ReadLine(), and ReadToEnd() to build a string for the response. Wondering if there was a bug in my code, I also tried WebClient.DownloadString(url) with the same result and XmlDocument.Load(url) which just throws an exception (unexpected end of file while parsing ____).
I know for a fact that this API has had some encoding issues in the past, so I've tried specifying multiple different encodings (e.g., UTF-8, iso-8859-1) for the StreamReader as well as letting .NET detect the encoding. Changing the encoding seems to result in a different number of characters that get left off the end.
Is there any way I can detect the proper encoding myself? How does a browser do it? Is there somewhere in any browser to see the actual encoding the response is using (not what the HTTP headers say it's returning)? Any other methods of getting a string response from a web site with an unknown encoding?
StreamReader sample code
StringBuilder sb = new StringBuilder();
using (resp = (HttpWebResponse)req.GetResponse())
{
using (Stream stream = resp.GetResponseStream())
{
using (StreamReader sr = new StreamReader(stream))
{
int charsRead = 1;
char[] buffer = new char[4096];
while (charsRead > 0)
{
charsRead = sr.Read(buffer, 0, buffer.Length);
sb.Append(buffer, 0, charsRead);
}
}
}
}
WebClient sample code
WebClient wc = new WebClient();
string text = wc.DownloadString(url);
XmlDocument sample code
XmlDocument doc = new XmlDocument();
doc.Load(url)

Retrieve html from website

This is a little bit tricky but this is how it goes.
Page loads
Executes some javascript which generates more html code. And source code is the one I need.
Now I see I can't use html parser because there isn't actually a way to run the code.
Using http I can manage getting the first source code but the javascript isn't executed so I never get the source code I need.
What is the best way to retrieve that code generated afterwards?
Edit: I am trying to avoid using a hidden web browser. It is actually possible with it since it works as a javascript interpreter here but it is very slow and very ugly way.
Edit2: Added code
static private string _InetReadEx(string sUrl)
{
string aRet;
HttpWebRequest webReq = (HttpWebRequest)HttpWebRequest.Create(sUrl);
try
{
webReq.CookieContainer = new CookieContainer();
webReq.Method = "GET";
using (WebResponse response = webReq.GetResponse())
{
using (Stream stream = response.GetResponseStream())
{
StreamReader reader = new StreamReader(stream);
aRet = reader.ReadToEnd();
return aRet;
}
}
}
catch (Exception ex)
{
return string.Empty;
}
}
Unless you're using WebBrowser as you mentioned you want to avoid. There is no other conveneiet way.
You can mimic the behavior of the JavaScript that runs and execute it and than format it as the WebBrowser does, but this will not be dynamic formatting and thus much less desired.

C# What is the best way to compute a hash of an xml feed

I want to detect if a feed has changed, the only way I can think of would be to hash the contents of the xml document and compare that to the last hash of the feed.
I am using XmlReader because SyndicationFeed uses it, so idealy I don't want to load the syndication feed unless the feed has been updated.
XmlReader reader = XmlReader.Create("http://www.extremetech.com/feed");
SyndicationFeed feed = SyndicationFeed.Load(reader);
Why not just check the LastUpdatedTime of the feed? That's a built-in way of telling you whether something is new or not. Instead of hashing and storing a hash you would simply keep track of the LastUpdatedTime and compare it periodically to latest LastUpdatedTime:
using System;
using System.ServiceModel.Syndication;
using System.Xml;
public class MyClass
{
private static DateTime _lastFeedTime = new DateTime(2011, 10, 10);
public static void Main()
{
XmlReader reader = XmlReader.Create("http://www.extremetech.com/feed");
SyndicationFeed feed = SyndicationFeed.Load(reader);
if (feed.LastUpdatedTime.LocalDateTime > _lastFeedTime)
{
_lastFeedTime = feed.LastUpdatedTime.LocalDateTime;
// load feed...
}
}
}
If you really want to go the hash way you can do the following:
var client = new WebClient();
var content = client.DownloadData("http://www.extremetech.com/feed");
var hash = MD5.Create().ComputeHash(content);
var hashString = Convert.ToBase64String(hash);
// you can then compare hashes and if changed load it this way
XmlReader reader = XmlReader.Create(new MemoryStream(content));
Of course going this way you will detect any change in the content, even the slightest.
IMHO the best way to go is load the feed anyway and hash just the contents of the articles, you can hash any string like this:
var toHash = "string to hash";
var hash = MD5.Create().ComputeHash(Encoding.UTF8.GetBytes(toHash);
var hashString = Convert.ToBase64String(hash);
Hope this helps.
A hash approach won't work in this case due to an XML comment added by some server side caching which constantly very frequently even when the actual feed never changes.
One thing you can do which works for this feed is use HTTP conditional requests to ask the server to give you the data only if its actually been modified since the last time you requested.
For example:
You'd have a global/member variable to hold the last modified datetime from your feed
var lastModified = DateTime.MinValue;
Then each time you'd make a request like the following
var request = (HttpWebRequest)WebRequest.Create( "http://www.extremetech.com/feed" );
request.IfModifiedSince = lastModified;
try {
using ( var response = (HttpWebResponse)request.GetResponse() ) {
lastModified = response.LastModified;
using ( var stream = response.GetResponseStream() ) {
//*** parsing the stream
var reader = XmlReader.Create( stream );
SyndicationFeed feed = SyndicationFeed.Load( reader );
}
}
}
catch ( WebException e ) {
var response = (HttpWebResponse)e.Response;
if ( response.StatusCode != HttpStatusCode.NotModified )
throw; // rethrow an unexpected web exception
}

Getting a Stream from an absolute path?

I have this method:
public RasImage Load(Stream stream);
if I want to load a url like:
string _url = "http://localhost/Application1/Images/Icons/hand.jpg";
How can I make this url in to a stream and pass it into my load method?
Here's one way. I don't really know if it's the best way or not, but it works.
// requires System.Net namespace
WebRequest request = WebRequest.Create(_url);
using (var response = request.GetRespone())
using (var stream = response.GetResponseStream())
{
RasImage image = Load(stream);
}
UPDATE: It looks like in Silverlight, the WebRequest class has no GetResponse method; you've no choice but to do this asynchronously.
Below is some sample code illustrating how you might go about this. (I warn you: I wrote this just now, without putting much thought into how sensible it is. How you choose to implement this functionality would likely be quite different. Anyway, this should at least give you a general idea of what you need to do.)
WebRequest request = WebRequest.Create(_url);
IAsyncResult getResponseResult = request.BeginGetResponse(
result =>
{
using (var response = request.EndGetResponse(result))
using (var stream = response.GetResponseStream())
{
RasImage image = Load(stream);
// Do something with image.
}
},
null
);
Console.WriteLine("Waiting for response from '{0}'...", _url);
getResponseResult.AsyncWaitHandle.WaitOne();
Console.WriteLine("The stream has been loaded. Press Enter to quit.");
Console.ReadLine();
Dan's answer is a good one, though you're requesting from localhost. Is this a file you can access from the filesystem? If so, I think you should be able to just pass in a FileStream:
FileStream stream = new FileStream(#"\path\to\file", FileMode.Open);

How do I update my UI from within HttpWebRequest.BeginGetRequestStream in Silverlight

I am uploading multiple files using the BeginGetRequestStream of HttpWebRequest but I want to update the progress control I have written whilst I post up the data stream.
How should this be done, I have tried calling Dispatch.BeginInvoke (as below) from within the loop that pushes the data into the stream but it locks the browser until its finished so it seems to be in some sort of worker/ui thread deadlock.
This is a code snippet of pretty much what I am doing:
class RequestState
{
public HttpWebRequest request; // holds the request
public FileDialogFileInfo file; // store our file stream data
public RequestState( HttpWebRequest request, FileDialogFileInfo file )
{
this.request = request;
this.file = file;
}
}
private void UploadFile( FileDialogFileInfo file )
{
UriBuilder ub = new UriBuilder( app.receiverURL );
ub.Query = string.Format( "filename={0}", file.Name );
// Open the selected file to read.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create( ub.Uri );
request.Method = "POST";
RequestState state = new RequestState( request, file );
request.BeginGetRequestStream( new AsyncCallback( OnUploadReadCallback ), state );
}
private void OnUploadReadCallback( IAsyncResult asynchronousResult )
{
RequestState state = (RequestState)asynchronousResult.AsyncState;
HttpWebRequest request = (HttpWebRequest)state.request;
Stream postStream = request.EndGetRequestStream( asynchronousResult );
PushData( state.file, postStream );
postStream.Close();
state.request.BeginGetResponse( new AsyncCallback( OnUploadResponseCallback ), state.request );
}
private void PushData( FileDialogFileInfo file, Stream output )
{
byte[] buffer = new byte[ 4096 ];
int bytesRead = 0;
Stream input = file.OpenRead();
while( ( bytesRead = input.Read( buffer, 0, buffer.Length ) ) != 0 )
{
output.Write( buffer, 0, bytesRead );
bytesReadTotal += bytesRead;
App app = App.Current as App;
int totalPercentage = Convert.ToInt32( ( bytesReadTotal / app.totalBytesToUpload ) * 100 );
// enabling the following locks up my UI and browser
Dispatcher.BeginInvoke( () =>
{
this.ProgressBarWithPercentage.Percentage = totalPercentage;
} );
}
}
I was going to say that, I didn't think that Silverlight 2's HttpWebRequest supported streaming, because the request data gets buffered into memory entirely. It had been a while since the last time I looked at it though, therefore I went back to see if Beta 2 supported it. Well turns out it does. I am glad I went back and read before stating that. You can enable it by setting AllowReadStreamBuffering to false. Did you set this property on your HttpWebRequest? That could be causing your block.
MSDN Reference
File upload component for Silverlight and ASP.NET
Edit, found another reference for you. You may want to follow this approach by breaking the file into chunks. This was written last March, therefore I am not sure if it will work in Beta 2 or not.
Thanks for that, I will take a look at those links, I was considering chunking my data anyway, seems to be the only way I can get any reasonable progress reports out of it.

Categories

Resources