Fetching string from webpage into C# Form application - c#

I want to set up a webpage with one small simple string on it. No images, no CSS, not anything else. Just a plain string that is not very long (maximum 20 char).
What is the best way to fetch this string with a C# Form application to use it for a variety of different purposes in the program, like showing people what the newest version of the software is when they boot it up.

I'd use System.Net.WebClient myself like this:
using (var client = new WebClient())
{
string result = client.DownloadString("http://www.test.com");
// TODO: do something with the downloaded result from the remote
}

Try using System.Net.WebClient:
string remoteUri = "http://www.myserver.com/mypage.aspx";
WebClient myWebClient = new WebClient();
string version = myWebClient.DownloadString(remoteUri);

Related

C# WebClient DownloadString and DownloadFile giving different results

I am attempting to retrieve some information from a website, parse out a specific item, and then move on with my life.
I noticed that when I check "view source" on the website, the results match with what I see when I use the WebClient class' method of DownloadFile. On the other hand, when I use the DownloadString method, the contents of that string are different from both view source and DownloadFile.
I need DownloadString to return similar contents to view source and DownloadFile. Any suggestions? My relevant code is below:
string criticalPathUrl = "http://blahblahblah&sessionId=" + sessionId;
WebClient wc = new WebClient();
wc.Encoding = System.Text.Encoding.UTF8;
//this is different
string urlContentsString = wc.DownloadString(criticalPathUrl);
//than this
wc.DownloadFile(criticalPathUrl, "rawDlTxt2.txt");
Edit: Please ignore this question as I just didn't scroll up far enough. Ugh. One of those days.
use download data instead of downloadstring and use suitable encoding to convert the string then save the file!
watch details: https://www.pavey.me/2016/04/aspnet-c-downloadstring-vs-downloaddata.html

How to detect the origin of a webpage's GET requests programmatically? (C#)

In short, I need to detect a webpage's GET requests programmatically.
The long story is that my company is currently trying to write a small installer for a piece of proprietary software that installs another piece of software.
To get this other piece of software, I realize it's as simple as calling the download link through C#'s lovely WebClient class (Dir is just the Temp directory in AppData/Local):
using (WebClient client = new WebClient())
{
client.DownloadFile("[download link]", Dir.FullName + "\\setup.exe");
}
However, the page which the installer comes from does is not a direct download page. The actual download link is subject to change (our company's specific installer might be hosted on a different download server another time around).
To get around this, I realized that I can just monitor the GET requests the page makes and dynamically grab the URL from there.
So, I know I'm going to do, but I was just wondering, is there was a built-in part of the language that allows you to see what requests a page has made? Or do I have to write this functionality myself, and what would be a good starting point?
I think I'd do it like this. First download the HTML contents of the download page (the page that contains the link to download the file). Then scrape the HTML to find the download link URL. And finally, download the file from the scraped address.
using (WebClient client = new WebClient())
{
// Get the website HTML.
string html = client.DownloadString("http://[website that contains the download link]");
// Scrape the HTML to find the download URL (see below).
// Download the desired file.
client.DownloadFile(downloadLink, Dir.FullName + "\\setup.exe");
}
For scraping the download URL from the website I'd recommend using the HTML Agility Pack. See here for getting started with it.
I think you have to write your own "mediahandler", which returns a HttpResponseMessage.
e.g. with webapi2
[HttpGet]
[AllowAnonymous]
[Route("route")]
public HttpResponseMessage GetFile([FromUri] string path)
{
HttpResponseMessage result = new HttpResponseMessage(HttpStatusCode.OK);
result.Content = new StreamContent(new FileStream(path, FileMode.Open, FileAccess.Read));
string fileName = Path.GetFileNameWithoutExtension(path);
string disposition = "attachment";
result.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue(disposition) { FileName = fileName + Path.GetExtension(absolutePath) };
result.Content.Headers.ContentType = new MediaTypeHeaderValue(MimeMapping.GetMimeMapping(Path.GetExtension(path)));
return result;
}

PhantomJS pass HTML string and return page source

for a web crawler project in C# I try to execute Javascript and Ajax to retrieve the full page source of a crawled page.
I am using an existing web crawler (Abot) that needs a valid HttpWebResponse object. Therefore I cannot simply use driver.Navigate().GoToUrl() method to retrieve the page source.
The crawler downloads the page source and I want to execute the existing Javascript/Ajax inside the source.
In a sample project I tried the following without success:
WebClient wc = new WebClient();
string content = wc.DownloadString("http://www.newegg.com/Product/Product.aspx?Item=N82E16834257697");
string tmpPath = Path.Combine(Path.GetTempPath(), "temp.htm");
File.WriteAllText(tmpPath, content);
var driverService = PhantomJSDriverService.CreateDefaultService();
var driver = new PhantomJSDriver(driverService);
driver.Navigate().GoToUrl(new Uri(tmpPath));
string renderedContent = driver.PageSource;
driver.Quit();
You need the following nuget packages to run the sample:
https://www.nuget.org/packages/phantomjs.exe/
http://www.nuget.org/packages/selenium.webdriver
Problem here is that the code stops at GoToUrl() and it takes several minutes until program terminates without even giving me the driver.PageSource.
Doing this returns the correct HTML:
driver.Navigate().GoToUrl("http://www.newegg.com/Product/Product.aspx?Item=N82E16834257697");
string renderedContent = driver.PageSource;
But I don't want to download the data twice. The crawler (Abot) downloads the HTML and I just want to parse/render the javascript and ajax.
Thank you!
Without running it, I would bet you need file:/// prior to tmpPath. That is:
WebClient wc = new WebClient();
string content = wc.DownloadString("http://www.newegg.com/Product/Product.aspx?Item=N82E16834257697");
string tmpPath = Path.Combine(Path.GetTempPath(), "temp.htm");
File.WriteAllText(tmpPath, content);
var driverService = PhantomJSDriverService.CreateDefaultService();
var driver = new PhantomJSDriver(driverService);
driver.Navigate().GoToUrl(new Uri("file:///" + tmpPath));
string renderedContent = driver.PageSource;
driver.Quit();
You probably need to allow PhantomJS to make arbitrary requests. Requests are blocked when the domain/protocol doesn't match as is the case when a local file is opened.
var driverService = PhantomJSDriverService.CreateDefaultService();
driverService.LocalToRemoteUrlAccess = true;
driverService.WebSecurity = false; // may not be necessary
var driver = new PhantomJSDriver(driverService);
You might need to combine this with the solution of Dave Bush:
driver.Navigate().GoToUrl(new Uri("file:///" + tmpPath));
Some of the resources have URLs that begin with // which means that the protocol of the page is used when the browser retrieves those resources. When a local file is read, this protocol is file:// in which case none of those resources will be found. The protocol must be added to the local file in order to download all those resources.
File.WriteAllText(tmpPath, content.Replace('"//', '"http://'));
It is apparent from your output that you use PhantomJS 1.9.8. It may be the case that a newly introduced bug is responsible for this sort of thing. You should user PhantomJS 1.9.7 with driverService.SslProcotol = 'tlsv1'.
You should also enable the disk cache if you do this multiple times for the same domain. Otherwise, the resources are downloaded each time you try to scrape it. This can be done with driverService.DiskCache = true;

What's the most efficient way to visit a .html page?

I have a .html page that just has 5 characters on it (4 numbers and a period).
The only way I know of is to make a webbrowser that navigates to a URL, then use
browser.GetElementByID();
However that uses IE so I'm sure it's slow. Is there any better way (without using an API, something built into C#) to simply visit a webpage in a fashion that you can read off of it?
Try these 2 lines:
var wc = new System.Net.WebClient();
string html = wc.DownloadString("http://google.com"); // Your page will be in that html variable
It appears that you want to download a url, parse it as html then to find an element and read its inner text, right? Use nuget to grab a reference to HtmlAgilityPack, then:
using(var wc = new System.Net.WebClient()){
string html = wc.DownloadString("http://foo.com");
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var el = doc.GetElementbyId("foo");
if(el != null)
{
var text = el.InnerText;
Console.WriteLine(text);
}
}
Without using any APIs? You're in the .NET framework, so you're already using an abstraction layer to some extent. But if you want pure C# without any addons, you could just open a TCP socket to the site and download the contents (it's just a formatted string, after all) and read the data.
Here's a similar question: How to get page via TcpClient?

How can I migrate email functionality from ASP Classic to ASP.NET?

I previously used CDO.Message and CDO.Configuration in ASP Classic to create HTML emails which was VERY simple to do. In .NET, it appears that you have to give the System.Net.Mail.Message object an HTML string for the content and then somehow embed the required images. Is there an easy way to do this in .NET? I'm pretty new to .NET MVC and would most appreciate any help.
This is how it looks in ASP Classic:
Set objCDO = Server.CreateObject("CDO.Message")
objCDO.To = someone#somthing.com
objCDO.From = me#myaddress.com
objCDO.CreateMHTMLBody "http://www.example.com/somepage.html"
objCDO.Subject = sSubject
'the following are for advanced CDO schematics
'for authentication and external SMTP
Set cdoConfig = CreateObject("CDO.Configuration")
With cdoConfig.Fields
.Item(cdoSendUsingMethod) = cdoSendUsingPort '2 - send using port
.Item(cdoSMTPServer) = mail.myaddress.com
.Item(cdoSMTPServerPort) = 25
.Item(cdoSMTPConnectionTimeout) = 10
.Item(cdoSMTPAuthenticate) = cdoBasic
.Item(cdoSendUsername) = "myusername"
.Item(cdoSendPassword) = "mypassword"
.Update
End With
Set objCDO.Configuration = cdoConfig
objCDO.Send
Basically I would like to send one of my views (minus site.master) as an email, images embedded.
I don't know of a simple way right off, but you could use WebClient to get your page, then pass the response as the body.
Example:
var webClient = new WebClient();
byte[] returnFromPost = webClient.UploadValues(Url, Inputs);
var utf = new UTF8Encoding();
string returnValue = utf.GetString(returnFromPost);
return returnValue;
Note: Inputs is just a dictionary of post variables.
One problem I think you'll run into right off is that I don't think you'd get the images. You could parse the HTML you get and then make the images absolute back to your server.
Thank you both for your help - here is a very clean and comprehensive tutorial posted by a .NET MVP
http://msdn.microsoft.com/en-us/vbasic/bb630227.aspx

Categories

Resources