c# windows forms, load first google image in app itself - c#

I am wondering, if its possible to display the first image of google picture search in the visual studio windows form.
The way I imagine this would work, is that a person enters a string, then the app googles the string, copies the first image, and then displays it in the app itself.
Thank you.
EDIT: Please consider, that I am a beginner in C# programming, so if you are going to use some difficult coding or suggest to use some APIs, could you please explain in more detail how to do so, thank you.

Short answer, Yes.
We know the URL to get an image is
https://www.google.co.uk/search?q=plane&tbm=isch&site=imghp
On the Form create a PictureBox(Call it pbImage), a TextBox(Call it tbSearch), a Button(Call it btnLookup).
Using the Nuget Package Manager (Tools-> Nuget.. -> Manage..), select browse and search for HtmlAgilityPack. Click the your project on the right and then click install.
When we send a request to google using System.Net.WebClient there is no javascript being executed (however this can be done by some trickery with the winforms web browser).
As there is no javascript the page will be rendered differently to what you are used to. Inspecting the page without javascript tells us the following flow of the page:
Within the document body a table with a class called 'images_table'
Within that we can find several img elements.
Here is a code listing:
private void btnLookup_Click(object sender, EventArgs e)
{
string templateUrl = #"https://www.google.co.uk/search?q={0}&tbm=isch&site=imghp";
//check that we have a term to search for.
if (string.IsNullOrEmpty(tbSearch.Text))
{
MessageBox.Show("Please supply a search term"); return;
}
else
{
using (WebClient wc = new WebClient())
{
//lets pretend we are IE8 on Vista.
wc.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)");
string result = wc.DownloadString(String.Format(templateUrl, new object[] { tbSearch.Text }));
//we have valid markup, this will change from time to time as google updates.
if (result.Contains("images_table"))
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(result);
//lets create a linq query to find all the img's stored in that images_table class.
/*
* Essentially we get search for the table called images_table, and then get all images that have a valid src containing images?
* which is the string used by google
eg https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcQmGxh15UUyzV_HGuGZXUxxnnc6LuqLMgHR9ssUu1uRwy0Oab9OeK1wCw
*/
var imgList = from tables in doc.DocumentNode.Descendants("table")
from img in tables.Descendants("img")
where tables.Attributes["class"] != null && tables.Attributes["class"].Value == "images_table"
&& img.Attributes["src"] != null && img.Attributes["src"].Value.Contains("images?")
select img;
byte[] downloadedData = wc.DownloadData(imgList.First().Attributes["src"].Value);
if (downloadedData != null)
{
//store the downloaded data in to a stream
System.IO.MemoryStream ms = new System.IO.MemoryStream(downloadedData, 0, downloadedData.Length);
//write to that stream the byte array
ms.Write(downloadedData, 0, downloadedData.Length);
//load an image from that stream.
pbImage.Image = Image.FromStream(ms);
}
}
}
}
}
Using System.Net.WebClient a request is sent to google using the url specified in the template string.
Adding headers makes the request looks more genuine. WebClient is used to download the markup, this is stored in result.
HtmlAgilityPack.HtmlDocument create a document object, we then load the data that was stored in result.
A Linq query is obtains the img elements, taking the first in that list we download the data and store it in a byte array.
With that data a memory stream is created (this should be encapsulated in a using().)
Write the data into the memory stream, then load that stream into the picture boxes image.

Related

C#.Net Download Image from URL, Crop, and Upload without Saving or Displaying

I have a large number of images on a Web server that need to be cropped. I would like to automate this process.
So my thought is to create a routine that, given the URL of the image, downloads the image, crops it, then uploads it back to the server (as a different file). I don't want to save the image locally, and I don't want to display the image to the screen.
I already have a project in C#.Net that I'd like to do this in, but I could do .Net Core if I have to.
I have looked around, but all the information I could find for downloading an image involves saving the file locally, and all the information I could find about cropping involves displaying the image to the screen.
Is there a way to do what I need?
It's perfectly possible to issue a GET request to a URL and have the response returned to you as a byte[] using HttpClient.GetByteArrayAsync. With that binary content, you can read it into an Image using Image.FromStream.
Once you have that Image object, you can use the answer from here to do your cropping.
//Note: You only want a single HttpClient in your application
//and re-use it where possible to avoid socket exhaustion issues
using (var httpClient = new HttpClient())
{
//Issue the GET request to a URL and read the response into a
//stream that can be used to load the image
var imageContent = await httpClient.GetByteArrayAsync("<your image url>");
using (var imageBuffer = new MemoryStream(imageContent))
{
var image = Image.FromStream(imageBuffer);
//Do something with image
}
}

How to detect the origin of a webpage's GET requests programmatically? (C#)

In short, I need to detect a webpage's GET requests programmatically.
The long story is that my company is currently trying to write a small installer for a piece of proprietary software that installs another piece of software.
To get this other piece of software, I realize it's as simple as calling the download link through C#'s lovely WebClient class (Dir is just the Temp directory in AppData/Local):
using (WebClient client = new WebClient())
{
client.DownloadFile("[download link]", Dir.FullName + "\\setup.exe");
}
However, the page which the installer comes from does is not a direct download page. The actual download link is subject to change (our company's specific installer might be hosted on a different download server another time around).
To get around this, I realized that I can just monitor the GET requests the page makes and dynamically grab the URL from there.
So, I know I'm going to do, but I was just wondering, is there was a built-in part of the language that allows you to see what requests a page has made? Or do I have to write this functionality myself, and what would be a good starting point?
I think I'd do it like this. First download the HTML contents of the download page (the page that contains the link to download the file). Then scrape the HTML to find the download link URL. And finally, download the file from the scraped address.
using (WebClient client = new WebClient())
{
// Get the website HTML.
string html = client.DownloadString("http://[website that contains the download link]");
// Scrape the HTML to find the download URL (see below).
// Download the desired file.
client.DownloadFile(downloadLink, Dir.FullName + "\\setup.exe");
}
For scraping the download URL from the website I'd recommend using the HTML Agility Pack. See here for getting started with it.
I think you have to write your own "mediahandler", which returns a HttpResponseMessage.
e.g. with webapi2
[HttpGet]
[AllowAnonymous]
[Route("route")]
public HttpResponseMessage GetFile([FromUri] string path)
{
HttpResponseMessage result = new HttpResponseMessage(HttpStatusCode.OK);
result.Content = new StreamContent(new FileStream(path, FileMode.Open, FileAccess.Read));
string fileName = Path.GetFileNameWithoutExtension(path);
string disposition = "attachment";
result.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue(disposition) { FileName = fileName + Path.GetExtension(absolutePath) };
result.Content.Headers.ContentType = new MediaTypeHeaderValue(MimeMapping.GetMimeMapping(Path.GetExtension(path)));
return result;
}

How to save dynamic image in WatIn [duplicate]

This question already has answers here:
How do I get a bitmap of the WatiN image element?
(5 answers)
Closed 8 years ago.
I am trying to save an image that changes dynamically with each request.
I tried WatIn and HttpWebRequest (getting new image)
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("www.test.com");
request.AllowAutoRedirect = false;
WebResponse response = request.GetResponse();
using (Stream stream = response.GetResponseStream())
using (FileStream fs = File.OpenWrite(ImageCodePath))
{
byte[] bytes = new byte[1024];
int count;
while ((count = stream.Read(bytes, 0, bytes.Length)) != 0)
{
fs.Write(bytes, 0, count);
}
}
and (User32.dll) URLDownloadToFile (getting new image)
[DllImport("urlmon.dll", CharSet = CharSet.Auto, SetLastError = true)]
static extern Int32 URLDownloadToFile(Int32 pCaller, string szURL, string szFileName, Int32 dwReserved, Int32 lpfnCB);
URLDownloadToFile(0, "https://test.reporterImages.php?MAIN_THEME=1", ImageCodePath, 0, 0);
I looked in all the temp folders and still can't find the image.
Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.InternetCache),"Content.IE5");
Every time I try to save it the server builds a new image and returns it. If I right click on the image and choose:
Save picture as...
It will save the picture. I need to somehow implement this method (Right-click, Save picture as...) in WatIn in IE, or somehow download it with HttpRequest from my HTML page without server interaction.
Does anyone know how I can do this?
As i understand, the idea is to capture current CAPTCHA image on page in browser to bypass it by some text recognition (btw, strange idea). And no problem to get image url (it is always the same). In this case you can use API to access browser cache.
Specifically for IE FindFirstUrlCacheEntry/FindNextUrlCacheEntry
It can help if your application hosts WebBrowser.
Use WebClient:
using (var client = new WebClient())
{
client.DownloadFile("http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=3c6263c3453b", "so-sprites.png");
}
If you provide an url to html page that contains an img that you want to download and not directly to an image you can still use a web client with two consecutive requests. First is to scrap the image url from the page source (not required if url is static) and second to actually download the image.
Edit:
Sorry. It was not clear from the start. So you host a browser in your app and you see the actual image in it. You want to download the image you are seeing. Of course, making another request won't work because another image will be generated.
The solution depends on the web browser control that you use.
For WebBrowser (System.Windows.Forms) you can use IHTMLDocument2. See example here:
Saving image shown in web browser

What's the most efficient way to visit a .html page?

I have a .html page that just has 5 characters on it (4 numbers and a period).
The only way I know of is to make a webbrowser that navigates to a URL, then use
browser.GetElementByID();
However that uses IE so I'm sure it's slow. Is there any better way (without using an API, something built into C#) to simply visit a webpage in a fashion that you can read off of it?
Try these 2 lines:
var wc = new System.Net.WebClient();
string html = wc.DownloadString("http://google.com"); // Your page will be in that html variable
It appears that you want to download a url, parse it as html then to find an element and read its inner text, right? Use nuget to grab a reference to HtmlAgilityPack, then:
using(var wc = new System.Net.WebClient()){
string html = wc.DownloadString("http://foo.com");
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var el = doc.GetElementbyId("foo");
if(el != null)
{
var text = el.InnerText;
Console.WriteLine(text);
}
}
Without using any APIs? You're in the .NET framework, so you're already using an abstraction layer to some extent. But if you want pure C# without any addons, you could just open a TCP socket to the site and download the contents (it's just a formatted string, after all) and read the data.
Here's a similar question: How to get page via TcpClient?

Pulling data from a webpage, parsing it for specific pieces, and displaying it

I've been using this site for a long time to find answers to my questions, but I wasn't able to find the answer on this one.
I am working with a small group on a class project. We're to build a small "game trading" website that allows people to register, put in a game they have they want to trade, and accept trades from others or request a trade.
We have the site functioning long ahead of schedule so we're trying to add more to the site. One thing I want to do myself is to link the games that are put in to Metacritic.
Here's what I need to do. I need to (using asp and c# in visual studio 2012) get the correct game page on metacritic, pull its data, parse it for specific parts, and then display the data on our page.
Essentially when you choose a game you want to trade for we want a small div to display with the game's information and rating. I'm wanting to do it this way to learn more and get something out of this project I didn't have to start with.
I was wondering if anyone could tell me where to start. I don't know how to pull data from a page. I'm still trying to figure out if I need to try and write something to automatically search for the game's title and find the page that way or if I can find some way to go straight to the game's page. And once I've gotten the data, I don't know how to pull the specific information I need from it.
One of the things that doesn't make this easy is that I'm learning c++ along with c# and asp so I keep getting my wires crossed. If someone could point me in the right direction it would be a big help. Thanks
This small example uses HtmlAgilityPack, and using XPath selectors to get to the desired elements.
protected void Page_Load(object sender, EventArgs e)
{
string url = "http://www.metacritic.com/game/pc/halo-spartan-assault";
var web = new HtmlAgilityPack.HtmlWeb();
HtmlDocument doc = web.Load(url);
string metascore = doc.DocumentNode.SelectNodes("//*[#id=\"main\"]/div[3]/div/div[2]/div[1]/div[1]/div/div/div[2]/a/span[1]")[0].InnerText;
string userscore = doc.DocumentNode.SelectNodes("//*[#id=\"main\"]/div[3]/div/div[2]/div[1]/div[2]/div[1]/div/div[2]/a/span[1]")[0].InnerText;
string summary = doc.DocumentNode.SelectNodes("//*[#id=\"main\"]/div[3]/div/div[2]/div[2]/div[1]/ul/li/span[2]/span/span[1]")[0].InnerText;
}
An easy way to obtain the XPath for a given element is by using your web browser (I use Chrome) Developer Tools:
Open the Developer Tools (F12 or Ctrl + Shift + C on Windows or Command + Shift + C for Mac).
Select the element in the page that you want the XPath for.
Right click the element in the "Elements" tab.
Click on "Copy as XPath".
You can paste it exactly like that in c# (as shown in my code), but make sure to escape the quotes.
You have to make sure you use some error handling techniques because Web scraping can cause errors if they change the HTML formatting of the page.
Edit
Per #knocte's suggestion, here is the link to the Nuget package for HTMLAgilityPack:
https://www.nuget.org/packages/HtmlAgilityPack/
I looked and Metacritic.com doesn't have an API.
You can use an HttpWebRequest to get the contents of a website as a string.
using System.Net;
using System.IO;
using System.Windows.Forms;
string result = null;
string url = "http://www.stackoverflow.com";
WebResponse response = null;
StreamReader reader = null;
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
response = request.GetResponse();
reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
result = reader.ReadToEnd();
}
catch (Exception ex)
{
// handle error
MessageBox.Show(ex.Message);
}
finally
{
if (reader != null)
reader.Close();
if (response != null)
response.Close();
}
Then you can parse the string for the data that you want by taking advantage of Metacritic's use of meta tags. Here's the information they have available in meta tags:
og:title
og:type
og:url
og:image
og:site_name
og:description
The format of each tag is: meta name="og:title" content="In a World..."
I recommend Dcsoup. There's a nuget package for it and it uses CSS selectors so it is familiar if you use jquery. I've tried others but it is the best and easiest to use that I've found. There's not much documentation, but it's open source and a port of the java jsoup library that has good documentation. (Documentation for the .NET API here.) I absolutely love it.
var timeoutInMilliseconds = 5000;
var uri = new Uri("http://www.metacritic.com/game/pc/fallout-4");
var doc = Supremes.Dcsoup.Parse(uri, timeoutInMilliseconds);
// <span itemprop="ratingValue">86</span>
var ratingSpan = doc.Select("span[itemprop=ratingValue]");
int ratingValue = int.Parse(ratingSpan.Text);
// selectors match both critic and user scores
var scoreDiv = doc.Select("div.score_summary");
var scoreAnchor = scoreDiv.Select("a.metascore_anchor");
int criticRating = int.Parse(scoreAnchor[0].Text);
float userRating = float.Parse(scoreAnchor[1].Text);
I'd recomend you WebsiteParser - it's based on HtmlAgilityPack (mentioned by Hanlet EscaƱo) but it makes web scraping easier with attributes and css selectors:
class PersonModel
{
[Selector("#BirdthDate")]
[Converter(typeof(DateTimeConverter))]
public DateTime BirdthDate { get; set; }
}
// ...
PersonModel person = WebContentParser.Parse<PersonModel>(html);
Nuget link

Categories

Resources