So I was wondering how I would get the source of an iFrame within the Page Source of the webrequest that has been made.
Example of what I mean:
string text = streamReader.ReadToEnd(); // Sets the string Text to the source of the page.
Now the string text holds the source of the page.
And within that page of the source is
Authenticator
</h3>
</div>
<div class="RaggedBoxContainer"><div class="RaggedBoxBg"><div class="RaggedBoxTop"></div><div class="RaggedBoxContent">
<iframe src="https://secure.runescape.com/m=totp-authenticator/a=13/c=zBsBJTw2E0M/accountinfo" allowtransparency="true" frameborder="0"></iframe>
</div><div class="RaggedBoxBottom"></div></div></div>
</div>
And I need it to read the source of the iFrame which is:
<h2 class="accountSettingsTitle">RuneScape Authenticator is enabled</h2>
<p>Your account is protected from hijackers. You will need your code generator each time you log in to RuneScape.</p>
<p>It's also really important to keep your email account secure. <a target="_top" href="https://support.runescape.com/hc/en-gb/articles/207258145">Find out how to do this.</a>
<p>You can <a target="_top" href="cape.com/m=totp-authenticator/a=13/c=zBsBJTw2E0M/disableTOTPRequest">disable</a> Authenticator - but remember this will make your account much less secure.</p>
How would I do that?
I know that in the webbrowser it will be:
foreach (HtmlElement elm in webBrowser1.Document.GetElementsByTagName("iframe"))
{
string src = elm.GetAttribute("src");
if (src != null && src != "")
{
string content = new System.Net.WebClient().DownloadString(src); //or using HttpWebRequest
MessageBox.Show(content);
}
}
Please help me out im confused.
Related
I am scraping a website (https://www.greenlee.com/us/en/elec-bender-classic-w-single-emt-shoes-555esc) using HTMLAgility Pack. I want to get the image src. However, I'm getting an empty result(?).
Here's my code:
HtmlWeb web = new HtmlWeb();
var htmldoc = web.Load(theurl);
var htmlnode = htmldoc.DocumentNode.SelectNodes("//div[contains(#class,'thumb-sizer')");
foreach (var item in htmlnode)
{
Console.WriteLine(item.InnerHtml);
}
Now, the result is this (also when viewing pagesource)
<img ng-src="{{image.thumbnailUrl}}" alt="{{image.title}}" title="{{image.title}}">
But, in the developer tools, I'm seeing this
<div class="thumb-sizer">
<img ng-src="https://cdn.greenlee.com/resources/images/c039f03f-cb77-
4c28-9a78-af339c773365" alt="ELECTRIC BENDER CLASSIC W/SINGLE EMT
SHOES" title="ELECTRIC BENDER CLASSIC W/SINGLE EMT SHOES"
src="https://cdn.greenlee.com/resources/images/c039f03f-cb77-4c28-
9a78-af339c773365">
</div>
I did a little bit of research, unfortunately, I haven't found a workaround on this one. I want to extract the img src link or value but from the result of the InnerHtml, there is no src value.
Ok. So I found this code online everything is working on it but it shows me the div class I am searching for but removes all the text. Any idea why? Heres a example of what its outputting...
<div class="marquee"><img src="logo.png" /></div>
<div id="joke">
<div id="setup" class="exit-left"></div>
<div id="punchline">
<div class="question"></div>
<div id="zing" class="exit-right"></div>
</div>
</div>
<div id="buttons">
<input id="tell-me" class="exit-bottom no-select" type="button" value="Tell Me!" />
<!--<input id="another" class="exit-bottom" type="button" value="Another!" />-->
<table class="another exit-bottom no-select">
<tr>
<td class="another" colspan="3">Another</td>
<td class="share"><div class="share-img"></div>Share</td>
</tr>
</table>
</div>
And the innertext is not shown at all...
And here is my code is VS.
var doc = new HtmlAgilityPack.HtmlDocument();
HtmlAgilityPack.HtmlNode.ElementsFlags["br"] = HtmlAgilityPack.HtmlElementFlag.Empty;
doc.OptionWriteEmptyNodes = true;
try
{
var webRequest = HttpWebRequest.Create("http://dadjokegenerator.com/");
Stream stream = webRequest.GetResponse().GetResponseStream();
doc.Load(stream);
stream.Close();
}
catch (System.UriFormatException uex)
{
throw;
}
catch (System.Net.WebException wex)
{
throw;
}
//get the div by id and then get the inner text
doc.GetElementbyId("content").InnerHtml;
await e.Channel.SendMessage("test " + divString); `
Although your code correctly downloads content of page http://dadjokegenerator.com/, InnerHtml is empty, because this page actually doesn't contain joke you are looking for (you can see that, if you display source code of page in you web browser - e.g. in Firefox press CTRL+U). Joke is added to this page later by javascript. If you look at source code of this Javascript at http://dadjokegenerator.com/js/main.js, you can see that individual jokes are downloaded from URL http://dadjokegenerator.com/api/api.php?a=j<=r&vj=0
Here is minimal sample to download joke from this URL. I ommited all error-checks for simplicity and I used free Json.NET library for JSON deserialization:
public class Joke
{
public int Id;
public string Setup;
public string Punchline;
public override string ToString()
{
return Setup + " " + Punchline;
}
}
public static Joke GetJoke()
{
var request = HttpWebRequest.Create("http://dadjokegenerator.com/api/api.php?a=j<=r&vj=0");
using (var response = request.GetResponse())
{
using (var stream = response.GetResponseStream())
{
using (var reader = new StreamReader(stream))
{
var jokeString = reader.ReadToEnd();
Joke[] jokes = JsonConvert.DeserializeObject<Joke[]>(jokeString);
return jokes.FirstOrDefault();
}
}
}
}
Usage is e.g.
GetJoke().ToString();
These links show how to read a web page.
Html Agility Pack. Load and scrape webpage
Get HTML code from website in C#
I am have simple script where i save news details like News Title, News URL and News Image URL. i noticed that image doesn't show when it has Unicode characters take for example http://www.bbj.hu/images2/201412/párizsi_ud_20141218113410452.jpg
It is stored in database as it is but when i display it on web page breaks & shows as
http://www.bbj.hu/images2/201412/pa%c2%b4rizsi_ud_20141218113410452.jpg
When i debug my asp.net webform page it show correctly in the code behind
protected String getImage(object imgSource)
{
string img = null;
img = imgSource.ToString();
return img;
// Debug show image url properly but it breaks on actual page
}
.aspx code
<asp:Image ID="NewsImage" ImageUrl='<%# getImage(Eval("NewsImageURL")) %>' runat="server" />
I tried different things but it keeps showing up as http://www.bbj.hu/images2/201412/pa%c2%b4rizsi_ud_20141218113410452.jpg
How can i fix this
Your problem's solution must be among one or more of the following:
C# URL Encode/Decode:
string encodedUrl = HttpUtility.UrlEncode(myUrl);
string sameMyUrl = HttpUtility.UrlDecode(encodedUrl);
Javascript URL Encode/Decode:
function myFunction() {
var uri = myUrl;
var uri_enc = encodeURIComponent(uri);
var uri_dec = decodeURIComponent(uri_enc);
}
C# HTML Encode/Decode:
string encodedHtml = HttpUtility.HtmlEncode(myHtml);
string sameMyHtml = HttpUtility.HtmlDecode(encodedHtml);
Javascript HTML Encode/Decode:
function htmlEncode(value) {
//create a in-memory div, set its inner text (which jQuery automatically encodes)
//then grab the encoded contents back out. The div never exists on the page.
return $('<div/>').text(value).html();
}
function htmlDecode(html) {
return $('<div>').html(html).text();
}
Via that code i have extracted all desired text out of a html document
private void RunThroughSearch(string url)
{
private IWebDriver driver;
driver = new FirefoxDriver();
INavigation nav = driver.Navigate();
nav.GoToUrl(url);
var div = driver.FindElement(By.Id("results"));
var element = driver.FindElements(By.ClassName("sa_wr"));
}
though as i need to refine results of extracted document
Container
HEADER -> Title of a given block
Url -> Link to the relevant block
text -> body of a given block
/Container
as u can see in my code i am able to get the value of the text part
as a text value , that was fine, but what if i want to have
the value of the container as HTML and not the extracted text ?
<div class="container">
<div class="Header"> Title...</div>
<div class="Url"> www.example.co.il</div>
<div class="ResConent"> bla.. </div>
</div>
so the container is about 10 times in a page
i need to extract it's innerHtml .
any ideas ? (using Selenium)
This seemed to work for me, and is less code:
var element = driver.FindElement(By.ClassName("sa_wr"));
var innerHtml = element.GetAttribute("innerHTML");
Find the element first, then use IJavaScriptExecutor to get the inner HTML.
var element = driver.FindElements(By.ClassName("sa_wr"));
IJavaScriptExecutor js = driver as IJavaScriptExecutor;
if (js != null) {
string innerHtml = (string)js.ExecuteScript("return arguments[0].innerHTML;", element);
}
I found the solution from SQA-SO
IWebDriver driver;
IJavaScriptExecutor js = driver as IJavaScriptExecutor;
js.ExecuteScript("document.getElementById("title").innerHTML = "New text!";");
I am working on an application in MVC 4.0 dot net.When user first time visit the application a mail is sent to the user.For sending the mail i am using the WCF mail service implemented in our company.
I have created a .txt file having the HTML for mail format
this is the code I am using for sending the mail
public void SendWelcomeMail(string name, string email, string filePath)
{
try
{
string subject = ConfigurationManager.AppSettings["WelcomeMailSub"];
string supportMail = ConfigurationManager.AppSettings["supportMail"];
using (StreamReader reader = File.OpenText(filePath))
{
string text = reader.ReadToEnd();
text = string.Format(text, name);
Mail mails = new Mail { MailTo = "suresh.negi89#gmail.com", Msg = text, Subject = subject, IsBodyHtml = true };
MailSenderServiceClient oClient = new MailSenderServiceClient();
oClient.SendMail(mails);
}
}
catch (Exception ex)
{
throw new Exception(ex.Message);
}
}
This is the file having the HTML format:
<html>
<body><
<div style="height:40px;width:675px;background:#000; text-align:center;color:red;">
<img src="~/Content/logo.png" alt="DTD" style="float:left">
<h1> {0} Congrates you are registered as a prime user!! </h1>
</div>
<p style="font-family: arial,sans-serif;">
Hi, Welcome to you
</p>
</body>
</html>
The image file logo.png is in Content folder.
When mail is sent no image is displayed.I want to know where I am doing mistake.
Where do you expect ~/Content/logo.png to point to on the recipient's machine?
99% of the time it will point nowhere, and the other 1% won't be the file you wanted anyway.
Three solutions:
Host your image on a public server and reference it with the full absolute URL (it's probably already there?). This is basically the de facto standard now.
<img src="http://www.example.com/logo.png" />
If this location changes, you can of course use a placeholder and resolve the URL at runtime, before sending the email.
Attach the image to the email. You can then reference it inline with cid:.
<img src="cid:logo.png" />
Use a data URI and inline the data:
<img src="data:image/x-icon,%00%00%01%00%01%00%10%10%00%00%01%00%20%00h%04%00%00%16%00%00%00(%00%00%00%10%00%00%00%20%00%00%00%01%00%20%00%00%00%00%00%00%00%00%00%13%0B%00%00%13%0B%00%00%00%00%00%00%00%00%00%00%00%00%00%00llm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FF%00%00%00%00%00%00%00%00llm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FF%00%00%00%00%00%00%00%00llm%FFllm%FF%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00llm%FFllm%FF%00%00%00%00%00%00%00%00llm%FFllm%FF%00%00%00%00llm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FF%00%00%00%00llm%FFllm%FF%00%00%00%00%00%00%00%00llm%FFllm%FF%00%00%00%00llm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FFllm%FF%00%00%00%00llm%FFllm%FF%00%00%00%00%00%00%00%00llm%FFllm%FF%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00Tx%9B%14Ox%A0%3CJx%A4dDx%AA%91Lx%A2%19llm%FFllm%FF%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00Kx%A3%0F%3Cx%B1R%3Ex%B0%84%3Ex%B0%B9%3Ex%B0%DE%3Ex%B0%FF%3Fx%AF%FFAw%AD%FFBv%AB%FFDw%A9%3E%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%3Dx%B0*Ax%AD%FFAx%AD%FFAx%AD%FFBw%AB%ECEv%A9%C0Cw%AB%85%3B%7B%B3T*%83%C5R%15%8D%DCY%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%3Fx%AF%09Bx%ACsDv%AAQDv%AA%2B%3B%7B%B3%08%00%00%00%00%18%8C%D9%0C%09%93%E8r%03%97%EF%E2%02%97%EE%FF%00%99%F2%1B%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%09%93%E8%19%03%97%EE%82%03%96%EE%ED%05%95%EC%FF%04%97%EB%EC%09%91%EA%82%1Dx%E6N%20r%E9%02%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%0F%91%E1%02%02%97%EE%24%03%97%EE%86%05%96%EC%F3%06%95%EB%FF%06%95%EB%EB%04%98%EB%83%0D%8C%E9%10'm%E5L%2Fc%E4%FC.d%E4%81%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%04%96%ED%09%06%95%EB%D5%06%95%EB%FF%06%95%EB%E9%06%96%EB%7C%04%98%EB%17%00%00%00%00%2Bg%E46.d%E4%F8%2Cf%E4%F8%2Cf%E4D%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%06%95%EBl%06%95%EB~%05%96%EB%15%00%00%00%00%00%00%00%00-d%E4%22-e%E4%E5%2Cf%E4%FF%2Cf%E4X%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00.c%E4%0F%2Ce%E4%CB%2Cf%E4%FF%2Cf%E4%7B%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00.c%E4%02%2Cf%E4%AD%2Cf%E4%FF%2Cf%E4%A1%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00.d%E4%02%2Cf%E4%93%2Cf%E4%FF%2Cf%E4%C0%2Cf%E4%07%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%80%01%00%00%80%01%F0%BF%9F%F9%00%00%90%09%00%00%90%09%00%00%9F%01%00%00%E0%07%00%00%E0%07%00%00%E0%83%00%00%FE%01%00%00%F0%01%00%00%F0!%00%00%F8%C3%FF%FF%FF%87%FF%FF%FF%0F%00%00%FE%0F%00%00" />
You can use the data: URI Kitchen to create one like this, or just use base64 as per the spec.
Provide absolute path of logo image, such as
<img src="http://www.foobar.com/Content/logo.png" alt="DTD" style="float:left">
or Attach the image to the email.