YouTube HTML Agility Pack C# - c#

I am trying to retrieve all the video ids off the YouTube's search results page.
Each result has this code:
<a href="/watch?v=aYIC-ebAD3o" class="ux-thumb-wrap result-item-thumb">
<span class="video-thumb ux-thumb-128 ">
<span class="clip">
<img onload="tn_load(5)" alt="Thumbnail" src="//i2.ytimg.com/vi/aYIC-ebAD3o/default.jpg" >
</span>
</span>
<span class="video-time">4:16</span>
<span dir="ltr" class="yt-uix-button-group addto-container short video-actions" data-video-ids="aYIC-ebAD3o" data-feature="thumbnail">
<button type="button" class="start master-sprite yt-uix-button yt-uix-button-short yt-uix-tooltip" onclick=";return false;" title="" data-button-action="yt.www.addtomenu.add" role="button" aria-pressed="false">
<img class="yt-uix-button-icon yt-uix-button-icon-addto" src="//s.ytimg.com/yt/img/pixel-vfl3z5WfW.gif" alt="">
<span class="yt-uix-button-content">
<span class="addto-label">Add to</span>
</span>
</button>
<button type="button" class="end yt-uix-button yt-uix-button-short yt-uix-tooltip yt-uix-button-empty" onclick=";return false;" title="" data-button-menu-id="shared-addto-menu" data-button-action="yt.www.addtomenu.load" role="button" aria-pressed="false">
<img class="yt-uix-button-arrow" src="//s.ytimg.com/yt/img/pixel-vfl3z5WfW.gif" alt="">
</button>
</span>
<span class="video-in-quicklist">Added to queue </span>
</a>
<div class="result-item-main-content">
And I am trying to parse out the "data-video-ids" class data. Whats the best way to do this with the HTML Agility Pack?
I have tried this:
foreach(HtmlNode node in doc.DocumentNode.
SelectNodes("//span[#class='data-video-ids']"))
{
string text = node.InnerText;
lblTest2.Text += text + Environment.NewLine;
}
Any ideas?

I think you will be better off in the longrun if you use one of YouTube's APIs.
I would only use web requests and HtmlAgilityPack as a last resort when no API exists. The main reason for this is if YouTube ever changes their page, it breaks your code. Open APIs are generally geared to be backwards compatible so your application should work indefinitely in most cases.
Here is a code example from Youtube's API:
YouTubeQuery query = new YouTubeQuery(YouTubeQuery.DefaultVideoUri);
//order results by the number of views (most viewed first)
query.OrderBy = "viewCount";
// search for puppies and include restricted content in the search results
// query.SafeSearch could also be set to YouTubeQuery.SafeSearchValues.Moderate
query.Query = "puppy";
query.SafeSearch = YouTubeQuery.SafeSearchValues.None;
Feed<Video> videoFeed = request.Get<Video>(query);
printVideoFeed(videoFeed);
Looks simple, right?

The 'data-video-ids' you're trying to filter out is not a class but an attribute - please try out the following expression in SelectNodes:
"//span[#data-video-ids]"
To retrieve the attribute value you could try this approach (since HtmlAgilityPack doesn't support attribute selection you have to get an element first and then select the actual attribute):
foreach(HtmlNode node in doc.DocumentNode.
SelectNodes("//span[#data-video-ids]"))
{
var videoIds = node.Attributes["data-video-ids"];
if (videoIds == null) continue;
string text = videoIds.Value;
lblTest2.Text += text + Environment.NewLine;
}

Related

How to get value of nested img src with Html Agility Pack?

I'm trying to get a nested img srcs with Html Agility pack and I've tried multiple things with no success. Basically there are multiple img srcs I need to grab, all are nested. There are 17 of these I need to grab but can't figure it out for the life of me. Here is the barebones html, I need the value of src in the last line:
<div class="largeTitle">
<article class="articleItem" data-id="0000">
<a href="#blank_link"> class="img">
<img class=" lazyloaded" data-src="#blank_link" alt="test" onerror="script"
src="image_link.jpg">
</a>
</article>
<article class="articleItem" data-id="0001">
<a href="#blank_link"> class="img">
<img class=" lazyloaded" data-src="#blank_link" alt="test" onerror="script"
src="image_link.jpg">
</a>
</article>
</div>
With the url you mentioned in comments, you can do:
var web = new HtmlWeb();
var doc = web.Load("https://www.investing.com/");
var images = doc.DocumentNode.SelectNodes("//*[contains(#class,'js-articles')]//a[#class='img']//img");
foreach(var image in images)
{
string source = image.Attributes["data-src"].Value;
string label = image.Attributes["alt"].Value;
Console.WriteLine($"\"{label}\" {source}");
}

Force Download of MP3 in Chrome, Firefox, Safari and IE using Amazon S3

Is there a way I can force an MP3 file to download from Amazon S3.
I have a Download button in my Razor:
<td>
<a href="#t.S3PreSignedUrl" class="js_recordingDownloadButton document-link btn btn-info btn-block br2 btn-xs fs12 #Html.Raw(t.S3PreSignedUrl.IsNullOrWhiteSpace() ? "disabled" : "")" target="_blank" data-container="body" data-toggle="tooltip" title="#t.OriginalFilename" type="#t.MimeType" download>
<span class="fa fa-cloud-download fs12"></span>
</a>
</td>
Currently, when you click on it, another browser window is opened and starts to play automatically using a Modal:
<div id="js_PlayRecordingPopup" class="popup-basic mfp-with-anim modalPopup">
<div class="panel">
<div class="admin-form">
<div class="panel-heading">
<span class="panel-title">
<i class="fa fa-play"></i> Play Recording
</span>
</div>
</div>
<div class="panel-body bt0 text-center p25">
<p class="popupInfo fs12 mb5">Playing: <b class="text-info js_playingTitle"></b></p>
<p class="popupInfo fs12">Filename: <b class="text-info js_playingFileName"></b></p>
<div class="summaryBox popupSummary text-center audioContainerBox">
<audio controls controlsList="nodownload" id="audRecording">
Your browser does not support the audio element.
</audio>
</div>
</div>
<div class="panel-footer">
<div class="text-center">
<input type="button" class="btn btn-primary" value="Done" data-bind="click: function(){ var sound = document.getElementById('audRecording'); if(sound != undefined) { sound.pause(); sound.currentTime = 0; } $.magnificPopup.close(); }">
</div>
</div>
</div>
<button title="Close (Esc)" type="button" class="mfp-close" data-bind="click: function(){ var sound = document.getElementById('audRecording'); if(sound != undefined) { sound.pause(); sound.currentTime = 0; }}">×</button>
</div>
Is there a way I can set it so that if I click Download it downloads the file straight away?
This is what it looks like in the Source Code:
<td>
<a href="https:url/Audio/Recordings/TES/39e7ca51-1ac8-f395-3ae6-ff814dbde6c3/39e7ca51-e77c-65f1-c88e-47fe20f67ee1/o_1cj5s0tntp3aa1011o51tlrd8ba.mp3?X-Amz-Expires=86400&response-content-disposition=inline%3B%20filename%3Drain-01.mp3&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIOHIWJAIQSFECYZQ/20180724/eu-west-1/s3/aws4_request&X-Amz-Date=20180724T104420Z&X-Amz-SignedHeaders=host&X-Amz-Signature=02a5febff28eed31646a37fea0b8da7d7bcf4b0ffe9a3365d31a0ac3f0b2cabb" class="js_recordingDownloadButton document-link btn btn-info btn-block br2 btn-xs fs12 " target="_blank" data-container="body" data-toggle="tooltip" title="rain-01.mp3" type="audio/mp3" download>
<span class="fa fa-cloud-download fs12"></span>
</a>
</td>
You can change Content-Type on the response to Content-Type: application/octet-stream. While also setting Content-Disposition: attachment; filename="filename.mp3". Make sure filename uses encoding defined in RFC 5987
I see you already have found the download attribute on HTML5 but you're not supplying a filename. It should be used like so:
<a href="pathtofile.mp3" download="filename">
You could always test this download.js written by dandavis. If that works you could reverse his code.
I worked through a solution using S3 behind CloudFront this weekend (August 2022); and, since Amazon has recently updated CloudFront functions to allow for this fairly easily, I wanted to share how I accomplished a solution, in case someone else may need it as you work through the Amazon Documentation.
How it works
Example Listen Link:
https://d111111abcdef8.cloudfront.net/audio.mp3
(plays in browser)
Example Download Link:
https://d111111abcdef8.cloudfront.net/audio.mp3?title=Custom%20Title%20for%20File
(forces download of the .mp3 file with the custom filename: Custom Title for Downloaded File.mp3)
My Solution:
Here is my custom CloudFront function's code, which you can use as an example to help you write what you need:
function handler(event) {
//This is a viewer response function.
var request = event.request;
var uri = request.uri;
var qs = request.querystring;
//console.log ('qs: ' + qs);
var response = event.response;
var headers = response.headers;
if(!qs.title) {
//console.log('No- qs title');
} else{
var title = qs.title.value;
title = decodeURIComponent(title);
title = title.split('+').join(' ');
//console.log('Yes- qs title LEN: ' + title.length);
if (uri.endsWith('.mp3')) {
var fileType = '.mp3';
var fileName = title + fileType;
//console.log('fileName: ' + fileName);
var disposition = "attachment; filename=" + fileName;
//console.log('disposition:' + disposition);
headers['content-disposition'] = { value: disposition };
}
}
return response;
}
I set up my CloudFront Distribution to have a new behavior with:
Path pattern: "*"
Origin: my S3 bucket
Cache Policy to look for and accept only the "title" key in uri query string
Origin Request Policy to also look for and accept only the "title" key in uri query string
Associated my custom CloudFront function (above) as a Viewer response function type

Identify XPath from particular element

I'm working on a page, where page loads dynamically and the data gets added while scrolling. To identify the properties of an item, I identified the parent div, where to identify the address, I have to locate an XPath from the parent to span element.
Below is my DOM structure:
<div class = "parentdiv">
<div class = "search">
<div class="header">
<div class="data"></div>
<div class="address-data">
<div class="address" itemprop="address">
<a itemprop="url" href="/search/Los-Angeles-CA-90025">
<span itemprop="streetAddress">
Avenue
</span>
<br>
<span itemprop="Locality">Los Angeles</span>
<span itemprop="Region">CA</span>
</a>
</div>
</div>
</div>
</div>
</div>
</div>
Here I want to locate the three spans, where I' currently in parent div.
Can someone guide how to locate an element using XPath from particular div?
You can try the following XPaths,
To locate the street address:
//div[#class="parentdiv"]/div/div/a/span[#itemprop="streetAddress"]
To locate the locality/city:
//div[#class="parentdiv"]/div/div/a/span[#itemprop="Locality"]
To locate the state:
//div[#class="parentdiv"]/div/div/a/span[#itemprop="Region"]
To print the list of <span> tagged WebElements with texts like Avenue with respect to div class = "parentdiv" node you can use the following block of code :
IList<IWebElement> myList = Driver.FindElements(By.CssSelector("div.parentdiv > div.address > a[itemprop=url] > span"));
foreach (IWebElement element in myList)
{
string my_add = element.GetAttribute("innerHTML");
Console.WriteLine(my_add);
}
Your DOM might become fairly large, since it adds elements while scrolling, so using CSS selectors might be quicker.
To get all the span tags in the div, use:
div[class='address'] span
To get a specific span by using the itemprop attribute use:
div[class='address'] span[itemprop='streetAddress']
div[class='address'] span[itemprop='Locality']
div[class='address'] span[itemprop='Region']
You can store the elements in a variable like so:
var streetAddress = driver.FindElement(By.CssSelector("div[class='address'] span[itemprop='streetAddress']"));
var locality = driver.FindElement(By.CssSelector("div[class='address'] span[itemprop='Locality']"));
var region = driver.FindElement(By.CssSelector("div[class='address'] span[itemprop='Region']"));

Html nodes issue with HtmlAgilityPack

I'm having a big trouble trying to parse these html contents with HtmlAgilityPack library.
In this piece of code, I would like to retrieve only the url (href) that reffers to uploaded.net, but I can't determine whether the url reffers to it.
<div class='downloads' id='download_block'>
<h5 style='text-align:center'>FREE DOWNLOAD LINKS</h5>
<h4>uploadable.ch</h4>
<ul class='parts'>
<li>
text here
</li>
</ul>
<h4>uploaded.net</h4>
<ul class='parts'>
<li>
text here
</li>
</ul>
<h4>novafile.com</h4>
<ul class='parts'>
<li>
text here
</li>
</ul>
</div>
This is how it looks on the webpage
And this is what I have:
nodes = myHrmlDoc.DocumentNode.SelectNodes(".//div[#class='downloads']/ul[#class='parts']")
I can't just use an array-index to determine the position like:
nodes(0) = uploadable.ch node
nodes(1) = uploaded.net node
nodes(2) = novafile.com node
...because they could change the amount of nodes and its hosting positions.
Note that also the urls will not contains the hosting names, are redirections like:
http://xxxxxx/r/YEHUgL44xONfQAnCNUVw_aYfY5JYAy0DT-i--
What could I do, in C# or else VB.Net?.
this should do, untested though:
doc.DocumentNode.SelectSingleNode("//h4[contains(text(),'uploaded.net')]/following-sibling::ul//a").Attributes["href"].Value
also use contains because you never know if the text contains spaces.
The only way I see this working is 2 fold approach. Sorry, I don't have HtmlAgilityPack at hand, but here is an example of using the standard XmlDocument. Even though you said you can't use array indexes to access, this process should allow you to do that by specifically grabbing the correct index dynamically.
void Main()
{
var xml = #"
<div class=""downloads"" id=""download_block"">
<h5 style=""text-align:center"">FREE DOWNLOAD LINKS</h5>
<h4>uploadable.ch</h4>
<ul class=""parts"">
<li>
text here
</li>
</ul>
<h4>uploaded.net</h4>
<ul class=""parts"">
<li>
text here
</li>
</ul>
<h4>novafile.com</h4>
<ul class=""parts"">
<li>
text here
</li>
</ul>
</div>";
var xmlDocument = new XmlDocument();
xmlDocument.LoadXml(xml);
var nav = xmlDocument.CreateNavigator();
var index = nav.Evaluate("count(//h4[text()='uploaded.net']/preceding-sibling::h4)+1").ToString();
var text = xmlDocument.SelectSingleNode("//ul["+index +"]//a/#href").InnerText;
Console.WriteLine(text);
}
Basically, it gets the index of the uploaded.net h4 and then uses that index to select the correct ul tag and get the URL out the of underlying anchor tag.
Sorry for the not so clean and error prone code, but it should get you in the right direction.
Give the snippet you supplied, this will help you get started.
var page = "<div class=\"downloads\" id=\"download_block\"> <h5 style=\"text-align:center\">FREE DOWNLOAD LINKS</h5> <h4>uploadable.ch</h4> <ul class=\"parts\"> <li> text here </li> </ul> <h4>uploaded.net</h4> <ul class=\"parts\"> <li> text here </li> </ul> <h4>novafile.com</h4> <ul class=\"parts\"> <li> text here </li> </ul></div>";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(page);
var nodes = doc.DocumentNode.Descendants("h4").Where(n => n.InnerText.Contains("uploadable"));
foreach (var node in nodes)
{
var attr = node.NextSibling.NextSibling.Descendants().Where(x=> x.Name == "a").FirstOrDefault().Attributes["href"];
attr.Value.Dump();
}

File Upload using Twitter Bootstrap, C#, asp.net and javascript

link to Jasny http://jasny.github.com/bootstrap/javascript.html#fileupload
link to what the form looks like http://img507.imageshack.us/img507/3308/picpx.png
I am using the Jasny Javascript file upload in my boot strap project, it looks like this:
ASP\HTML VIEW
<div class="row-fluid">
<div class="fileupload fileupload-new" data-provides="fileupload"><input type="hidden">
<div class="input-append">
<div class="uneditable-input span2" runat="server" id="statment1"><i class="icon-file
fileupload-exists"></i> <span class="fileupload-preview" style=""></span></div><span
class="btn btn-file"><span class="fileupload-new">Select file</span><span
class="fileupload-exists">Change</span><input type="file"></span><a href="#" class="btn
fileupload-exists" data-dismiss="fileupload">Remove</a>
</div>
</div>
How do I go about using this in the code behind to save the attached file to my server as I would using the C# asp.net File Upload?
In ASP.net C# I would normally do this in the code behind:
ASP.net C# CodeBehind
string filename = FileUpload1.PostedFile.FileName;
FileUpload1.PostedFile.SaveAs(Path.Combine(Server.MapPath("\\Document"),
filename).ToString());
filelocation = "Document\\" + filename;
media = "Document";
The Jasny github explains how to set the layout using bootstrap which is great as it looks really good (much better than the boring asp file upload) but How do I actually get I to post on my button click? I would really like to get this to work as I think it looks heaps nicer.
Since you want to do this without a standard asp.net control, you will have to do some of the wiring that asp.net does for you.
Make sure your input has an id. I will set it here to myFile.
<div class="row-fluid">
<div class="fileupload fileupload-new" data-provides="fileupload"><input type="hidden">
<div class="input-append">
<div class="uneditable-input span2" runat="server" id="statment1">
<i class="icon-file fileupload-exists"></i>
<span class="fileupload-preview" style=""></span>
</div>
<span class="btn btn-file"><span class="fileupload-new">Select file</span>
<span class="fileupload-exists">Change</span><input id="myFile" type="file" runat="server">
</span>
<a href="#" class="btn fileupload-exists" data-dismiss="fileupload" >Remove</a>
</div>
</div>
</div>
Your page should now have a HtmlInputFile control to your page. like this:
protected HtmlInputFile myFile;
Then you should be able to receive the file:
if (IsPostBack)
{
if (myFile.PostedFile != null)
{
// File was sent
var postedFile = myFile.PostedFile;
int dataLength = postedFile.ContentLength;
byte[] myData = new byte[dataLength];
postedFile.InputStream.Read(myData, 0, dataLength);
}
else
{
// No file was sent
}
}

Categories

Resources