WP Parse Enclosure url - c#

i recently made an app using windows app studio online, and suprisingly the app turn out alright it handles the rss feeds quite nicely. Ive made some changes to the app, as the feed contains a enclosure url which is a download link. i cant get it to detect the enclosure url. i think im just expressing it wrong.
i tried every online rss source viewer to see if it appeared any different in yahoo pipes it appears as item.enclosure.url, on code beauty it appears as http://bla.com"/> ive tried every combination i can think of
i also tried putting the feed through yahoo pipes to give me a new feed with the enclosure url in a new tag
< downloadlink >< /downloadlink> and used the code
rssItem.FeedUrl = item.GetSafeElementString("downloadlink")
and the whole app works as it should so i know the only thing im doing wrong is retieving the enclosure url the only problem with yahoo pipes is you cant change the input url which i need to do so it has to come from the enclosure url
so my question is how should i retrieve the tag
rssItem.FeedUrl = item.GetSafeElementString("whatgoeshere")
thanks

This Worked Nicely
XElement element = item.Element("enclosure"); //new element
int length = (int)element.Attribute("length"); //seprate attributes
string type = (string)element.Attribute("type");
string url = (string)element.Attribute("url");
rssItem.FeedUrl = url; // use the result

Related

How to programmatically trigger HTML button and parse HTML page only after this click in C#?

I am trying to parse Google play store HTML page in C# .NET core. Unfortunately, Google does not provide APIs to get the mobile application info (such as version, last update ...), while Apple does. This is why I am trying to parse the HTML page and then get the info needed.
However, it seems they published a new version recently, where a user has to press on an arrow button to be able to see the info of the app displayed in a popup.
In order to understand more, consider the example of WhatsApp application: https://play.google.com/store/apps/details?id=com.whatsapp&hl=en
In order to get the info of this app (like release date, version ...), the user has to press now on the arrow near "About this app".
Previously, the below code was working perfectly:
var id = "com.whatsapp";
var language = "en";
var url = string.Format("https://play.google.com/store/apps/details?id={0}&hl={1}", id, language);
string result;
WebClient client = new WebClient();
client.Encoding = System.Text.UTF8Encoding.UTF8;
result = client.DownloadString(url);
MatchCollection matches = Regex.Matches(result, "<div class=\"hAyfc\">.*?
<span class=\"htlgb\"><div class=\"IQ1z0d\"><span class=\"htlgb\">(?<content>.*?)
</span></div></span></div>");
objAndroidDetails.updated = matches[0].Groups["content"].Value;
objAndroidDetails.version = matches[3].Groups["content"].Value;
...
But now, it's not the case anymore for two reasons:
The regular expression is not valid anymore
The client.DownloadString(url) downloads only the code before triggering the button to display the info, thus I will not be able to extract it bcz it's not available :)) .
So, anybody can help me to solve the issue #2 ? I need somehow to trigger the button in order to be able to match the HTML code needed and extract it.
Thanks

Get Information from website with C# after Js

So I'm working on a little fun project and keep in mind I'm a beginner, I want to grab the info of songs that have played from this radio channel:
ilikeradio (sorry the site is in Swedish).
I want to just simply put that in a textBox.
I have tried:
WebClient web = new WebClient();
string htmlContent = new System.Net.WebClient().DownloadString(URL);
But this only gave me the source code and not the code with the list items for artist song etc.
Any help is appreciated Keep in mind I am a beginner.
It seems that the URL you provided returns HTML, but if you compare the HTML you get with that which is rendered in the browser (by right-clicking the webpage and inspecting the HTML), you will see that what you get is actually different than what is finally rendered. The reason for this is that the website is using Ajax to load the song list. In other words, when you call DownloadString(), you get the results from the web serve before it has had the javascript run and update it.
It is not easy to get the final HTML render result. But you are in luck!
If you go to that website and open the debug tools in Chrome and click the Network tab. Next, sort all the requests by Method and GET requests should be at the top. Amongst those GET requests is the one you are looking for:
https://unison.mtgradio.se/api/v2/timeline?channel_id=6&client_id=6690709&to=2018-10-02T08%3A00%3A50&from=2018-10-02T07%3A00%3A50&limit=40
This URL returns JSON which the web server eventually loads and renders for you to see as a "song list".
The JSON returned is a list of songs with some metadata. You will need to parse this JSON to extract and display the list of songs in your own webpage. I suspect that you can view the source code of that website and find the Javascript to do this ;)
Newtonsoft JSONConvert is the best library for parsing JSON.
If you want to view the JSON with the song list, copy the URL above and paste it into your browser address bar (and hit enter). Next, copy the JSON result and then open this. Paste JSON into the Text tab and then click the Viewer tab. You will note that the first element is the Current Song, while other elements are in the song list. Also note that each element has a child element called song, which contains the title.
To get you going, try this:
using System;
using System.Net;
using Newtonsoft.Json.Linq;
public class Program
{
public static void Main()
{
WebClient web = new WebClient();
using (WebClient wc = new WebClient())
{
var json = wc.DownloadString("https://unison.mtgradio.se/api/v2/timeline?channel_id=6&client_id=6690709&to=2018-10-02T08%3A00%3A50&from=2018-10-02T07%3A00%3A50&limit=40");
dynamic stuff = JArray.Parse(json);
string name = stuff[1].song.title;
Console.WriteLine(name);
}
}
}
NOTE
By the time you try this out, you will notice that the song name printed to console does not exist in the list on the webpage. This is because if you look at the JSON URL that I posted above, there are query parameters... one of which is date and time. You will need to modify the URL accordingly to get the most recent (displayed right now on the website) playlist.

WebClient.DownloadFile - Invalid URI: The hostname could not be parsed

So I tried this in several different formats and produced different results. I will include all relevant information below.
My company uses a web-based application to schedule the generation of reports. The service emails a URL that can be clicked on and will immediately begin the "Open Save As Cancel" dialogue box. I am trying to automate the process of downloading these reports with a C# script as part of a Visual Studio project (the end goal is to import these reports in SQL Server).
I am encountering terrible difficulty initiating the download of this file using WebClient Here is the closest I have gotten with any of the methods I have tried:
*NOTE: I removed all identifying information from the URL, but left all special characters and the basic architecture intact. Hopefully this will be a happy medium between protecting confidential info and giving you enough to understand my dilemma. The URL does work when manually copied and pasted into the address bar of internet explorer.
Error Message:
"Invalid URI: The hostname could not be parsed."
public void Main()
{
using (var wc = new System.Net.WebClient())
{
wc.DownloadFile(
new Uri(#"http:\\webapp.locality.company.com\scripts\rds\cgigetf.exe?job_id=3058352&file_id=1&format=TAB\report.tab"),
#"\\server\directory\folder1\folder2\folder3\...\...\...\rawfile.tab");
}
}
Note also that I have tried to set:
string sourceUri = #"http:\\webapp.locality.company.com\scripts\rds\cgigetf.exe?job_id=3058352&file_id=1&format=TAB\report.tab\abc123_3058352.tab";
Uri uriPath;
Uri.TryCreate(sourceUri, UriKind.Absolute, out uriPath);
But uriPath remains null - TryCreate fails.
I have attempted doing a webrequest / webresponse / WebStream, but it still cannot find the host.
I have tried including the download URL (as in my first code example) and the download URL + the file name (as in my second code example). I do not need the file name in the URL to initiate the download if I do it manually. I have also tried replacing the "report.tab" portion of the URL with the file name, but to no avail.
Help is greatly appreciated as I have simply run out of thoughts on this one. The only idea I have left is that perhaps one of the special characters in my URL is getting in the way, but I don't know which one that would be or how to handle it properly.
Thanks in advance!
My first thought would be that your URI backslashes are being interpreted as escape characters, leading to a nonsense result after evaluation. I would try a quick test where each backslash is escaped as itself (i.e. "\" instead of "\" in each instance). I'm also a little puzzled as to why your URI is not using forward slashes...?
// Create an absolute Uri from a string.
Uri absoluteUri = new Uri("http://www.contoso.com/");
Ref: Uri Constructor on MSDN

Navigate URL from Silverlight control

I am working in a Silverlight control where I need to upload some static data synchronously from a XML file. The file is in my same Web Server. I can get the URI of the control like so:
HtmlPage.Document.DocumentUri.ToString();
That returns the URI with the query that shows the control inclusive:
http://example.com:8085/MyWeb/CustomPage.aspx?waid=a1a5780a8ddea6c517ae1-b4ef&nid=id78
What I need from there is only http://example.com:8085/MyWeb (which will always be the same except for the host name/port) I do not want to hard code that because this will be deployed in several servers. So, what I'd like to do is get the web site URI. I tried several things like localpath, host, AbsolutePath and others in the DocumentUri object but none seem to give me what I need. How can I do that without doing a ton of manipulation.
Thanks!
Try the following:
var absoluteUri = Application.Current.Host.Source.AbsoluteUri;
int lengthWithoutParams = absoluteUri.IndexOf("?") < 0 ? absoluteUri.Length : absoluteUri.IndexOf("?");
string uploadUrl = absoluteUri.Substring(0, lengthWithoutParams).Replace("/ClientBin/<YourXAPfile>.xap", filePath);
And finally:
HtmlPage.Window.Navigate(new Uri(uploadUrl));
Try:
System.Windows.Browser.HtmlPage.Window.Navigate(new Uri("urlString"));

Trouble Scraping .HTM File

I have just begun scraping basic text off web pages, and am currently using the HTMLAgilityPack C# library. I had some success with boxscores off rivals.yahoo.com (sports is my thing so why not scrape something interesting?) but I am stuck on NHL's game summary pages. I think this is kind of an interesting problem so I would post it here.
The page I am testing is:
http://www.nhl.com/scores/htmlreports/20102011/GS020079.HTM
Upon first glance, it seems like basic text with no ajax or stuff to mess up a basic scraper. Then I realize I can't right click due to some javascript, so I work around that. I right click in firefox and get the xpath of the home team using XPather and I get:
/html/body/table[#id='MainTable']/tbody/tr[1]/td/table[#id='StdHeader']/tbody/tr/td/table/tbody/tr/td[3]/table[#id='Home']/tbody/tr[3]/td
When I try to grab that node / inner text, htmlagilitypack won't find it. Does anyone see anything strange in the page's source code that might be stopping me?
I am new to this and still learning how people might stop me from scraping, any tips or tricks are gladly appreciated!
p.s. I observe all site rules regarding bots, etc, but I noticed this strange behavior and saw it as a challenge.
Ok so it appears that my xpaths have tbody's in them. When I remove these tbodys manually from the xpath, HTMLAgilityPack can handle it fine.
I'd still like to know why I am getting invalid xpaths, but for now I have answered my question.
I think unless my xpath knowledge is heaps flawed(probably) the problem is with the /tbody node in your xpath expression.
When I do
string test = string.Empty;
StreamReader sr = new StreamReader(#"C:\gs.htm");
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(sr);
sr.Close();
sr = null;
string xpath = #"//table[#id='Home']/tr[3]/td";
test = doc.DocumentNode.SelectSingleNode(xpath).InnerText;
That works fine.. returns a
"COLUMBUS BLUE JACKETSGame 5 Home Game 3"
which I hope is the string you wanted.
Examining the html I couldn't find a /tbody.

Categories

Resources