I did an application which parses an html document and then obtains some urls, the problem is the urls only can be downloaded directly from the navigator.
In VB.NET or C#, how I could redirect this url to obtain a direct link for later paste the link to download it in a Download Manager?
dim url as string = "http://m.mrtzcmp3.net/get.php?singer=Madonna&song=Like%20A%20Virgin%20&size=5242104&ids=687474703a2h2h63733434303876342g766s2g6f652h75323237363831362h617564696h732h3132323564303466333839622g6f7033"
I need to say that I'm not much experimented with http things, maybe I'm wrong and the url has anything to redirect or something similar fault, please just say me how can I redirect that kind of urls or If I'm wrong.
UPDATE:
Tried this, but I get the same url without any changes:
Dim url As String = _
"http://m.mrtzcmp3.net/get.php?singer=Madonna&song=Like%20A%20Virgin%20&size=5242104&ids=687474703a2h2h63733434303876342g766s2g6f652h75323237363831362h617564696h732h3132323564303466333839622g6f7033"
Dim request As HttpWebRequest = DirectCast(HttpWebRequest.Create(url), HttpWebRequest)
request.AllowAutoRedirect = True
Dim response As HttpWebResponse
Dim resUri As String
response = request.GetResponse
resUri = response.ResponseUri.AbsoluteUri
MsgBox(resUri)
UPDATE 2:
In the answer from here HttpWebRequest Login data Then Redirect
He says
If the redirect is handled transparently, the _response.ResponseURI
will contain the address it redirected to. If not, you have to read
the redirect header and decide yourself whether or not to request the
new page.
so...if I need to do thatm, how I can do that?
UPDATE 3:
DownloadThemAll plugin for Firefox can obtain the direct urls... as you can see all the urls finishes with an .mp3 file extension, that's what I need
To my knowledge, the url
http://m.mrtzcmp3.net/get.php?singer=Madonna&song=Like%20A%20Virgin%20&size=5242104&ids=687474703a2h2h63733434303876342g766s2g6f652h75323237363831362h617564696h732h3132323564303466333839622g6f7033
IS the direct url, a direct file url does not need to have the filetype in it.
you can download the file using
string url = "http://m.mrtzcmp3.net/get.php?singer=Madonna&song=Like%20A%20Virgin%20&size=5242104&ids=687474703a2h2h63733434303876342g766s2g6f652h75323237363831362h617564696h732h3132323564303466333839622g6f7033"
WebClient wc = new WebClient();
wc.DownloadFile(url, fileName);
you can get the fileName (Madonna-Like A Virgin -www.mrtzcmp3.net.mp3) by using
HttpWebRequest myHttpWebRequest = (HttpWebRequest)HttpWebRequest.Create(url);
string header = myHttpWebResponse.Headers.ToString();
fileName = header.Remove(0, header.IndexOf("filename=")+10);
fileName = fileName.Remove(fileName.IndexOf('"'));
that is untested, but it should work.
edit: I think this does what you want, but I may have misunderstood your question
you can perform a web request using web client to get the content (url) from that url, then you just need to perform the redirect.
Use an HttpWebRequest and use the AllowAutoRedirect=true to get the direct link and download the file.
Can you try to paste the URL to an URl shortener like tinyUrl or BitLy? Maybe there is a shortener Service that provides an API?
The file then will be downloaded at: http://tinyurl.com/phzhxsr
You will never get a direct URL from the site owner because the URL is dynamicaly parsed and the file is send with the retrun datastream, not by downloading a specific URL.
Related
I want to scrape the HTML of a website. When I access this website with my browser (no matter if it is Chrome or FireFox), I have no problem accessing the website + HTML.
When I try to parse the HTML with C# using methods like HttpWebRequest and HtmlAgilityPack, the website redirects me to another website and thus I parse the HTML of the redirected website.
Any idea how to solve this problem?
I thought the site recognises my program as a program and redirects immediately, so I tried using Selenium and a GoogleDriver and FireFoxDriver but also no luck, I get redirected immediately.
The Website: https://www.jodel.city/7700#!home
private void bt_load_Click(object sender, EventArgs e)
{
var url = #"https://www.jodel.city/7700#!home";
var req = (HttpWebRequest)WebRequest.Create(url);
req.AllowAutoRedirect = false;
// req.Referer = "http://www.muenchen.de/";
var resp = req.GetResponse();
StreamReader sr = new StreamReader(resp.GetResponseStream());
String returnedContent = sr.ReadToEnd();
Console.WriteLine(returnedContent);
return;
}
And of course, cookies are to blame again, because cookies are great and amazing.
So, let's look at what happens in Chrome the first time you visit the site:
(I went to https://www.jodel.city/7700#!home):
Yes, I got a 302 redirect, but I also got told by the server to set a __cfduid cookie (twice actually).
When you visit the site again, you are correctly let into the site:
Notice how this time a __cfduid cookie was sent along? That's the key here.
Your C# code needs to:
Go to the site once, get redirected, but obtain the cookie value from the response header.
Go BACK to the site with the correct cookie value in the request header.
You can go to the first link in this post to see an example of how to set cookie values for requests.
I have a application, which logs you on to a website and then downloads some files from that website.
Although, I have been able to download all the type of files and save them properly except whose link redirects to another page.
for ex if in the source code of the web page , the link address is written as :-
"http://someurl.com/view.php", then this link redirects and the download immediately starts(when we click on the link in a web brwoser).
I have been able to download this file programmatically using HttpWebRequest
and then setting the AllowAutoRedirect = true.
The problem arises while saving, I need to have the extension of the downloaded file(whether it is a word document, pdf file or some other file)
How should I check that?
Some of the code which I am using is :-
HttpWebRequest request = WebRequest.Create(UriObj) as HttpWebRequest;
request.Proxy = null;
request.CookieContainer = CC;
request.AllowAutoRedirect = true ;
request.ContentType = "application/x-www-form-urlencoded";
When you get a redirected response, the ResponseUri property will give you the URI of the resource that actually responded.
So if http://someurl.com/view.php redirected to http://example.com/foo.doc, then ResponseUri will contain http://example.com/foo.doc.
Understand that the "extension" might not be what you expect. For example, I've seen a URL like http://example.com/document.php return a PDF file. I've seen URLs with ".mp3" extension return image files, etc.
You can check the ContentType header, which is usually a more reliable indicator of the actual content of the response, but not a guarantee.
I am trying to unshorten urls and have not been able to find code (vb.net/c#) to do this. These are the twitter shortened urls and I guess I could try and access one of the web services available and do a httpwebrequest but would prefer to find some programmatic way of doing this.
You can get it directly from response of the shortened url since it will return a status code MovedPermanently and the location for the real url.(This should work for most of the sites without the need for navigating to the real url)
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://t.co/xqbLEi6s");
req.AllowAutoRedirect = false;
var resp = req.GetResponse();
string realUrl = resp.Headers["Location"];
Other test data: http://goo.gl/zdf2n , http://tinyurl.com/8xc9vca , http://x.co/iEup, http://is.gd/vTOlz6 , http://bit.ly/FUA4YU
There is no magic way to unshorten a URL without asking the service which created the URL (and the way to ask will be different for each service), or more pragmatically, just opening the URL and watching where it redirects to.
I am using the code,
string loadFile = HttpContext.Current.Request.Url.AbsoluteUri;
// this.Response.ClearContent();
// this.Response.ClearHeaders();
this.Response.AppendHeader("content-disposition", "attachment; filename " + filename);
this.Response.ContentType ="application/html";
this.Response.WriteFile("C:\\Users\\Desktop\\Jobspoint Website\\jobpoint3.0\\print.aspx");
this.Response.Flush();
this.Response.Close();
this.Response.End();
to download an aspx page in asp.net C#.. But its only showing the html tags and static values... How can I save the entire page without html tags and with the values that retrieved from the database?
Thanks...
Leema
Use WebClient for this. It will download your file.
If I have understood correctly, one option would be to actually make a request to the web server using WebClient for example. And then write the response to that request to the Response.OutputStream. This means that the server will actually make a second request to it self and then send the response to the second request back to the client.
This way you will have the web server actually process the request and return the resulting HTML back to you rather than just the raw aspx page.
Basically, I'm trying to grab an EXE from CNet's Download.com
So i created web parser and so far all is going well.
Here is a sample link pulled directly from their site:
http://dw.com.com/redir?edId=3&siteId=4&oId=3001-20_4-10308491&ontId=20_4&spi=e6323e8d83a8b4374d43d519f1bd6757&lop=txt&tag=idl2&pid=10566981&mfgId=6250549&merId=6250549&pguid=PlvcGQoPjAEAAH5rQL0AAABv&destUrl=ftp%3A%2F%2F202.190.201.108%2Fpub%2Fryl2%2Fclient%2Finstaller-ryl2_v1673.exe
Here is the problem: When you attempt to download, it begins with HTTP, then redirects to an FTP site. I have tried .NET's WebClient and HttpWebRequest Objects, and it looks like Neither can support Redirects.
This Code Fails at GetResponse();
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://dw.com.com/redir");
WebResponse response = req.GetResponse();
Now, I also tried this:
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://dw.com.com/redir");
req.AllowAutoRedirect = false;
WebResponse response = req.GetResponse();
string s = new StreamReader(response.GetResponseStream()).ReadToEnd();
And it does not throw the error anymore, however variable s turns out to be an empty string.
I'm at a loss! Can anyone help out?
You can get the value of the "Location" header from the response.headers, and then create a new FtpWebRequest to download that resource.
in your first code snippet you will be redirected to a link using a different protocol (i.e it's no longer Http as in HttpWebRequest) so it fails du to a malformed http response.
In the second part you're no longer redirected and hence you don't receive a FTP response (which is not malform when interpreted as HTTP response).
You need to acquire FTP link,as ferozo wrote you can do this by getting the value of the header "location", and use a FtpWebRequest to access the file