As some of you may know, you are able to POST with C#. This means you can "push" buttons on a website with webrequest/response. Now there are also buttons on sites which work with javascript, they start like:
(function($j){
$j.data(document, 'maxPictureSize', 764327);
share_init();
})(jQuery.noConflict());
Is there any solution you can make those function calls in C# with like httprequests or any other kind of library?
I'm assuming you have a program that wants to manipulate the server "back end" for a web page by making the server think that someone pushed a button that POSTs, and sending the data that the web page would include with its POST.
The first tool you need is Microsoft Network Monitor 3.3, or another network packet tracing tool. Use this to look at the POST from the real web page. NetMon (at least) decomposes the packet into the HTTP pieces and headers, so you can easily see what's going on.
Now you will know what data the real POST is sending, and the URL to which it is sending the data (with any possible "query string" - which is unusual for a POST).
Next you need to write C# to create the same sort of POST to the same URL. It seems that you already know about HttpWebRequest/HttpWebResponse so I won't explain them in detail. You may have noticed in your NetMon trace that the Content-Type header was application/x-www-form-urlencoded. This is most often data from an HTML form which is URL-Encoded (like the name), so you need to URL-Encode your data before POSTing it, and you need to know the size of the encoded data for the Content-Length. HttpUtility.UrlEncode() is one method to use for this encoding.
Once you think you have it, try it and use NetMon to inspect your POST request and the response from the server. Keep going until you have duplicated what the mystery web page is doing.
Ok use webBrowser form to load the page:
webBrowser.Navigate( url );
then save the contents of the web broweser form to a file or a string:
File.WriteAllText(#"c:\test\ajax_test.txt", webBrowser1.Document.Body.Parent.OuterHtml, Encoding.GetEncoding(webBrowser1.Document.Encoding));
now if you look to the txt file it should have the html tags you look for.
Even when using JavaScript to do a POST there is a POST somewhere in the JS which works the same way as button submit. You just have to dig to the place where the JS code posts and see how it does it. Then craft the same post in C#.
Take for example ASP.NET's own __doPostBack function
var theForm = document.forms['aspnetForm'];
if (!theForm) {
theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
You can see that it digs for the form sets several values for input fields and does a submit. Basically you need to fill the same values for the inputs and submit the same form and you've got the JS submit done yourself.
You need to capture the requests and headers those buttons are sending and simulate them with HttpWebRequest. You could also take a look at WatiN if you want to automate user actions on web sites.
Related
Is there a way to get the fully rendered html of a web page using WebClient instead of the page source? I'm trying to scrape some data from the page's html. My current code is like this:
WebClient client = new WebClient();
var result = client.DownloadString("https://somepageoutthere.com/");
//using CsQuery
CQ dom = result;
var someElementHtml = dom["body > main];
WebClient will only return the URL you requested. It will not run any javacript on the page (which runs on the client) so if javascript is changing the page DOM in any way, you will not get that through webclient.
You are better off using some other tools. Look for those that will render the HTML and javascript in the page.
I don't know what you mean by "fully rendered", but if you mean "with all data loaded by ajax calls", the answer is: no, you can't.
The data which is not present in the initial html page is loaded through javascript in the browser, and WebClient has no idea what javascript is, and cannot interpret it, only browsers do.
To get this kind of data, you need to identify these calls (if you don't know the url of the data webservice, you can use tools like Fiddler), simulate/replay them from your application, and then, if successful, get response data, and extract data from it (will be easy if data comes as json, and more tricky if it comes as html)
better use http://html-agility-pack.net
it has all the functionality to scrap web data and having good help on the site
I am using dropzone to upload files and images to the DB which works perfect, I generate the dropzone div's and call the dropzone jquery function.
In C# its been received by a WebMethod and files are being uploaded to the database.
Now I need to send several id with however I would like to avoid to implement a ajax call to send these id's. After read the documentation on dropzonejs I could not find a simple solution to do this.
My WebMethod does not accept parameter for now but when I have a good way to implement this on client I can add these to WebMethod.
Did I miss something or Am I only able to do this with ajax?
In short looking for the "data:" object within dropzone as in ajax
I am not sure if I understood your question correctly, the formData is available before each files are sent.
From Dropzone.js. Sending: Called just before each file is sent. Gets the xhr object and the formData objects as second and third parameters, so you can modify them (for example to add a CSRF token) or add additional data.
Sample usage:
sending: function(file, xhr, formData) {
formData.append("test",$('#test').val());
},
I want using httpWebRequest to "POST" data to a website. So i used firebug analyzing what really send to server. First step, i use browser to browsing www.mytargetURL.net, second, turn on firebug, after that, i fill all form data and click the submit button (it's mean POST data to the server). So, I watching firebug and there was a lot of Parameters in request body part. Something like:
param1=
param2=
param3=default_value1
param4=default_value2
param5=value_I_set_byhand1
param6=value_I_set_byhand2
The question is: I should set up the request body of httpWebRequest obj with all the parameters i saw in firebug parameter table (it's mean all 6 parameters) or just parameters which has value (parameter 3-6) or just parameters i have filled in the submit form(just param5 and param6)?
Thanks you for all supports
You create the HttpWebRequest object, get the request stream, and write your parameters to it. The example at HttpRequest.GetRequestStream should point you in the right direction.
I have an application that contains a button, on click of this button, it will open a browser window using a URL with querystring parameters (the url of a page that i am coding).
Is there a way to ensure that the URL is coming from my application and only from my application - and not just anyone typing the URL manually in a webbrowser?
If not, what is the best way to ensure that a specific URL is coming from a specific application - and not just manually entered in the address bar or a web browser-
Im using asp.net.
You can check if the request was made from one of the pages of your application using:
Request.UrlReferrer.Contains("mywebsite.com")
That's the simple way.
The secure way is to put a cookie on the client containing a value encrypted using a secure key or hashed using a secure salt. If the cookie is set to expire when the page is closed it should be impossible for someone to forge.
Here's an example:
On the pages that would redirect to the page you are trying to protect:
HttpCookie cookie = new HttpCookie("SecureCheck");
//don't set the cookie's expiration so it's deleted when the browser is closed
cookie.Value = System.Web.Security.FormsAuthentication.HashPasswordForStoringInConfigFile(Session.SessionID, "SHA1");
Response.Cookies.Add(cookie);
On the page you are trying to protect:
//check to see if the cookie is there and it has the correct value
if (string.IsNullOrEmpty(Request.Cookies["SecureCheck"]) || System.Web.Security.FormsAuthentication.HashPasswordForStoringInConfigFile(Session.SessionID, "SHA1") != Request.Cookies["SecureCheck"])
throw Exception("Invalid request. Please access this page only from the application.");
//if we got this far the exception was not thrown and we are safe to continue
//insert whatever code here
There's no reliable way to do this for a GET request, nor is their any reason to try for a legitimate user. What you should do instead is ensure that regardless of where the request comes from the user has the proper permissions and access rights and that the session is protected appropriately (HTTP only cookies, SSL, etc.) If the request is changing data, then it should be a POST, not a GET, and it should be accompanied by some suitable cross-site request forgery prevention techniques (such as a cookie containing a nonce that is verified against a matching nonce on the form itself).
There is no way, other than rejecting the request if it doesn't contain a previously generated random one-time token in the parameters (that would be stored in the session, for example).
While there is no 100% secure way to do this, what I am suggesting might at least take care of your basic needs.
This is what you can do .
Client: Add a HTTP header with an encoded string that is like hash (sha256) of some word.
Then make your client always do a POST request instead of GET.
Server: Check the HTTP Header for encoded string. Also make sure it is a POST request.
This is not 100% as ofcourse someone smart enough could figure out and still generate a request, but depending on your need you might find this enough or not
You can check the referer, the user agent, add an additional header to the request, always do post requests to that url. However, considering HTTP is transmitted in plain text, somebody is always able to let wireshark or fiddler run, capture the HTTP packets and recreate the requests with your measures in place.
Pass parameters from your application so that you can verify on the server side.
I suggest you use an encryption algorithm and generate random text using a password(key). Then, decrypt the param on the server side and check if it matches your expectation.
I am not very clear though. sorry about that, If had to do something like this, then, I would do something similar to mentioned above.
You can use to check the header on MVC controller like Request.Headers["Accept"]; if it is coming from your code in angularjs or jquery:
sample angularjs like this:
var url = ServiceServerPath + urlSearchService + '/SearchCustomer?input=' + $scope.strInput;
$http({
method: 'GET',
url: url,
headers: {
'Content-Type': 'application/json'
},.....
And on the MVC [HttpGet] Action method
[HttpGet]
[PreventDirectAccess]//It is my custom filters
// ---> /Index/SearchCustomer?input={input}/
public string SearchCustomer(string input)
{
try
{
var isJsonRequestOnMVC = Request.Headers["Accept"];//TODO: This will check if the request comes from MVC else comes from Browser
if (!isJsonRequestOnMVC.Contains("application/json")) return "Error Request on server!";
var serialize = new JavaScriptSerializer();
ISearch customer = new SearchCustomer();
IEnumerable<ContactInfoResult> returnSearch = customer.GetCustomerDynamic(input);
return serialize.Serialize(returnSearch);
}
catch (Exception err)
{
throw;
}
}
I'm downloading a web site using WebClient
public void download()
{
client = new WebClient();
client.DownloadStringCompleted += new DownloadStringCompletedEventHandler(client_DownloadStringCompleted);
client.Encoding = Encoding.UTF8;
client.DownloadStringAsync(new Uri(eUrl.Text));
}
void client_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
SaveFileDialog sd = new SaveFileDialog();
if (sd.ShowDialog() == DialogResult.OK)
{
StreamWriter writer = new StreamWriter(sd.FileName,false,Encoding.Unicode);
writer.Write(e.Result);
writer.Close();
}
}
This works fine. But I am unable to read content that is loaded using ajax. Like this:
<div class="center-box-body" id="boxnews" style="width:768px;height:1167px; ">
loading .... </div>
<script language="javascript">
ajax_function('boxnews',"ajax/category/personal_notes/",'');
</script>
This "ajax_function" downloads data from server on the client side.
How can I download the full web html data?
To do so, you would need to host a Javascript runtime inside of a full-blown web browser. Unfortunately, WebClient isn't capable of doing this.
Your only option would be automation of a WebBrowser control. You would need to send it to the URL, wait until both the main page and any AJAX content has been loaded (including triggering that load if user action is required to do so), then scrape the entire DOM.
If you are only scraping a particular site, you are probably better off just pulling the AJAX URL yourself (simulating all required parameters), rather than pulling the web page that calls for it.
I think you'd need to use a WebBrowser control to do this since you actually need the javascript on the page to run to complete the page load. Depending on your application this may or may not be possible for you -- note it's a Windows.Forms control.
When you visit a page in a browser, it
1.downloads a document from the
requested url
2.downloads anything referenced by an
img, link, script,etc tag (anything
that references an external file)
3.executes javascript where applicable.
The WebClient class only performs step 1. It encapsulates a single http request and response. It does not contain a script engine, and does not, as far as I know, find image tags, etc that reference other files and initiate further requests to obtain those files.
If you want to get a page once it's been modified by an AJAX call and handler, you'll need to use a class that has the full capabilities of a web browser, which pretty much means using a web browser that you can somehow automate server-side. The WebBrowser control does this, but it's for WinForms only, I think. I shudder to think of the security issues here, or the demand that would be placed on the server if multiple users are taking advantage of this facility simultaneously.
A better question to ask yourself is: why are you doing this? If the data you're really interested in is being obtained via AJAX (probably through a web service), why not skip the webClient step and just go straight to the source?