using HTTP HttpWebRequest to extract sharepoint data in c#

using HTTP HttpWebRequest to extract sharepoint data in c# - c#

I am trying to read a sharepoint site using HttpWebRequest, but the below code throws an exception (403 Forbidden):
HttpWebRequest r = (HttpWebRequest)WebRequest.Create(#"https://myCompany.sharepoint.com/sites/it/abc/ScriptAttest/docs/");
r.Method = "GET";
WebResponse rs = r.GetResponse();
I get the same response if I add
client.Credentials = new NetworkCredential("username", "secret");
(using my domain credentials of course)
or specify default credentials.
However, if I create a browser control (called documentBrowser) and execute the following:
documentBrowser.Navigate(#"https://myCompany.sharepoint.com/sites/it/abc/ScriptAttest/docs/");
I get the data. However, it takes a long time, and I don't really need to display the page. My objective is to parse the html and only pull out certain elements. Additionally, the data comes in stages and the control triggers the DocumentCompleted event after each segment, so I don't really know when the entire page has loaded.

SharePoint Online does not support NetworkCredential. documentBrowser.Navigate in fact use the embed IE browser which may has some SPO related cache, thus it could navigate to the site. If you want to fetch data from SPO, you could use Rest API or CSOM. If you just want to access the site page, you may consider using cookie to get it:
var login = "admin#***.onmicrosoft.com";
var password = "P#ssw0rd";
var siteUrl = "https://***.sharepoint.com/";
var creds = new SharePointOnlineCredentials(login, password);
var auth = creds.AuthenticateAsync(new Uri(siteUrl), true);
var request = (HttpWebRequest)WebRequest.Create(siteUrl);
request.CookieContainer = auth.Result.CookieContainer;
var result = (HttpWebResponse)request.GetResponse();
BR

Related

Share current session on a new HttpWebRequest

I have already read many articles about the topic but I can't find solution.
So please don't mark this question as duplicate because other solutions won't work and are out to date.
I have a web application with a page containing a GridView (one button per row).
The button will create a HttpWebRequest (or WebClient, it's the same) and get its html.
I tried using one cookie or all the cookies but I have no success.
This is the code:
String path = Request.Url.GetLeftPart(UriPartial.Authority) + VirtualPathUtility.ToAbsolute("~/");
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(path + "MyPage.aspx");
CookieContainer cookieContainer = new CookieContainer();
HttpCookie httpCookie = HttpContext.Current.Request.Cookies.Get("ASP.NET_SessionId");
if (httpCookie != null)
{
Cookie myCookie = new Cookie();
// Convert between the System.Net.Cookie to a System.Web.HttpCookie...
myCookie.Domain = webRequest.RequestUri.Host;
myCookie.Expires = httpCookie.Expires;
myCookie.Name = httpCookie.Name;
myCookie.Path = httpCookie.Path;
myCookie.Secure = httpCookie.Secure;
myCookie.Value = httpCookie.Value;
cookieContainer.Add(myCookie);
}
webRequest.CookieContainer = cookieContainer;
string responseHTML = string.Empty;
using (HttpWebResponse response = (HttpWebResponse)webRequest.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
using (StreamReader responseReader = new StreamReader(responseStream))
{
responseHTML = responseReader.ReadToEnd();
}
}
}
webRequest.GetResponse will get timeout.
I think the problem is the domain (localhost), i know it's not possible but i have not any domain and i won't create a fake one in web.config. Moreover i have tried using a fake domain without success.
Without the following line
webRequest.CookieContainer = cookieContainer;
it works nicely without sharing session.
I would remember domain must be set otherwise i will received the relative error.

Session access must be serialized. When you use ASP.NET session, it is necessary to "serialize" HTTP requests to avoid threading issues. If two or more requests were processed in parallel, that would mean two threads could change or read session variables at the same time, which could cause a variety of issues.
The good news: ASP.NET will serialize the requests for you, automatically. If you send a second request with the same ASP.NET_SessionId, it will wait until the first one has completed.
The bad news: That means that a mechanism like the one you are attempting will not work. Your web request runs in the context of one HTTP request that is already in progress; it will block any additional HTTP requests until it is completed, including the request that you are sending via WebRequest.
More good news: If your page reads session data and does not write it, it can specify a hint that will allow two threads to run concurrently. Try adding this to both pages (the page your code is behind and the page that your code is attempting to access):
<% #Page EnableSessionState="ReadOnly" %>
If ASP.NET recognizes that the session needs are read-only, it'll allow two read-only threads to run at the same time with the same session ID.
If you need read/write access in either page, you are out of luck.
An alternative would be to use HttpServerUtility.Transfer instead. The role of the first page would change. Instead of serving as a proxy to the second page, it hands off control to the second page. By putting the pages in series, you avoid any issues with parallelism.
Example:
Server.Transfer("MyPage.aspx");

Grab the contents of a Drupal website that is secured with a login form

I would like to grab some content from a website that is made with Drupal.
The challenge here is that i need to login on this site before i can access the page i want to scrape. Is there a way to automate this login process in my C# code, so i can grab the secure content?

To access the secured content, you'll need to store and send cookies with every request to your server, starting with the request that sends your log in info and then saving the session cookie that the server gives you (which is your proof that you are who you say you are).
You can use the System.Windows.Forms.WebBrowser for a less control but out-of-the-box solution that will handle cookies.
My preferred method is to use System.Net.HttpWebRequest to send and receive all web data and then use the HtmlAgilityPack to parse the returned data into a Document Object Model (DOM) which can be easily read from.
The trick to getting System.Net.HttpWebRequest to work is that you must create a long-lived System.Net.CookieContainer that will keep track of your log in info (and other things the server expects you to keep track of). The good news is that the HttpWebRequest will take care of all of this for you if you provide the container.
You need a new HttpWebRequest for each call you make, so you must sets their .CookieContainer to the same object every time. Here is an example:
UNTESTED
using System.Net;
public void TestConnect()
{
CookieContainer cookieJar = new CookieContainer();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.mysite.com/login.htm");
request.CookieContainer = cookieJar;
HttpWebResponse response = (HttpWebResponse) request.GetResponse();
// do page parsing and request setting here
request = (HttpWebRequest)WebRequest.Create("http://www.mysite.com/submit_login.htm");
// add specific page parameters here
request.CookeContainer = cookieJar;
response = (HttpWebResponse) request.GetResponse();
request = (HttpWebRequest)WebRequest.Create("http://www.mysite.com/secured_page.htm");
request.CookeContainer = cookieJar;
// this will now work since you have saved your authentication cookies in 'cookieJar'
response = (HttpWebResponse) request.GetResponse();
}
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.aspx
HttpWebRequest Class
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.cookiecontainer.aspx

You'll have to use the Services module to do that. Also check out this link for a bit of explanation.

Retrieve DOM data from site

Is there any chance to retrieve DOM results when I click older posts from the site:
http://www.facebook.com/FamilyGuy
using C# or Java? I heard that it is possible to execute a script with onclick and get results. How I can execute this script:
onclick="(JSCC.get('j4eb9ad57ab8a19f468880561') && JSCC.get('j4eb9ad57ab8a19f468880561').getHandler())(); return false;"

I think older posts link sends an Ajax request and appends the response to the page. (I'm not sure. You should check the page source).
You can emulate this behavior in C#, Java, and JavaScript (you already have the code for javascript).
Edit:
It seems that Facebook uses some sort of internal APIs (JSCC) to load the content and it's undocumented.
I don't know about Facebook Developers' APIs (you may want to check that first) but if you want to emulate exactly what happens in your browser then you can use TamperData to intercept GET requests when you click on more posts link and find the request URL and it's parameters.
After you get this information you have to Login to your account in your application and get the authentication cookie.
C# sample code as you requested:
private CookieContainer GetCookieContainer(string loginURL, string userName, string password)
{
var webRequest = WebRequest.Create(loginURL) as HttpWebRequest;
var responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
string responseData = responseReader.ReadToEnd();
responseReader.Close();
// Now you may need to extract some values from the login form and build the POST data with your username and password.
// I don't know what exactly you need to POST but again a TamperData observation will help you to find out.
string postData =String.Format("UserName={0}&Password={1}", userName, password); // I emphasize that this is just an example.
// cookie container
var cookies = new CookieContainer();
// post the login form
webRequest = WebRequest.Create(loginURL) as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;
// write the form values into the request message
var requestWriter = new StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();
webRequest.GetResponse().Close();
return cookies;
}
Then you can perform GET requests with the cookie you have, on the URL you've got from analyzing that JSCC.get().getHandler() requests using TamperData, and eventually you'll get what you want as a response stream:
var webRequest = WebRequest.Create(url) as HttpWebRequest;
webRequest.CookieContainer = GetCookieContainer(url, userName, password);
var responseStream = webRequest.GetResponse().GetResponseStream();
You can also use Selenium for browser automation. It also has C# and Java APIs (I have no experience using Selenium).

Facebook loads it's content dynamically with AJAX. You can use a tool like Firebug to examine what kind of request is made, and then replicate it.
Or you can use a browser render engine like webkit to process the JavaScript for you and expose the resulting HTML:
http://webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/

C# WebRequest returning 401

There is a web file within my intranet that my computer is authorized to read and write. I can open up IE or Firefox and view the file by typing int the url address. I need to write a C# desktop app that reads/writes to that file. Even though my computer has access, all my attempts so far result in 401, unauthorized access errors. The program needs to work from any computer whose account has been authorized, so I cannot hard-code any username/password. I've never done anything like this, but I was able to scrounge the following from several sites:
WebRequest objRequest = HttpWebRequest.Create("https://site.com/file");
objRequest.Credentials = CredentialCache.DefaultNetworkCredentials;
objRequest.Proxy = WebRequest.DefaultWebProxy;
objRequest.Proxy.Credentials = CredentialCache.DefaultCredentials;
WebResponse objResponse = (WebResponse)objRequest.GetResponse();
using (StreamReader sr = new StreamReader(objResponse.GetResponseStream()))
{
string str = sr.ReadToEnd();
sr.Close();
//... Do stuff with str
}
If it matters, I'm working in .NET 2.0

Just ran into the same problem, it all started working when I added:
objRequest.UseDefaultCredentials = true;

Did you try using Fiddler to inspect the actual request that was sent to the server?
You can also check if the server requires a client certificate to allow the connection.
Since you are accessing an intranet server, do you really need to set the proxy part? I mean most of the time, the proxy is configured to ignore local addresses anyway.

This won't work if NTLM credentials are required:
objRequest.Credentials = CredentialCache.DefaultNetworkCredentials;
You need to pass in the actual credentials like:
NetworkCredential networkCredential = new NetworkCredential(UserName, Password, Domain);
CredentialCache credCache = new CredentialCache();
credCache.Add(new Uri(url), "NTLM", networkCredential);
objRequest.Proxy.Credentials = credCache;

how to get the source code as register user

i downloaded a sourcecode of a site,but i downloaded it i saw it identify my program as a guest,i search at google and figure out that i can send a cookie when i "ask" the source code.
that what i have managed to do and it still dont identify me as register user:
CookieContainer cj = new CookieContainer();
string all = "";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(Url);
req.CookieContainer = cj;
HttpWebResponse res = (HttpWebResponse)req.GetResponse();
CookieCollection cs=cj.GetCookies(req.RequestUri);
CookieContainer cc = new CookieContainer();
cc.Add(cs);
req.CookieContainer = cc;
StreamReader read = new StreamReader(res.GetResponseStream());
all = read.ReadToEnd();
read.Close();
return all;
what is wrong here?
tyvm for help:)
(if that help,i can have a simple details of a register user of the site).

You would have to use the cookie that the server left behind in your cookie cache that identified you as a authenticated user in a previous session. You'll need to use the Cookie(name, value) constructor. Getting the value is the tricky part, look through your cookie cache to see if you can find one. It is still going to fail if the server expires the cookie.
Using a tool that lets you look at the HTTP headers and cookie values is important to debug this. Firebug is very nice.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.