How Systems like AdSense and Webstats Work? - c#

I am thinking about working with remote data and receive or send data actually in external web sites. exists a large amount of examples in World Wide Web are working. For example: free online web tools like web stats OR Google's AdSense .... .you know in such web services some code will generate for publishers and the publisher put generated code in her BODY of web page document(HTML file) and the system after that will work. we can have count of visits for home pages, count of clicks on advertisements and so on.now this is my question: How such systems Work? and how can I investigate and search about them to find out how to program them? can you suggest me some keywords? Which Titles should I looking for? and which Technologies is relevant to this kind of programming? Exactly I want to find some relevant references to learn and start some experiences on these systems. if my Q is not Clear I will Explain it more if you want...Help me I am confused.
Consider that I am an Programmer want to program such a systems not to use them.

There are a few different ways to track clicks.
Redirection Tracking
One is to link the advertisement (or any link) to a redirection script. You would normally pass it some sort of ID so it knows which URL it should forward to. But before redirecting the user to that page it can first record that click in a database where it can store the users IP, timestamp, browser information, etc. It will then forward the user (without them really knowing) to the specified URL.
Advertisement ---> Redirection Script (records click) ---> Landing Page
Pixel Tracking
Another way to do it is to use pixel tracking. This is where you put a "pixel" or a piece of Javascript code onto the body of a webpage. The pixel is just an image (or a script posing as an image) which will then be requested by the user visiting the page. The tracker which hosts the pixel can record the relevant information by that image request. Some systems will use Javascript instead of an image (or they use both) to track clicks. This may allow them to gain slightly more information using Javascript's functions.
Advertisement ---> Landing Page ---> User requests pixel (records click)
Here is an example of a pixel: <img src="http://tracker.mydomain.com?id=55&type=png" />
I threw in the png at the end because some systems might require a valid image filetype.
Hidden Tracking
If you do not want the user to know what the tracker is you can put code on your landing page to pass data to your tracker. This would be done on the backend (server side) so it is invisible to the user. Essentially you can just "request" the tracker URL while passing relevant data via the GET parameters. The tracker would then record that data with very limited server load on the landing page's server.
Advertisement ---> Landing Page requests tracker URL and concurrently renders page

Your question really isn't clear I'm afraid.
Are you trying to find out information on who uses your site, how many click you get and so one? Something like Google Analytics might be what you are after - take a look here http://www.google.com/analytics/
EDIT: Adding more info in response to comment.
Ah, OK, so you want to know how Google tracks clicks on sites when those sites use Google ads? Well, a full discussion on how Google AdSense works is well beyond me I'm afraid - you'll probably find some useful info on Google itself and on Wikipedia.
In a nutshell, and at a very basic level, Google Ads work by actually directing the click to Google first - if you look at the URL for a Google ad (on this site for example) you will see the URL starts with "http://googleads.g.doubleclick.net..." (Google own doubleclick), the URL also contains a lot of other information which allows Google to detect where the click came from and where to redirect you to see the actual web site being advertised.
Google analytics is slightly different in that it is a small chunk of JavaScript you run in your page, but that too basically reports back to Google that the page was clicked on, when you landed there and how long you spend on a page.
Like I said a full discussion of this is beyond me I'm afraid, sorry.

Related

C# URL, Get all files of type "audio/webm"

I'm making an app that enables me to download YouTube videos. If you weren't aware by selecting the network tab on a YouTube video then refreshing the page, filtering using "mime-type:audio/webm" {EDIT:: and removing &range=.. from the url} you get access to all the video's files like just audio, just video, low quality, high quality etc.
The app I'm wanting to make but can't seem to figure out will go to https://youtube.com/watch?v=VIDEO_ID, filter the requests by "mime-type:audio/webm" and list all of the links found.
How could I access this from C#?
a screenshot of what I'm talking about in regards to video types
You can clearly see that the web request is initiated by a javascript.
By doing so, it has been made hard to initiate such a request from C#, because;
you either need to parse and execute the javascript, or
recreate the generated URL's in C#, or
use a proxy to capture the incoming data
Another option would be to check out Google's API, but I am not sure they expose such functionality.

Issue with HttpWebRequest Google.com

I have a C# application that searches on Google. After a few hits, I see the captcha message.
To solve this, I open Internet Explorer, go to the same page, and I'm presented with the captcha as well. I complete that and then, its all good; search results are shown.
But in my c# application when I hit the same URL, I still see the captcha. Why is that, and how could I bypass it? I am confused as I've completed the captcha (using IE), so why do I see it again on next hit in c# but not from the browser!
I just need to be pointed in the right direction , or some ideas or suggestions.
I don't have any knowledge of how Google does it, but I've seen websites which track how often you use them based on:
IP Address
User-Agent String
Cookies
You can spoof number 2 so its the same as in Internet Explorer, just in case its through that.
Number 3 is easy to check I suppose, and you can transmit the cookie if there is one.
Google wants to prevent other people to send requests by their own applications, there is no advertise , ... . And maybe this is attack, You've two options : 1. Your application should act as the way as the browser acts , for example modifying User-Agent and cookies. 2. Contact google to provide you a API. I'm sure Google provides API for this reason , but I've no more details information.

Need Help in building a "robot" that extracts data from HTTP request

I am building a web site in ASP.net and C# that one of its components involves log-in to a website that the user has an account (for example cellular phone company) on behalf of the user, take information from this site and store it in our database.
I think this action called "scraping".
Are there any products that already does so that I can use to integrate with my software ?
I don't need a software that does it, I need some sort of SDK that I can integrate with my C# code.
Thanks,
Koby
Use the HtmlAgilityPack to parse the HTML that you get from a web request once you've logged in.
See here for logging in: Login to website, via C#
I haven't found any product, that would do it right so far.
One way to handle this is to
- do requests by your self
- use http://htmlagilitypack.codeplex.com/ to extract important information from downloaded html
- save extracted information by your self
Thing is, that depending on context, there are so many things to tune/configure, that you need very large product and still it won't reach custom solution performance/accuracy:
a) multithreading control
b) extraction rules
c) persistance control
d) web spidering (or how next link to parse is chosen)
Check the Web Scraping Wikipedia Entry.
However I would say since what we need to acquire via web-scraping is application specific, most of the time, it may be more efficient to scrape whatever you need from a web response stream.

Replicate steps in downloading file

I'm trying to automate the download of a file from a website. Normally to download the file, I login with a username and password. Navigate to a particular screen then click a button.
I've been trying to watch the sequence of POSTs using Chrome's developer mode, and then replicate all the steps using .Net WebClient class, but to no success. I've derived from the WebClient class and added cookie handling. Which seems to be working. I go to the login page and post using WebClient.UploadValues. About half the times it seems to work. The next step appears to make another POST action to a reporting URL. Once again I use WebClient.UploadValues, but the response from the server is a page showing an internal error.
I have a couple of questions.
1) Are there better tools than hand coding C# code to replicate a bunch of web browser interactions? I really only care about being able to download the file at a particular time each day onto a Windows box.
2) The WebClient does not seem to be the best class to use for this. Perhaps it's a bit to simplistic. I tried using HttpWebRequest, but it has no facilities for encoding POST requests. Any other recommendations?
3) Although Chrome's developer plugin appears to show all interaction, I find it a bit cumbersome to use. I'd be interested in seeing all of the raw communication (unencrypted though, the site is only accesses via https), so I can see if I'm really replicating all of the steps.
I can even post the exact code I'm using. The site I'm pulling data from, specifically is the Standard and Poors website. They have the ability to create custom reports for downloading historical data which I need for reporting, not republishing.
Using IE to download the file would be a much easier, as compared to writing C# / Perl / Java code to replicate http requests.
Reason is, even a slight change in JavaScript code can break the flow.
With IE, you can automate it using COM. Following VBA example opens IS and performs a google search:
Sub Search_Google()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate "http://www.google.com" 'load web page google.com
While IE.Busy
DoEvents 'wait until IE is done loading page.
Wend
IE.Document.all("q").Value = "what you want to put in text box"
ie.Document.all("btnG").Click
'clicks the button named "btng" which is google's "google search" button
While ie.Busy
DoEvents 'wait until IE is done loading page.
Wend
End Sub
3) Although Chrome's developer plugin appears to show all interaction, I find it a bit cumbersome to use. I'd be interested in seeing all of the raw communication (unencrypted though, the site is only accesses via https), so I can see if I'm really replicating all of the steps.
For this you can use Fiddler to view all the interaction going on and the RAW data going back and forth. To make it work with HTTPS you will need to install the Certificates to enable decryption of trafffic.

Determining Another Site's Traffic Measurements?

I have a conceptual question.
I am wondering how companies such as Alexa Internet determine a given site's (not my own) overall traffic and traffic for each unique page. I would appreciate a technical response - if you were to design this feature (i am sure it is complicated but hypothetically...) how would you go about it?
Thanks in advance.
One way is to be hooked into one or more core routers. From there you could perform deep packet inspection to see where traffic is going, what pages are visited, etc.
Another way is to have people install a browser toolbar which records where they go and submits that information back to you. I think this is how Alexa works.
A third way is to have web site owners install a bit of javascript which performs analytics and submits that data back to you. This is how Google does it.
A fourth way is to buy that data from companies that do one of the above.
Alexa estimates website traffic by extrapolating the data from the browsing sessions of the subset of the Internet population who use the Alexa toolbar or browser extensions. This isn't a truly random sample, so questions are raised over the accuracy of such data:
http://en.wikipedia.org/wiki/Alexa_Internet#Accuracy_of_ranking_by_the_Alexa_Toolbar
Installing the Alexa toolbar modifies the browser user-agent, so you can estimate the % of visitors to your site who are contributing data to Alexa by scanning your server logs for requests with the appropriate user-agent strings.

Categories

Resources