Good morning everyone,
I recently got a request if it's possible to retrieve data from other sites search results. I tried searching, but didn't exactly know how to word my searching.
Best explained by example.
Visit: https://bcbst.vitalschoice.com/professional?search_specialty_id=29&ci=DFT&geo_location=33688&network_id=39&sort=relevancy&radius=any&page=1
You'll see a list of doctors.
I'm looking for a way to programmatically get the list of doctors. Like the name, address, phone.
I just need some direction as I will probably be doing this for multiple sites.
I program in C# and JS.
In the case of the website you linked it has an API available for use. What you can do is make an AJAX request (if using JQuery) or WebRequest (if using C#) to one of the endpoints, and then convert the JSON you get from the website into whatever you need to use.
You can test what you'll be getting back from the server by typing the url into the browser, example
As for the search parameters, you'll have to add those to the url. I'd advise taking a look at their API to see what functions they support.
Hope this helps!
Related
I want to know the method of pulling the data from website and parsing it into our own code to present it to the user.
For example: Consider an app in which a user types a movie name and all the poster gets fetched from various websites, like IMDb, etc. Or a user enters a movie name and all the data from IMDb is fetched. I know about certain third party API services for fetching data from IMDb, like omdbapi and imdbapi, but I want to know the method of doing so from any sort of website, not just IMDb.
I am a complete newbie in this context so please guide me through this from the very beginning. I want to do this in a Windows 8 Store app using C# and XAML in Visual Studio.
Simple way is you use the website's RSS feed. You can find the rss feed for any website. All you have to do is pass the parameters as a query string using a web request object. the response stream will them have all the details that you want that can be parsed through in c# and worked upon.
There is no standard way to do it for any website
you must write your algorithm for each of the websites you want to get the content from
HttpClient
is you tool in getting web content in your app
Check out YQL:
The Yahoo Query Language is an expressive SQL-like language that lets you query, filter, and join data across Web.
You should use Html Agility Pack.
For better performance, host your scraping service on Azure.
I’m looking for a website that offers API for retrieving the words from English WordNet database.
I do not want to download the WordNet database and implement it in my server.
Simply I want to call API and get back some results in XML format from that web site.
I have a web application in ASP.net that is written in C#.
Here there is a sample from WordNet, I want to do something like that in my web application.
WordNet Online
It seems that is no such API publicly available.
According to Related Projects site part of WordNet data is avaible as API via abbreviations.com:
Abbreviations.com has created free APIs based on REST calls which return a well-formatted XML result, providing both synonyms and definitions APIs based on the WordNet database.
However on the same page in .NET/C# section you can find some publicly available local APIs, so you don't have to implement it by yourself, but have to download data files.
WordNet does not seem to expose a REST or similar API that can be used. That said, you might be able to derive the URL pattern by searching online and using that in your application and parsing the response html.
You might want to check there website to make sure this is legal.
The ERP application we use has webservices but not with the functionalities we want.
So we would like to build an in-between webservice which forwards the request to ERP. And sends the result back without the requester, to even noticing, the difference. We don't know how the wsdl will look. It can be a list of customers, or one item. thats not important.
Is this something you have done/seen before? I have looked for examples everywhere I can think of. The code im trying to do it now with, just reacts as a webrequests.
I would like to show the visitor an adjusted wsdl from the ERP webservice and it has to be modified a little bit to accept a simple login en from then on forward the requests.
I was thinking the visitor logs in first and after a check receives a session id. This session id needs to be added always into an extra header value with the original webservice. These I will translate to the ERP webservice.
Hope someone has seen such an implementation and give me some hints/links.
The webservice can run in ASP but I prefer it to be a simple service in Windows.
I found this:
http://www.java2s.com/Code/CSharp/Network/ImplementsamultithreadedWebproxyserver.htm
Don't know if it will work in production but it looks to be passing it all thru..
Please comment if you see issues with this?!
I am building a web site in ASP.net and C# that one of its components involves log-in to a website that the user has an account (for example cellular phone company) on behalf of the user, take information from this site and store it in our database.
I think this action called "scraping".
Are there any products that already does so that I can use to integrate with my software ?
I don't need a software that does it, I need some sort of SDK that I can integrate with my C# code.
Thanks,
Koby
Use the HtmlAgilityPack to parse the HTML that you get from a web request once you've logged in.
See here for logging in: Login to website, via C#
I haven't found any product, that would do it right so far.
One way to handle this is to
- do requests by your self
- use http://htmlagilitypack.codeplex.com/ to extract important information from downloaded html
- save extracted information by your self
Thing is, that depending on context, there are so many things to tune/configure, that you need very large product and still it won't reach custom solution performance/accuracy:
a) multithreading control
b) extraction rules
c) persistance control
d) web spidering (or how next link to parse is chosen)
Check the Web Scraping Wikipedia Entry.
However I would say since what we need to acquire via web-scraping is application specific, most of the time, it may be more efficient to scrape whatever you need from a web response stream.
I've tried to find som good how to, or some example that is good for beginners when it comes to write your first web crawler. I would like to write it in c#. Does anybody have any good example code to share or some tips on some sites where I can find info for c#, and some bacic webcrawling.
Thanks
HtmlAgilityPack is your friend.
Yes, HtmlAgeilityPack is a good tool to parse the HTML but that is definitely not enough.
There are 3 elements to crawling:
1) Crawling itself i.e. looping through web sites: This can be done by sending requests to random IP addresses but this does not work well since many websites use shared IP address HTTP with host header so using IP does not hit it. On the other hand, there are far too many IP addresses unused or not hosting a web server so this does not get you anywhere.
I suggest you send request to google (search for words from a dictionary) and crawl the results coming back.
2) Rendering the content: Many websites generate the HTML content in JavaScript when the form is loaded so if you send a simple request, it will not be able to capture the content as a user would be able to see. You need to render the page as browser does and that can be done using Webkit.net which is an open source tool although still in beta.
3) Comprehending and parsing the HTML: use HTML pack and there are tons of examples online. This can be used to crawl the site as well.
A while ago I also wanted to write a custom web crawler, and found this document:
Web Crawler
It has some great info, and is very well written IMO.