So I have a web scraping project where one of the pages has all the necessary content in JSON format inside a set of <script> tags.
here's an example of said <script> tags:
<script>
window.postData = {}
window.postData["content"] = [json content]
</script>
I've used the HtmlAgilityPack to get to the particular <script> tags, but I am not sure how to grab just the json content from this. I can parse the JSON with JSON.net or other library/framework, so I'm not worried about that part. I'm just stuck on getting just the Json. Is there a javascript parsing library or something that I can use to get this, or is there another way to accomplish this.
Any help would be greatly appreciated!
Check out jint
var postDataJSON = new Engine()
.Execute("window.postData = {}; window.postData['content'] = [json content]")
.GetValue("window.postData");
Related
My API is selecting some filtered data from database
My TS :
getItems(fromId: string, os_params: any) {
// var encodedJSON =encodeURIComponent(JSON.stringify(os_params))
var httpOptions = null;
if (os_params) {
httpOptions = {
headers: new HttpHeaders({
// 'Content-Type': 'application/json;charset=utf-8',
// 'Accept': 'application/json,text/*;q=0.99',
'OS_Params': JSON.stringify(os_params) //encodedJSON
//,'Authorizationx': 'os-auth-token'
})
};
}
return this.http.get(`${environment.apiUrl}/api/quate?key=${fromId}`, httpOptions);
}
os_params is a json Parameter contains as a sample { "first_name":"عميل 1" } (filters)
the code is firing an exception (Unexpected end of input)
When i change the value to English it works fine so i have to use encoding but also for that i have to change my C# Code that i need in somewhere else,
so i'm really stuck
any help please ?!!
Looks like you need to add this to your HTML page header section:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
While searching web i found that i can not send non-English characters without encoding so i had to decode it in my back-end code ,
i used HttpUtility.UrlDecode() in c# then Deserialized the result,
if any one has another answer please tell me.
Thanks
Is there a way to get the fully rendered html of a web page using WebClient instead of the page source? I'm trying to scrape some data from the page's html. My current code is like this:
WebClient client = new WebClient();
var result = client.DownloadString("https://somepageoutthere.com/");
//using CsQuery
CQ dom = result;
var someElementHtml = dom["body > main];
WebClient will only return the URL you requested. It will not run any javacript on the page (which runs on the client) so if javascript is changing the page DOM in any way, you will not get that through webclient.
You are better off using some other tools. Look for those that will render the HTML and javascript in the page.
I don't know what you mean by "fully rendered", but if you mean "with all data loaded by ajax calls", the answer is: no, you can't.
The data which is not present in the initial html page is loaded through javascript in the browser, and WebClient has no idea what javascript is, and cannot interpret it, only browsers do.
To get this kind of data, you need to identify these calls (if you don't know the url of the data webservice, you can use tools like Fiddler), simulate/replay them from your application, and then, if successful, get response data, and extract data from it (will be easy if data comes as json, and more tricky if it comes as html)
better use http://html-agility-pack.net
it has all the functionality to scrap web data and having good help on the site
Anyone can help me on how to get the data details from JSON model?
I am using a WCF service which return a JSON type data. It runs well I am sure because I try it from WebClient.
But I want to show the data in my HTML site. I am using the following code, nothing help.
success: function (msg) {
var result = eval("("+msg+")");
$.each(result.UserLoginResult.d,function(i,item){
alert(item.name);
});
It really hurt me, you know.
So I beg your help here, I search from google for hours, No one example can help me. :(.
Thank you all. Finally, I found the problem and fix it.
JQuery already return us Json object not a string, we needn't eval() at all.
Just use msg.d[index][index]!
Happy coding,
Rocky
Did you try the JSON.parse(msg) method?
Then you can simply console.log the answer and find out exactly what to do next.
here is the sample how you do this
$(document).ready(function() {
var jsonp = '[{"Lang":"jQuery","ID":"1"},{"Lang":"C#","ID":"2"}]';
var lang = '';
var obj = $.parseJSON(jsonp);
$.each(obj, function() {
lang += this['Lang'] + "<br/>";
});
$('span').html(lang);
});
out put: jQuery C#
or you can use the $.getJSON method:
im doing a project where I should take half of the image from one source and another half from another source and then merge them together.
in c# it works like this:
HttpWebRequest request1 = (HttpWebRequest)WebRequest.Create("URL");
request1.AddRange(0, 10000);
HttpWebRequest request2 = (HttpWebRequest)WebRequest.Create("URL2");
request2.AddRange(10000, 20000);
and then I read the streams, merge them into a buffer, and write the buffer into a file.
now I have to create a plugin that does the same thing,
as far as I know that I can create an extension for firefox with javascript.
do you think that is possible to do the same thing in javascript or I should search another method? I dont even know yet how to create a plugin so I dont know if I can use some programming language(maybe I can even use c# or java to directly create a firefox plugin)
can you give me some tips? thanks a lot
Yes you can definately do it with ajax
Here you are
Link
$(function() {
$.ajax({
url: 'range-test.txt',
headers: {Range: "bytes=618-647"},
success: function( data ) { $('#results').html( data ); }
});
});
I am working on a C# console application using the Nancy Framework and the Spark view engine, and I am trying to replicate something from another project. However, I am very inexperienced with both Javascript and JSON. To call a chat function in my C# code from my HTML, right now I simply use something like the following...
HTML:
http://localhost:1234/sendchat?message="this is a test message"
C# Code:
Get["/sendchat"] = x =>
{
string message = Request.Query.message;
string message2 = message.Replace("\"", "");
Console.WriteLine(message2);
return View["console.spark"];
};
The problem is that this causes the page to reload. In the project I am looking at for reference, they use Javascript/JSON to call the same type of function without doing a page reload. I understand all of it except for the JSON line as I don't understand what the DataSource is...
$(document).ready(function () {
$("#typechat").keypress(function (event) {
if (event.keyCode == '13') {
event.preventDefault();
message = escape($("#typechat").attr('value'));
$.getJSON(dataSource + "?req=sendchat&message=" + message);
$("#typechat").attr('value', "");
}
});
});
dataSource is just an http domain like http://yourserver.com/possibly/with/a/path. It'll be a string defined somewhere in the code.
JSON resources are fetched just like regular HTML pages, with a normal GET request over HTTP. The only difference is the content is JSON not HTML. Try this in your browser for example to see the JSON returned by the SO api:
http://api.stackoverflow.com/1.1/users/183579
(If you don't have a browser plugin to format/highlight JSON nicely it might just look like a long messy string)
Data source is propobly some web page
dataSource = "http://somepage.com/someaction";
wich renders response as json text, response is grabbed and then parsed to javascript object