Trying to download string and can't search in it context - c#

I am using WebClient to download string html from WebSite and then i am trying to manipulate the string by using SubString and IndexOf..
Also some times i use the functions: substring, indexOf or contains and a strange thing happens:
Some times it shows a text (HTML code) and some times it isn't show anything at all.
using (WebClient client = new WebClient())
{
htmlCode = client.DownloadString("https://www.google.com");
}
This is my code for getting an html code from a web site.
Now for example in this site i want to get the source of an image - a specific img (or another attribute)
using (StringReader reader = new StringReader(htmlCode))
{
string inputLine;
while ((inputLine = reader.ReadLine()) != null)
{
if (inputLine.Contains("img"))
{
RichTextBox.Text += inputLine;
}
}
}
There May be some syntax problems but don't look at it, They are not important.
Do you have an alterenetive or better way to get an HTML source code from a page and handle with it. It has to be HTTPS site and i would like a good explanation of it.
Sorry for noob question.

Related

how to load xml file in c#

Can any one give me a proper explain why i am unable to get updated XML content from URL. I have a XML file which will frequently update. But in my application i am getting old data. Until i restart my application.
Here i am placing my code that i have tried
XmlDocument doc = new XmlDocument();;
string str;
using (var wc = new WebClient())
{
str = wc.DownloadString(location.AbsoluteUri);
}
doc.LoadXml(str);
And also tried with below code
WebRequest req = HttpWebRequest.Create("url");
using (Stream stream = req.GetResponse().GetResponseStream())
{
xmldoc.Load(stream);
}
I got to know that raw git hub take time to update in all servers so it is taking time to update. So you can use other web services to get result you want.

How to execute all http-links from a website?

I have a task to write a program on C#, which finds all http-links from a website. Now I've write a such function for it:
async static void DownloadWebPage(string url)
{
using (HttpClient client = new HttpClient())
using (HttpResponseMessage response = await client.GetAsync(url))
using (HttpContent content = response.Content)
{
string[] resArr;
string result = await content.ReadAsStringAsync();
resArr = result.Split(new string[] {"href"}, StringSplitOptions.RemoveEmptyEntries);//splitting
//here must be some code-string which finds all neccessary http-links from resArr
Console.WriteLine("Main page of " + url + " size = " + result.Length.ToString());
}
}
Using this function I load a web-page content to the string, then I parse this string and write results to array, using "href"-splitter, then I check every array-unit on string, which contents "href" substring.So I can get strings, which content http-links. Problem starts when the string is spliting, because impossible to find http-links, to my mind this is due to content-format of this string.How to fix it?
I once did something similar. My solution was to change the html in a way that it fits the xml-regulations.
(Here could be the problem with this solution, i believe my html was in some way predefined, so i only had to change a few thing which I knew are not xml conform in the html)
After this you could simple search the "a"-nodes and read the href param.
Unfortunately, I can't find my code anymore, it's too long ago.

How to open txt file on localhost and change is content

i want to open a css file using C# 4.5 and change only one file at a time.
Doing it like this gives me the exception - URI formats are not supported.
What is the most effective way to do it ?
Can I find the line and replace it without reading the whole file ?
Can the line that I am looking and than start to insert text until
cursor is pointing on some char ?
public void ChangeColor()
{
string text = File.ReadAllText("http://localhost:8080/game/Css/style.css");
text = text.Replace("class='replace'", "new value");
File.WriteAllText("D://p.htm", text);
}
I believe File.ReadAllText is expecting a file path, not a URL.
No, you cannot search/replace sections of a text file without reading and re-writing the whole file. It's just a text file, not a database.
most effective way to do it is to declare any control you want to alter the css of as "runat=server" and then modify the CssClass property of it. There is no known alternative way to modify the css file directly. Any other hacks is just that.. a hack and very innefficient way to do it.
As mentioned before File.ReadAllText does not support url. Following is a working example with WebRequest:
{
Uri uri = new Uri("http://localhost:8080/game/Css/style.css");
WebRequest req = WebRequest.Create(uri);
WebResponse web = req.GetResponse();
Stream stream = web.GetResponseStream();
string content = string.Empty;
using (StreamReader sr = new StreamReader(stream))
{
content = sr.ReadToEnd();
}
content.Replace("class='replace'", "new value");
using (StreamWriter sw = new StreamWriter("D://p.htm"))
{
sw.Write(content);
sw.Flush();
}
}

Retrieve HTML from links on page

I am using the following method to retrieve the source code from my website-
class WorkerClass1
{
public static string getSourceCode(string url)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
StreamReader sr = new StreamReader(resp.GetResponseStream());
string sourceCode = sr.ReadToEnd();
sr.Close();
return sourceCode;
}
}
And then implement the WorkerClass1 as so-
private void button1_Click(object sender, EventArgs e)
{
string url = textBox1.Text;
string sourceCode = WorkerClass1.getSourceCode(url);
StreamWriter sw = new StreamWriter(#"path");
sw.Write(sourceCode);
sw.Close();
}
This works great and retrieves the HTML from my home page, however there are links at the bottom the page which I want to follow once the first page has been retrieved.
Is there a way I could modify my current code to do this?
Yes of course.
What I would do is to read the HTML using a regular expression looking for links. For each match, I would put those links in a queue or similar data structure, and then use the same method for looking at that source.
Consider looking at HTMLAgilityPack for the parsing, it might be easier, even though looking for links should be quite simpele using Google.

How could I download a file like .doc or .pdf from internet to my harddrive using c#

How could I download a file like .doc , .pdf from internet to my
hard drive using c#
using (var client = new System.Net.WebClient())
{
client.DownloadFile( "url", "localFilename");
}
The most simple way is use WebClient.DownloadFile
use the WebClient class:
using(WebClient wc = new WebClient())
wc.DownloadFile("http://a.com/foo.pdf", #"D:\foo.pdf");
Edit based on comments:
Based on your comments I think what you are trying to do is download i.e. PDF files that are linked to from an HTML page. In that case you can
Download the page (with WebClient,
see above)
Use the HtmlAgilityPack to find
all the links within the page that
point to pdf files
Download the pdf files
i am developing a crawler were if i
specify a keyword for eg:SHA algorithm
and i select the option .pdf or .doc
from the crawler it should download
the file with selected format in to a
targeted folder ..
Based on your clarification this is a solution using google to get the results of the search:
DownloadSearchHits("SHA", "pdf");
...
public static void DownloadSearchHits(string searchTerm, string fileType)
{
using (WebClient wc = new WebClient())
{
string html = wc.DownloadString(string.Format("http://www.google.com/search?q={0}+filetype%3A{1}", searchTerm, fileType));
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var pdfLinks = doc.DocumentNode
.SelectNodes("//a")
.Where(link => link.Attributes["href"] != null
&& link.Attributes["href"].Value.EndsWith(".pdf"))
.Select(link => link.Attributes["href"].Value)
.ToList();
int index = 0;
foreach (string pdfUrl in pdfLinks)
{
wc.DownloadFile(pdfUrl,
string.Format(#"C:\download\{0}.{1}",
index++,
fileType));
}
}
}
In general though you should ask a question related to a particular problem you have with a given implementation that you already have - based on your question you are very far off being able to implement a standalone crawler.
Use WebClient.DownloadFile() from System.Net
Using WebClient.DownloadFile
http://msdn.microsoft.com/en-us/library/system.net.webclient.downloadfile.aspx
using (var client = new WebClient())
{
var data = client.DownloadFile(url, filename);
}

Categories

Resources