Find then save web page to Drive Using C#

Find then save web page to Drive Using C# - c#

i have a problem i want to find a specific string in a web page then save the web page that i found the string.
I am using firefox for web browser
Problem :
1. I open a page (Containing a random word)
2. Then my C# program doing searching in the page, if the word find in the page then program will automaticaly save the page to Drive . If not the program will do click on Next Button on the page then do search again in the page.
Is that possible ?

Ok, so it sounds like you might want to do something like the following.
You can use WebClient to load the response from a url into a string:
using(WebClient client = new WebClient()) {
string s = client.DownloadString(your_url);
}
You can then search for a occurrence of the string you a looking for in "s" using indexOf:
if (s.IndexOf("string you are searching for") > -1)
{
// s contains "string you are searching for"
}
Then you can save "s" to disk using a StreamWriter:
using(StreamWriter sw = new StreamWriter("file name"))
{
sw.WriteLine(s);
}
In terms of clicking the "next" button can you define the urls as a list of strings and then just iterate over them using the previous code for each.

Related

How do I download a page with Selenium

I did not find a solution how to download a whole Webpage
All I want is to navigate to https://api.tracker.gg/api/v2/rocket-league/standard/profile/epic/ManuelNotManni?
and download it. Is it possible to download the page with Selenium?
I used the following Code to navigate to the page:
var options = new ChromeOptions();
using (var driver = new ChromeDriver(".", options))
{
driver.Navigate().GoToUrl("https://api.tracker.gg/api/v2/rocket-league/standard/profile/epic/ManuelNotManni?");
}

You can retrieve the page source content with driver.PageSource command. And save it to the file.
var options = new ChromeOptions();
using (var driver = new ChromeDriver(".", options))
{
driver.Navigate().GoToUrl("https://api.tracker.gg/api/v2/rocket-league/standard/profile/epic/ManuelNotManni?");
await File.WriteAllTextAsync("PageSource.html", driver.PageSource);
}
For downloading json it will work well.
But for html pages, note:
If the page has been modified after loading (for example, by JavaScript) there is no guarantee that the returned text is that of the modified page. Please consult the documentation of the particular driver being used to determine whether the returned text reflects the current state of the page or the text last sent by the web server. The page source returned is a representation of the underlying DOM: do not expect it to be formatted or escaped in the same way as the response sent from the web server.
References
https://www.selenium.dev/selenium/docs/api/dotnet/html/P_OpenQA_Selenium_IWebDriver_PageSource.htm
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/file-system/how-to-write-to-a-text-file

Get Information from website with C# after Js

So I'm working on a little fun project and keep in mind I'm a beginner, I want to grab the info of songs that have played from this radio channel:
ilikeradio (sorry the site is in Swedish).
I want to just simply put that in a textBox.
I have tried:
WebClient web = new WebClient();
string htmlContent = new System.Net.WebClient().DownloadString(URL);
But this only gave me the source code and not the code with the list items for artist song etc.
Any help is appreciated Keep in mind I am a beginner.

It seems that the URL you provided returns HTML, but if you compare the HTML you get with that which is rendered in the browser (by right-clicking the webpage and inspecting the HTML), you will see that what you get is actually different than what is finally rendered. The reason for this is that the website is using Ajax to load the song list. In other words, when you call DownloadString(), you get the results from the web serve before it has had the javascript run and update it.
It is not easy to get the final HTML render result. But you are in luck!
If you go to that website and open the debug tools in Chrome and click the Network tab. Next, sort all the requests by Method and GET requests should be at the top. Amongst those GET requests is the one you are looking for:
https://unison.mtgradio.se/api/v2/timeline?channel_id=6&client_id=6690709&to=2018-10-02T08%3A00%3A50&from=2018-10-02T07%3A00%3A50&limit=40
This URL returns JSON which the web server eventually loads and renders for you to see as a "song list".
The JSON returned is a list of songs with some metadata. You will need to parse this JSON to extract and display the list of songs in your own webpage. I suspect that you can view the source code of that website and find the Javascript to do this ;)
Newtonsoft JSONConvert is the best library for parsing JSON.
If you want to view the JSON with the song list, copy the URL above and paste it into your browser address bar (and hit enter). Next, copy the JSON result and then open this. Paste JSON into the Text tab and then click the Viewer tab. You will note that the first element is the Current Song, while other elements are in the song list. Also note that each element has a child element called song, which contains the title.
To get you going, try this:
using System;
using System.Net;
using Newtonsoft.Json.Linq;
public class Program
{
public static void Main()
{
WebClient web = new WebClient();
using (WebClient wc = new WebClient())
{
var json = wc.DownloadString("https://unison.mtgradio.se/api/v2/timeline?channel_id=6&client_id=6690709&to=2018-10-02T08%3A00%3A50&from=2018-10-02T07%3A00%3A50&limit=40");
dynamic stuff = JArray.Parse(json);
string name = stuff[1].song.title;
Console.WriteLine(name);
}
}
}
NOTE
By the time you try this out, you will notice that the song name printed to console does not exist in the list on the webpage. This is because if you look at the JSON URL that I posted above, there are query parameters... one of which is date and time. You will need to modify the URL accordingly to get the most recent (displayed right now on the website) playlist.

Downloading data from a hyperlink on a webpage

I wish to download some data on a daily basis. I can get the data manually by loading the webpage web page with data and then there is a link near the top right hand corner called 'History Download'. This link opens an excel file with the data I require.
Using either C# or VBA is there anyway to automate this process and if so how?
Edit
Here is the code I currently have. It download a text file with all the html of the webpage although looking at the html it looks like the home page. Was hoping this link would download the data as an excel file. I originally save text.xlsx but it didn't like that so have save the file below as txt.
class Program
{
static void Main(string[] args)
{
string path = "http://www.ishares.com/uk/institutional/en/products/251382/ishares-msci-world-minimum-volatility-ucits-etf/1393511975017.ajax?fileType=xls&fileName=iShares-MSCI-World-Minimum-Volatility-UCITS-ETF";
string pathSave = #"C:\MyFolder\test.txt";
WebClient wc = new WebClient();
wc.DownloadFile(path,pathSave);
}
}

Open PDF file in a specific page using pdfbox

I have this program that makes a search, for example a sentence, in all pdf files of a folder.
It's working perfect...
But I would like to add a feature to open in the exact page of that sentence.
And I look through the documentation of pdfbox and I could not find anything that was specific for this.
I don't know if I let something pass by, but if somebody could enlighten me in this I would be very grateful
Thank you

I read your question earlier this week. At the time, I didn't have an answer for you. Then I stumbled on the methods setStartPage() and setEndPage() on the PDFBox documentation for the PDFTextStripper class and it made me think of your question and this answer. It's been about 4 months since you asked the question, but maybe this will help someone. I know I learned a thing or two while writing it.
When you search a PDF file, you can search a range of pages. The functions setStartPage() and setEndPage() set the range of pages you are searching. If we set the start and end page to the same page number, then we will know which page the search term was found on.
In the code below, I am using a windows forms application but you can adapt my code to fit your application.
using System;
using System.Windows.Forms;
using org.apache.pdfbox.pdmodel;
using org.apache.pdfbox.util;
//The Diagnostics namespace is needed to specify PDF open parameters. More on them later.
using System.Diagnostics;
//specify the string you are searching for
string searchTerm = "golden";
//I am using a static file path
string pdfFilePath = #"F:\myFile.pdf";
//load the document
PDDocument document = PDDocument.load(pdfFilePath);
//get the number of pages
int numberOfPages = document.getNumberOfPages();
//create an instance of text stripper to get text from pdf document
PDFTextStripper stripper = new PDFTextStripper();
//loop through all the pages. We will search page by page
for (int pageNumber = 1; pageNumber <= numberOfPages; pageNumber++)
{
//set the start page
stripper.setStartPage(pageNumber);
//set the end page
stripper.setEndPage(pageNumber);
//get the text from the page range we set above.
//in this case we are searching one page.
//I used the ToLower method to make all the text lowercase
string pdfText = stripper.getText(document).ToLower();
//just for fun, display the text on each page in a messagebox. My pdf file only has two pages. But this might be annoying to you if you have more.
MessageBox.Show(pdfText);
//search the pdfText for the search term
if (pdfText.Contains(searchTerm))
{
//just for fun, display the page number on which we found the search term
MessageBox.Show("Found the search term on page " + pageNumber);
//create a process. We will be opening the pdf document to a specific page number
Process myProcess = new Process();
//I specified Adobe Acrobat as the program to open
myProcess.StartInfo.FileName = "Acrobat.exe";
//see link below for info on PDF document open parameters
myProcess.StartInfo.Arguments = "/A \"page=" + pageNumber + "=OpenActions\"" + pdfFilePath;
//Start the process
myProcess.Start();
//break out of the loop. we found our search term and we opened the PDF file
break;
}
}
//close the document we opened.
document.close();
Check out this Adobe pdf document on setting opening parameters of the PDF file:
http://partners.adobe.com/public/developer/en/acrobat/PDFOpenParameters.pdf

asp.net C# get final page source of a webpage

I'm trying to get the FINAL source of a webpage. I am using webclient openRead method, but this method is only returning the initial page source. After the source downloads, there is a javascript that runs and collect the data that I need in a different format and my method will be looking for something that got completely changed.
What I am talking about is exactly like the difference between:
right-click on a webpage -> select view source
access the developer tools
Look at this site to know what I am talking about: http://www.augsburg.edu/history/fac_listing.html and watch how any of the email is displayed using each option. I think what happening is that the first will show you the initial load of the page. The second will show you the final page html. The webclient only lets me do option #1.
here is the code that will only return option #1. Oh I need to do this from a console application. Thank you!
private static string GetReader(string site)
{
WebClient client = new WebClient();
try
{
data = client.OpenRead(site);
reader = new StreamReader(data);
}
catch
{
return "";
}
return reader.ReadToEnd();
}

I've found a solution to my problem.
I ended up using Selenium-WebDriver PageSource property. It worked beautifully!
Learn about Selenium and Webdriver. It is an easy thing to learn. It helps for testing and on this!

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Find then save web page to Drive Using C# - c#

Related

How do I download a page with Selenium

Get Information from website with C# after Js

Downloading data from a hyperlink on a webpage

Open PDF file in a specific page using pdfbox

asp.net C# get final page source of a webpage

Categories

Resources