Google url scraping not looping lines in list box - c#

I am trying to scrape results from google search, the tool need to browse pages one by one. However the issue is its not taking the all the list from the listbox. Its just working for the first line of the list box.
Startbtn Code
foreach (string url in urlList.Items)
{
webBrowser1.Navigate("https://www.google.com/search?q=" + url);
await PageLoad(30, 5);
MessageBox.Show("sdsaD3");
string pageSource = webBrowser1.DocumentText;
Scrape(pageSource);
}
--
Scrape Method
private async void Scrape(string pageSource)
{
string regexExpression = "(?<=><div class=\"rc\"><div class=\"r\"><a href=\")(.*?)(?=\" onmousedown=)";
Regex match = new Regex(regexExpression, RegexOptions.Singleline);
MatchCollection collection = Regex.Matches(pageSource, regexExpression);
for (int i = 0; i < collection.Count; i++)
{
CommonCodes.WriteToTxt(collection[i].ToString(), "googlescrapedurls.txt");
if (i == collection.Count - 1)
{
var elementid = webBrowser1.Document.GetElementById("pnnext");
if (elementid != null)
{
for (int w = 0; w < 1; w++)
{
BackgroundWorker worker = new BackgroundWorker();
worker.DoWork += new DoWorkEventHandler(backgroundWorker1_DoWork);
worker.RunWorkerAsync(w);
}
}
else if(webBrowser1.Document.GetElementById("pnnext") == null)
{
for(int pg=0; pg< urlList.Items.Count; pg++)
{
webBrowser1.Navigate("https://www.google.com/search?q=" + urlList.Items[pg+1]);
CommonCodes.WaitXSeconds(10);
//await PageLoad(30, 5);
Scrape(webBrowser1.DocumentText);
}
}
}
}
--
Background worker code:
BackgroundWorker backgroundWorker = sender as BackgroundWorker;
webBrowser1.Invoke(new Action(() => { gCaptcha(); }));
webBrowser1.Invoke(new Action(() => { webBrowser1.Document.GetElementById("pnnext").InvokeMember("Click"); }));
await PageLoad(30, 5);
webBrowser1.Invoke(new Action(() => { Scrape(webBrowser1.DocumentText); }));
pageload code
try
{
TaskCompletionSource<bool> PageLoaded = null;
PageLoaded = new TaskCompletionSource<bool>();
int TimeElapsed = 0;
webBrowser1.DocumentCompleted += (s, e) =>
{
if (webBrowser1.ReadyState != WebBrowserReadyState.Complete) return;
if (PageLoaded.Task.IsCompleted) return; PageLoaded.SetResult(true);
};
//
while (PageLoaded.Task.Status != TaskStatus.RanToCompletion)
{
await Task.Delay(delay * 1000);//interval of 10 ms worked good for me
TimeElapsed++;
if (TimeElapsed >= TimeOut * 100) PageLoaded.TrySetResult(true);
}
}
catch (Exception ex)
{
CommonCodes.WriteLog(ex.ToString());
MessageBox.Show(ex.Message);
}
--
The main problem is when I have 5 lines in listbox, for the first line only it is going to every page and scraping urls but for the other lines its not working properly. I don't understand the problem with in the code. Some how the code
MessageBox.Show("sdsaD3");
executing multiple time(If 5 lines in listbox then this msg boxpoping up 5 times). Thanks for the help.
EDit: I found the issue, it seems the issue is with await PageLoad(30, 5); but I am not sure how to invoke async method. Any one have idea?

Related

Cefsharp Offscreen EvaluateScriptAsync

I was using Cefsharp Winforms, and recently I've been trying to switch to Offscreen. Everything works just fine, except now my code doesn't wait for EvaluateScriptAsync to complete before returns the page's source.
Or maybe I am just not quite understand how this task thing is working. Here is my progress so far:
private static void WebBrowserFrameLoadEnded(object sender, FrameLoadEndEventArgs e)
{
var browser = (CefSharp.OffScreen.ChromiumWebBrowser)sender;
if (e.Frame.IsMain)
{
browser.FrameLoadEnd -= WebBrowserFrameLoadEnded;
var x = browser.EvaluateScriptAsync("/* some javascript codes */");
if (x.IsCompleted && x.Result.Success)
{
x.ContinueWith(a =>
{
var task = browser.GetSourceAsync();
task.ContinueWith(d =>
{
if (d.IsCompleted)
{
globalRtnVal = d.Result;
}
}).ConfigureAwait(false);
});
}
}
}
And my main code is like this:
/* some codes */
CefSharp.OffScreen.ChromiumWebBrowser asd = new CefSharp.OffScreen.ChromiumWebBrowser(/* url */);
asd.BrowserSettings.Javascript = CefSharp.CefState.Enabled;
asd.BrowserSettings.WebSecurity = CefSharp.CefState.Disabled;
asd.FrameLoadEnd += WebBrowserFrameLoadEnded;
int tryCount = 0;
do
{
Thread.Sleep(3000);
RtnHtml = globalRtnVal;
if (String.IsNullOrEmpty(RtnHtml))
tryCount++;
if (tryCount == 10 && String.IsNullOrEmpty(RtnHtml))
{
/* some codes */
return null;
}
}
while (String.IsNullOrEmpty(RtnHtml));
/* some codes */

UI Freezing and Computation Really Slow

I'm writing a program, that should replace or remove some entries from a logfile.txt.
The code is working fine ( at least for small LogFiles). If i use a big file (like 27 MB) its getting very slow and the UI freeze. I cant click anything.
On Button click i execute this method:
private string delete_Lines(string[] lines, string searchString)
{
for (int i = 0; i < lines.Length; i++)
{
if (lines[i].Contains(searchString))
{
rtbLog.Text += "Deleting(row " + (i + 1) + "):\n" + lines[i] + "\n";
progressBar1.Value += 1;
if (cbDB == true)
{
while (is_next_line_block(lines, i) == true)
{
i++;
rtbLog.Text += lines[i] + "\n";
progressBar1.Value += 1;
}
}
}
else
{
res += lines[i]+"\n";
progressBar1.Value += 1;
}
}
tssLbl.Text = "Done!";
rtbLog.Text += "...Deleting finished\n";
return res;
}
Lines is the array of the logfile i am trying to clean up. every entry is a single row . tssLbl is a notification label and rtbLog is a richTextBox, where i'am tracking which row i am deleting.
is_next_line_block is just another method, which is checking of the next lines are part of the block i want to delete. The params of this method are the whole lines array and the line position.
private bool is_next_line_block(string[] lines, int curIndex)
{
if (curIndex < lines.Length-1)
{
if (lines[curIndex + 1].StartsWith(" "))
{
return true;
}
else
{
return false;
}
}
else
{
return false;
}
}
Have anybody any idea, what is causing that freezes and is slowing down the program? I know, that i could speed my code up by parallelizing it, but i cant imagine, that it takes so long to check up a 27 MB txt file without parallelism.
You have several issues here:
You are reading the whole file in buffer (array of string), I am guessing you are calling File.ReadAllLines(). Reading big files in buffer will slow you down, as well as in extreme case run you out of memory.
You are using += operation for your rich textbox Text property. That is time consuming operation as UI has to render the whole rich text box every time you update the text property that way. Better option is to use string builder to load these text, and update rich text box periodically.
To fix this you need to read the file as stream. Progress can be monitored based on bytes read instead of line position. You can run the read operation async and monitor progression on a timer, as shown in example below.
private void RunFileOperation(string inputFile, string search)
{
Timer t = new Timer();
int progress = 0;
StringBuilder sb = new StringBuilder();
// Filesize serves as max value to check progress
progressBar1.Maximum = (int)(new FileInfo(inputFile).Length);
t.Tick += (s, e) =>
{
rtbLog.Text = sb.ToString();
progressBar1.Value = progress;
if (progress == progressBar1.Maximum)
{
t.Enabled = false;
tssLbl.Text = "done";
}
};
//update every 0.5 second
t.Interval = 500;
t.Enabled = true;
// Start async file read operation
System.Threading.Tasks.Task.Factory.StartNew(() => delete_Lines(inputFile, search, ref progress, ref sb));
}
private void delete_Lines(string fileName, string searchString, ref int progress, ref StringBuilder sb)
{
using (var file = File.OpenText(fileName))
{
int i = 0;
while (!file.EndOfStream)
{
var line = file.ReadLine();
progress = (int)file.BaseStream.Position;
if (line.Contains(searchString))
{
sb.AppendFormat("Deleting(row {0}):\n{1}", (i + 1), line);
// Change this algorithm for nextline check
// Do this when it is next line, i.e. in this line.
// "If" check above can check if (line.startswith(" "))...
// instead of having to do it nextline next here.
/*if (cbDB == true)
{
while (is_next_line_block(lines, i) == true)
{
i++;
rtbLog.Text += lines[i] + "\n";
progressBar1.Value += 1;
}
}*/
}
}
}
sb.AppendLine("...Deleting finished\n");
}
As a follow up to your question on Task.Factory.Start() usage, it's done this way (generally):
// you might need to wrap this in a Dispatcher.BeginInvoke (see below)
// if you are not calling from the main UI thread
CallSomeMethodToSetVisualCuesIfYouHaveOne();
Task.Factory.StartNew(() =>
{
// code in this block will run in a background thread...
}
.ContinueWith(task =>
{
// if you called the task from the UI thread, you're probably
// ok if you decide not to wrap the optional method call below
// in a dispatcher begininvoke...
Application.Current.Dispatcher.BeginInvoke(new Action(()=>
{
CallSomeMethodToUnsetYourVisualCuesIfYouHaveAnyLOL();
}));
}
Hope this helps!
Thanks to everybody for the help, especially loopedcode, That's the working version (Took loopedcode's code and made some edit):
private void RunFileOperation(string inputFile, string search)
{
Timer t = new Timer();
StringBuilder sb = new StringBuilder();
{
rtbLog.Text = "Start Deleting...\n";
}
// Filesize serves as max value to check progress
progressBar1.Maximum = (int)(new FileInfo(inputFile).Length);
t.Tick += (s, e) =>
{
rtbLog.Text += sb.ToString();
progressBar1.Value = progress;
if (progress == progressBar1.Maximum)
{
t.Enabled = false;
tssLbl.Text = "done";
}
};
//update every 0.5 second
t.Interval = 500;
t.Enabled = true;
// Start async file read operation
if (rbtnDelete.Checked)
{
if (cbDelete.Checked)
{
System.Threading.Tasks.Task.Factory.StartNew(() => delete_Lines(inputFile, search, ref progress, ref sb, ref res1));
}
}
else
{
//..do something
}
private void delete_Lines(string fileName, string searchString, ref int progress, ref StringBuilder sb, ref StringBuilder res1)
{
bool checkNextLine=false;
using (var file = File.OpenText(fileName))
{
int i = 0;
while (!file.EndOfStream)
{
i++;
var line = file.ReadLine();
progress = (int)file.BaseStream.Position;
if (line.Contains(searchString))
{
sb.AppendFormat("Deleting(row {0}):\n{1}\n", (i), line);
checkNextLine = true;
}
else
{
if (cbDB && checkNextLine && line.StartsWith(" "))
{
sb.AppendFormat("{0}\n", line);
}
else
{
checkNextLine = false;
res1.AppendLine(line);
}
}
}
}
sb.AppendLine("\n...Deleting finished!);
}

why do other events cannot be fired while "for loop" is in process?

I have a for loop and when the loop is being processed, I cant access any other function or event like clicking button it doesn't work till the for loop ends. Is there any way to overcome this Issue and hope I can get answer soon.
for (int i = 0; i < sizes - 2; i++)
{
if (pictureBox1.Image != null)
{
trackBar1.Value = trackBar1.Value + 1;
DisplayImage(_image);
}
}
Thanks in advance.
hi if you using framework 4.5
you can to the next :
Task.Run(() =>
{
for (int i = 0; i < sizes - 2; i++)
{
if (pictureBox1.Image != null)
{
trackBar1.Value = trackBar1.Value + 1;
DisplayImage(_image);
}
}
});
if not you can try this using thread :
Thread thread = new Thread(NewMethod);
thread.Start();
private void NewMethod()
{
for (int i = 0; i < sizes - 2; i++)
{
if (pictureBox1.Image != null)
{
trackBar1.Value = trackBar1.Value + 1;
DisplayImage(_image);
}
}
}
you can upgrade but you need to do it with delegate try this if you have cross thread operation error when update ui :
create delegate void function
delegate void Function();
then in your for make this :
Invoke(new Function(delegate()
{
label.text = "some text" ;
}));
This example shows how to create a new thread in .NET Framework. First, create a new ThreadStart delegate. The delegate points to a method that will be executed by the new thread. Pass this delegate as a parameter when creating a new Thread instance. Finally, call the Thread.Start method to run your method (in this case WorkThreadFunction) on background.
using System.Threading;
Thread thread = new Thread(new ThreadStart(WorkThreadFunction));
thread.Start();
The WorkThreadFunction could be defined as follows.
public void WorkThreadFunction()
{
try
{
// do any background work
}
catch (Exception ex)
{
// log errors
}
}

AutoResetEvent in Windows Phone project - can't invoke handler

It's from Windows Phone project. I am trying to invoke few handlers, handler by handler to receive information about GPS / Reverse position. I wonder why it won't run correctly.
When I setup only 1 coordinate it's ok. I have message with Street etc. But when there is more coordinates my handler isn't invoke.
private async void SimulationResults()
{
done = new AutoResetEvent(true);
Geolocator geolocator = new Geolocator();
geolocator.DesiredAccuracy = PositionAccuracy.High;
myCoordinate = new GeoCoordinate(51.751985, 19.426515);
if (myMap.Layers.Count() > 0) myMap.Layers.Clear();
mySimulation = new List<SimulationItem>();
mySimulation = Simulation.SimulationProcess(myCoordinate, 120); // Odległość
for(int i = 0; i<2; i++)
{
done.WaitOne();
if (mySimulation.ElementAt(i).Id == 1 | mySimulation.ElementAt(i).Id == -1)
{
// Oczekiwanie, ponieważ obiekt jest zasygnalizowany od razu wejdziemy
// do sekcji krytycznej
AddMapLayer(mySimulation.ElementAt(i).Coordinate, Colors.Yellow, false);
myReverseGeocodeQuery_1 = new ReverseGeocodeQuery();
myReverseGeocodeQuery_1.GeoCoordinate = mySimulation.ElementAt(i).Coordinate;
myReverseGeocodeQuery_1.QueryCompleted += ReverseGeocodeQuery_QueryCompleted_1;
// Sekcja krytyczna
done.Reset(); // Hey I'm working, wait!
myReverseGeocodeQuery_1.QueryAsync();
}
}
MessageBox.Show("Skonczylem");
}
private void ReverseGeocodeQuery_QueryCompleted_1(object sender, QueryCompletedEventArgs<IList<MapLocation>> e)
{
done.Set();
if (e.Error == null)
{
if (e.Result.Count > 0)
{
MapAddress address = e.Result[0].Information.Address;
MessageBox.Show("Wykonano "+address.Street);
}
}
}
What's happening here is that you are blocking on your AutoResetEvent on the UI thread, but that's the same thread that the ReverseGeocodeQuery is trying to run on. Since it's blocked it can't run and it also can't invoke your callback.
A very quick fix that doesn't change your flow too much and assumes some sort of "on everything complete do X" requirement is below. I triggered the whole thing on a background thread with:
new Thread(new ThreadStart(() =>
{
SimulationResults();
})).Start();
Since all of the below is on a background thread I needed to add some Dispatcher.BeginInvoke() calls around anything that called into the UI thread, but this way the thread that is blocked is your background thread and not your UI thread.
AutoResetEvent done;
int remaining;
private async void SimulationResults()
{
done = new AutoResetEvent(true);
Geolocator geolocator = new Geolocator();
geolocator.DesiredAccuracy = PositionAccuracy.High;
var myCoordinate = new GeoCoordinate(51.751985, 19.426515);
var mySimulation = new List<GeoCoordinate>()
{
new GeoCoordinate(51.751985, 19.426515),
new GeoCoordinate(2, 2)
};
//mySimulation = Simulation.SimulationProcess(myCoordinate, 120); // Odległość
remaining = mySimulation.Count;
for (int i = 0; i < mySimulation.Count; i++)
{
done.WaitOne();
//if (mySimulation.ElementAt(i).Id == 1 | mySimulation.ElementAt(i).Id == -1)
//{
// Oczekiwanie, ponieważ obiekt jest zasygnalizowany od razu wejdziemy
// do sekcji krytycznej
//AddMapLayer(mySimulation.ElementAt(i).Coordinate, Colors.Yellow, false);
var tempI = i;
Dispatcher.BeginInvoke(() =>
{
var myReverseGeocodeQuery_1 = new ReverseGeocodeQuery();
myReverseGeocodeQuery_1.GeoCoordinate = mySimulation.ElementAt(tempI);
myReverseGeocodeQuery_1.QueryCompleted += ReverseGeocodeQuery_QueryCompleted_1;
// Sekcja krytyczna
done.Reset(); // Hey I'm working, wait!
myReverseGeocodeQuery_1.QueryAsync();
});
//}
}
}
private void ReverseGeocodeQuery_QueryCompleted_1(object sender, QueryCompletedEventArgs<IList<MapLocation>> e)
{
done.Set();
remaining--;
if (e.Error == null)
{
if (e.Result.Count > 0)
{
MapAddress address = e.Result[0].Information.Address;
Dispatcher.BeginInvoke(() =>
{
MessageBox.Show("Wykonano " + address.Street);
});
}
}
if (remaining == 0)
{
// Do all done code
Dispatcher.BeginInvoke(() =>
{
MessageBox.Show("Skonczylem");
});
}
}
Alternatively you could also make this a proper async method that awaits on different events to progress.

How to update button text after event

I'm trying to let a program post a bunch of text. The user enters text, the amount of messages and how fast these must be delivered. While the program is busy, the button text needs to be "Stop" instead of "Start". When you press the button to force it to stop after you've initially launched it, the text changes back to "Start", but this doesn't happen when the program stops after the given amount of messages are delivered, even though the code is in place and doesn't generate an error.
I have a feeling that this is because of the text not updating for some reason. I've tried to flush it with Invalidate() and Update(), but this isn't working. How to fix this?
Here is the code:
private void button1_Click(object sender, EventArgs e)
{
if (button1.Text == "Start")
{
isEvil = true;
button1.Text = "Stop";
Thread t = new Thread(StartTyping);
t.Start(textBox1.Text);
}
else
{
isEvil = false;
button1.Text = "Start";
}
}
private void StartTyping(object obj)
{
string message = obj.ToString();
int amount = (int)numericUpDown2.Value;
Thread.Sleep(3000);
for (int i = 0; i < amount; i++)
{
if (isEvil == false)
{
//////This does NOT work
//button1.Text = "Start";
//button1.Invalidate();
//button1.Update();
//button1.Refresh();
//Application.DoEvents();
break;
}
SendKeys.SendWait(message + "{ENTER}");
int j = (int)numericUpDown1.Value * 10;
Thread.Sleep(j);
}
}
You have four answers telling you to update UI stuff from the UI thread, but none of them address the logic flow problem with your code.
The reason why it doesn't happen is because it only happens in the for-loop when isEvil is false. When does isEvil get set to false? Only when you click "Stop", and nowhere else.
If you want the button to go back to "Start" after the thread finishes, without clicking "Stop", then you need to add code after the loop to do that, independent of the value of isEvil: (piggybacking off of VoidMain's answer)
private void StartTyping(object obj)
{
string message = obj.ToString();
int amount = (int)numericUpDown2.Value;
Thread.Sleep(3000);
for (int i = 0; i < amount; i++)
{
if (isEvil == false)
{
if (button1.InvokeRequired)
{
button1.BeginInvoke( new Action(() => { button1.Text = "Start"; }) );
}
else
{
button1.Text = "Start";
}
break;
}
SendKeys.SendWait(message + "{ENTER}");
int j = (int)numericUpDown1.Value * 10;
Thread.Sleep(j);
}
if (button1.InvokeRequired)
{
button1.BeginInvoke( new Action(() => { button1.Text = "Start"; }) );
}
else
{
button1.Text = "Start";
}
}
Now you have duplicated code, so you might want to split it off into a separate method.
You need to be on the UI thread to update the UI.
Try something called the SynchronizationContext. There are plenty of examples when you google it.
If you're in WPF or Silverlight, you could use the Dispatcher. Again, lots of examples if you search those keywords in google or StackOverflow.
You must update your controls from the UI thread. This is how you would do it for winforms.
for (int i = 0; i < amount; i++)
{
if (isEvil == false)
{
button1.Invoke(new Action(() => button1.Text = "Start"));
break;
}
SendKeys.SendWait(message + "{ENTER}");
int j = (int)numericUpDown1.Value * 10;
Thread.Sleep(j);
}
This will block till button1 get's its text updated. If you don't want it to block, replace Invoke with BeginInvoke
Your best bet is to use a BackgroundWorker. It's a bit too wieldy to add a concise example here but there's a decent tutorial from O'Reilly
Something like this (not tested) should work:
private void StartTyping(object obj)
{
string message = obj.ToString();
int amount = (int)numericUpDown2.Value;
Thread.Sleep(3000);
for (int i = 0; i < amount; i++)
{
if (isEvil == false)
{
if(button1.InvokeRequired)
{
button1.BeginInvoke( new Action(() => { button1.Text = "Start"; }) );
}
else
{
button1.Text = "Start";
}
break;
}
SendKeys.SendWait(message + "{ENTER}");
int j = (int)numericUpDown1.Value * 10;
Thread.Sleep(j);
}
}

Categories

Resources