I'm trying to test the accuracy of Apple's forward geocoding service to cover the rare case when the latitude/longitude we have on our data model is missing or invalid, but we still know the physical address. Monotouch provides CLGeocoder.GeocodeAddress(String, CLGeocodeCompletionHandler) and GeocodeAddressAsync(String) but when I call them the completionHandler is never called and the async method never returns.
Nothing in my application log indicates a problem and enclosing the call in a try-catch block didn't turn up any exceptions. Maps integration is enabled in the project options. Short of capturing network traffic (which is probably SSL anyway) I'm out of ideas.
Here's the code which loads placemarks and tries to geocode addresses:
private void ReloadPlacemarks()
{
// list to hold any placemarks which come back with empty/invalid coordinates
List<ServiceCallWrapper> geoList = new List<ServiceCallWrapper> ();
mapView.ClearPlacemarks ();
List<MKPlacemark> placemarks = new List<MKPlacemark>();
if (serviceCallViewModel.ActiveServiceCall != null) {
var serviceCall = serviceCallViewModel.ActiveServiceCall;
if (serviceCall.dblLatitude != 0 && serviceCall.dblLongitude != 0) {
placemarks.Add (serviceCall.ToPlacemark ());
} else {
// add it to the geocode list
geoList.Add (serviceCall);
}
}
foreach (var serviceCall in serviceCallViewModel.ServiceCalls) {
if (serviceCall.dblLatitude != 0 && serviceCall.dblLongitude != 0) {
placemarks.Add (serviceCall.ToPlacemark ());
} else {
//add it to the geocode list
geoList.Add (serviceCall);
}
}
if (placemarks.Count > 0) {
mapView.AddPlacemarks (placemarks.ToArray ());
}
if (geoList.Count > 0) {
// attempt to forward-geocode the street address
foreach (ServiceCallWrapper s in geoList) {
ServiceCallWrapper serviceCall = GeocodeServiceCallAddressAsync (s).Result;
mapView.AddPlacemark (serviceCall.ToPlacemark());
}
}
}
private async Task<ServiceCallWrapper> GeocodeServiceCallAddressAsync(ServiceCallWrapper s)
{
CLGeocoder geo = new CLGeocoder ();
String addr = s.address + " " + s.city + " " + s.state + " " + s.zip;
Console.WriteLine ("Attempting forward geocode for service call UID: " + s.call_uid + " with address: " + addr);
//app hangs on this
CLPlacemark[] result = await geo.GeocodeAddressAsync(addr);
//code updating latitude and longitude (omitted)
return s;
}
Your problem is this line:
ServiceCallWrapper serviceCall = GeocodeServiceCallAddressAsync (s).Result;
By calling Task<T>.Result, you are causing a deadlock. I explain this fully on my blog, but the gist of it is that await will (by default) capture a "context" when it yields control, and will use that context to complete the async method. In this case, the "context" is the UI context. So, the UI thread is blocked (waiting on Result) when the response comes in, and the async method cannot continue because it's waiting to run on the UI thread.
The solution is to use async all the way. In other words, replace every Task<T>.Result and Task.Wait with await:
private async Task ReloadPlacemarksAsync()
{
...
ServiceCallWrapper serviceCall = await GeocodeServiceCallAddressAsync (s);
...
}
Note that your void ReloadPlacemarks is now Task ReloadPlacemarksAsync, so this change affects your callers. async will "grow" through the codebase, and this is normal. For more information, see my MSDN article on async best practices.
Related
I'm new to C# .Net and Visual Studio 2022 - What I'm trying to achieve is to have a timer running every second to check that a website url is valid/is up. If the url IS reachable and the current WebView2 is not showing that website, then it should navigate to it. If it's already showing that website, it should do nothing else. If it was showing that website, but now it's no longer valid, the WebView should navigate to my custom error page. If whilst on the custom error page the website becomes available again, it should (re)load the website.
In my particular scenario I'm making a webView load localhost (127.0.0.1) for now. I want to continuously check the website is ip, and if it goes down, show custom error, if it comes back, show the website.
Not sure I'm explaining that very well. From the research I have done, I believe I need Task and also await using async method.
Here's my current timer and checkurl code as well as navigtionstarted and navigationcompeted:
private void webView_NavigationStarting(object sender, CoreWebView2NavigationStartingEventArgs e)
{
timerCheckRSLCDURL.Enabled = false;
}
private void webView_NavigationCompleted(object sender, Microsoft.Web.WebView2.Core.CoreWebView2NavigationCompletedEventArgs e)
{
if (e.IsSuccess)
{
Debug.WriteLine("JT:IsSuccess");
((Microsoft.Web.WebView2.WinForms.WebView2) sender).ExecuteScriptAsync("document.querySelector('body').style.overflow='hidden'");
}
else if (!e.IsSuccess)
{
Debug.WriteLine("JT:IsNOTSuccess");
webView.DefaultBackgroundColor = Color.Blue;
//webView.CoreWebView2.NavigateToString(Program.htmlString);
}
timerCheckRSLCDURL.Enabled = true;
}
private void timerCheckRSLCDURL_Tick(object sender, EventArgs e)
{
Debug.WriteLine("Timer Fired! Timer.Enabled = " + timerCheckRSLCDURL.Enabled);
CheckURL(Properties.Settings.Default.URL, Properties.Settings.Default.Port);
}
private async void CheckURL(string url, decimal port)
{
timerCheckRSLCDURL = false;
Program.isWebSiteUp = false;
string webViewURL = BuildURL();
Debug.WriteLine("Checking URL: " + webViewURL);
try
{
var request = WebRequest.Create(webViewURL);
request.Method = "HEAD";
var response = (HttpWebResponse) await Task.Factory.FromAsync < WebResponse > (request.BeginGetResponse, request.EndGetResponse, null);
if (response.StatusCode == HttpStatusCode.OK)
{
Program.isWebSiteUp = true;
}
}
catch (System.Net.WebException exception)
{
Debug.WriteLine("WebException: " + exception.Message);
if (exception.Message.Contains("(401) Unauthorized"))
{
Program.isWebSiteUp = false;
}
else
{
Program.isWebSiteUp = false;
} // This little block is unfinished atm as it doesn't really affect me right now
}
catch (Exception exception)
{
Debug.WriteLine("Exception: " + exception.Message);
Program.isWebSiteUp = false;
}
if (Program.isWebSiteUp == true && webView.Source.ToString().Equals("about:blank"))
{
Debug.WriteLine("JT:1");
Debug.WriteLine("isWebSiteUp = true, webView.Source = about:blank");
webView.CoreWebView2.Navigate(webViewURL);
}
else if (Program.isWebSiteUp == true && !webView.Source.ToString().Equals(webViewURL))
{
Debug.WriteLine("JT:2");
Debug.WriteLine("isWebSiteUp = true\nwebView.Source = " + webView.Source.ToString() + "\nwebViewURL = " + webViewURL + "\nWebView Source == webViewURL: " + webView.Source.ToString().Equals(webViewURL) + "\n");
webView.CoreWebView2.Navigate(webViewURL);
}
else if (Program.isWebSiteUp == false && !webView.Source.ToString().Equals("about:blank"))
{
Debug.WriteLine("JT:3");
Debug.WriteLine("This SHOULD be reloading the BSOD page!");
webView.CoreWebView2.NavigateToString(Program.htmlString);
}
}
private string BuildURL()
{
string webViewURL;
string stringURL = Properties.Settings.Default.URL;
string stringPort = Properties.Settings.Default.Port.ToString();
string stringURLPORT = $ "{stringURL}:{stringPort}";
if (stringPort.Equals("80"))
{
webViewURL = stringURL;
}
else
{
webViewURL = stringURLPORT;
}
if (!webViewURL.EndsWith("/"))
{
webViewURL += "/";
}
//For now, the URL will always be at root, so don't need to worry about accidentally
//making an invalid url like http://example.com/subfolder/:port
//although potentially will need to address this at a later stage
Debug.WriteLine("BuildURL returns: " + webViewURL);
return webViewURL;
}
So the timer is fired every 1000ms (1 second) because I need to actively check the URL is still alive. I think the way I'm controlling the timer is wrong - and I imagine there's a better way of doing it, but what I want to do is this...
Check website URL every 1 second
To avoid repeating the same async task, I'm trying to disable the timer so it does not fire a second time whilst the async checkurl is running
Once the async/await task of checking the url has finished, the timer should be re-enabled to continue monitoring is the website url is still up
If the website is down, it should show my custom error page (referred to as BSOD) which is some super basic html loaded from resources and 'stored' in Program.htmlString
if the the website is down, and the webview is already showing the BSOD, the webview should do nothing. The timer should continue to monitor the URL.
if the website is up and the webview is showing the BSOD, then it should navigate to the checked url that is up. If the website is up, and the webview is already showing the website, then the webview should do nothing. The timer should continue to monitor the URL.
From other research, I'm aware I shouldn't be using private async void - eg shouldn't be using it as a void. But I've not yet figured out / understood the correct way to do this
In the Immediate Window, it appears that webView_NavigationCompleted is being fired twice (or sometimes even a few times) instantly as the immediate window output will show JT:IsSuccess or JT:IsNOTSuccess a few times repeated in quick succession. Is that normal? I'm assuming something isn't correct there.
The main problem appears to be due to the timer being only 1 second. If I change the timer to fire every 30 seconds for example, it seems to work ok, but when it's every second (I may even need it less than that at some point) it's not really working as expected. Sometimes the BSOD doesn't load at all for example, as well as the webView_NavigationCompleted being fire multiple times in quick succession etc.
Could someone pretty please help me make this code better and correct.
I've searched countless websites etc and whilst there is some good info, some of it seems overwhelming / too technical so to speak. I had to lookup what "antecedent" meant earlier as it's a completely new word to me! :facepalm:
Many thanks inadvance
This answer will focus on the Task timer loop to answer the specific part of your question "check a url is valid every second". There are lots of answers about how to perform the actual Ping (like How do you check if a website is online in C#) and here's the Microsoft documentation for Ping if you choose to go that route.
Since it's not uncommon to set a timeout value of 120 seconds for a ping request, it calls into question whether it would have any value to do this on a steady tick of one second. My suggestion is that it would make more sense to:
Make a background thread
Perform a synchronous ping (wait for the result) on the background thread.
Marshal the ping result onto the UI thread to perform the other tasks you have laid out.
Synchronously wait a Task.Delay on the background thread before performing the next ping.
Here is how I personally go about doing that in my own production code:
void execPing()
{
Task.Run(() =>
{
while (!DisposePing.IsCancellationRequested)
{
var pingSender = new Ping();
var pingOptions = new PingOptions
{
DontFragment = true,
};
// https://learn.microsoft.com/en-us/dotnet/api/system.net.networkinformation.ping?view=net-6.0#examples
// Create a buffer of 32 bytes of data to be transmitted.
string data = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
byte[] buffer = Encoding.ASCII.GetBytes(data);
int timeout = 120;
try
{
// https://stackoverflow.com/a/25654227/5438626
if (Uri.TryCreate(textBoxUri.Text, UriKind.Absolute, out Uri? uri)
&& (uri.Scheme == Uri.UriSchemeHttp ||
uri.Scheme == Uri.UriSchemeHttps))
{
PingReply reply = pingSender.Send(
uri.Host,
timeout, buffer,
pingOptions);
switch (reply.Status)
{
case IPStatus.Success:
Invoke(() => onPingSuccess());
break;
default:
Invoke(() => onPingFailed(reply.Status));
break;
}
}
else
{
Invoke(() => labelStatus.Text =
$"{DateTime.Now}: Invalid URI: try 'http://");
}
}
catch (Exception ex)
{
// https://stackoverflow.com/a/60827505/5438626
if (ex.InnerException == null)
{
Invoke(() => labelStatus.Text = ex.Message);
}
else
{
Invoke(() => labelStatus.Text = ex.InnerException.Message);
}
}
Task.Delay(1000).Wait();
}
});
}
What works for me is initializing it when the main window handle is created:
protected override void OnHandleCreated(EventArgs e)
{
base.OnHandleCreated(e);
if (!(DesignMode || _isHandleInitialized))
{
_isHandleInitialized = true;
execPing();
}
}
bool _isHandleInitialized = false;
Where:
private void onPingSuccess()
{
labelStatus.Text = $"{DateTime.Now}: {IPStatus.Success}";
// Up to you what you do here
}
private void onPingFailed(IPStatus status)
{
labelStatus.Text = $"{DateTime.Now}: {status}";
// Up to you what you do here
}
public CancellationTokenSource DisposePing { get; } = new CancellationTokenSource();
Example 404:
I'm trying to create a Class to manage my Cloud Firestore requests (like any SQLiteHelper Class). However, firebase uses async calls and I'm not able to return a value to other scripts.
Here an example (bool return):
public bool CheckIfIsFullyRegistered(string idUtente)
{
DocumentReference docRef = db.Collection("Utenti").Document(idUtente);
docRef.GetSnapshotAsync().ContinueWithOnMainThread(task =>
{
DocumentSnapshot snapshot = task.Result;
if (snapshot.Exists)
{
Debug.Log(String.Format("Document {0} exist!", snapshot.Id));
return true; //Error here
}
else
{
Debug.Log(String.Format("Document {0} does not exist!", snapshot.Id));
}
});
}
Unfortunately, since Firestore is acting as a frontend for some slow running I/O (disk access or a web request), any interactions you have with it will need to be asynchronous. You'll also want to avoid blocking your game loop if at all possible while performing this access. That is to say, there won't be a synchronous call to GetSnapshotAsync.
Now there are two options you have for writing code that feels synchronous (if you're like me, it's easier to think like this than with callbacks or reactive structures).
First is that GetSnapshotAsync returns a task. You can opt to await on that task in an async function:
public async bool CheckIfIsFullyRegistered(string idUtente)
{
DocumentReference docRef = db.Collection("Utenti").Document(idUtente);
// this is equivalent to `task.Result` in the continuation code
DocumentSnapshot snapshot = await docRef.GetSnapshotAsync()
return snapshot.Exists;
}
The catch with this is that async/await makes some assumptions about C# object lifecycle that aren't guaranteed in the Unity context (more information in my related blog post and video). If you're a long-time Unity developer, or just want to avoid this == null ever being true, you may opt to wrap your async call in a WaitUntil block:
private IEnumerator CheckIfIsFullyRegisteredInCoroutine() {
string idUtente;
// set idUtente somewhere here
var isFullyRegisteredTask = CheckIfIsFullyRegistered(idUtente);
yield return new WaitUntil(()=>isFullyRegisteredTask.IsComplete);
if (isFullyRegisteredTask.Exception != null) {
// do something with the exception here
yield break;
}
bool isFullyRegistered = isFullyRegisteredTask.Result;
}
One other pattern I like to employ is to use listeners instead of just retrieving a snapshot. I would populate some Unity-side class with whatever the latest data is from Firestore (or RTDB) and have all my Unity objects ping that MonoBehaviour. This fits especially well with Unity's new ECS architecture or any time you're querying your data on a per-frame basis.
I hope that all helps!
This is how i got it to work, but excuse my ignorance as i've only been using FBDB for like a week.
Here is a snippet, I hope it helps someone.
Create a thread task extension to our login event
static Task DI = new System.Threading.Tasks.Task(LoginAnon);
Logging in anon
DI = FirebaseAuth.DefaultInstance.SignInAnonymouslyAsync().ContinueWith(result =>
{
Debug.Log("LOGIN [ID: " + result.Result.UserId + "]");
userID = result.Result.UserId;
FirebaseDatabase.DefaultInstance.GetReference("GlobalMsgs/").ChildAdded += HandleNewsAdded;
FirebaseDatabase.DefaultInstance.GetReference("Users/" + userID + "/infodata/nickname/").ValueChanged += HandleNameChanged;
FirebaseDatabase.DefaultInstance.GetReference("Users/" + userID + "/staticdata/status/").ValueChanged += HandleStatusChanged;
FirebaseDatabase.DefaultInstance.GetReference("Lobbies/").ChildAdded += HandleLobbyAdded;
FirebaseDatabase.DefaultInstance.GetReference("Lobbies/").ChildRemoved += HandleLobbyRemoved;
loggedIn = true;
});
And then get values.
DI.ContinueWith(Task =>
{
FirebaseDatabase.DefaultInstance.GetReference("Lobbies/" + selectedLobbyID + "/players/").GetValueAsync().ContinueWith((result) =>
{
DataSnapshot snap2 = result.Result;
Debug.Log("Their nickname is! -> " + snap2.Child("nickname").Value.ToString());
Debug.Log("Their uID is! -> " + snap2.Key.ToString());
//Add the user ID to the lobby list we have
foreach (List<string> lobbyData in onlineLobbies)
{
Debug.Log("Searching for lobby:" + lobbyData[0]);
if (selectedLobbyID == lobbyData[0].ToString()) //This is the id of the user hosting the lobby
{
Debug.Log("FOUND HOSTS LOBBY ->" + lobbyData[0]);
foreach (DataSnapshot snap3 in snap2.Children)
{
//add the user key to the lobby
lobbyData.Add(snap3.Key.ToString());
Debug.Log("Added " + snap3.Child("nickname").Value.ToString() + " with ID: " + snap3.Key.ToString() + " to local lobby.");
currentUsers++;
}
return;
}
}
});
});
Obviously you can alter it how you like, It does not really need loops but i'm using them to test before i compact the code into something less readable and more intuitive.
I'm having issues creating an asynchronous web service using the Task Parallel Library with ASP.NET Web API 2. I make an asynchronous call to a method StartAsyncTest and create a cancellation token to abort the method. I store the token globally and then retrieve it and call it from a second method CancelAsyncTest. Here is the code:
// Private Global Dictionary to hold text search tokens
private static Dictionary<string, CancellationTokenSource> TextSearchLookup
= new Dictionary<string, CancellationTokenSource>();
/// <summary>
/// Performs an asynchronous test using a Cancellation Token
/// </summary>
[Route("StartAsyncTest")]
[HttpGet]
public async Task<WsResult<long>> StartAsyncTest(string sSearchId)
{
Log.Debug("Method: StartAsyncTest; ID: " + sSearchId + "; Message: Entering...");
WsResult<long> rWsResult = new WsResult<long>
{
Records = -1
};
try
{
var rCancellationTokenSource = new CancellationTokenSource();
{
var rCancellationToken = rCancellationTokenSource.Token;
// Set token right away in TextSearchLookup
TextSearchLookup.Add("SyncTest-" + sSearchId, rCancellationTokenSource);
HttpContext.Current.Session["SyncTest-" + sSearchId] =
rCancellationTokenSource;
try
{
// Start a New Task which has the ability to be cancelled
var rHttpContext = (HttpContext)HttpContext.Current;
await Task.Factory.StartNew(() =>
{
HttpContext.Current = rHttpContext;
int? nCurrentId = Task.CurrentId;
StartSyncTest(sSearchId, rCancellationToken);
}, TaskCreationOptions.LongRunning);
}
catch (OperationCanceledException e)
{
Log.Debug("Method: StartAsyncText; ID: " + sSearchId
+ "; Message: Cancelled!");
}
}
}
catch (Exception ex)
{
rWsResult.Result = "ERROR";
if (string.IsNullOrEmpty(ex.Message) == false)
{
rWsResult.Message = ex.Message;
}
}
// Remove token from Dictionary
TextSearchLookup.Remove(sSearchId);
HttpContext.Current.Session[sSearchId] = null;
return rWsResult;
}
private void StartSyncTest(string sSearchId, CancellationToken rCancellationToken)
{
// Spin for 1100 seconds
for (var i = 0; i < 1100; i++)
{
if (rCancellationToken.IsCancellationRequested)
{
rCancellationToken.ThrowIfCancellationRequested();
}
Log.Debug("Method: StartSyncTest; ID: " + sSearchId
+ "; Message: Wait Pass #" + i + ";");
Thread.Sleep(1000);
}
TextSearchLookup.Remove("SyncTest-" + sSearchId);
HttpContext.Current.Session.Remove("SyncTest-" + sSearchId);
}
[Route("CancelAsyncTest")]
[HttpGet]
public WsResult<bool> CancelAsyncTest(string sSearchId)
{
Log.Debug("Method: CancelAsyncTest; ID: " + sSearchId
+ "; Message: Cancelling...");
WsResult<bool> rWsResult = new WsResult<bool>
{
Records = false
};
CancellationTokenSource rCancellationTokenSource =
(CancellationTokenSource)HttpContext.Current.Session["SyncTest-" + sSearchId];
// Session doesn't always persist values. Use TextSearchLookup as backup
if (rCancellationTokenSource == null)
{
rCancellationTokenSource = TextSearchLookup["SyncTest-" + sSearchId];
}
if (rCancellationTokenSource != null)
{
rCancellationTokenSource.Cancel();
TextSearchLookup.Remove("SyncTest-" + sSearchId);
HttpContext.Current.Session.Remove("SyncTest-" + sSearchId);
rWsResult.Result = "OK";
rWsResult.Message = "Cancel delivered successfully!";
}
else
{
rWsResult.Result = "ERROR";
rWsResult.Message = "Reference unavailable to cancel task"
+ " (if it is still running)";
}
return rWsResult;
}
After I deploy this to IIS, the first time I call StartAsyncTest and then CancelAsyncTest (via the REST endpoints), both requests go through and it cancels as expected. However, the second time, the CancelAsyncTest request just hangs and the method is only called after StartAsyncTest completes (after 1100 seconds). I don't know why this occurs. StartAsyncTest seems to highjack all threads after it's called once. I appreciate any help anyone can provide!
I store the token globally and then retrieve it and call it from a second method CancelAsyncTest.
This is probably not a great idea. You can store these tokens "globally", but that's only "global" to a single server. This approach would break as soon as a second server enters the picture.
That said, HttpContext.Current shouldn't be assigned to, ever. This is most likely the cause of the odd behavior you're seeing. Also, if your real code is more complex than ThrowIfCancellationRequested - i.e., if it's actually listening to the CancellationToken - then the call to Cancel can execute the remainder of StartSyncTest from within the call to Cancel, which would cause considerable confusion over the value of HttpContext.Current.
To summarize:
I recommend doing away with this approach completely; it won't work at all on web farms. Instead, keep your "task state" in an external storage system like a database.
Don't pass HttpContext across threads.
A colleague offered a alternative call to Task.Factory.StartNew (within StartAsyncTest):
await Task.Factory.StartNew(() =>
{
StartSyncTest(sSearchId, rCancellationToken);
},
rCancellationToken,
TaskCreationOptions.LongRunning,
TaskScheduler.FromCurrentSynchronizationContext());
This implementation seemed to solve the asynchronous issue. Now future calls to CancelAsyncTest succeed and cancel the task as intended.
I have been working on a webscraping project.
I am having two issues, one being presenting the number of urls processed as percentage but a far larger issue is that I can not figure out how I know when all the threads i am creating are totaly finished.
NOTE: I am aware of that the a parallel foreach once done moves on BUT this is within a recursive method.
My code below:
public async Task Scrape(string url)
{
var page = string.Empty;
try
{
page = await _service.Get(url);
if (page != string.Empty)
{
if (regex.IsMatch(page))
{
Parallel.For(0, regex.Matches(page).Count,
index =>
{
try
{
if (regex.Matches(page)[index].Groups[1].Value.StartsWith("/"))
{
var match = regex.Matches(page)[index].Groups[1].Value.ToLower();
if (!links.Contains(BaseUrl + match) && !Visitedlinks.Contains(BaseUrl + match))
{
Uri ValidUri = WebPageValidator.GetUrl(match);
if (ValidUri != null && HostUrls.Contains(ValidUri.Host))
links.Enqueue(match.Replace(".html", ""));
else
links.Enqueue(BaseUrl + match.Replace(".html", ""));
}
}
}
catch (Exception e)
{
log.Error("Error occured: " + e.Message);
Console.WriteLine("Error occured, check log for further details."); ;
}
});
WebPageInternalHandler.SavePage(page, url);
var context = CustomSynchronizationContext.GetSynchronizationContext();
Parallel.ForEach(links, new ParallelOptions { MaxDegreeOfParallelism = 25 },
webpage =>
{
try
{
if (WebPageValidator.ValidUrl(webpage))
{
string linkToProcess = webpage;
if (links.TryDequeue(out linkToProcess) && !Visitedlinks.Contains(linkToProcess))
{
ShowPercentProgress();
Thread.Sleep(15);
Visitedlinks.Enqueue(linkToProcess);
Task d = Scrape(linkToProcess);
Console.Clear();
}
}
}
catch (Exception e)
{
log.Error("Error occured: " + e.Message);
Console.WriteLine("Error occured, check log for further details.");
}
});
Console.WriteLine("parallel finished");
}
}
catch (Exception e)
{
log.Error("Error occured: " + e.Message);
Console.WriteLine("Error occured, check log for further details.");
}
}
NOTE that Scrape gets called multiple times(recursive)
call the method like this:
public Task ExecuteScrape()
{
var context = CustomSynchronizationContext.GetSynchronizationContext();
Scrape(BaseUrl).ContinueWith(x => {
Visitedlinks.Enqueue(BaseUrl);
}, context).Wait();
return null;
}
which in turn gets called like so:
static void Main(string[] args)
{
RunScrapper();
Console.ReadLine();
}
public static void RunScrapper()
{
try
{
_scrapper.ExecuteScrape();
}
catch (Exception e)
{
Console.WriteLine(e);
throw;
}
}
my result:
How do I solve this?
(Is it ethical for me to answer a question about web page scraping?)
Don't call Scrape recursively. Place the list of urls you want to scrape in a ConcurrentQueue and begin processing that queue. As the process of scraping a page returns more urls, just add them into the same queue.
I wouldn't use just a string, either. I recommend creating a class like
public class UrlToScrape //because naming things is hard
{
public string Url { get; set; }
public int Depth { get; set; }
}
Regardless of how you execute this it's recursive, so you have to somehow keep track of how many levels deep you are. A website could deliberately generate URLs that send you into infinite recursion. (If they did this then they don't want you scraping their site. Does anybody want people scraping their site?)
When your queue is empty that doesn't mean you're done. The queue could be empty, but the process of scraping the last url dequeued could still add more items back into that queue, so you need a way to account for that.
You could use a thread safe counter (int using Interlocked.Increment/Decrement) that you increment when you start processing a url and decrement when you finish. You're done when the queue is empty and the count of in-process urls is zero.
This is a very rough model to illustrate the concept, not what I'd call a refined solution. For example, you still need to account for exception handling, and I have no idea where the results go, etc.
public class UrlScraper
{
private readonly ConcurrentQueue<UrlToScrape> _queue = new ConcurrentQueue<UrlToScrape>();
private int _inProcessUrlCounter;
private readonly List<string> _processedUrls = new List<string>();
public UrlScraper(IEnumerable<string> urls)
{
foreach (var url in urls)
{
_queue.Enqueue(new UrlToScrape {Url = url, Depth = 1});
}
}
public void ScrapeUrls()
{
while (_queue.TryDequeue(out var dequeuedUrl) || _inProcessUrlCounter > 0)
{
if (dequeuedUrl != null)
{
// Make sure you don't go more levels deep than you want to.
if (dequeuedUrl.Depth > 5) continue;
if (_processedUrls.Contains(dequeuedUrl.Url)) continue;
_processedUrls.Add(dequeuedUrl.Url);
Interlocked.Increment(ref _inProcessUrlCounter);
var url = dequeuedUrl;
Task.Run(() => ProcessUrl(url));
}
}
}
private void ProcessUrl(UrlToScrape url)
{
try
{
// As the process discovers more urls to scrape,
// pretend that this is one of those new urls.
var someNewUrl = "http://discovered";
_queue.Enqueue(new UrlToScrape { Url = someNewUrl, Depth = url.Depth + 1 });
}
catch (Exception ex)
{
// whatever you want to do with this
}
finally
{
Interlocked.Decrement(ref _inProcessUrlCounter);
}
}
}
If I was doing this for real the ProcessUrl method would be its own class, and it would take HTML, not a URL. In this form it's difficult to unit test. If it were in a separate class then you could pass in HTML, verify that it outputs results somewhere, and that it calls a method to enqueue new URLs it finds.
It's also not a bad idea to maintain the queue as a database table instead. Otherwise if you're processing a bunch of urls and you have to stop, you'd have start all over again.
Can't you add all tasks Task d to some type of concurrent collection you thread through all recursive calls (via method argument) and then simply call Task.WhenAll(tasks).Wait()?
You'd need an intermediate method (makes it cleaner) that launches the base Scrape call and passes in the empty task collection. When the base call returns you have in hand all tasks and you simply wait them out.
public async Task Scrape (
string url) {
var tasks = new ConcurrentQueue<Task>();
//call your implementation but
//change it so that you add
//all launched tasks d to tasks
Scrape(url, tasks);
//1st option: Wait().
//This will block caller
//until all tasks finish
Task.WhenAll(tasks).Wait();
//or 2nd option: await
//this won't block and will return to caller.
//Once all tasks are finished method
//will resume in WriteLine
await Task.WhenAll(tasks);
Console.WriteLine("Finished!"); }
Simple rule: if you want to know when something finishes, the first step is to keep track of it. In your current implementation you are essentially firing and forgetting all launched tasks...
I am developping an UWP MVVM application which resolving adress with location :
private async void ResolveAddress()
{
//TODO : Manage cancel
Debug.WriteLine("Resolving adress ...");
var result = await MapLocationFinder.FindLocationsAtAsync(SelectedLocation);
if (result.Status == MapLocationFinderStatus.Success)
{
if (result.Locations.Count > 0)
{
Debug.WriteLine("Adress resolved : " + Address);
Address = result.Locations[0].Address.FormattedAddress;
}
}
Debug.WriteLine("Resolve fail");
}
This call can occured really often (based on the location selected by the user), so the method may have not finishing running when an other call is make.
//Binding property
public Geopoint SelectedLocation
{
get { return _selectedLocation; }
set
{
Debug.WriteLine("Location change");
_selectedLocation = value;
ResolveAddress();
RaisePropertyChanged();
}
}
The Adress field is also a binding property.
I encounter 2 problems with this implementation :
I am not sure the Adress field will the last selected location (the call N-1 can finished after N).
The adress field contains progressivly all the adress resolved I want only the the last.
I find a way to cancel async task :
https://msdn.microsoft.com/fr-fr/library/jj155759.aspx
But the MapLocationFinder.FindLocationsAtAsync does not have a cancelation token in parameters.
What is the best way to accomplish this ?
Thanks.
I am not sure the Adress field will the last selected location (the call N-1 can finished after N).
You can solve this with an asynchronous callback token. I describe it in more detail in a very old blog post where I call them "asynchronous callback contexts". Given how overloaded the term "context" is today, I now prefer the term "token".
private object _addressCallbackToken;
private async void ResolveAddress()
{
Debug.WriteLine("Resolving adress ...");
var token = _addressCallbackToken = new object();
var result = await MapLocationFinder.FindLocationsAtAsync(SelectedLocation);
if (token != _addressCallbackToken)
return;
if (result.Status == MapLocationFinderStatus.Success)
{
if (result.Locations.Count > 0)
{
Debug.WriteLine("Adress resolved : " + Address);
Address = result.Locations[0].Address.FormattedAddress;
}
}
Debug.WriteLine("Resolve fail");
}
However, I strongly recommend not using async void in this fashion. You may find my article on async MVVM data binding useful.