There are a lot threads about this but none of them were clear and none I tried actually even worked right.
What is the code to get the contents of the entire web browser control (even that which is off screen)?
It looks like they did have:
webBrowser1.DrawToBitmap(); // but its unsupported and doesnt work
3rd party api - not wasting my time
.DrawToBitmap and nonanswer-links
100 wrong answers
just takes a screenshot
Try to make sure you are calling the method in the DocumentCompleted event.
webBrowser1.Width = wb.Document.Body.ScrollRectangle.Width;
webBrowser1.Height = wb.Document.Body.ScrollRectangle.Height;
Bitmap bitmap = new Bitmap(webBrowser1.Width, webBrowser1.Height);
webBrowser1.DrawToBitmap(bitmap, new Rectangle(0, 0, webBrowser1.Width, webBrowser1.Height));
I was working on a similiar function in my project last week, read a few posts on this topic including your links. I'd like to share my experience:
The key part of this function is System.Windows.Forms.WebBrowser.DrawToBitmap method.
but its unsupported and doesnt work
It is supported and does work, but not always works fine. In some circumstances you will get a blank image screenshot(in my experience, the more complex html it loads, the more possible it fails. In my project only very simple and well-formatted htmls will be loaded into the WebBrowser control so I never get blank images).
Anyway I have no 100% perfect solution either. Here is part of my core code and hope it helps (it works on ASP.NET MVC 3).
using (var browser = new System.Windows.Forms.WebBrowser())
{
browser.DocumentCompleted += delegate
{
using (var pic = new Bitmap(browser.Width, browser.Height))
{
browser.DrawToBitmap(pic, new Rectangle(0, 0, pic.Width, pic.Height));
pic.Save(imagePath);
}
};
browser.Navigate(Server.MapPath("~") + htmlPath); //a file or a url
browser.ScrollBarsEnabled = false;
while (browser.ReadyState != System.Windows.Forms.WebBrowserReadyState.Complete)
{
System.Windows.Forms.Application.DoEvents();
}
}
Related
I'm loading a huge data from json(with images) to my list view and it causes my app not to respond when scrolling but if i press wait on my emulator.. I can scroll it but it made my application very lag.. however. I have a detailactivity in my application which is also from my json. after clicking 1 item in my listview and proceeding to my detail activity, lag goes away.any work around for this? I put the link of my json file below. I think my scroll view is making a hard time loading my images.. as ive done my research it says that i should memorycache while converting my image to bytes.. but how? :O
heres how I convert my images to bytes..
public class ImageHelper
{
public static Bitmap GetImageBitmapFromUrl(string url)
{
Bitmap imageBitmap = null;
using (var webClient = new WebClient())
{
webClient.Headers.Add(System.Net.HttpRequestHeader.UserAgent, "Others");
var imageBytes = webClient.DownloadData(url);
if (imageBytes != null && imageBytes.Length > 0)
{
imageBitmap = BitmapFactory.DecodeByteArray(imageBytes, 0, imageBytes.Length);
}
}
return imageBitmap;
}
}
My json file.
You can use Piccaso https://components.xamarin.com/view/square.picasso for asynchronous image download and cache. Its works pretty well.
Let FFImageLoading handle it for you. It's an awesome well-known library for image caching and compression.
Here are few tips
Reduce the size of image or compress it
Do pagination i.e load data in chunks and load more as the user scrolls down
Open profiler to note at what point memory cross danger sign
I am unable to use the drag-and-drop functionality within DotNetNuke version 7.1.
The drag-and-drop functionality of the Telerik RadEditor takes the browser's Base64 input and encases it in an img tag where the source is the data. E.g., src="data:image/jpeg;base64,[base64data]".
When using drag/drop to a RadEditor within the HTML Module and then saving the HTML content, that src definition is changed to a URI request by prepending the relative path for the DNN portal. E.g., src="/mysite/portals/0/data:image/jpeg;base64,[base64data]".
This converts what started out as a perfectly valid embedded image tag into a request and thereby causes the browser to request this "image" from the server. The server then returns a 414 error (URI too long).
Example without prepended relative path: http://jsfiddle.net/GGGH/27Tbb/2/
<img src="data:image/jpeg;base64,[stuff]>
Example with prepended relative path (won't display): http://jsfiddle.net/GGGH/NL85G/2/
<img src="mysite/portals/0/data:image/jpeg;base64,[stuff]>
Is there some configuration that I've missed? Prepending relative paths is OK for src="/somephysicalpath" but not for src="data:image...".
I ended up solving the problem prior to posting the question but wanted to add this knowledge to SO in case someone else encountered the same problem (has no one noticed this yet?). Also, perhaps, DNN or the community can improve upon my solution and that fix can make it into a new DNN build.
I've looked at the source code for RadEditor, RadEditorProvider and then finally the Html module itself. It seems the problem is in the EditHtml.ascx.cs, FormatContent() method which calls the HtmlTextController's ManageRelativePaths() method. It's that method that runs for all "src" tags (and "background") in the Html content string. It post-processes the Html string that comes out of the RadEditor to add in that relative path. This is not appropriate when editing an embedded Base64 image that was dragged to the editor.
In order to fix this, and still allow for the standard functionality originally intended by the manufacturer, the DotNetNuke.Modules.Html.EditHtm.ascx.cs, ManageRelativePaths needs to be modified to allow for an exception if the URI includes a "data:image" string at its beginning. Line 488 (as of version 7.1.0) is potentially appropriate. I added the following code (incrementing P as appropriate and positioned after the URI length was determined -- I'm sure there's a better way but this works fine):
// line 483, HtmlTextController.cs, DNN code included for positioning
while (P != -1)
{
sbBuff.Append(strHTML.Substring(S, P - S + tLen));
// added code
bool skipThisToken = false;
if (strHTML.Substring(P + tLen, 10) == "data:image") // check for base64 image
skipThisToken = true;
// end added code - back to standard DNN
//keep characters left of URL
S = P + tLen;
//save startpos of URL
R = strHTML.IndexOf("\"", S);
//end of URL
if (R >= 0)
{
strURL = strHTML.Substring(S, R - S).ToLower();
}
else
{
strURL = strHTML.Substring(S).ToLower();
}
// added code to continue while loop after the integers were updated
if (skipThisToken)
{
P = strHTML.IndexOf(strToken + "=\"", S + strURL.Length + 2, StringComparison.InvariantCultureIgnoreCase);
continue;
}
// end added code -- the method continues from here (not reproduced)
This is probably not the best solution as its searching for a hard coded value. Better would be functionality that allows the developers to add tags later. (But, then again, EditHtml.ascx.cs and HtmlTextController both hard code the two tags that they intend to post-process.)
So, after making this small change, recompiling the DotNetNuke.Modules.Html.dll and deploying, drag-and-drop should be functional. Obviously this increases the complexity of an upgrade -- it would be better if this were fixed by DNN themselves. I verified that as of v7.2.2 this issue still exists.
UPDATE: Fixed in DNN Community Version 7.4.0
I'm trying to get the FINAL source of a webpage. I am using webclient openRead method, but this method is only returning the initial page source. After the source downloads, there is a javascript that runs and collect the data that I need in a different format and my method will be looking for something that got completely changed.
What I am talking about is exactly like the difference between:
right-click on a webpage -> select view source
access the developer tools
Look at this site to know what I am talking about: http://www.augsburg.edu/history/fac_listing.html and watch how any of the email is displayed using each option. I think what happening is that the first will show you the initial load of the page. The second will show you the final page html. The webclient only lets me do option #1.
here is the code that will only return option #1. Oh I need to do this from a console application. Thank you!
private static string GetReader(string site)
{
WebClient client = new WebClient();
try
{
data = client.OpenRead(site);
reader = new StreamReader(data);
}
catch
{
return "";
}
return reader.ReadToEnd();
}
I've found a solution to my problem.
I ended up using Selenium-WebDriver PageSource property. It worked beautifully!
Learn about Selenium and Webdriver. It is an easy thing to learn. It helps for testing and on this!
It seems that Im encountering quite a few problems in a simple attempt to parse some HTML. As practice, I'm writting a mutli-threaded web crawler that starts with a list of sites to crawl. This gets handed down through a few classes, which should eventually return the content of the sites back to my system. This seems rather straightforward, but I've had no luck in either of the following tasks:
A. Convert the content of a website ( In string format, from an HttpWebRequest Stream ) to an HtmlDocument ( Cannot create a new instance of an HtmlDocument? Doesn't make much sense... ) by using the HtmlDocument.Write() Method.
or
B. Collect an HtmlDocument via a WebBrowser instance.
Here is my code as it exists, any advice would be great...
public void Start()
{
if (this.RunningThread == null)
{
Console.WriteLine( "Executing SiteCrawler for " + SiteRoot.DnsSafeHost);
this.RunningThread = new Thread(this.Start);
this.RunningThread.SetApartmentState(ApartmentState.STA);
this.RunningThread.Start();
}
else
{
try
{
WebBrowser BrowserEmulator = new WebBrowser();
BrowserEmulator.Navigate(this.SiteRoot);
HtmlElementCollection LinkCollection = BrowserEmulator.Document.GetElementsByTagName("a");
List<PageCrawler> PageCrawlerList = new List<PageCrawler>();
foreach (HtmlElement Link in LinkCollection)
{
PageCrawlerList.Add(new PageCrawler(Link.GetAttribute("href"), true));
continue;
}
return;
}
catch (Exception e)
{
throw new Exception("Exception encountered in SiteCrawler: " + e.Message);
}
}
}
This code seems to do nothing when it passes over the 'Navigate' method. I've attempted allowing it to open in a new window, which pops a new instance of IE, and proceeds to navigate to the specified address, but not before my program steps over the navigate method. I've tried waiting for the browser to be 'not busy', but it never seems to pick up the busy attribute anyway. I've tried creating a new document via the Browser.Document.OpenNew() so that I might populate it with data from a WebRequest stream, however as Im sure you can assume I get back a Null Pointer exception when I try to reach through the 'Document' portion of that statement. I've done some research and this appears to be the only way to create a new HtmlDocument.
As you can see, this method is intended to kick off a 'PageCrawler' for every link in a specified page. I am sure that I could parse through the HTML character by character to find all of the links, after using an HttpWebRequest and collecting the data from the stream, but this is far more work than should be necessary to complete this.
If anyone has any advice it would be greatly appreciated. Thank you.
If this is a console application, then it will not work since the console application doesn't have a message pump (which is required for the WebBrowser to process messages).
If you run this in a Windows Forms application, then you should handle the DocumentCompleted event:
WebBrowser browserEmulator = new WebBrowser();
browserEmulator.DocumentCompleted += OnDocumentCompleted;
browserEmulator.Navigate(this.SiteRoot);
Then implement the method that handles the event:
private void OnDocCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = sender as WebBrowser;
if (wb.Document != null)
{
List<string> links = new List<string>();
foreach (HtmlElement element in wb.Document.GetElementsByTagName("a"))
{
links.Add(element.GetAttribute("href"));
}
foreach (string link in links)
{
Console.WriteLine(link);
}
}
}
If you want to run this in a console application, then you need to use a different method for downloading pages. I would recommend that you use the WebRequest/WebResponse and then use the HtmlAgilityPack to parse the HTML. The HtmlAgilityPack will generate an HtmlDocument for you and you can get the links from there.
Additionally, if you're interested in learning more about building scalable web crawlers, then check out the following links:
How to crawl billions of pages?
Designing a web crawler
Good luck!
I'm trying to load an image from a URL in Silverlight and have followed the steps at this site but to no avail.
My code is as follows:
imageUri = new Uri("http://php.scripts.psu.edu/dept/iit/hbg/philanthropy/Images/BlueSkyLarge.jpg", UriKind.Absolute);
System.Windows.Media.Imaging.BitmapImage bi = new System.Windows.Media.Imaging.BitmapImage();
bi.UriSource = imageUri;
m_Image.Source = bi;
m_Image.ImageOpened += new EventHandler<RoutedEventArgs>(Image_Opened);
The callback function (Image_Opened) is never called either..
Is your Silverlight application running from the domain php.scripts.psu.edu? If not, Silverlight will block access to it because it will not allow TCP requests made to any domain other than the one the application was loaded from.
See here for network restrictions in Silverlight.
EDIT: commenter is right. It's the cross-zone issue you're seeing now. Here's a link with a table indicating what an Image (among others) can and can't do.
Another thing I would fix in your code is that you attach the handler at the end.
In theory the event handler may not be called if the image is loaded really fast.
I suppose it would not happen on most cases, but with caching/object reuse/etc. who knows. I would attach the handler just after instantiating the object and be safe.