I am trying to white an application that gets a url and find out all html5 canvas advertisement (i found out that iframes is the best HTMLElement that I can get) and take a video screenshot of the element.
I have to render the page (because it uses javascript calls to create the final page) using an in memory webbrowser control and then I find all iframes from document of the Webbrowser control (objDoc in the sample).
var aColl = from HtmlElement p in objDoc.Body.All
where Configuration.lstCanvas.Contains(p.TagName.ToUpper())
select p;
For each element I have to capture a screenshot and a video of the content of the iframe or even an html file that can represent the video later.
In order to capture the image I use the following code (if i pass null to HTMLelement I get a full screen capture) as an extended metod of the webbrowser control.
public static Image GetElementImage(this WebBrowser web, HtmlElement elm = null)
{
try
{
IHTMLDocument2 idoc2 = web.Document.DomDocument as IHTMLDocument2;
if (idoc2 == null)
{
return null;
}
IHTMLElementCollection elementCollection = idoc2.all;
if (elementCollection == null)
{
return null;
}
IHTMLElementRender iHTMLElementRender;
if (elm != null)
{
iHTMLElementRender = (IHTMLElementRender)elm.DomElement;
if (iHTMLElementRender == null)
{
return null;
}
}
else {
iHTMLElementRender = idoc2.body as IHTMLElementRender;
if (iHTMLElementRender == null)
{
return null;
}
}
// get the location and size of the element and create a bitmap
// need a different interface
IHTMLElement element = iHTMLElementRender as IHTMLElement;
if (element == null)
{
return null;
}
int elementWidth = element.offsetWidth;
int elementHeight = element.offsetHeight;
// Create a bitmap and render the element to it.
Bitmap memoryBitmap = new Bitmap(elementWidth, elementHeight, PixelFormat.Format24bppRgb);
Graphics memGraphics = Graphics.FromImage(memoryBitmap);
IntPtr memDC = memGraphics.GetHdc();
iHTMLElementRender.DrawToDC(memDC);
memGraphics.ReleaseHdc(memDC);
Image r = (Image)memoryBitmap.Clone();
memGraphics.Dispose();
memoryBitmap.Dispose();
return r;
}
catch // (Exception ex)
{
return null;
}
}
I haven't found a way to take a video capture of the element.
I try to grab the content of the iframe as html (in order to save it and use it for latter) but I come out with some issues:
The url is not always an attribute of the iframe (it changes from the javascript)
Cross Domain scripting is not letting me to gram the html
The file that I download is not allways reproduces the content of the iframe
Do you have any suggestion either on getting the html code or the video?
Thanks in advance
Related
I'm trying to get a selected line of a pdf document which i've displayed in Webbrowser control (c# winform)
This is my code
IHTMLDocument2 htmlDocument = webBrowser1.Document.DomDocument as IHTMLDocument2;
IHTMLSelectionObject currentSelection = htmlDocument.selection;
if (currentSelection != null)
{
IHTMLTxtRange range = currentSelection.createRange() as IHTMLTxtRange;
if (range != null)
{
MessageBox.Show(range.text);
}
}
i get the following output
It doesn't seem to work as i'm getting an empty message box. I've got to know online that in order to edit/highlight text, i might have to use an SDK!
I'd really appreciate any kind of help/guidence here!
I want to get all the links of a HTML document. This isn't a problem, but apparently it puts all the links in an alphabetic order before storing them in an array one by one. I want to have the links in original order (not in alphabetic).
So is there any possibility to get the first found link, store it, then the second one,...? I already tried using HtmlAgilityPack and the Webbrowser-Control methods, but both order them alphabetically. The original order is important for later purposes.
I heard that it might be possible with Regex, but I've found enough answers, where they say that you shouldn't use it for HTML parsing. So how can I do it?
Here's the Webbrowser-Control code, I tried to use to get the links and store them into an array:
private void btnGet_Click(object sender, EventArgs e)
{
HtmlWindow mainFrame = webFl.Document.Window.Frames["mainFrame"];
HtmlElementCollection links = mainFrame.Document.Links;
foreach (HtmlElement link in links)
{
string linkText = link.OuterHtml;
if (linkText.Contains("puzzle"))
{
arr[i] = linkText;
i++;
}
}
}
Thank you in advance,
Opak
You can get the correct order by walking the DOM tree using HTML DOM API. The following code does this. Note, I use dynamic to access DOM API. That's because WebBrowser's HtmlElement.FirstChild/HtmlElement.NextSibling don't work for this purpose, as they return null for DOM text nodes.
private void btnGet_Click(object sender, EventArgs e)
{
Action<object> walkTheDom = null;
var links = new List<object>();
// element.FirstChild / NextSibling don't work as they stop at DOM text nodes
walkTheDom = (element) =>
{
dynamic domElement = element;
if (domElement.tagName == "A")
links.Add(domElement);
for (dynamic child = domElement.firstChild; child != null; child = child.nextSibling)
{
if (child.nodeType == 1) // Element node?
walkTheDom(child);
}
};
walkTheDom(this.webBrowser.Document.Body.DomElement);
string html = links.Aggregate(String.Empty, (a, b) => a + ((dynamic)b).outerHtml + Environment.NewLine);
MessageBox.Show(html);
}
[UPDATE] If you really need to get a list of HtmlElement objects for <A> tags, instead of dynamic native elements, that's still possible with a little trick using GetElementById:
private void btnGet_Click(object sender, EventArgs e)
{
// element.FirstChild / NextSibling don't work because they stop on DOM text nodes
var links = new List<HtmlElement>();
var document = this.webBrowser.Document;
dynamic domDocument = document.DomDocument;
Action<dynamic> walkTheDom = null;
walkTheDom = (domElement) =>
{
if (domElement.tagName == "A")
{
// get HtmlElement for the found <A> tag
string savedId = domElement.id;
string uniqueId = domDocument.uniqueID;
domElement.id = uniqueId;
links.Add(document.GetElementById(uniqueId));
if (savedId != null)
domElement.id = savedId;
else
domElement.removeAttribute("id");
}
for (var child = domElement.firstChild; child != null; child = child.nextSibling)
{
if (child.nodeType == 1) // is an Element node?
walkTheDom(child);
}
};
// walk the DOM for <A> tags
walkTheDom(domDocument.body);
// show the found tags
string combinedHtml = links.Aggregate(String.Empty, (html, element) => html + element.OuterHtml + Environment.NewLine);
MessageBox.Show(combinedHtml);
}
Hello fellow developers,
Being a newbie with the OpenXML SDK I cannot figure out how to retrieve a graph part that I've put in a rich text control (with a specific tag name).
For the moment I retrieve the graph part by using the mainDocumentPart.ChartParts collection. But a ChartPart object does not seem to know where it's located in the document: chartPart.GetParentParts() only contains the mainDocumentPart.
I have multiple graphs in my document, so how can I distinguish them?
I have put my graphs in rich text controls, so I thought I could access them like that, but I cannot figure out how to do this. Retrieving the rich text control works, but how to find the graph within it?
foreach (SdtProperties sdtProp in mainDocumentPart.Document.Body.Descendants<SdtProperties>())
{
Tag tag = sdtProp.GetFirstChild<Tag>();
if (tag != null && tag.Val != null)
{
if (tag.Val == "containerX")
{
SdtProperties sdtPropTestResults = sdtProp;
// How to retrieve the graph part??
// sdtPropTestResults.Descendants<ChartPart> does not seem to work
}
}
}
Thanks a lot for your help.
Found a solution myself. I don't use the parent container now. Instead I gave the chart space an "Alt Title". Now my code looks for a drawing having a docProperties with the given title.
Here it is:
// Find our graphs by looping all drawings in the document and comparing their "alt title" property
foreach (Drawing drawing in mainDocumentPart.Document.Body.Descendants<Drawing>())
{
DocProperties docProperties = drawing.Descendants<DocumentFormat.OpenXml.Drawing.Wordprocessing.DocProperties>().FirstOrDefault();
if (docProperties != null && docProperties.Title != null)
{
if (docProperties.Title.Value == AltTitleChartBlack || docProperties.Title.Value == AltTitleChartRed)
{
LineChartData lineChartData = null;
switch (docProperties.Title.Value)
{
case AltTitleChartBlack:
lineChartData = this.chartDataBlack;
break;
case AltTitleChartRed:
lineChartData = this.chartDataRed;
break;
}
ChartReference chartRef = drawing.Descendants<ChartReference>().FirstOrDefault();
if (chartRef != null && chartRef.Id != null)
{
ChartPart chartPart = (ChartPart)mainDocumentPart.GetPartById(chartRef.Id);
if (chartPart != null)
{
Chart chart = chartPart.ChartSpace.Elements<Chart>().FirstOrDefault();
if (chart != null)
{
LineChart lineChart = chart.Descendants<LineChart>().FirstOrDefault();
if (lineChart != null)
{
LineChartEx chartEx = new LineChartEx(chartPart, lineChartData);
chartEx.Refresh();
chartPart.ChartSpace.Save();
}
}
}
}
}
}
}
I have a Panel filled with a lot of controls for users to fill. These include textboxes, checkboxes, radiobuttons etc. It is a long form to fill so the controls are in a scrollable panel. What I need is to save the whole panel as pdf. I think PDFsharp is a good library to be able to save any text or image as a pdf file but I don't want to write code for every single control inside the panel. I once wrote a class to create a pdf file from a Control object. It was iterating all inner controls (and their inner controls until no inner control is left) of the given control and write their Text property (yes/no for chekable controls) to pdf using their Location and Size properties. I could not find it now but I remember it was having issues with some of the DevExpress controls I use so I didn't bother writing it again. (Edit: I had to, you can find it below.) I think taking a screenshot and save that image as pdf would be nice but I couldn't find out how to achieve it. This question seems like it but there is no satisfying answer to that.
So, screenshot or not I'm open for any advice. There should be many occasions where users must fill long forms and be able to keep it as pdf. Again, any advice or workaround would be appreciated. (I think about creating the form using html, displaying it in a WebBrowser control and using an html to pdf library but I really prefer using my existent form)
Many Thanks.
Edit:
I had to write something iterates inner controls of a container control (like a panel) and writes every inner control to a pdf using their Location, Size and Font properties though, I don't recommend to use it (at least as it is) because of these:
It sets the page's size to given control's size and use only one (usually huge) pdf page. You can add a logic to split it to pages if you need to. (I didn't, but I guess you'll probably need your pdf more printer friendly).
Cheeso's method (using a FlowDocument) is a much more "legitimate" way for a task like this. I prefer using that over this but I didn't have a choice in this instance.
I used PDFsharp in this code. You can find it in it's hompage or it's CodePlex page.
PdfReport class:
private PdfDocument Document;
public Control Control { get; private set; }
public PdfReport(Control control) { Control = control; }
public PdfDocument CreatePdf(PdfDocument document = null)
{
Document = document != null ? document : new PdfDocument();
PdfPage page = Document.AddPage();
page.Height = Control.Height;
page.Width = Control.Width;
XGraphics gfx = XGraphics.FromPdfPage(page);
foreach (PdfItem item in CreatePdf(new Point(0, 0), Control.Controls))
{
XStringFormat format = item.IsContainer ? XStringFormats.TopLeft : item.TextAlign == ContentAlignment.BottomCenter ? XStringFormats.BottomCenter : item.TextAlign == ContentAlignment.TopLeft ? XStringFormats.TopLeft : item.TextAlign == ContentAlignment.TopCenter ? XStringFormats.TopCenter : XStringFormats.Center;
gfx.DrawString(item.Text, item.Font, item.Brush, new XRect(item.Location, item.Size), format);
}
return Document;
}
private IEnumerable<PdfItem> CreatePdf(Point location, Control.ControlCollection controls)
{
List<PdfItem> items = new List<PdfItem>();
foreach (Control control in controls)
{
if (control.Controls.Count > 0)
items.AddRange(CreatePdf(control.Location, control.Controls));
items.Add(new PdfItem(control, location));
}
return items;
}
public void SaveAsPdf(string path, bool open = false)
{
CreatePdf().Save(path);
if (open)
Process.Start(path);
}
PdfItem class:
public string Text { get; set; }
public Point Location { get; set; }
public Size Size { get; set; }
public Font Font { get; set; }
public bool IsContainer { get; set; }
public ContentAlignment TextAlign { get; set; }
public Color ForeColor { get; set; }
public XBrush Brush { get { return new SolidBrush(ForeColor); } }
public PdfItem() { }
public PdfItem(string text, Point location, Font font, Color foreColor, Size size, bool isContainer = false, ContentAlignment alignment = ContentAlignment.MiddleCenter)
{
Text = text;
Location = location;
Size = size;
Font = new Font(font.FontFamily, font.Size, font.Style, GraphicsUnit.World);
TextAlign = alignment;
ForeColor = foreColor;
IsContainer = isContainer;
}
public PdfItem(string text, Point location, Size size)
: this(text, location, new Font("Calibri", 12), Color.Black, size) { }
public PdfItem(Control control, Point parentLocation)
: this(control.Text, control.Location, control.Font, control.ForeColor, control.Size, control.Controls.Count > 0)
{
Location = new Point(Location.X + parentLocation.X, Location.Y + parentLocation.Y);
IEnumerable<PropertyInfo> properties = control.GetType().GetProperties();
if (properties.FirstOrDefault(p => p.Name == "TextAlign" && p.PropertyType == typeof(ContentAlignment)) != null)
TextAlign = (control as dynamic).TextAlign;
if (properties.FirstOrDefault(p => p.Name == "Checked" && p.PropertyType == typeof(bool)) != null)
{
string title = control.Text != null && control.Text.Length > 0 ? string.Format("{0}: ", control.Text) : string.Empty;
Text = string.Format("{0}{1}", title, (control as dynamic).Checked ? "Yes" : "No");
}
}
Regarding
. I think taking a screenshot and save that image as pdf would be nice but I couldn't find out how to achieve it.
There is a tool called "cropper" available on codeplex.com. It is designed to be used as a user tool that can take screenshots. It is managed code, open source.
I can imagine embedding some of the cropper magic into your app so that you could take that screenshot. I can also imagine this would be useful for collecting a diagnostic image of the screen at the time of a problem.
On the other hand... if you are interested in producing a printed form that reproduces the content on the screen, then I think you should be using WPF, in which case doing what you want is pretty easy. For example, this question describes how to do a print-preview for a FlowDocument. From that point your user can print to PDF (if he has a PDF printer installed) or print to XPS, or print to a physical output device, and so on.
I don't know if this would help you or not, but DocRaptor.com's pdf api could be built in so it would do it for you, no matter what the user inputs. It uses basic html.
As you can use the below :)
YourPanel.AutoSize = true;
int width = YourPanel.Size.Width;
int height = YourPanel.Size.Height;
Bitmap bm = new Bitmap(width, height);
YourPanel.DrawToBitmap(bm, new Rectangle(0, 0, width, height));
string outputFileName = #"C:\YourDirectory/myimage.bmp";
using (MemoryStream memory = new MemoryStream())
{
using (FileStream fs = new FileStream(outputFileName, FileMode.Create, FileAccess.ReadWrite))
{
bm.Save(memory, ImageFormat.Bmp);
Clipboard.SetImage(bm);
byte[] bytes = memory.ToArray();
fs.Write(bytes, 0, bytes.Length);
}
}
YourPanel.AutoSize = false;
The Clipboard.SetImage will send you bm to the clipboard so you can paste them to your pdf form or whatever document
This also has an example built in that saves it as a image for you if you want.
The trick here is Autosize for your panel. It needs to be set to true so the panel resizes itself as a whole area visible, then right after you do your work you can resize it to false so it uses scrollbars again for the users screen (you may see it flash for half a second, but this code does work.
Saving it in a PDF I personally just prefer to write it there as my clipboard or you can write byte. But ITextSharp is a great library for the extension to work with!
I Really hope this helps.
I need to be able to place the captcha image into a picturebox on my form, the reason being that I need to zoom the captcha image for the visualy impaired users.
It appears to be such a simple task, just take the image from the web page and put it into a picturebox but it is turning out to be not so simple.
i have WebBrowser control in form and for registration in one of site, i need captcha image in picture box. problem is that captcha image is generated by JavaScript, when java script runs then it gives url of captcha image. but every time when java script runs, captcha image goes change. i just want that captcha image which is on WebBrowser control current page.
Any help would be greatly appreciated.
here is my code.
public void FacebookRegistration()
{
HTMLDoc = (mshtml.HTMLDocument)WBrowser.Document.DomDocument;
iHTMLCol = HTMLDoc.getElementsByTagName("input");
foreach (IHTMLElement iHTMLEle in iHTMLCol)
{
if (iHTMLEle.getAttribute("name", 0) != null)
{
strAttriName = iHTMLEle.getAttribute("name", 0).ToString();
if (strAttriName == "firstname")
{
iHTMLEle.setAttribute("value", FirstName, 0);
continue;
}
if (strAttriName == "lastname")
{
iHTMLEle.setAttribute("value", LastName, 0);
continue;
}
if (strAttriName == "reg_email__")
{
iHTMLEle.setAttribute("value", EmailID, 0);
continue;
}
if (strAttriName == "reg_passwd__")
{
string s = GetRandomString();
Random ran = new Random();
iHTMLEle.setAttribute("value", s+ran.Next(1111,9999), 0);
break;
}
}
}
iHTMLCol = HTMLDoc.getElementsByTagName("option");
foreach (IHTMLElement iHTMLEle in iHTMLCol)
{
try
{
if (iHTMLEle.innerText.Contains("Male"))
{
iHTMLEle.setAttribute("selected", "selected",0);
}
if (iHTMLEle.innerText.Contains("Jun"))
{
iHTMLEle.setAttribute("selected", "selected", 0);
}
Random ran = new Random();
if (iHTMLEle.innerText.Contains("4"))
{
iHTMLEle.setAttribute("selected", "selected", 0);
}
Random ran1 = new Random();
if (iHTMLEle.innerText.Contains(ran1.Next(1920,1985).ToString()))
{
iHTMLEle.setAttribute("selected", "selected", 0);
}
}
catch { }
}
iHTMLCol = HTMLDoc.getElementsByTagName("input");
int i = 0;
foreach (IHTMLElement iHTMLEle in iHTMLCol)
{
string s = iHTMLEle.className;
if (iHTMLEle.className == "UIButton_Text" && iHTMLEle.getAttribute("value", 0).ToString() == "Sign Up")
{
if (i != 0)
{
iHTMLEle.click();
break;
}
i++;
}
}
private void WBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (CurrentSocial == "facebook")
{
FacebookRegistration();
}
}
}
in the registration page of facebook.com, there is captcha and if you go to page source then you will see only this:
<input type="hidden" id="captcha_persist_data" name="captcha_persist_data" value="AAAAAQAQiCw5zhFGOsVF6TbDBX8d_wAAAGvqENqFy5KkvMip5AIv3QSF7BS7goiHfAC7fTkzr8hW61cq3s1d23Tw7m-WAi-21Uzt1l3frkLf4obBEuZZMwga_hbcUhnWXu4P382QsJ7J0WtAbo5USXWuVjzv_KD1SMyTWhf34AGorQd27dFqZc0a" /?;
and in this input tag, i found url of that javascript which gives captcha url
javascript url: http://api.recaptcha.net/challenge?k=6LezHAAAAAAAADqVjseQ3ctG3ocfQs2Elo1FTa_a&ajax=1&xcachestop=0.31520985781374&authp=nonce.tt.time.new_audio_default&psig=H48rD9d3_QogBfxxKAmzFZ7CG10&nonce=hl77BQn58EsYsPpPwQ2TIA&tt=r3zaWETv27-0igoIw5ndwnHt_W4&time=1256413208&new_audio_default=1
if you browse this url you will get captcha url like this:
var RecaptchaState = {
challenge : '02UflxsCli4nYg-oG48n5bNDm6ywMlvE62UwXQssF__eJAfSiv2TXuac-1tbu2FThwakgH65IdExWDy9qyr1sYbRuwyQFZD7Dk1eE_fXuoSn9tliqnYeMq__LEF6-GTEm0H6TChOtvpwL2G3C1BsBriw8FFaKqkaTwbNoJeAfzI_j9qYnPaqtHJYillevhRsxyaQVYfLvqai7p0Sfu3849BFpamlbfE3to3KTXi5cZ0xlmuGkMkuZhvq_GyK_z-ZXq9z_Ls8xZlywN0jlIOsSEvI9QJq_69X-X3Moq9lFBcmqWYaKbf7faRQt19aJGB4DdBC1PqQIC',
timeout : 25000,
server : 'http://api.recaptcha.net/',
site : '6LezHAAAAAAAADqVjseQ3ctG3ocfQs2Elo1FTa_a',
error_message : '',
programming_error : '',
is_incorrect : false
};
Recaptcha.challenge_callback();
and original captcha url look like this:
http://api.recaptcha.net/image?c=027CxC4LbBbzVJKy-1xX_wRBf7Gmi4AvgikDVaKeYjBCmiX4XBzGymWC7XRfWx4LLQgfscKnfeB7U305MhlVN0X4vAkrK84ac3jybRJ3UJPUQ8rnlJOS7lqNqpRpolYSd6WBxMShhrzqbx-5ScL0JAsN7cJRMLMqeQsPHg1QB7g4kp4KxKO1aEONsUibahnCC8baLHGSIYJ5Q1Gcr1MPvJ9i_a5qQCilT1tWXwAKE_fkVGi31_un3OxHbNm9UmMemRp7IZ9C9ZLU4IjMApxVJOWXMYqjt588z_ZVcYG2dtY6Dh0b4R1aAQcp0UXFTggdWtsjPw7wIC
then you will get captcha, but is is not what i want , because javascript everytime changes the capthca image. so i dont get captcha which is currently being shown in webbrowser
You can copy the image to the clipboard and read from there. Alternatively you can parse the page to get the image's URL and see if you can dig the image file out of the cache
How To Programmatically Copy an IMG Element to the Clipboard
Would it be possible to use reCaptcha.net instead? You register for free, add the script to your page and then they do the rest. They may not have zooming for the visually impaired, but they do have text-to-speech. No sense in reinventing the wheel if you don't have to. Of course you'd need access to the internet on your page so this could be a problem if your site is an intranet or private network of some sort. Hope that helps.