i wrote a small application that will monitor the clipboard and paste text directly in a webbrowser component.
...
DocumentWysiwyg = ClipBoardWebBrowser.Document.DomDocument as IHTMLDocument2;
DocumentWysiwyg.designMode = "On";
...
and when the Clipboard changes, i paste the Clipboard content into the webbrowser using:
ClipBoardWebBrowser.Document.Write(Clipboard.GetText(TextDataFormat.Html));
now, when pasting any copied content from web as html, i get inside the html
span class="Apple-converted-space">Â </span
which does not belong to the html i copied from a website.
what are those? and how can i get rid of them?
any help is really appreciated .
here is the html code for google.de as example
http://pastie.org/pastes/3706386/text
how would i make sure that the pasted clipboard is exactly the same as the copied data in the clipboard?
On Mac multiple occurrences of a regular space " " get converted such that they replace every other regular space with a non-breaking space character aka This allows the spaces to still break on every other character, but preserve such ranges of spaces.
The reason for this is because without this trick HTML would compress multiple spaces to a single one. Having only characters would disable line wrapping, because as their name states they would be non-breaking.
In addition to "Apple-converted-space" there was also "Apple-style-span" but that was eliminated from Webkit in 2011: https://www.webkit.org/blog/1737/apple-style-span-is-gone/
So to answer your question: since Webkit is filling the pasteboard you cannot do anything to prevent such behavior.
Related
So right now I'm using something like this in C#:
Response.Write(Server.HtmlEncode(line));
This will grab a line from a text file and display the stuff on screen - it works. My goal was to have the lines from the text file show something like this (HTML formatting in lines of text file):
<a href='http://www.url.com' target='_blank'>Link</a><br><br><!--info-->
Unfortunately, for some reason the < and > are being converted to < and > before it gets turned into HTML, which exposes all the HTML code as text and simply doesn't work. I have tried looking into the WebBrowser control and stuff but I don't understand what I'm supposed to do. For example, I tried this from a site:
WebBrowser browser = new WebBrowser();
browser.Navigate("about:blank");
browser.Document.Write(Server.HtmlEncode(line));
...but it didn't work. I'm looking for something very simple and with a clear example I can mimic, if possible. Thanks! I will make sure to select an answer too.
Other than that you are explicitly encoding the < and > brackets into < and it's counterpart, I don't see any error.
Drop the Server.HtmlEncode() part and it should work.
Response.Write(line);
Server.HtmlEncode() will explicitly encode everything into so-called "entities", which are those < codes. It is usually used to protect yourself from user-entered raw strings that should not modify your site's behaviour.
My issue is that I have a designer that will create a custom aspx page bu without any .net controls. I need a way of adding the controls dynamically. So far the only types of controls will be textboxes and a button, but there are 30 variations of what the textboxes can be (name, phone #, email, etc). Also the textboxes may or may not need to be required. Once the textboxes are added the form will be submitted to a db.
My first thought was to have the designer place something like [name] and then replace that with a user control that has a name textbox and a required field validator. In order to determine if the validator should be enabled I was thinking that the place holder could look like this, [name;val] or [name;noval]. I could either do replace the place holders in code dynamically or set up a tool that the user pastes their html into a textbox and clicks a button which then spits out the necessary code to create the aspx page.
I'm sure there must be a better way to do this but its a fairly unique problem so I haven't been able to find any alternatives. Does anyone have any ideas?
Thanks,
Kirk
IF your designer gives you html pages, just create a new website. copy and pages all the HTML pages with the Image folders and everything to your project. then for every HTML page create an aspx page, (with the same name) copy and pages the html's tags which are between to the aspx page's and for the body copy and paste HTML page's tags which are between into the of the aspx page.
Now you have your aspx page, exactly the same as html page.
Sounds like an attempt to over-engineer a solution to what should be a non-issue.
As #Alessandro mentioned in a comment above, why can't the designer provide you with pages that have the control markup? As it stands right now, the designer isn't providing you with "a custom aspx" so much as "a custom html page." If the designer is promising ASPX but delivering only HTML, that's a misinterpretation somewhere in the business requirements.
However, even if the designer is rightfully providing only HTML, there shouldn't be a problem with that. At worst, you can set each element you need on the server to runat="server" to access them on the server-side. Or, probably better, would be to simply replace them with the ASPX control markup for the relevant controls.
Write a simple parser that will recognize the [...] tags and replace them with corresponding controls. Its pretty easy to do and i've often done this... the tag i use is usually $$(..); though, but that doesn't matter as long as your parser knows your tags.
Such a parser will consist of a simple state-machine that can be in two states; text-mode or tag-mode. Loop through the whole page-text, char for char. As long as you're in text-mode you keep appending each char into a temporary buffer. As soon as you get into tag-mode you create a LiteralControl with the content of the temporary buffer and add it to the bottom of your Control-tree, and emtpy the buffer.
Now, you still keep adding each char into the buffer, but when you hit text-mode again, you analyze the content of the buffer and create the correct control - could be a simple switch case statement. Add the control to the bottom of your control tree and keep looping through the rest of the chars unto you read the end and keep switching back and forth between text-mode and tag-mode adding LiteralControls and concrete controls.
Simple example of such a parser... written in notepad in 4 minutes, but you should get the idea.
foreach (var c in text)
{
buffer.Append(c);
if (c== '[' && mode == Text)
{
mode = Tag;
Controls.Add(new LiteralControl(buffer));
buffer.Clear();
}
if (c == ']' && mode == Tag)
{
mode = Text;
switch (buffer)
{
case "[name]": Controls.Add(new NameControl());
... the rest of possible tags
}
buffer.Clear();
}
I need to create a data index of HTML pages provided to a service by essentially grabbing all text on them and putting them in a string to go into a storage system.
If this were GUI based, I would simply Ctrl+A on the HTML page, copy it, then go to Notepad and Ctrl+V. Simples. If I can do it via good old point n' click, then surely there must be a way to do it programmatically, but I'm struggling to find anything useful.
The HTML docs in question are being loaded for rendering currently using the System.Windows.Controls.WebBrowser class, so I wonder if its somehow possible to grab the data from there?
I'm going to keep hunting, but any pointers would be very appreciated.
Note: We don't want the HTML source code, and would also really rather not have to parse all the source code to get the text unless we absolutely have to.
If I understand your problem correctly, you will have to do a bit of work to get the data.
WebBrowser browser=new WebBrowser(); // This is what you have
HtmlDocument doc = browser.Document; // This gives you the browser contents
String content =
(((mshtml.HTMLDocumentClass)(doc.DomDocument)).documentElement).innerText;
That last line is the browser's view of the rendered content.
This looks like it might be quite helpful.
C# doesn't want to put Unicode characters on buttons. If I put \u2129 in the Text attribute of the button, the button displays the \u2129, not the Unicode character, (example - I chose 2129 because I could see it in the font currently active on the machine).
I saw this question before, link text, but the question isn't really answered, just got around. I am working on applications which are going all over the world, and don't want to install all the fonts, more then "don't want", there are that many that I doubt the machine I am working on has sufficient disk space. Our overseas sales agents supply the Unicode character "numbers". Is there another way forward with this?
As an aside, (curiosity), why does it not work?
The issue is:
C# will let you put Unicode in, like button1.Text = "Hello \u2129";, no problem
but the Visual Studio Forms designer will not recognize '\u2129' as anything special. By design.
So just paste in the '℩' in the Properties Window or use code.
Change the "Font" of the button to the "Font" (From google:Arial Unicode MS) which supports "u2129". It may help you
have you tried entering the characters manually? also, have you tried using a literal string with #"blahblahblah" ?
I was trying to include copyright symbol (\u00a9) in the form title. Using escape characters or changing fonts didn't work for me. I simply copy-pasted the symbol from text editor.
I have a requirement that user can input HTML tags in the ASP.NET TextBox. The value of the textbox will be saved in the database and then we need to show it
on some other page what he had entered. SO to do so I set the ValidateRequest="false" on the Page directive.
Now the problem is that when user input somthing like :
<script> window.location = 'http://www.xyz.com'; </script>
Now its values saved in the database, but when I am showing its value in some other page It redirects me to "http://www.xyz.com" which is obvious
as the javascript catches it. But I need to find a solution as I need to show exactly what he had entered.
I am thinking of Server.HtmlEncode. Can you guide me to a direction for my requirement
Always always always encode the input from the user and then and only then persist in your database. You can achieve this easily by doing
Server.HtmlEncode(userinput)
Now, when it come time to display the content to the user decode the user input and put it on the screen:
Server.HtmlDecode(userinput)
You need to encode all of the input before you output it back to the user and you could consider implementing a whitelist based approach to what kind of HTML you allow a user to submit.
I suggest a whitelist approach because it's much easier to write rules to allow p,br,em,strong,a (for example) rather than to try and identify every kind of malicious input and blacklist them.
Possibly consider using something like MarkDown (as used on StackOverflow) instead of allowing plain HTML?
You need to escape some characters during generating the HTML: '<' -> <, '>' -> >, '&' -> &. This way you get displayed exactly what the user entered, otherwise the HTML parser would possibly recognize HTML tags and execute them.
Have you tried using HTMLEncode on all of your inputs? I personally use the Telerik RadEditor that escapes the characters before submitting them... that way the system doesn't barf on exceptions.
Here's an SO question along the same lines.
You should have a look at the HTML tags you do not want to support because of vulnerabilities as the one you described, such as
script
img
iframe
applet
object
embed
form, button, input
and replace the leading "<" by "& lt;".
Also replace < /body> and < /html>
HTML editors such as CKEditor allow you to require well-formed XHTML, and define tags to be excluded from input.