jQuery + C# code to resolve URL's from user-supplied text - c#

I'd like to add some kind of simple URL resolution and formatting to my C# and jQuery-based ASP.NET web application. I currently allow users to add simple text-based descriptions to items and leave simple comments ('simple' as in I only allow plain text).
What I need to support is the ability for a user to enter something like:
Check out this cool link: http://www.really-cool-site.com
...and have the URL above properly resolved as a link and automagically turned into a clickable link...kinda like the way the editor in StackOverflow works. Except that we don't want to support BBCode or any of its variants. The user experience would actually be more like the way Facebook resolves user-generated URL's.
What are some jQuery + C# solutions I should consider?

There's another question with a solution that might help you. It uses a regex in pure JS.
Personally though, I would do it server-side when the user submits it. That way, you only need to do it once, rather than every time you display that text. You could use a similar regex in C#.

I ended up using server-side C# code to do the linkification. I use an AJAX-jQuery wrapper to call into a PageMethod that does the work.
The PageMethod both linkifies and sanitizes the user-supplied string, then returns the result.
I use the Microsoft Anti-Cross Site Scripting Library (AntiXSS) to sanitize:
http://www.microsoft.com/download/en/details.aspx?id=5242
And I use C# code I found here and there to resolve and shorten links using good olde string parsing and regular expressions.
My method is not as cool as the way FaceBook does it in real time, but at least now my users can add links to their descriptions and comments.

Related

C# data scraping from websites

HI I am pretty new in C# sphere. Been in php and JavaScript since the beginning of this year. I want to scrap posts and comments from a blog. The site is http://www.somewhereinblog.net
What I want to do is
1. I want to log in using a software
2. Then download the html
3. Then use regular expressions, xpath whatever comes handy to separate the contents of posts and comments
I been searching all over. Understood very little. Though I am quite sure I need to use 'htmlagilitypack'. I dont know how to add a library to c# console or form application. Can someone give me some help? I badly need this. And I am not too into C# just a week. So would be grateful if there is some detailed information. Waiting eagerly.
Thanks in advance brothers.
Using Webclient you can login and download
Instead html-agility-pack I like CsQuery because lets you use jQuery syntax inside a string in C# code, so you can download to a string the html, and search and do things in it like with jQuery and HTML page.

How to read a Google calculation query result using .net/c#?

I want to submit Google queries like these:
http://www.google.ch/search?q=100+eur+to+chf
http://www.google.ch/search?q=1.5*17.5
...from a C# console application and capture the result reported back by Google (and ignore any links to other sites). Is there a specific Google API that helps me with this task?
I got this idea from the tool Launchy (launchy.net). The plugin GCalc does this, I also found the source file for this module:
http://launchy.svn.sourceforge.net/viewvc/launchy/tags/2.5/plugins/gcalc/gcalc.cpp?revision=614&view=markup
It looks like GCalc does not use any Google API at all. But I've got no clue how to do the same in C#, and I would prefer to use a proper API. But if there isn't one, I could use some help/pointers on how to copy the GCalc functionality to C# (.net libraries/classes...?)
Google calculator results don't show up when using the API. So if you want them, you'll have to scrape the page. Be careful doing so as it's against Google' terms of service so your IP will be banned if you send too many frequent requests.
Once you've got the results page, use an html parser. The result is in a <b> tag (e.g. <b>1 + 1 = 2</b>; if it's not present, then you have no calculator result). Be careful of <sup> tags within the result (e.g. <b>(1 (m^2) kg) / 2 = 0.5 m<sup>2</sup> kg</b>). You might also want to decode the html entities.
You can use WebClient.DownloadString(String url). This way you get page (html) as string.
You have to parse result, but that shouldn't be hard. HttpAgilityPack is good c# html parser that uses XPath for data retrieval.
why not use HTTPWebRequest and then parse the result as macrog stated in his answer.

Grab details from web page

I need to write a C# code for grabbing contents of a web page. Steps looks like following
Browse to login page
I have user name and a password, provide it programatically and login
Then you are in detail page
You have to get some information there, like (prodcut Id, Des, etc.)
Then need to click(by code) on Detail View
Then you can get the price for that product from there.
Now it is done, so we can write detail line into text file like this...
ABC Printer::225519::285.00
Please help me on this, (Even VB.Net Code is ok, I can convert it to C#)
The WatiN library is probably what you want, then. Basically, it controls a web browser (native support for IE and Firefox, I believe, though they may have added more since I last used it) and provides an easy syntax for programmatically interacting with page elements within that browser. All you'll need are the names and/or IDs of those elements, or some unique way to identify them on the page.
You should be able to achieve this using the WebRequest class to retrieve pages, and the HTML Agility Pack to extract elements from HTML source.
yea I downloaded that library. Nice one.
Thanks for sharing it with me. But I have a issue with that library. The site I want to get data is having a "captcha" on the login page.
I can enter that value if this can show image and wait for my input.
Can we achive that from this library, if you can like to have a sample.
You should be able to achieve this by using two classes in C#, HttpWebRequest (to request the web pages) and perhaps XmlTextReader (to parse the HTML/XML response).
If you do not wish to use XmlTextReader, then I'd advise looking into Regular Expressions, as they are fantastically useful for extracting information from large bodies of text where-in patterns exist.
How to: Send Data Using the WebRequest Class

Detect HTML in ASP.NET

(clarification: this is an old question that has been tweaked for admin purposes)
There have been a fair amount of questions on this site about parsing HTML from textareas and whatnot, or not allowing HTML in Textboxes. My question is similar: How would I detect if HTML is present in the textbox? Would I need to run it through a regular expression of all known HTML tags? Is there a current library for .NET that has the ability to detect when HTML is inserted into a Textarea?
Edit: Similarly, is there a JavaScript Library that does this?
Edit #2: Due to the way that the web app works (It validates textarea text on asyncronous postback using the Validate method of ASP.NET), it bombs before it can get back to the code-behind to use HTML.Encode. My concern was trying to find another way of handling HTML in those instances.
Not really an answer, but why you need it at all? You need to sanitize HTML input only if you are going to output it without modifications, i.e. if you want to allow your users actually to be able to use HTML. And if you want that, you do not have to "detect" HTML, you just need to make sure that you handle it safe. Jeff Atwood has a good routine for this.
If you want to prevent at all HTML output, you can take whatever the user inputs, without any checks. Just take care to HtmlEncode it, and store it that way. Then your output will not have actually any "real" HTML from what the user wrote.
Yes, a regular expression is probably the easiest way to do that.
One regex would be: <([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>
You can run that in both ASP.Net and javascript. The .Net framework class you use is System.Text.RegularExpressions.Regex
Hope that helps!
bool containsHtml = Regex.IsMatch(MyTextbox.Text, #"<(.|\n)*?>");
As far as I know you cannot paste HTML into a TextArea and have it work automatically at least in .Net 2.0. ASP.Net automatically santizes input. You need to set ValidateInput page directive to false (If I remember correctly).
If you want to allow HTML tags and want to pick from a possible list of tags, I suggest you lookup 'Markdown' and this Jeff Atwood Post.
+1 Sunny. “detecting” HTML is a fool's errand. You need to escape it on output, and as long as you're doing that you're safe. If you're not escaping it, sanitisation hacks aren't going to make you secure, they're just going to obfuscate the problem.
 Due to the way that the web app works (It validates textarea text on asyncronous postback using the Validate method of ASP.NET)
Yeah, you'll want to stop doing that. ASP.NET's “request validation” is utterly bogus and needs to be turned off if you want to have any chance of processing uploaded content consistently.
Well, in HTML you can't do a lot without a less than symbol "<".
So, I would look for a less than symbol followed by come characters followed by a greater than symbol. If you find that, you can pretty much be assured that it is HTML.
I don't think you have to look for specific tags, as HTML will ignore invalid tags as part of the specification and it would still be considered HTML.
EDIT: Oops! Almost forgot... the ampersand character! If you see one in the text, you MIGHT have HTML since it is used to specify special characters (like © for ©) This can be dangerous because the user could specify < for < so it might turn into HTML later...

How to allow simple HTML tags in comments or anywhere?

In my web application I am developing a comment functionality, where user's can comment. But I am facing a problem which is I want to allow simple HTML tags in the comment box. HTML tags like <b>, <strong>, <i>, <em>, <u>, etc., that are normally allowed to enter in a commenting box. But then I also want when user presses enter then it will be automatically converted into breaks (<br /> tags) and get stored into database, so that when I'll display them in the web page then they'll look like as user entered.
Can you please tell me how to parse that user entered only allowed set of HTML tags and how to convert enters into <br /> tags and then store them in database.
Or if anyone have some better idea or suggestion to implement this kind of functionality. I am using ASP.NET 2.0 (C#)
I noticed that StackOverflow.com is doing the same thing on Profile Editing. When we edit our profile then below the "About Me" field "basic HTML allowed" line is written, I want to do almost the same functionality.
I don't have a C# specific answer for you, but you can go about it a few different ways. One is to let the user input whatever they want, then you run a filter over it to strip out the "bad" html. There are numerous open source filters that do this for PHP, Python, etc. In general, it's a pretty difficult problem, and it's best to let some well developed 3rd party code do this rather than write it yourself.
Another way to handle it is to allow the user to enter comments in some kind of simpler markup language like BBCode, Textile, or Markdown (stackoverflow is using Markdown), perhaps in conjunction with a nice Javascript editor. You then run the user's text through a processor for one of these markup languages to get the HTML. You can usually obtain implementations of these processors for whatever language you are using. These processors usually strip out the "bad" HTML.
Its rather "simple" to do that in php and python due to the large number of functions.I am still learning c# .lol. but havent yet come across the function.The chances are that it exists and all you need to do is search for it.I mean a function that can take the user input,search for the allowed tags (which are in an array of course) and replace the <> with something else like [] then use a function to escape the other html tags.In php we use htmlentities().
Something like
<code>
$txt=$_POST['comment'];
$txt=strreplace("<b>*</b>","[b]*[/b],"$txt");
$securetxt=htmlentities($txt);
$finaltxt=strreplace("[b]*[/b]","<b>*</b>","$securetxt");
//Now save to Db
I'm not sure, but I think you have to escape html characters when inserting in database and when retrieving echo them unescaped, so the browser can see it just like html.
I don´t know asp.net, but in php there´s an easy function, strip_tags, that let you add exceptions (in your case, b, em, etc.). If there´s nothing like that in C# you can write a regular expression that strips out all tags except the allowed ones but chances are that such an expression already exists so it should be easy to find.
replacing \n (or something similar) with br shouldn´t be a problem either with a simple search and replace.
This is a dangerous road to go down. You might think you can do some awesome regexes, or find someone who can help you with it, but sanitizing SOME markup and leaving other is just crazy talk.
I highly recommend you look into BBCode or another token system. Even something untokenized such as what SO uses, is probably a much better solution.

Categories

Resources