multiline textbox to string - c#

I have a multiline textbox that I wish to convert to a string,
I found this
string textBoxValue = textBox1.Text.Replace(Environment.NewLine,"TOKEN");
But dont understand TOKEN what is TOKEN? whitespace or /n newline ?
If this is the incorrect answer then Please let me know of the correct way of doing this
Thanks

In the code snippet you gave, "TOKEN" is any value you wish to insert, such as an HTML <br /> tag, more Environment.NewLines for formatting, or just some random delimiter that will later allow you to split the text on it.
A very simple example:
string text = textBox1.Text.Replace(Environment.NewLine, "^"); // a random token
string[] lines = test.Split( '^' );
If you are handling input from a textbox available on the web, you also need to take into account XSS (http://en.wikipedia.org/wiki/Cross-site_scripting). Also, in a real scenario I would split on a more complex token and make sure to handle multiple carriage returns in the input value.
EDIT: now that I see your actual requirements, this code may do what you need:
// replace newlines with a single whitespace
string text = textBox1.Text.Replace(Environment.NewLine, " ");
EDIT #2:
further I need to enter this data into
SQLite and rewrite his whole
application, The company does not wish
to have information from the previos
application inputted to the new
database, there are hyperlinks etc
inbedded in the content , so if there
is a way I can make the text box only
accept RAW data this would be the
best.
Regular Expressions are the way to go for something like this, unless the data is structured enough to load into an XML or HTML DOM and process. You can build regular expressions in a variety of tools (do a Google search for a free online tester and you will find many). Once you have determined the expressions you need, you can use the Regex object in C# to match, replace, etc.
http://msdn.microsoft.com/en-us/library/ms228595(VS.80).aspx
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace(v=VS.100).aspx

As it stands, "TOKEN" is just a meangingless string, unless it is elsewhere in your code? You can replace "TOKEN" with any text you like.
Edit:
Okay, so you say you're removing NewLine's from your client's text. So you would do it like this. Paste their text into a textBox called textBox2, then use the following:
textBox2.Text = textBox2.Text.Replace(Environment.NewLine, string.Empty);

Related

HttpRequestValidationException workaround

I have a textbox that when user inputs a string such as "<daily" (to signify less than daily) it throws a HttpRequestValidationException. However if there is a space between the less than symbol and the string, it works fine such as "< daily".
I have had it change the value that is submitted in the code behind by using the replace function. For example:
string s = "This is a <test";
if(s.Contains("<")){
s = s.Replace("<", "< "); //I have also used "<" & "<"
}
However, I still get the exception because in the textbox it is still showing it as "<daily". I am wondering if there is a way that if the focus is off the textbox to dynamically add a space to the string?
I understand that the HttpRequestValidationException is not supposed to allow those characters, but it seems to allow if there are spaces. Any thoughts?
It would be nice to know how you use the string in the HttpRequest. Depending on how and where you use we could come up with some ideas.

C# Regex Replace ignore specific string

Since this is my first question here on stackoverflow I hope my question is correctly asked.
Basicly I have a normal .txt file which contains any text like:
car accident
people died
cat without owner
<!-- Text added at 6/29/2011 9:20:38 AM -->
Some addintional Text
other Text added
add Text
I have a write/append function which allows the user to append some text and set a little timestamp.
So my problem is: With another function, you can search and replace text in the textfile, but as you can guess if someone wants to replace the word "Text" it will be replaced in the xml-stylish comment(timestamp) as well.
My result until now is
content = Regex.Replace(content,"[^<+.*"+input+".*>+]*", replace);
//content = content of the .txt file, input = search term, replace = string to replace
But this fails miserably, as some regex pro's will see without executing it.
Now I hope that some regex pro could help me out here and provide me a search pattern which replaces the normal text but ignores the timestamp.
I'm not realy aware of the logic from regex until now, nevertheless I understand the single expressions so this would be a hook for me to understand Regex more properly.
Thanks in advice.
If I understand your question correctly, you want to replace every instance of "Text" except for the one(s) inside the comment.
The easist way is to use a negative lookbehind (fantastic description here) as below:
content = Regex.Replace(content, #"(?<!<!--.*?)" + input, replace);
What you're doing is attempting to replace a repetition of any length of a character that is NOT <+.*> or a character contained in input with the value in replace.
If you're going to be working a lot with Regex, I would HIGHLY recommend giving the website above a good read. It's hands down the best intro to Regex that I've found, the time spent now will save you lots of headaches later!
Edit
Updated to add flexibility thanks to #stema

HTMLencode HTMLdecode

I have a text area and I want to store the text entered by user in database with html formatting like paragraph break, numbered list. I am using HTMLencode and HTMLdecode for this.
Sample of my code is like this:
string str1 = Server.HtmlEncode(TextBox1.Text);
Response.Write(Server.HtmlDecode(str1));
If user entered text with 2 paragraphs, str1 shows characters \r\n\r\n between paragraphs. but when it writes it to screen, just append 2nd paragraph with 1st. While I'm decoding it, why doesn't it print 2 paragraphs?
The simple solution would be to do:
string str1 = Server.HtmlEncode(TextBox1.Text).Replace("\r\n", "<br />");
This is assuming that you only care about getting the right <br /> tags in place. If you want a real formatter you will need a library like Aaronaught suggested.
That's not what HtmlEncode and HtmlDecode do. Not even close.
Those methods are for "escaping" HTML. < becomes <, > becomes >, and so on. You use these to escape user entered input in order to avoid Cross-Site Scripting attacks and related issues.
If you want to be able to take plain-text input and transform it into HTML, consider a formatting tool like Markdown (I believe that Stack Overflow uses MarkdownSharp).
If all you want are line breaks, you can use text.Replace("\r\n", "<br/>"), but handling more complex structures like ordered lists is difficult, and there are already existing tools to handle it.
HTML doesn't recognize \r\n as a line break. Convert them to "p" or "br" tags.

How to split a user-generated string which may contain the delimitter?

I'd like to String.Split() the following string using a comma as the delimitter:
John,Smith,123 Main Street,212-555-1212
The above content is entered by a user. If they enter a comma in their address, the resulting string would cause problems to String.Split() since you now have 5 fields instead of 4:
John,Smith,123 Main Street, Apt 101,212-555-1212
I can use String.Replace() on all user input to replace commas with something else, and then use String.Replace() again to convert things back to commas:
value = value.Replace(",", "*");
However, this can still be fooled if a user happens to use the placeholder delimitter "*" in their input. Then you'd end up with extra commas and no asterisks in the result.
I see solutions online for dealing with escaped delimitters, but I haven't found a solution for this seemingly common situation. What am I missing?
EDIT: This is called delimitter collision.
This is a common scenario — you have some arbitrary string values that you would like to compose into a structure, which is itself a string, but without allowing the values to interfere with the delimiters in structure around them.
You have several options:
Input restriction: If it is acceptable for your scenario, the simplest solution is to restrict the use of delimiters in the values. In your specific case, this means disallow commas.
Encoding: If input restriction is not appropriate, the next easiest option would be to encode the entire input value. Choose an encoding that does not have delimiters in its range of possible outputs (e.g. Base64 does not feature commas in its encoded output)
Escaping delimiters: A slightly more complex option is to come up with a convention for escaping delimiters. If you're working with something mainstream like CSV it is likely that the problem of escaping is already solved, and there's a standard library that you can use. If not, then it will take some thought to come up with a complete escaping system, and implement it.
If you have the flexibility to not use CSV for your data representation this would open up a host of other options. (e.g. Consider the way in which parameterised SQL queries sidestep the complexity of input escaping by storing the parameter values separately from the query string.)
This may not be an option for you but would is it not be easier to use a very uncommon character, say a pipe |, as your delimiter and not allow this character to be entered in the first instance?
If this is CSV, the address should be surrounded by quotes. CSV parsers are widely available that take this into account when parsing the text.
John,Smith,"123 Main Street, Apt. 6",212-555-1212
One foolproof solution would be to convert the user input to base64 and then delimit with a comma. It will mean that you will have to convert back after parsing.
You could try putting quotes, or some other begin and end delimiters, around each of the user inputs, and ignore any special character between a set of quotes.
This really comes down to a situation of cleansing user inputs. You should only allow desired characters in the user input and reject/strip invalid inputs from the user. This way you could use your asterisk delimiter.
The best solution is to define valid characters, and reject non valid characters somehow, then use the nonvalid character (which will not appear in the input since they are "banned") as you delimiters
Dont allow the user to enter that character which you are using as a Delimiter. I personally feel this is best way.
Funny solution (works if the address is the only field with coma):
Split the string by coma. First two pieces will be name and last name; the last piece is the telephone - take those away. Combine the rest by coma back - that would be address ;)
In a sense, the user is already "escaping" the comma with the space afterward.
So, try this:
string[] values = RegEx.Split(value, ",(?![ ])");
The user can still break this if they don't put a space, and there is a more foolproof method (using the standard CSV method of quoting values that contain commas), but this will do the trick for the use case you've presented.
One more solution: provide an "Address 2" field, which is where things like apartment numbers would traditionally go. User can still break it if they are lazy, though what they'll actually break the fields after address2.
Politely remind your users that properly-formed street addresses in the United States and Canada should NEVER contain any punctuation whatsoever, perhaps?
The process of automatically converting corrupted data into useful data is non-trivial without heuristic logic. You could try to outsource the parsing by calling a third-party address-formatting library to apply the USPS formatting rules.
Even USPS requires the user to perform much of the work, by having components of the address entered into distinct fields on their address "canonicalizer" page (http://zip4.usps.com/zip4/welcome.jsp).

Removing <div>'s from text file?

Ive made a small program in C#.net which doesnt really serve much of a purpose, its tells you the chance of your DOOM based on todays news lol. It takes an RSS on load from the BBC website and will then look for key words which either increment of decrease the percentage chance of DOOM.
Crazy little project which maybe one day the classes will come uin handy to use again for something more important.
I recieve the RSS in an xml format but it contains alot of div tags and formatting characters which i dont really want to be in the database of keywords,
What is the best way of removing these unwanted characters and div's?
Thanks,
Ash
If you want to remove the DIV tags WITH content as well:
string start = "<div>";
string end = "</div>";
string txt = Regex.Replace(htmlString, Regex.Escape(start) + "(?<data>[^" + Regex.Escape(end) + "]*)" + Regex.Escape(end), string.Empty);
Input: <xml><div>junk</div>XXX<div>junk2</div></xml>
Output: <xml>XXX</xml>
IMHO the easiest way is to use regular expressions. Something like:
string txt = Regex.Replace(htmlString, #"<(.|\n)*?>", string.Empty);
Depending on which tags and characters you want to remove you will modify the regex, of course. You will find a lot of material on this and other methods if you do a web search for 'strip html C#'.
SO question Render or convert Html to ‘formatted’ Text (.NET) might help you, too.
Stripping HTML tags from a given string is a common requirement and you can probably find many resources online that do it for you.
The accepted method, however, is to use a Regular expression based Search and Replace. This article provides a good sample along with benchmarks. Another point worth mentioning is that you would require separate Regex based lookups for the different kinds of unwanted characters you are seeing. (Perhaps showing us an example of the HTML you receive would help)
Note that your requirements may vary based on which tags you want to remove. In your question, you only mention DIV tags. If that is the only tag you need to replace, a simple string search and replace should suffice.
A regular expression such as this:
<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>
Would highlight all HTML tags.
Use this to remove them form your data.

Categories

Resources