Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 days ago.
This post was edited and submitted for review 4 days ago.
Improve this question
There is some text. I need to remove all tags and tag-like entities.
And at the same time leave the <p> tag and </p>
Example:
<p> English texts for beginners to practice reading and comprehension online and for free. <span>Practicing your comprehension of written <English> will both improve your vocabulary and understanding of grammar and word order.<\span> The texts <below> are designed to help you develop while giving you an instant evaluation of your progress. </p>
Expectation:
<p> English texts for beginners to practice reading and comprehension online and for free.Practicing your comprehension of written will both improve your vocabulary and understanding of grammar and word order.The texts are designed to help you develop while giving you an instant evaluation of your progress. </p>
Used pattern = (<[^<>]+>)
Result:
English texts for beginners to practice reading and comprehension online and for free.Practicing your comprehension of written will both improve your vocabulary and understanding of grammar and word order.The texts are designed to help you develop while giving you an instant evaluation of your progress.
How to add a condition to this pattern so that the tag is not equal to <p> and </p>?
Should it work, something like that? (<([^<>]+)(?!<p>)>)
Update:
This pattern works great.
(?=(<[^<>]+>))(?!(<p>|</p>))
Thanks #GoodNightNerdPride for the link to
regex101
My suggestion would be :
Read and store the first tag from the string.
Then replace all the HTML tags using regular expression(should be simple).
Then append the tag read as part of step 1 back to this string.
You can get the text between two HTML tags using regex but that doesn't ensure all the tags in between the string are removed.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I want to filter some bad words like 'asshole' but you can bypass the word by just saying 'asssssshole' or 'asshoooole'
Here is my code currently.
string Word = "asshole";
if (Comment.Contains(Word)
//block the comment from being posted
How would I check the message for multiple extra letters added on to a banned word, without creating hundreds of different rules for each way you can spell 'asshole'.
You might have some partial success using a soundex. Try putting your variations into https://www.functions-online.com/soundex.html and they all return the same value A240, even a*s*h*o*l*e. Unfortunately it won't work with a$$hole, but you might be able to come up with some simple substitutions to run before testing.
It should be pretty easy to find a c# implementation online.
Make a regex out of it and check for a match:
//using System.Text.RegularExpressions;
string word = "asshole";
//add a '+' after each letter to make it find any number of it
string pattern = string.Join("+", word.ToCharArray());
//pattern is "a+s+s+h+o+l+e"
if(Regex.IsMatch(comment, pattern))
{
//do something
}
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I need a regex pattern which can detect if the given text is in English or not, but I want to include the following:
Allowing spaces
Allowing numbers and words
Allowing multiple lines and tabs
Allowing all special characters !##$%^&*()_-+={}|/<>~`':";[]
Allowing URLs, emails
If the given text contains any character rather than English, it should be considered a non-English text, this should be applied if the text contains Arabic letters/words like "ا ب ت ... etc." and the same for French "é, â ... etc." and also all of the other languages
In brief, I need to know if the given text, any text with any format, is in English or not. I tried a lot of patterns but I didn't get it, and actually, I don't need to use any language detector as the application will be used offline.
Samples of the texts which should not be accepted:
Hello! ... é
مرحبا بك
للتحميل اضغط هنا ... http://www.google.com
So, if the text contains non-English letter, it should be considered non-English text.
I think I found it, I tried the Basic Latin Unicode category, and it works fine so far. I used:
"^[\u0000-\u007F]+$"
Its idea is about checking if the given text is in English and is written by using English letters only, in addition, it allows special characters. So, if the given text was like this "I met my friend in a café", it is considered as non-English text, as the given text should contain only English letters and avoid any other letters even if typed a name, place ... etc. this was exactly what I need.
Thank you all.
Resources:
http://kourge.net/projects/regexp-unicode-block
https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
Regular expression to match non-English characters?
In theory it is possible, if regex contained every word from English dictionary.
You can create a regex that detects non-English characters. That will detect text that is definitely not English, but won't be able to confirm it definitely is.
This should work:
#"[^\t\w\d\s$-/:-?{-~!"^_`\[\]]+"
If there is a match, there ARE non-english letters/characters.
BTW, you are just testing if the text contains only those characters where a English speaking person would normally use, NOT what language it is in.
To detect a language you need stuffs like Natural Language Processing but NOT regex.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I need to extract from a string all the hashtags (#hashtag), mentions (#user) and links.
Right now I'm using this one:
#"((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|#|#|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?)";
But it doesn't recognize users that starts with _ like "#_me" and links like this one (https://blogs.windows.com/windowsexperience/2015/12/03/whats-new-for-windows-10-iot-this-fall/#.VmB1q2NPg2A.twitter) are recognized partially.
How can I improve my regex to get all the possible cases?
Try this pattern (remember to turn RegexOptions.IgnorePatternWhitespace option on):
(?'tag'(#|\#)(\w|_)+)
|
(?'link'((https?://)|(www\.))[\w$-_.+!*'(),]+)
For this string:
My name is #_dave from #chicago. Visit my city at www.choosechicago.com/things-to-do/ Have a nice day!
It makes 3 captures: 2 under the tag group (#_dave and #chicago) and one under the link group (www.choosechicago.com/things-to-do/).
You can check it with a regex tester like Regex Storm
Explanation
RegexOptions.IgnorePatternWhitespace allows you to break your pattern into multiple lines for easier readability. Instead of this:
(?'tag'(#|#)(\w|_)+)|(?'link'www\.[\w$-_.+!*'(),]+)
You can write this when you turn on the option:
(?'tag'(#|\#)(\w|_)+) # capture # and # tags into the tag group
|
(?'link'www\.[\w$-_.+!*'(),]+) # capture hyperlinks, must begin with www
(?'tag'...) defines a capture group named tag, so you can refer to it by name Groups["tag"]rather by its positional value Groups[1].
[\w$-_.+!*'(),]+ defines the list of characters allowed in a URL, which I got from this question. I haven't checked the RFC specs so don't burn me if I missed a few.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm kinda rusty with regular expressions. I need a REGEX that will validate values formatted like the following:
123.00
123,00
1324,00
1234.00
123
1213.0
I tried ^\d.\d{2}$, but it does not seem to match all values.
Appreciate any assistance.
You can use the following:
\d+[.,]?\d+
Good luck!
\d+[,.]?\d*
I would strongly advise against mixing cultures especially for persistence or transport.
The Regex you're likely looking for is something like #"\d+([,.]\d+)?"
It specifies "Some number of digits, optionally followed by a . or , and at least one digit". It would not match 123..
If you want to match culture-specific strings, however, I'd recommend using NumberFormatInfo.CurrencyDecimalSeparator and then look for that specifically.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I've got the following input text:
10,"ABASTECEDORA NAVAL Y INDUSTRIAL, S.A.",-0- ,"CUBA"
I need a String[] result with
result[0] == "10"
result[1] == "ABASTECEDORA NAVAL Y INDUSTRIAL, S.A."
result[2] == "-0-"
result[3] == "CUBA"
Please help to give me a regex pattern to split the input for above result.
It looks like you are reading a CSV file with optional quotations and you want to parse a single line. Take a look at this excellent .NET CSV reader API:
http://www.codeproject.com/KB/database/CsvReader.aspx
It appears that this may be a CSV file that you are reading in, if so you should use the .NET CSV api.
However if you really want to use regex, you can use the Regex.split() function to split your input into a String[].
http://msdn.microsoft.com/en-us/library/8yttk7sy.aspx
Something like this will work with your specific example:
(\d+),"([^"]+)",([^,]+),"([^"]+)"
However, it looks like you're really parsing CSV, so I'd use an appropriate CSV library for this. The pattern I provided won't account for embedded and escaped quotations / commas within the String, etc.