Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have silly problem with Asp.Net project that I'm working on it for more than 5 years.
Today suddenly the Trim() function stopped working.
Notes:
I update project framework from 4.7.2 to 4.8 and the problem still happen.
I tried TrimLeft() and TrimRight() also have the same problem of Trim()
Replace Function is still working fine not effected but it is not a good solution for me to use it every where.
Trim working fine on new projects
I also check the char code of space it is 32
any idea?
this is the code
string val = "abo "; // "abo\x0020\x0020\x0020\x0020\x200e"
string userName = val.Trim();
you can run the problem in this link
Update:
thanks you all for the comments, I also found a simple check to test the end of the char when you set the cursor at the end and press backspace one time nothing is happen the second press start deleting and that because of the \x200e char at the end.
any idea how to trim hidden Char from left and right and deal with just like spaces.
String.Trim works. If it didn't, hundreds of thousands of developers would have noticed 16 years ago.
The string ends with a formatting character, specifically \x200e, the Left-to-Right-Mark. That's definitely not a whitespace. Calling Char.GetUnicodeCategory('') returns Format. I suspect the input came from mixed Arabic and Latin text, perhaps something copied by a user from a longer string?
One way to handle this is to use String.Trim(char[]) specifying the LTR mark along with other characters. That's not quite the same as String.Trim() though, which removes any character that returns true with Char.IsWhiteSpace() :
var userName=val.Trim(' ','\t','\r','\n','\u200e`);
Another option would be to use a regular expression that trims both whitespace \s and characters in the Format Unicode category Cf, only from the start ^[\p{Cf}\s] or end ([\p{Cf}\s]+$) :
string userName = Regex.Replace(val,#"(^[\p{Cf}\s]+)|([\p{Cf}\s]+$)","");
Perhaps a better option would be to prevent unexpected characters using input validation, and require that the input TextBox contains only letter or letter and digit characters. After all, the user could paste some other unexpected non-printable character. It's better to warn the user than try to handle all possible bad data.
Usernames are typically letter and number combinations without whitespace. All ASP.NET stacks allow validation. Modern browsers allow regular expression validation in the input element directly, so we could come up with a regex that allows only valid characters, eg :
<input type="text" required pattern="[A-Z0-9]' ..../>
The NumberLetter block (Nl) could be used to capture numbers and letters in any language, just like Cf is used to capture format characters
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I am new to regex in c# and i am trying to figure out a way to pull data from a user input string. So far I have tried to use the Regex.Matches and the Regex.Split but no matter what i try i can't seem to understand how to write my regular expression to find what i want. Here is the input string example:
-new -task:my task 1 -body:this is the body for task one -priority:1
i would like to split this so that i can get everything that is in between the :(colon) and the -
so for example, i would like for one of my matches/split results to be: my task 1
and then another match to be: this is the body for task oneand so on. Thank you
You can use Match in Csharp
string input = "-new - task:my task 1 - body:this is the body for task one -priority:1";
string pattern = #":(.*?)-";
Match match = Regex.Match(input, pattern);
while (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
match = match.NextMatch();
}
This is probably parsable with Regex. But regex has the downside of, while being nifty. It has a heavy cognitive load to understand. Since this only very little information being inputted (and not a huge file with allot of cases you need to shift through)
And I believe you have full control over how the format is being inputted into the program. I'd advice you not to use regex. And just to a string.split in the '-'. And simply interpret each argument as you walk over the array.
This should be much easier to maintain in the long run. Because if you have to ask about the regex online now, think about what will happen if you have to maintain the code again in the future.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
Having a hard time getting my regex to work correctly. Essentially, all I need is a valid number regex that just allows for one comma. Here's what I have tried:
[0-9]*[,]\\d
(This was when I thought I might have a number with multiple commas, not the case anymore)
[0-9][,]\\d
and
^\d+(?:[\,]\d+)?$ (http://regexr.com/3ggn5)
The latter seemed to work the best, however when I input this: 1,23134 it doesn't break the rule. How can I make it better to make sure if you input an invalid number 1,23232 (for example) it will break, but be fine if you do 1,232 (for example, just showing a valid number input).
UPDATE
This is the code surrounding, just using a RegularExpression annotation:
[RegularExpression(#"^\d+(?:[\,]\d+)?$", ErrorMessage = ...]
UPDATE 2
By valid number I simply mean a number that is correctly formatted to United States standards. Example of valid numbers:
1
10
100
1,000
1000
10,000
10000
100,000
100000
..etc
In the United States, we either have a comma or don't after the third digit sequentially (except for the first number in some cases, 1,000 is valid). Although, if you have comma, you typically will use commas every third digit. So I would assume a number like this: 1,00000000 isn't valid.
Examples of invalid numbers:
1,1
1,00
12,12
Basically if anywhere else in the world uses a comma in a place that isn't after the third digit, this would be invalid for what I need. Simply just numbers that may or may not have a comma.
This Regex will parse a number in many valid format:
^-?(\d+|\d{1,3}(?:,\d{3})+)?(\.\d+)?$
It will detect too many numbers after comma
wrong dot notation
numbers with no comma will pass
If you don't need nor negative nor float numbers, you can simplify it:
^(?:\d+|\d{1,3}(?:,\d{3})+)$
And if you don't want number without comma either (e.g: 1345):
^\d{1,3}(?:,\d{3})+$
P.S: For users coming from a non-english speaking world, you can replace the comma with a space in all those regex, and it will work the same way
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I need a regex pattern which can detect if the given text is in English or not, but I want to include the following:
Allowing spaces
Allowing numbers and words
Allowing multiple lines and tabs
Allowing all special characters !##$%^&*()_-+={}|/<>~`':";[]
Allowing URLs, emails
If the given text contains any character rather than English, it should be considered a non-English text, this should be applied if the text contains Arabic letters/words like "ا ب ت ... etc." and the same for French "é, â ... etc." and also all of the other languages
In brief, I need to know if the given text, any text with any format, is in English or not. I tried a lot of patterns but I didn't get it, and actually, I don't need to use any language detector as the application will be used offline.
Samples of the texts which should not be accepted:
Hello! ... é
مرحبا بك
للتحميل اضغط هنا ... http://www.google.com
So, if the text contains non-English letter, it should be considered non-English text.
I think I found it, I tried the Basic Latin Unicode category, and it works fine so far. I used:
"^[\u0000-\u007F]+$"
Its idea is about checking if the given text is in English and is written by using English letters only, in addition, it allows special characters. So, if the given text was like this "I met my friend in a café", it is considered as non-English text, as the given text should contain only English letters and avoid any other letters even if typed a name, place ... etc. this was exactly what I need.
Thank you all.
Resources:
http://kourge.net/projects/regexp-unicode-block
https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
Regular expression to match non-English characters?
In theory it is possible, if regex contained every word from English dictionary.
You can create a regex that detects non-English characters. That will detect text that is definitely not English, but won't be able to confirm it definitely is.
This should work:
#"[^\t\w\d\s$-/:-?{-~!"^_`\[\]]+"
If there is a match, there ARE non-english letters/characters.
BTW, you are just testing if the text contains only those characters where a English speaking person would normally use, NOT what language it is in.
To detect a language you need stuffs like Natural Language Processing but NOT regex.
This question already has an answer here:
.NET Regex Negative Lookahead - what am I doing wrong?
(1 answer)
Closed 6 years ago.
I asked this previously and used what I believe to be entirely too simple of a construct, so I'm trying again...
Assuming that I have:
This is a random bit of information from 0 to 1.
This is a non-random bit of information I do NOT want to match
This is the end of this bit
This is a random bit of information from 0 to 1.
This is a non-random bit of information I do want to match
This is the end of this bit
And (attempting) the following regex:
/This is a random bit(?:(?!NOT).)*?This is the end/g
Why is this not matching?
Regexr.com link: http://regexr.com/3db8m
What I'm looking to accomplish:
1) Determine a match based on a partial string of a line
2) Determine a match that ends with a partial string of a line
3) NOT capture based on some random string inside that start/end of a match.
edit
The patterns suggested in the original question were entirely too complicated for my meager understanding of Regex. Further, the suggestion of (?s) was throwing errors on regexr.com (ERROR: Invalid target for quantifier), so I reconstructed the question here.
If, indeed, there is a method to edit a question once asked, I apologize for not finding the edit link. I did find this edit link, as this question was marked as a 'duplicate' and 'previously answered'.
Respectfully, an answer not understood is no answer. As the author and seeker of the information contained in both this, and my previous question, I state that Maximilian Gerhardt's answer was the more correct (for me, at least).
Also, no idea if this is what was expected of this edit? I usually resort to StackOverflow when I've left a large enough dent on my desk. If I'm mis-using the site, again, I apologize :)
Don't use . with text that has newlines in it..
Working example:
http://regexr.com/3db92
This question already has answers here:
What is a regular expression which will match a valid domain name without a subdomain?
(23 answers)
Closed 7 years ago.
i will be using the expression with
Regex.Replace();
to replace the rest with "".
inputs:
http://therealzenstar.blogspot.fr
output:
blogspot.fr
Just to iterate on Jens' comment, we have to guess: What is your expected output when additional information appears, e.g. http://therealzenstar.blogspot.fr/somedata.html. Is it still blogspot.fr? Are such examples needed to be adresed?
You said you want to replace "everything else" with "". Replace() will replace everything that is matched with what you want. So, to replace it with "", you'd need to match everything that you do not want.It's possible, however, it's much easier to capture what you DO want and replace all the match with $1.
Assuming you always want only the domain.xx, even if more information appears. Something like this will work: ^(?:https?:\/\/)?[^\/\s]*\.([^.\s\/]*\.[^.\s\/]*)(?:$|\/.*), as seen: https://regex101.com/r/hN8iQ7/1
A problem arises if your domains also include those with multiple extensions. I.e. domain.co.uk. You'd need to adress them specifically (naming them), as it is very hard to generalize a way to distinguish these items.
^(?:https?:\/\/)?[^\/\s]*?\.([^.\s\/]*\.(?:co\.uk|[^.\s\/]*))(?:$|\/.*) - with .co.uk option added. https://regex101.com/r/hN8iQ7/2 .
yourregex.Replace(yourstring, "$1") may do what you need.