Regex in between characters - c#

Im trying to create a regex that will match ascii characters in a string so that they be converted with hex afterwards. The string is received as follows:<<<441234567895,ASCII,4,54657379>>> so I am looking to match everything between the third comma and the >>> characters at the end of the string like so.
<<<441234567895,ASCII,4,54657379>>>
So far I have managed to create this regex (/([^,]*,[^,]*)*([^;]*)>>>/) for it but the third comma is picked up as well which I don't want. What do I need to do to remove it from the match?
thanks Callum

(?<=,)[^,]+(?=>>>)
This should do it.See demo.
https://regex101.com/r/sJ9gM7/79

Do you need to use Regex?
string input = "<<<441234567895,ASCII,4,54657379>>>";
string match = input.Substring(3, input.Length - 6).Split(',')[3];
You can also use further splits on the beginning and ending padding strings or check their lengths if you want something safer than the Substring magic.

Related

C# Regex contain []

I am working on a regex to validate that a string input contains letters with in a length of 6 to 20 and inside the string exists square brackets with an integer.
Something like:
ABC[15]WHATEVER
I get that it would be better to check for the string length without the the use of regex, but still wandering for the brackets with the integer.
What I managed to do:
\[([0-9\-]+)]
which is working by testing through regex tester .net
Is this the appropriate solution though?
Any help is welcome
I'd go with
^(?=[\w\[\]]{6,20}$)[A-Z]+\[\d+\][A-Z]+$
see this working example on Regex101
Explanation
^ Begin of string
(?=[\w\[\]]{6,20}$) Followed by a string between 6 and 20 characters long ({6,20}) containing only alphanumeric characters (\w) and brackets (\[\]), followed by the end of the string ($)
[A-Z]+\[\d+\][A-Z]+ the actual pattern - 1 or more digits in brackets surrounded by characters
$ end of string
I'd comment but I don't have the rep. Consider this an expansion on Jeroen's comment.
A length check and regex to match the bracketed integer won't necessarily guarantee a legitimate string. What if there are two bracketed integers? What if the 'letters' prior to the brackets contain an invalid character? I'd recommend a string length check followed by something like:
^[a-zA-Z]+\[[0-9]+\][a-zA-Z]+$
Which will further constrain the match. Add capturing groups as needed.
I think you should use string.Length and regex (^[a-zA-Z]+\[-?\d+\][a-zA-Z]+$) to test the completeness.

Match up everything before STRING or STRING

I've searched for hours and already tried tons of different patterns - there's a simple thing I wan't to achive with regex, but somehow it just won't do as I want:
Possible Strings
String1
This is some text \0"§%lfsdrlsrblabla\0\0\0}dfglpdfgl
String2
This is some text
String3
This is some text \0
Desired Match/Result
This is some text
I simply want to match everything - until and except the \0 - resulting in only 1 Match. (everything before the \0)
Important for my case is, that it will match everytime, even when the \0 is not given.
Thanks for your help!
You can try with this pattern:
#"^(?:[^\\]+|\\(?!0))+"
In other words: all characters except backslashes or backslashes not followed by 0
I like
#"^((?!\\0).)*"
Because it's very easy to implement with any arbitrary string. The basic trick is the negative lookahead, which asserts that the string starting at this point doesn't match the
regular expression inside. We follow this with a wildcard to mean "Literally any character not at the start of my string. If your string should change, this is an easy update - just
#"^((?!--STRING--).)*)"
As long as you properly escape that string. Heck, with this pattern, you're merely a regex_escape function from generating any delimiter string.
Bonus: using * instead of + will return a blank string as a valid match when your string starts with your delimiter.

Why is regex not matching on unicode character

So I am trying to write a regex in c# (.NET) to match on a range of unicode characters that could potentially be found in a string. As a simple test, I attempted to match on a single unicode character, \u8221, which is the character ”. If I use the regex string "”", I get a match against my test string that contains this character. If, however, I change my regex to "\u8221", I don't get a match. Anyone know why this could be and how to get it to work? I have been pulling my hair out over this one. Thanks in advance.
You are not matching the correct character. \u requires a character code in hexadecimal. Try \u201D instead.

Regex - Match a string only when it contains any alphabetic characters

example strings
785*()&!~`a
##$%$~2343
455frt&*&*
i want to capture the first and the third but not the second since it doesnt contain any alphabet character plz help
In fact, I think [a-zA-Z] might suffice to match your strings.
To capture the whole thing, try: ^.*[a-zA-Z].*$
Here is one possible way:
.*[a-zA-Z]+
You should maybe clarify a bit what you mean by 'catpuring': do you want the whole string of just the ascii bits?
Also, you don't say if it should match just plain Roman alphabet (A to Z) or if it should also match Unicode chars to match strings in other languages.
If you just need to test your string, in C# you would do:
bool matching = Regex.IsMatch(myString, "[a-zA-Z]");
You wouldn't need anything else, since just one letter anywhere in the myString string will match (according to your definition).
This is my favorite RegEx testing site: Javascript Regexp Tester and Cheat Sheet
If you want to match all letters (including non-ascii ones), use p{L} instead of [a-zA-Z]. See Unicode categories.

Replacing numbers in strings with C#

I'd thought i do a regex replace
Regex r = new Regex("[0-9]");
return r.Replace(sz, "#");
on a file named aa514a3a.4s5 . It works exactly as i expect. It replaces all the numbers including the numbers in the ext. How do i make it NOT replace the numbers in the ext. I tried numerous regex strings but i am beginning to think that its a all or nothing pattern so i cant do this? do i need to separate the ext from the string or can i use regex?
This one does it for me:
(?<!\.[0-9a-z]*)[0-9]
This does a negative lookbehind (the string must not occur before the matched string) on a period, followed by zero or more alphanumeric characters. This ensures only numbers are matched that are not in your extension.
Obviously, the [0-9a-z] must be replaced by which characters you expect in your extension.
I don't think you can do that with a single regular expression.
Probably best to split the original string into base and extension; do the replace on the base; then join them back up.
Yes, I thing you'd be better off separating the extension.
If you are sure there is always a 3-character extension at the end of your string, the easiest, most readable/maintainable solution would be to only perform the replace on
yourString.Substring(0,YourString.Length-4)
..and then append
yourString.Substring(YourString.Length-4, 4)
Why not run the regex on the substring?
String filename = "aa514a3a.4s5";
String nameonly = filename.Substring(0,filename.Length-4);

Categories

Resources