Remove characters before any special characters c# - c#

I'm trying to remove the characters in a string PRIOR to ANY non-alphanumeric characters. For instance, say you have a name "James Ebanks-Blake", I can split this into an array by using:
var s = "James Ebanks-Blake".Split(' ');
Even if there are more than one space, it'll just make more array indexes.
So what I need to do is loop thru all the arrays, find indexes with a special character, then remove all the indexes and the special character.
Can anyone assist me?

This works here
[-^$#](.*)
Just add what you consider special characters inside the character class
The string that you want will be in group 1
resultString = Regex.Match(subjectString, "[-^$#](.*)", RegexOptions.Singleline).Groups[1].Value;

[-'](.*)
That should grab anything after a - and a '. If you want, you can add more characters in the [ ] section. Just make sure to escape special regex ones.

Related

RegEx to find non-existence of white space prefix but not include the character in the match?

So i have the following RegEx for the purpose of finding and adding whitespace:
(\S)(\()
So for a string like "SomeText(Somemoretext)" I want to update this to "SomeText (Somemoretext)" it matches "t(" and so my replace eliminates the "t" from the string which is not good. I also do not know what the character could be, I'm merely trying to find the non-existence of whitespace.
Is there a better expression to use or is there a way to exclude the found character from the match returned so that I can safely replace without catching characters i do not want to replace?
Thanks
I find lookarounds hard to read and would prefer using substitutions in the replacement string instead:
var s = Regex.Replace("test1() test2()", #"(\S)\(", "$1 (");
Debug.Assert(s == "test1 () test2 ()");
$1 inserts the first capture group from the regex into the replacement string which is the non-space character before the opening parenthesis (.
If you need to detect the absence of space before a specific character (such as bracket) after a word, how about the following?
\b(?=[^\s])\(
This will detect words ( [a-zA-z0-9_] that are followed by a bracket, without a space).
(if I got your problem correctly) you can replace the full match with ( and get exactly what you need.
In case you need to look for absence spaces before a symbol (like a bracket) in any kind of text (as in the text may be non-word, such as punctuation) you might want to use the following instead.
^(?:\S*)(\()(?:\S*)$
When using this, your result will be in group 1, instead of just full match (which now contains the whole line, if a line is matched).

Regex in between characters

Im trying to create a regex that will match ascii characters in a string so that they be converted with hex afterwards. The string is received as follows:<<<441234567895,ASCII,4,54657379>>> so I am looking to match everything between the third comma and the >>> characters at the end of the string like so.
<<<441234567895,ASCII,4,54657379>>>
So far I have managed to create this regex (/([^,]*,[^,]*)*([^;]*)>>>/) for it but the third comma is picked up as well which I don't want. What do I need to do to remove it from the match?
thanks Callum
(?<=,)[^,]+(?=>>>)
This should do it.See demo.
https://regex101.com/r/sJ9gM7/79
Do you need to use Regex?
string input = "<<<441234567895,ASCII,4,54657379>>>";
string match = input.Substring(3, input.Length - 6).Split(',')[3];
You can also use further splits on the beginning and ending padding strings or check their lengths if you want something safer than the Substring magic.

finding and replacing special characters

Have some imported data which is leaving me with little invalid character symbols such as:
Caf�
Just wondering what's the easiest way to find/replace these in string content?
var newString = yourString.Replace("�", "");
where yourString is Caf�.
The special character can be used in the Replace statement. It should be as simple as that.
This may help you. Results depend on what type of text you want to keep or remove...
MSDN: How to: Strip Invalid Characters from a String.
This will replace every nonalphanumeric characters(leaving punctuation intact):
string result = Regex.Replace(textBox1.Text, #"[^\w(\p{P}) ]+", "");
if you want only the letters and numbers and want to clear punctuation remove (\p{P}) from the expression.

Regex.Split on "() " and "?"

myString= "First?Second Third";
String[] result = Regex.Split(myString, #"( )\?");
Should result:
First,
Second,
Third
What am I missing? (I also need brackets to split on for something else)
I guess with ( ), you meant whitespace. You don't need any capturing group there. Just use alteration, or a character class:
String[] result = Regex.Split(myString, #"\s|\?");
// OR
String[] result = Regex.Split(myString, #"[\s?]");
Using string methods:
myString= "First?Second Third";
String[] result = myString.Split(' ','?');
I'm not quite sure what you are trying to do with the quotes. Remember that in C# parenthesis are used to denote a logical group in your regular expression, they do not escape a space. Rather you want to split on an explicit set of characters, which is denoted by brackets []. You should use the following pattern to split:
String[] result = Regex.Split(myString, #"[\?\s]");
Note that \? is an escaped space (as you had in your original). White-space characters are escaped as \s. Thus, my solution is essentially saying to separate the string on any of the explicitly indicated characters (based on the []) and lists those characters as ? (escaped as \?) and " " (escaped as \s).
EDIT AFTER MORE INFO FROM OP:
I also saw, after answering this post, that you editted the top comment to say you wanted a logical grouping for the white-space, in which case I would go with:
String[] result = Regex.Split(myString, #"[\?(\s)]");
You need to surround those chars inside [] to create a range of them. [\s\?] This will split on:
a space
?
You can use \s to handle "any" whitespace char.

How to split string preserving spaces and any number of \n characters

I want to split the string and create a collection, with the following rules:
The string should be splitted into words.
1) If the string contains '\n' it should be considered as a seperate '\n' word.
2) If the string contains more than one '\n' it should considered it as more than on '\n' words.
3) No space should be removed from the string. Only exception is, if space comes between two \n it can be ignored.
PS: I tried a lot with string split, first split-ted \n characters and created a collection, downside is, if I have two \n consecutively, I'm unable to create two dummy words into the collection. Any help would be greatly appreciated.
Is there anyway to do this using regex?
Split with a regex like this:
(?<=[\S\n])(?=\s)
Something like:
var substrings = Regex.Split(input, #"(?<=[\S\n])(?=\s)");
This will not remove any spaces at all, but that was not required so should be fine.
If you really want the spaces between \ns to be removed, you could split with something like:
(?<=[\S\n])(?=\s)(?:[ \t]+(?=\n))?
Looks like homework. As such, read up on \b.
Should set you in the right direction.
Read up on the zero-width assertions. With them you can define a split position between e.g. \s and \S without actually matching either adjacent character.
edit:
Here's another question where the OP asked about those constructs.

Categories

Resources