I am having trouble doing the following search and replace:
// Consider this string - Note that it may be much more complicated, with many more matches
string output = "\"$type\": \"SomeType\",";
// The variable to search and replace - In this case is set to "SomeType"
string myType = "SomeType";
// The non-working regex
output = Regex.Replace(output, #"\"\$type\"\:\s\"" + myType + #"\",", "NewType");
I would expect the following output:
"$type": "NewType",
instead of:
"$type": "SomeType",
I think there are 2 problems here, but I cannot figure out the syntax. One is the use of the string "type", the other one is that I am not using capture groups so that only myType gets replaced by "NewType" in the output string.
You want to use
output = Regex.Replace(output, $#"(?<=""\$type"":\s""){Regex.Escape(myType)}(?="",)", "NewType");
See the C# demo.
NOTES:
" must be escaped with another " in a verbatim string literal
Escape the variables inside regex patterns
Use lookarounds to wrap the pattern parts you need to keep after substitution.
The resulting regex looks like (?<="\$type":\s")NewType(?=",) and matches
(?<="\$type":\s") - a location that is immediately preceded with "$type": a whitespace and "
NewType - some text
(?=",) - a location that is immediately followed with ",.
I'm new to regular expressions and would appreciate your help. I'm trying to put together an expression that will split the example string using all spaces that are not surrounded by single or double quotes. My last attempt looks like this: (?!") and isn't quite working. It's splitting on the space before the quote.
Example input:
This is a string that "will be" highlighted when your 'regular expression' matches something.
Desired output:
This
is
a
string
that
will be
highlighted
when
your
regular expression
matches
something.
Note that "will be" and 'regular expression' retain the space between the words.
I don't understand why all the others are proposing such complex regular expressions or such long code. Essentially, you want to grab two kinds of things from your string: sequences of characters that aren't spaces or quotes, and sequences of characters that begin and end with a quote, with no quotes in between, for two kinds of quotes. You can easily match those things with this regular expression:
[^\s"']+|"([^"]*)"|'([^']*)'
I added the capturing groups because you don't want the quotes in the list.
This Java code builds the list, adding the capturing group if it matched to exclude the quotes, and adding the overall regex match if the capturing group didn't match (an unquoted word was matched).
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
if (regexMatcher.group(1) != null) {
// Add double-quoted string without the quotes
matchList.add(regexMatcher.group(1));
} else if (regexMatcher.group(2) != null) {
// Add single-quoted string without the quotes
matchList.add(regexMatcher.group(2));
} else {
// Add unquoted word
matchList.add(regexMatcher.group());
}
}
If you don't mind having the quotes in the returned list, you can use much simpler code:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
There are several questions on StackOverflow that cover this same question in various contexts using regular expressions. For instance:
parsings strings: extracting words and phrases
Best way to parse Space Separated Text
UPDATE: Sample regex to handle single and double quoted strings. Ref: How can I split on a string except when inside quotes?
m/('.*?'|".*?"|\S+)/g
Tested this with a quick Perl snippet and the output was as reproduced below. Also works for empty strings or whitespace-only strings if they are between quotes (not sure if that's desired or not).
This
is
a
string
that
"will be"
highlighted
when
your
'regular expression'
matches
something.
Note that this does include the quote characters themselves in the matched values, though you can remove that with a string replace, or modify the regex to not include them. I'll leave that as an exercise for the reader or another poster for now, as 2am is way too late to be messing with regular expressions anymore ;)
If you want to allow escaped quotes inside the string, you can use something like this:
(?:(['"])(.*?)(?<!\\)(?>\\\\)*\1|([^\s]+))
Quoted strings will be group 2, single unquoted words will be group 3.
You can try it on various strings here: http://www.fileformat.info/tool/regex.htm or http://gskinner.com/RegExr/
The regex from Jan Goyvaerts is the best solution I found so far, but creates also empty (null) matches, which he excludes in his program. These empty matches also appear from regex testers (e.g. rubular.com).
If you turn the searches arround (first look for the quoted parts and than the space separed words) then you might do it in once with:
("[^"]*"|'[^']*'|[\S]+)+
(?<!\G".{0,99999})\s|(?<=\G".{0,99999}")\s
This will match the spaces not surrounded by double quotes.
I have to use min,max {0,99999} because Java doesn't support * and + in lookbehind.
It'll probably be easier to search the string, grabbing each part, vs. split it.
Reason being, you can have it split at the spaces before and after "will be". But, I can't think of any way to specify ignoring the space between inside a split.
(not actual Java)
string = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
regex = "\"(\\\"|(?!\\\").)+\"|[^ ]+"; // search for a quoted or non-spaced group
final = new Array();
while (string.length > 0) {
string = string.trim();
if (Regex(regex).test(string)) {
final.push(Regex(regex).match(string)[0]);
string = string.replace(regex, ""); // progress to next "word"
}
}
Also, capturing single quotes could lead to issues:
"Foo's Bar 'n Grill"
//=>
"Foo"
"s Bar "
"n"
"Grill"
String.split() is not helpful here because there is no way to distinguish between spaces within quotes (don't split) and those outside (split). Matcher.lookingAt() is probably what you need:
String str = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
str = str + " "; // add trailing space
int len = str.length();
Matcher m = Pattern.compile("((\"[^\"]+?\")|('[^']+?')|([^\\s]+?))\\s++").matcher(str);
for (int i = 0; i < len; i++)
{
m.region(i, len);
if (m.lookingAt())
{
String s = m.group(1);
if ((s.startsWith("\"") && s.endsWith("\"")) ||
(s.startsWith("'") && s.endsWith("'")))
{
s = s.substring(1, s.length() - 1);
}
System.out.println(i + ": \"" + s + "\"");
i += (m.group(0).length() - 1);
}
}
which produces the following output:
0: "This"
5: "is"
8: "a"
10: "string"
17: "that"
22: "will be"
32: "highlighted"
44: "when"
49: "your"
54: "regular expression"
75: "matches"
83: "something."
I liked Marcus's approach, however, I modified it so that I could allow text near the quotes, and support both " and ' quote characters. For example, I needed a="some value" to not split it into [a=, "some value"].
(?<!\\G\\S{0,99999}[\"'].{0,99999})\\s|(?<=\\G\\S{0,99999}\".{0,99999}\"\\S{0,99999})\\s|(?<=\\G\\S{0,99999}'.{0,99999}'\\S{0,99999})\\s"
Jan's approach is great but here's another one for the record.
If you actually wanted to split as mentioned in the title, keeping the quotes in "will be" and 'regular expression', then you could use this method which is straight out of Match (or replace) a pattern except in situations s1, s2, s3 etc
The regex:
'[^']*'|\"[^\"]*\"|( )
The two left alternations match complete 'quoted strings' and "double-quoted strings". We will ignore these matches. The right side matches and captures spaces to Group 1, and we know they are the right spaces because they were not matched by the expressions on the left. We replace those with SplitHere then split on SplitHere. Again, this is for a true split case where you want "will be", not will be.
Here is a full working implementation (see the results on the online demo).
import java.util.*;
import java.io.*;
import java.util.regex.*;
import java.util.List;
class Program {
public static void main (String[] args) throws java.lang.Exception {
String subject = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
Pattern regex = Pattern.compile("\'[^']*'|\"[^\"]*\"|( )");
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, "SplitHere");
else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("SplitHere");
for (String split : splits) System.out.println(split);
} // end main
} // end Program
If you are using c#, you can use
string input= "This is a string that \"will be\" highlighted when your 'regular expression' matches <something random>";
List<string> list1 =
Regex.Matches(input, #"(?<match>\w+)|\""(?<match>[\w\s]*)""|'(?<match>[\w\s]*)'|<(?<match>[\w\s]*)>").Cast<Match>().Select(m => m.Groups["match"].Value).ToList();
foreach(var v in list1)
Console.WriteLine(v);
I have specifically added "|<(?[\w\s]*)>" to highlight that you can specify any char to group phrases. (In this case I am using < > to group.
Output is :
This
is
a
string
that
will be
highlighted
when
your
regular expression
matches
something random
1st one-liner using String.split()
String s = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
String[] split = s.split( "(?<!(\"|').{0,255}) | (?!.*\\1.*)" );
[This, is, a, string, that, "will be", highlighted, when, your, 'regular expression', matches, something.]
don't split at the blank, if the blank is surrounded by single or double quotes
split at the blank when the 255 characters to the left and all characters to the right of the blank are neither single nor double quotes
adapted from original post (handles only double quotes)
I'm reasonably certain this is not possible using regular expressions alone. Checking whether something is contained inside some other tag is a parsing operation. This seems like the same problem as trying to parse XML with a regex -- it can't be done correctly. You may be able to get your desired outcome by repeatedly applying a non-greedy, non-global regex that matches the quoted strings, then once you can't find anything else, split it at the spaces... that has a number of problems, including keeping track of the original order of all the substrings. Your best bet is to just write a really simple function that iterates over the string and pulls out the tokens you want.
A couple hopefully helpful tweaks on Jan's accepted answer:
(['"])((?:\\\1|.)+?)\1|([^\s"']+)
Allows escaped quotes within quoted strings
Avoids repeating the pattern for the single and double quote; this also simplifies adding more quoting symbols if needed (at the expense of one more capturing group)
You can also try this:
String str = "This is a string that \"will be\" highlighted when your 'regular expression' matches something";
String ss[] = str.split("\"|\'");
for (int i = 0; i < ss.length; i++) {
if ((i % 2) == 0) {//even
String[] part1 = ss[i].split(" ");
for (String pp1 : part1) {
System.out.println("" + pp1);
}
} else {//odd
System.out.println("" + ss[i]);
}
}
The following returns an array of arguments. Arguments are the variable 'command' split on spaces, unless included in single or double quotes. The matches are then modified to remove the single and double quotes.
using System.Text.RegularExpressions;
var args = Regex.Matches(command, "[^\\s\"']+|\"([^\"]*)\"|'([^']*)'").Cast<Match>
().Select(iMatch => iMatch.Value.Replace("\"", "").Replace("'", "")).ToArray();
When you come across this pattern like this :
String str = "2022-11-10 08:35:00,470 RAV=REQ YIP=02.8.5.1 CMID=caonaustr CMN=\"Some Value Pyt Ltd\"";
//this helped
String[] str1= str.split("\\s(?=(([^\"]*\"){2})*[^\"]*$)\\s*");
System.out.println("Value of split string is "+ Arrays.toString(str1));
This results in :[2022-11-10, 08:35:00,470, PLV=REQ, YIP=02.8.5.1, CMID=caonaustr, CMN="Some Value Pyt Ltd"]
This regex matches spaces ONLY if it is followed by even number of double quotes.
i have the following sample cases :
1) "Sample"
2) "[10,25]"
I want to form a(only one) regular expression pattern, to which the above examples are passed returns me "Sample" and "10,25".
Note: Input strings do not include Quotes.
I came up with the following expression (?<=\[)(.*?)(?=\]), this satisfies the second case and retreives me only "10,25" but when the first case is matched it returns me blank. I want "Sample" to be returned? can anyone help me.
C#.
here you go, a small regex using a positive lookbehind, sometime these are very handy
Regex
(?<=^|\[)([\w,]+)
Test string
Sample
[10,25]
Result
MATCH 1
[0-6] Sample
MATCH 2
[8-13] 10,25
try at regex101.com
if " is included in your original string, use this regex, this will look for " mark as well, you may choose to remove ^| from lookup if " mark is always included or you may choose to leave it as it is if your text has combination of with and without " marks
Regex
(?<=^|\[|\")([\w,]+)
try at regex101.com
As far as I can tell, the below regex should help:
Regex regex = new Regex(#"^\w+|[[](\w)+\,(\w)+[]]$");
This will match multiple words, or 2 words (alphanumeric) separated by commas and inside square brackets.
One Java example:
// String input = "Sample";
String input = "[10,25]";
String text = "[^,\\[\\]]+";
Pattern pMod = Pattern.compile("(" + text + ")|(?>\\[(" + text + "," + text + ")\\])");
Matcher mMod = pMod.matcher(input);
while (mMod.find()) {
if(mMod.group(1) != null) {
System.out.println(mMod.group(1));
}
if(mMod.group(2)!=null) {
System.out.println(mMod.group(2));
}
}
if input is "[hello&bye,25|35]", then the output is hello&bye,25|35
I have the following Regex that I use to translate fields in SQL strings
string posLookBehind = #"(?<=\p{P}*)\b";
string posLookAhead = #"\b(?=\p{P}*)";
string keyword = "FieldA";
string translatedKeyword = "FldA";
string strSql = "SELECT [FieldA] FROM SomeTableA;";
strSql = Regex.Replace(
strSql,
baseRegexLeft + keyword + baseRegexRight,
translatedKeyword,
RegexOptions.IgnoreCase);
For some keyword FieldA, the above regex would replace 'FieldA' with 'FldA' in each of the following: [FieldA], [FieldA + FieldB] et al. However, I now wish to restrict the regex. I do not want to match 'FieldA' note the single apostrophe.
So, I have changed the regex using regex subtraction to remove ' from the punctuation set:
string posLookBehind = #"(?<=[\p{P}-[']]*)\b";
string posLookAhead = #"\b(?=[\p{P}-[']]*)";
where the other code is the same. But this is still matching 'FieldA'. What am I doing wrong here?
Thanks for your time.
I think the reason behind this is the the star * in the look behind and the look ahead, because it matches zero or more, so after matching FieldA it self it then checks behind for zero or more punctuations that are not apostrophes but since there is an apostrophe it just matches zero times.
You can fix this by changing the star to a plus +:
string posLookBehind = #"(?<=[\p{P}-[']]+)\b";
string posLookAhead = #"\b(?=[\p{P}-[']]+)";
Or if there is only a single surrounding character all the time then:
string posLookBehind = #"(?<=[\p{P}-[']])\b";
string posLookAhead = #"\b(?=[\p{P}-[']])";
I am trying to remove all the vowels from a string except for the first and last character. I have tried with 2 expressions and using 2 ways but in vain. I have described them below. Does anybody has a regular expression for this?
e.g.
original string -- source = apeaple
after regex -- source_modified = apple (this is what is expected)
I tried the expression ([a-zA-Z])[aeiouAEIOU]([a-zA-Z]) but this expression is removing repeated character as well. So the following is happening when i apply the above expression
code used --
Regex reg = new Regex("([a-zA-Z])[aeiouAEIOU]([a-zA-Z])");
string source_modified = reg.Replace(source, "");
original string -- source = apeaple
after code execution -- source_modified = aple (repeating character removed)
code used -- string source_modified = Regex.Replace(source, "([a-zA-Z])[aeiouAEIOU]([a-zA-Z])", "$1" + "$2");
original string -- source = apeaple
after code execution -- source_modified = apaple (just 1 vowel gets removed)
i also tried ([a-zA-Z])[aeiouAEIOU]*([a-zA-Z]) but this is removing just 1 vowel and not all. So the following is happening when i apply the above expression
code used --
Regex reg = new Regex("([a-zA-Z])[aeiouAEIOU]*([a-zA-Z])");
string source_modified = reg.Replace(source, "");
original string -- source = apeaple
after code execution -- source_modified = "" (all characters are removed)
code used -- string source_modified = Regex.Replace(source, "([a-zA-Z])[aeiouAEIOU]*([a-zA-Z])", "$1" + "$2");
original string -- source = apeaple
after code execution -- source_modified = apeple
You need some lookaround like so
(?<!^)[aouieyAOUIEY](?!$)
C# supports it and it's very powerful
string resultString = null;
try {
resultString = Regex.Replace(subjectString, "(?<!^)[aeui](?!$)", "");
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Update 1
T.W.R.Cole informs me that there is a special rule in the English language ("this doesn't work for words like "Anyanka" where an inner 'y' is used as a consonant")
The following change should do this, using the technique of negative lookahead:
(?<!^)([aouie]|y(?![aouie]))(?!$)
This time enable the regex modifier that matches case insensitive, it makes the regex simpler than the original
if a y followed by another y still means that the y is a consonant (euh... is there such a word) and thus should not disappear than a y must be listed in the last character class as well :
(?<!^)([aouie]|y(?![aouiey]))(?!$)
I repeat that I used C# as my regex dialect which has good support for lookaround techniques.
If so, why not remove the 1st and last character, remove vowels, and then stitch up again?
string sWord = "apeaple";
char cFirst = sWord[0], cLast = sWord[sWord.length-1];
sWord = sWord.substring(1, sWord.length -2);
sWord = cFirst.ToString() +
Regex.Replace(sWord , "[aouiyeAOUIYE]", String.Empty) +
cLast.ToString();
You need to start the string with at least one character, find a vowel and then end the string with at least one character. Try:
(.+)[aeiouAEIOU](.+)
In case you ever want to apply that to individual words in strings that consist of more than one word, \B[AEIOUaeiou]\B might be worth a try. \B is a non-word-boundary, i.e. any location where the two adjacent characters are either both word characters or both non-word characters. The latter case is obviously not possible if there's a vowel between the two locations.
Needless to say it also works for strings consisting only of a single word.