hey there!
im not really into regular expressions, but i need a simple regex for replacing all of the following chars:
\ / : * ? " < > |
thanks^^
There you have it:
[\\/:*?"<>|]
here is an eg which can be helpful to u
CAP*!(ITAL)!('CINE)!(CAP [A-Z])
finds CAP but not when followed by ITAL or 'CINE or when followed by a space and a world beginning with a capital letter.
Related
I'm trying to replace '&' inside quotes.
Input
"I & my friends are stuck here", & we can't resolve
Output
"I and my friends are stuck here", & we can't resolve
Replace '&' by 'and' and only inside quotes, could you please help?
By far the quickest way is to use the \G construct and do it with a single regex.
C# code
var str =
"\"I & my friends are stuck here & we can't get up\", & we can't resolve\n" +
"=> \"I and my friends are stuck here and we can't get up\", & we can't resolve\n";
var rx = #"((?:""(?=[^""]*"")|(?<!""|^)\G)[^""&]*)(?:(&)|(""))";
var res = Regex.Replace(str, rx, m =>
// Replace the ampersands inside double quotes with 'and'
m.Groups[1].Value + (m.Groups[2].Value.Length > 0 ? "and" : m.Groups[3].Value));
Console.WriteLine(res);
Output
"I and my friends are stuck here and we can't get up", & we can't resolve
=> "I and my friends are stuck here and we can't get up", & we can't resolve
Regex ((?:"(?=[^"]*")|(?<!"|^)\G)[^"&]*)(?:(&)|("))
https://regex101.com/r/db8VkQ/1
Explained
( # (1 start), Preamble
(?: # Block
" # Begin of quote
(?= [^"]* " ) # One-time check for close quote
| # or,
(?<! " | ^ ) # If not a quote behind or BOS
\G # Start match where last left off
)
[^"&]* # Many non-quote, non-ampersand
) # (1 end)
(?: # Body
( & ) # (2), Ampersand, replace with 'and'
| # or,
( " ) # (3), End of quote, just put back "
)
Benchmark
Regex1: ((?:"(?=[^"]*")|(?<!"|^)\G)[^"&]*)(?:(&)|("))
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 10
Elapsed Time: 2.21 s, 2209.03 ms, 2209035 µs
Matches per sec: 226,343
Use
Regex.Replace(s, "\"[^\"]*\"", m => Regex.Replace(m.Value, #"\B&\B", "and"))
See the C# demo:
using System;
using System.Linq;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var s = "\"I & my friends are stuck here\", & we can't resolve";
Console.WriteLine(
Regex.Replace(s, "\"[^\"]*\"", m => Regex.Replace(m.Value, #"\B&\B", "and"))
);
}
}
Output: "I and my friends are stuck here", & we can't resolve
I'm working on a project where I have a HMTL fragment which needs to be cleaned up - the HTML has been removed and as a result of table being removed, there are some strange ends where they shouldnt be :-)
the characters as they appear are
a space at the beginning of a line
a colon, carriage return and linefeed at the end of the line - which needs to be replaced simply with the colon;
I am presently using regex as follows:
s = Regex.Replace(s, #"(:[\r\n])", ":", RegexOptions.Multiline | RegexOptions.IgnoreCase);
// gets rid of the leading space
s = Regex.Replace(s, #"(^[( )])", "", RegexOptions.Multiline | RegexOptions.IgnoreCase);
Example of what I am dealing with:
Tomas Adams
Solicitor
APLawyers
p:
1800 995 718
f:
07 3102 9135
a:
22 Fultam Street
PO Box 132, Booboobawah QLD 4113
which should look like:
Tomas Adams
Solicitor
APLawyers
p:1800 995 718
f:07 3102 9135
a:22 Fultam Street
PO Box 132, Booboobawah QLD 4313
as my attempt to clean the string, but the result is far from perfect ... Can someone assist me to correct the error and achive my goal ...
[EDIT]
the offending characters
f:\r\n07 3102 9135\r\na:\r\n22
the combination of :\r\n should be replaced by a single colon.
MTIA
Darrin
You may use
var result = Regex.Replace(s, #"(?m)^\s+|(?<=:)(?:\r?\n)+|(\r?\n){2,}", "$1")
See the .NET regex demo.
Details
(?m) - equal to RegexOptions.Multiline - makes ^ match the start of any line here
^ - start of a line
\s+ - 1+ whitespaces
| - or
(?<=:)(?:\r?\n)+ - a position that is immediately preceded with : (matched with (?<=:) positive lookbehind) followed with 1+ occurrences of an optional CR and LF (those are removed)
| - or
(\r?\n){2,} - two or more consecutive occurrences of an optional CR followed with an LF symbol. Only the last occurrence is saved in Group 1 memory buffer, thus the $1 replacement pattern inserts that last, single, occurrence.
A basic solution without Regex:
var lines = input.Split(new []{"\n"}, StringSplitOptions.RemoveEmptyEntries);
var output = new StringBuilder();
for (var i = 0; i < lines.Length; i++)
{
if (lines[i].EndsWith(":")) // feel free to also check for the size
{
lines[i + 1] = lines[i] + lines[i + 1];
continue;
}
output.AppendLine(lines[i].Trim()); // remove space before or after a line
}
Try it Online!
I tried to use your regular expression.I was able to replace "\n" and ":" with the following regular expression.This is removing ":" and "\n" at the end of the line.
#"([:\r\n])"
A Linq solution without Regex:
var tmp = string.Empty;
var output = input.Split(new []{"\n"}, StringSplitOptions.RemoveEmptyEntries).Aggregate(new StringBuilder(), (a,b) => {
if (b.EndsWith(":")) { // feel free to also check for the size
tmp = b;
}
else {
a.AppendLine((tmp + b).Trim()); // remove space before or after a line
tmp = string.Empty;
}
return a;
});
Try it Online!
I tried to make a regular expression using online tool but not succeeded. Here is the string i need to match:-
27R4FF^27R4FF Text until end
always starts with alphanumeric (case-insensitive)
then always caret sign ^ (no space before & after)
then alphanumeric string
then always one white space
then string until end.
Here is the regular expression that is not working for me:-
((?:[a-z][a-z]*[0-9]+[a-z0-9]*))(\^)((?:[a-z][a-z]*[0-9]+[a-z0-9]*)).*?((?:[a-z][a-z]+))
c# code:-
string txt = "784SFS^784SFS Value is here";
var regs = #"((?:[a-z][a-z]*[0-9]+[a-z0-9]*))(\^)((?:[a-z][a-z]*[0-9]+[a-z0-9]*)).*?((?:[a-z][a-z]+))";
Regex r = new Regex(regs, RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match m = r.Match(txt);
Console.Write(m.Success ? "matched" : "didn't match");
Console.ReadLine();
Help appreciated. Thanks
Verbatim ^[^\W_]+\^[^\W_]+[ ].*$
^ # BOS
[^\W_]+ # Alphanum
\^ # Caret
[^\W_]+ # Alphanum
[ ] # Space
.* # Anything
$ # EOS
Output
** Grp 0 - ( pos 0 , len 28 )
27R4FF^27R4FF Text until end
I didn't get if string 'until the end' should be matched.
This works for
27R4FF^27R4FF Text
^\w+\^\w+\s\w+$
if you have some spaces at the end, try with
^\w+\^\w+\s[\w\s]+$
Try this: https://regex101.com/r/hD0hV0/2
^[\da-z]+\^[\da-z]+\s.*$
...or commented (assumes RegexOptions.IgnorePatternWhitespace if you're using the format in code):
^ # always starts...
[\da-z]+ # ...with alphanumeric (case-insensitive)
\^ # then always caret sign ^ (no space before & after)
[\da-z]+ # then alphanumeric string
\s # then always one white space
.* # then string...
$ # ...until end.
The other answers don't actually match what you describe (at the time of this writing) because \w matches underscore and you didn't mention any limitations on "the string at the end".
The service received the string from Uri.EscapeUriString and Microsoft.JScript.GlobalObject.escape are difference, then I use Microsoft.JScript.GlobalObject.escape to handle url is ok.
What's different between Microsoft.JScript.GlobalObject.escape and Uri.EscapeUriString in c#?
Although Uri.EscapeUriString is available to use in C# out of the box, it can not convert all the characters exactly the same way as JavaScript escape function does.
For example let's say the original string is: "Some String's /Hello".
Uri.EscapeUriString("Some String's /Hello")
output:
"Some%20String's%20/Hello"
Microsoft.JScript.GlobalObject.escape("Some String's /Hello")
output:
"Some%20String%27s%20/Hello"
Note how the Uri.EscapeUriString did not escape the '.
That being said, lets look at a more extreme example. Suppose we have this string "& / \ # , + ( ) $ ~ % .. ' " : * ? < > { }". Lets see what escaping this with both methods give us.
Microsoft.JScript.GlobalObject.escape("& / \\ # , + ( ) $ ~ % .. ' \" : * ? < > { }")
output: "%26%20/%20%5C%20%23%20%2C%20+%20%28%20%29%20%24%20%7E%20%25%20..%20%27%20%22%20%3A%20*%20%3F%20%3C%20%3E%20%7B%20%7D"
Uri.EscapeUriString("& / \\ # , + ( ) $ ~ % .. ' \" : * ? < > { }")
output: "&%20/%20%5C%20#%20,%20+%20(%20)%20$%20~%20%25%20..%20'%20%22%20:%20*%20?%20%3C%20%3E%20%7B%20%7D"
Notice that Microsoft.JScript.GlobalObject.escape escaped all characters except +, /, * and ., even those that are valid in a uri. For example the ? and & where escaped even though they are valid in a query string.
So it all depends on where and when you wish to escape your URI and what type of URI you are creating/escaping.
I have following kind of string-sets in a text file:
<< /ImageType 1
/Width 986 /Height 1
/BitsPerComponent 8
/Decode [0 1 0 1 0 1]
/ImageMatrix [986 0 0 -1 0 1]
/DataSource <
803fe0503824160d0784426150b864361d0f8844625138a4562d178c466351b8e4763d1f904864523924964d27944a6552b964b65d2f984c665339a4d66d379c4e6753b9e4f67d3fa05068543a25168d47a4526954ba648202
> /LZWDecode filter >> image } def
There are 100s of Images defined like above.
I need to find all such images defined in the document.
Here is my code -
string txtFile = #"text file path";
string fileContents = File.ReadAllText(txtFile);
string pattern = #"<< /ImageType 1.*(\n|\r|\r\n)*image } def"; //match any number of characters between `<< /ImageType 1` and `image } def`
MatchCollection matchCollection = Regex.Matches(fileContents, pattern, RegexOptions.Singleline);
int count = matchCollection.Count; // returns 1
However, I am getting just one match - whereas there are around 600 images defined.
But it seems they all are matched in one because of 'newline' character used in pattern.
Can anyone please guide what do I need to modify the correct result of regex match as 600.
The reason is that regular expressions are usually greedy, i.e. the matches are always as long as possible. Thus, the image } def is contained in the .*. I think the best approach here would be to perform two separate regex queries, one for << /ImageType 1 and one for image } def. Every match of the first pattern would correspond to exactly one match of the second one and as these matches carry their indices in the original string, you can reconstruct the image by accessing the appropriate substring.
Instead of .* you should use the non-greedy quantifier .*?:
string pattern = #"<< /ImageType 1.*?image } def";
Here is a site that can help you out with REGEX that I use. http://webcheatsheet.com/php/regular_expressions.php.
if(preg_match('/^/[a-z]/i', $string, $matches)){
echo "Match was found <br />";
echo $matches[0];
}