Regex to remove text between two chars in c#

Regex to remove text between two chars in c# - c#

I have the following string that I will need to remove everything between =select and the following } char
ex.
Enter Type:=select top 10 type from cable}
The end result is the string variable to just show Enter Type:
I was looking for a way to do this with Regex, but I'm open to other methods as well. Thanks in advance for the help.

string input = "Enter Type:=select top 10 type from cable}";
System.Text.RegularExpressions.Regex regExPattern = new System.Text.RegularExpressions.Regex("(.*):=select.*}");
System.Text.RegularExpressions.Match match = regExPattern.Match(input);
string output = String.Empty;
if( match.Success)
{
output = match.Groups[1].Value;
}
Console.WriteLine("Output = " + output);
The value of the 'output' variable will be the value found before the ":=select" segment of the input string. If you need to pull out additional information from the input string, surround it will parenthesis and matches found will be added to the match.Groups array. By the way, the value of match.Groups[0].Value is the original string.

var rx = new Regex("=select[^}]*}");;
Console.WriteLine(rx.Replace ("Enter Type:=select top 10 type from cable}", ""));
Regexp.Replace(string input,string output) function replaces all substrings that match given regexp with string "output". First line defines regexp that matches everything between =select and }

Related

Regular expression split string, extract string value before and numeric value between square brackets

I need to parse a string that looks like "Abc[123]". The numerical value between the brackets is needed, as well as the string value before the brackets.
The most examples that I tested work fine, but have problems to parse some special cases.
This code seems to work fine for "normal" cases, but has some problems handling "special" cases:
var pattern = #"\[(.*[0-9])\]";
var query = "Abc[123]";
var numVal = Regex.Matches(query, pattern).Cast<Match>().Select(m => m.Groups[1].Value).FirstOrDefault();
var stringVal = Regex.Split(query, pattern)
.Select(x => x.Trim())
.FirstOrDefault();
How should the code be adjusted to handle also some special cases?
For instance for the string "Abc[]" the parser should return correctly "Abc" as the string value and indicate an empty the numeric value (which could be eventually defaulted to 0).
For the string "Abc[xy33]" the parser should return "Abc" as the string value and indicate an invalid numeric value.
For the string "Abc" the parser should return "Abc" as the string value and indicate a missing numeric value. The blanks before/after or inside the brackets should be trimmed "Abc [ 123 ] ".

Try this pattern: ^([^\[]+)\[([^\]]*)\]
Explanation of a pattern:
^ - match beginning of a string
([^\[]+) - match one or more of any character ecept [ and store it insinde first capturing group
\[ - match [ literally
([^\]]*) - match zero or more of any character except ] and store inside second capturing group
\] - match ] literally
Here's tested code:
var pattern = #"^([^\[]+)\[([^\]]*)\]";
var queries = new string[]{ "Abc[123]", "Abc[xy33]", "Abc[]", "Abc[ 33 ]", "Abc" };
foreach (var query in queries)
{
string beforeBrackets;
string insideBrackets;
var match = Regex.Match(query, pattern);
if (match.Success)
{
beforeBrackets = match.Groups[1].Value;
insideBrackets = match.Groups[2].Value.Trim();
if (insideBrackets == "")
insideBrackets = "0";
else if (!int.TryParse(insideBrackets, out int i))
insideBrackets = "incorrect value!";
}
else
{
beforeBrackets = query;
insideBrackets = "no value";
}
Console.WriteLine($"Input string {query} : before brackets: {beforeBrackets}, inside brackets: {insideBrackets}");
}
Console.ReadKey();
Output:

We can try doing a regex replacement on the input, for a one-liner solution:
string input = "Abc[123]";
string letters = Regex.Replace(input, "\\[.*\\]", "");
string numbers = Regex.Replace("Abc[123]", ".*\\[(\\d+)\\]", "$1");
Console.WriteLine(letters);
Console.WriteLine(numbers);
This prints:
Abc
123

Pretty sure there'd be some language-based techniques for that, which I wouldn't know, yet with a regular expression, we'd capture everything using capturing groups and check for things one by one, maybe:
^([A-Za-z]+)\s*(\[?)\s*([A-Za-z]*)(\d*)\s*(\]?)\s*$
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.

You can achieve that easily without using regex
string temp = "Abc[123]";
string[] arr = temp.Split('[');
string name = arr[0];
string value = arr[1].ToString().TrimEnd(']');
output name = Abc, and value = 123

Why does this Regular Expression match nothing?

I want to replace all instances of all consecutive non-lowercase-alphabet-letters with a single space for each instance. This works, but why does it inject spaces in between the alphabet letters?
const string pattern = #"[^a-z]*";
const string replacement = #" ";
var reg = new Regex(pattern);
string a = "the --fat- cat";
string b = reg.Replace(a, replacement); // b = " t h e f a t c a t " should be "the fat cat"

Because of *(which repeats the previous token zero or more times). It must finds a match in all boundaries since an empty string exists in all those boundaries.
const string pattern = #"[^a-z]+";

You don't need regex if you simply want to remove non-lowercase letters:
string a = "the --fat- cat";
string res = String.Join("", a.Where(c => Char.IsLower(c) || Char.IsWhiteSpace(c)));
Console.WriteLine(res); // the fat cat

Just a follow up answer that might turn out useful: if you need to match any character but any Unicode lowercase letter, you may use
var res = Regex.Replace(str, #"\P{Ll}+", " ");
// "моя НЕ знает" > "моя знает"
The \P{Ll} construct will match all characters but lowercase letters from all Unicode table. The + quantifier will match one or more occurrences and will not cause the issue in OP.
And an illustration of the current problem caused by [^a-z]* (see the vertical pipes showing where the Regex.Replace found empty string matches):
A rule of thumb: avoid unanchored patterns that may match empty strings!

Omit unnecessary parts in string array

In C#, I have a string comes from a file in this format:
Type="Data"><Path.Style><Style
or maybe
Type="Program"><Rectangle.Style><Style
,etc. Now I want to only extract the Data or Program part of the Type element. For that, I used the following code:
string output;
var pair = inputKeyValue.Split('=');
if (pair[0] == "Type")
{
output = pair[1].Trim('"');
}
But it gives me this result:
output=Data><Path.Style><Style
What I want is:
output=Data
How to do that?

This code example takes an input string, splits by double quotes, and takes only the first 2 items, then joins them together to create your final string.
string input = "Type=\"Data\"><Path.Style><Style";
var parts = input
.Split('"')
.Take(2);
string output = string.Join("", parts); //note: .net 4 or higher
This will make output have the value:
Type=Data
If you only want output to be "Data", then do
var parts = input
.Split('"')
.Skip(1)
.Take(1);
or
var output = input
.Split('"')[1];

What you can do is use a very simple regular express to parse out the bits that you want, in your case you want something that looks like this and then grab the two groups that interest you:
(Type)="(\w+)"
Which would return in groups 1 and 2 the values Type and the non-space characters contained between the double-quotes.

Instead of doing many split, why don't you just use Regex :
output = Regex.Match(pair[1].Trim('"'), "\"(\w*)\"").Value;

Maybe I missed something, but what about this:
var str = "Type=\"Program\"><Rectangle.Style><Style";
var splitted = str.Split('"');
var type = splitted[1]; // IE Data or Progam
But you will need some error handling as well.

How about a regex?
var regex = new Regex("(?<=^Type=\").*?(?=\")");
var output = regex.Match(input).Value;
Explaination of regex
(?<=^Type=\") This a prefix match. Its not included in the result but will only match
if the string starts with Type="
.*? Non greedy match. Match as many characters as you can until
(?=\") This is a suffix match. It's not included in the result but will only match if the next character is "

Given your specified format:
Type="Program"><Rectangle.Style><Style
It seems logical to me to include the quote mark (") when splitting the strings... then you just have to detect the end quote mark and subtract the contents. You can use LinQ to do this:
string code = "Type=\"Program\"><Rectangle.Style><Style";
string[] parts = code.Split(new string[] { "=\"" }, StringSplitOptions.None);
string[] wantedParts = parts.Where(p => p.Contains("\"")).
Select(p => p.Substring(0, p.IndexOf("\""))).ToArray();

C# regular expression trouble

Problem!
I Have the following input (rules) from a flat file (talking about numeric input):
Input might be a natural number (below 1000): 1, 10, 100, 999, ...
Input might be a comma separated number surrounded by quotes (above 1000): "1,000", "2,000", "3,000", "10,000", ...
I Have the following regular expression to validate the input: (?:(\d+)|\x22([0-9]+(?:,[0-9]+)*)\x22), So for an input like 10 I'm expecting in the first matching group 10, which is exactly what I got. But when I got an input like "10,000" I'm expecting in the first matching group 10,000, but it is stored at the second matching group.
Example
string text1 = "\"" + "10,000" + "\"";
string text2 = "50";
string pattern = #"(\d+)|\x22([0-9]+(?:,[0-9]+){0,})\x22";
Match match1 = Regex.Match(text1, pattern);
Match match2 = Regex.Match(text2, pattern);
if (match1.Success)
{
Console.WriteLine("Match#1 Group#1: " + match1.Groups[1].Value);
Console.WriteLine("Match#1 Group#2: " + match1.Groups[2].Value);
# Outputs
# Match#1 Group#1:
# Match#1 Group#2: 10,000
}
if (match2.Success)
{
Console.WriteLine("Match#2 Group#1: " + match2.Groups[1].Value);
Console.WriteLine("Match#2 Group#2: " + match2.Groups[2].Value);
# Outputs
# Match#2 Group#1: 50
# Match#2 Group#2:
}
Expected Result
Both results on the same matching group, in this case 1
Questions?
What am I doing wrong? I'm just getting bad grouping from the regular expression matches.
Also, I'm using filehelpers .NET to parse the file, is there any other way to resolve this problem. Actualy I'm trying to implement a custom converter.
Object File
[FieldConverter(typeof(OOR_Quantity))]
public Int32 Quantity;
OOR_Quantity
internal class OOR_Quantity : ConverterBase
{
public override object StringToField(string from)
{
string pattern = #"(?:(\d+)|\x22([0-9]+(?:,[0-9]+)*)\x22)";
Regex regex = new Regex(pattern);
if (regex.IsMatch(from))
{
Match match = regex.Match(from);
return int.Parse(match.Groups[1].Value);
}
throw new ...
}
}

Group numbers are assigned purely on the basis of their positions in the regex--specifically, the relative position of the opening bracket, (. In your regex, (\d+) is the first group and ([0-9]+(?:,[0-9]+)*) is the second.
If you want to refer to them both with the same identifier, use named groups and give them both the same name:
#"(?:(?<NUMBER>\d+)|\x22(?<NUMBER>[0-9]+(?:,[0-9]+)*)\x22)"
Now you can retrieve the captured value as match.Groups["NUMBER"].Value.

I tested the regex below with Ruby:
text1 = "\"10,000\""
text2 = "50"
regex = /"?([0-9]+(?:,[0-9]+){0,})"?/
text1 =~ regex
puts "#$1"
text2 =~ regex
puts "#$1"
The result is:
10,000
50
I think you can rewrite in C#. Isn't it enough for you?

How to remove leading and trailing spaces from a string

I have the following input:
string txt = " i am a string "
I want to remove space from start of starting and end from a string.
The result should be: "i am a string"
How can I do this in c#?

String.Trim
Removes all leading and trailing white-space characters from the current String object.
Usage:
txt = txt.Trim();
If this isn't working then it highly likely that the "spaces" aren't spaces but some other non printing or white space character, possibly tabs. In this case you need to use the String.Trim method which takes an array of characters:
char[] charsToTrim = { ' ', '\t' };
string result = txt.Trim(charsToTrim);
Source
You can add to this list as and when you come across more space like characters that are in your input data. Storing this list of characters in your database or configuration file would also mean that you don't have to rebuild your application each time you come across a new character to check for.
NOTE
As of .NET 4 .Trim() removes any character that Char.IsWhiteSpace returns true for so it should work for most cases you come across. Given this, it's probably not a good idea to replace this call with the one that takes a list of characters you have to maintain.
It would be better to call the default .Trim() and then call the method with your list of characters.

You can use:
String.TrimStart - Removes all leading occurrences of a set of characters specified in an array from the current String object.
String.TrimEnd - Removes all trailing occurrences of a set of characters specified in an array from the current String object.
String.Trim - combination of the two functions above
Usage:
string txt = " i am a string ";
char[] charsToTrim = { ' ' };
txt = txt.Trim(charsToTrim)); // txt = "i am a string"
EDIT:
txt = txt.Replace(" ", ""); // txt = "iamastring"

I really don't understand some of the hoops the other answers are jumping through.
var myString = " this is my String ";
var newstring = myString.Trim(); // results in "this is my String"
var noSpaceString = myString.Replace(" ", ""); // results in "thisismyString";
It's not rocket science.

txt = txt.Trim();

Or you can split your string to string array, splitting by space and then add every item of string array to empty string.
May be this is not the best and fastest method, but you can try, if other answer aren't what you whant.

text.Trim() is to be used
string txt = " i am a string ";
txt = txt.Trim();

Use the Trim method.

static void Main()
{
// A.
// Example strings with multiple whitespaces.
string s1 = "He saw a cute\tdog.";
string s2 = "There\n\twas another sentence.";
// B.
// Create the Regex.
Regex r = new Regex(#"\s+");
// C.
// Strip multiple spaces.
string s3 = r.Replace(s1, #" ");
Console.WriteLine(s3);
// D.
// Strip multiple spaces.
string s4 = r.Replace(s2, #" ");
Console.WriteLine(s4);
Console.ReadLine();
}
OUTPUT:
He saw a cute dog.
There was another sentence.
He saw a cute dog.

You Can Use
string txt = " i am a string ";
txt = txt.TrimStart().TrimEnd();
Output is "i am a string"

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to remove text between two chars in c# - c#

Related

Regular expression split string, extract string value before and numeric value between square brackets

Why does this Regular Expression match nothing?

Omit unnecessary parts in string array

C# regular expression trouble

How to remove leading and trailing spaces from a string

Categories

Resources