Regular expression help - ignoring parenthesis, ands, ors and whitespace again

Regular expression help - ignoring parenthesis, ands, ors and whitespace again - c#

Consider the following english phrase
FRIEND AND COLLEAGUE AND (FRIEND OR COLLEAGUE AND (COLLEAGUE AND FRIEND AND FRIEND))
I want to be able to programmatically change arbitrary phrases, such as above, to something like:
SELECT * FROM RelationTable R1 JOIN RelationTable R2 ON R2.RelationName etc etc WHERE
R2.RelationName = FRIEND AND R2.RelationName = Colleague AND (R3.RelationName = FRIENd,
etc. etc.
My question is. How do I take the initial string, strip it of the following words and symbols : AND, OR, (, ),
Then change each word, and create a new string.
I can do most of it, but my main problem is that if I do a string.split and only get the words I care for, I can't really replace them in the original string because I lack their original index. Let me explain in a smaller example:
string input = "A AND (B AND C)"
Split the string for space, parenthesies, etc, gives: A,B,C
input.Replace("A", "MyRandomPhrase")
But there is an A in AND.
So I moved into trying to create a regular expression that matches exact words, post split, and replaces. It started to look like this:
"(\(|\s|\))*" + itemOfInterest + "(\(|\s|\))+"
Am I on the right track or am I overcomplicating things..Thanks !

You can try using Regex.Replace, with \b word boundary regex
string input = "A AND B AND (A OR B AND (B AND A AND A))";
string pattern = "\\bA\\b";
string replacement = "MyRandomPhrase";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);

class Program
{
static void Main(string[] args)
{
string text = "A AND (B AND C)";
List<object> result = ParseBlock(text);
Console.ReadLine();
}
private static List<object> ParseBlock(string text)
{
List<object> result = new List<object>();
int bracketsCount = 0;
int lastIndex = 0;
for (int i = 0; i < text.Length; i++)
{
char c = text[i];
if (c == '(')
bracketsCount++;
else if (c == ')')
bracketsCount--;
if (bracketsCount == 0)
if (c == ' ' || i == text.Length - 1)
{
string substring = text.Substring(lastIndex, i + 1 - lastIndex).Trim();
object itm = substring;
if (substring[0] == '(')
itm = ParseBlock(substring.Substring(1, substring.Length - 2));
result.Add(itm);
lastIndex = i;
}
}
return result;
}
}

Related

Editing string in C#

given a string with words separated by spaces how would you go about merging two words if one of them is made by one character only ? An example should clarify:
"a bcd tttt" => "abcd tttt"
"abc d hhhh" => "abcd hhhh"
I would like to merge the single characer word with the one on the left in all cases where it is not the first word in the string, in this case i would like to merge it with the one on the right.
I am trying to loop through the string and create some logic but it turned out to be more complex than i was expecting.

Try the below program's approach:
using System;
using System.Text;
public class Program
{
public static void Main()
{
var delimiter=new char[]{' '};
var stringToMerge="abc d hhhh";
var splitArray=stringToMerge.Split(delimiter);
var stringBuilder=new StringBuilder();
for(int wordIndex=0;wordIndex<splitArray.Length;wordIndex++)
{
var word=splitArray[wordIndex];
if(wordIndex!=0 && word.Length>1)
{
stringBuilder.Append(" ");
}
stringBuilder.Append(word);
}
Console.WriteLine(stringBuilder.ToString());
}
}
Basically, you split the string to words, then using StringBuilder, build a new string, inserting a space before a word only if the word is larger than one character.

One way to approach this is to first use string.Split(' ') to get an array of words, which is easier to deal with.
Then you can loop though the words, handling single character words by concatenating them with the previous word, with special handling for the first word.
One such approach:
public static void Main()
{
string data = "abcd hhhh";
var words = data.Split(' ');
var sb = new StringBuilder();
for (int i = 0; i < words.Length; ++i)
{
var word = words[i];
if (word.Length == 1)
{
sb.Append(word);
if (i == 0 && i < words.Length - 1) // Single character first word is special case: Merge with next word.
sb.Append(words[++i]); // Note the "++i" to increment the loop counter, skipping the next word.
}
else
{
sb.Append(' ' + word);
}
}
var result = sb.ToString();
Console.WriteLine(result);
}
Note that this will concatenate multiple instances of single-letter words, so that "a b c d e" will result in "abcde" and "ab c d e fg" will result in "abcde fg". You don't actually specify what should happen in this case.

if you want to do it with a plain for loop and string walking:
using System;
using System.Text;
public class Program
{
public static void Main()
{
Console.WriteLine(MergeOrphant("bcd a tttt") == "bcda tttt");
Console.WriteLine(MergeOrphant("bcd a tttt a") == "bcda tttta");
Console.WriteLine(MergeOrphant("a bcd tttt") == "abcd tttt");
Console.WriteLine(MergeOrphant("a b") == "ab");
}
private static string MergeOrphant(string source)
{
var stringBuilder = new StringBuilder();
for (var i = 0; i < source.Length; i++)
{
if (i == 1 && char.IsWhiteSpace(source[i]) && char.IsLetter(source[i - 1])) {
i++;
}
if (i > 0 && char.IsWhiteSpace(source[i]) && char.IsLetter(source[i - 1]) && char.IsLetter(source[i + 1]) && (i + 2 == source.Length || char.IsWhiteSpace(source[i + 2])) )
{
i++;
}
stringBuilder.Append(source[i]);
}
return stringBuilder.ToString();
}
}

Quite short with Regex.
string foo = "a bcd b tttt";
foo = Regex.Replace(foo, #"^(\w) (\w{2,})", "$1$2");
foo = Regex.Replace(foo, #"(\w{2,}) (\w)\b", "$1$2");
Be aware \w is [a-zA-Z0-9_] if you need an other definition you have to define you own character class.

My answer would not be the best practice but it works for your second case, but still you should be clear about the letter merging rules.
public static void Main()
{
Console.WriteLine(Edit("abc d hhhh") == "abcd hhhh");
Console.WriteLine(Edit("abc d hhhh a") == "abcd hhhha");
Console.WriteLine(Edit("abc d hhhh a b") == "abcd hhhhab");
Console.WriteLine(Edit("abc d hhhh a def g") == "abcd hhhha defg");
}
public static string Edit(string str)
{
var result = string.Empty;
var split = str.Split(' ', StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < split.Length; i++)
{
if(i == 0)
result += split[i];
else
{
if (i > 0 && split[i].Length == 1)
{
result += split[i];
}
else
{
result += $" {split[i]}";
}
}
}
return result;
}
As I have mentioned above, this does not work for your 1st case which is : Edit("a bcd") would not generate "abcd".

Expanding on Matthew's answer,
If you don't want the extra space in the output you can change the last line to;
Console.WriteLine(result.TrimStart(' '));

Clean string to have only numbers c#

I want to do have only the numbers from a string. I have tried this:
string phoneNumber = txtPhoneNumber.Text;
string cleanPhoneNumber = string.Empty;
foreach (char c in phoneNumber)
{
if (c.Equals('0') || c.Equals('1') || c.Equals('2') ||
c.Equals('3') || c.Equals('4') || c.Equals('5') ||
c.Equals('6') || c.Equals('7') || c.Equals('8') ||
c.Equals('9'))
cleanPhoneNumber += Convert.ToString(c);
}
The solution above worked, but i want to know if there is a more efficient way.

string b = string.Empty;
for (int i=0; i< a.Length; i++)
{
if (Char.IsDigit(a[i]))
b += a[i];
}
Or use Regex
resultString = Regex.Match(subjectString, #"\d+").Value;

Since you, probable, want digits in 0..9 range only, not all unicode ones (which include Persian, Indian digits etc.), char.IsDigit and \d regular expression are not exact solutions.
Linq:
string cleanPhoneNumber = string.Concat(phoneNumber.Where(c => c >= '0' && c <= '9'));
Regex:
either Sami's, integer's codes or
resultString = Regex.Match(subjectString, #"\d+", RegexOptions.ECMAScript ).Value;
which is Krystian Borysewicz's solution with ECMAScript option to be on the safe side.

string phoneNumber = txtPhoneNumber.Text;
// Get numbers only
Regex numbersRegex = new Regex("[^0-9]");
var cleanPhoneNumber = numbersRegex.Replace(phoneNumber, ""));

If you're looking to be efficient in terms on time then you should avoid using regex as the Regex class will need to parse your expression before it applies it to the phone number.
The code below avoid regex and keeps memory allocations to a minimum. It only allocates twice, once for a buffer to store the numbers and the once again at the end to create the string containing the valid numbers.
string Clean(string text)
{
var validCharacters = new char[text.Length];
var next = 0;
for(int i = 0; i < text.Length; i++)
{
char c = text[i];
if(char.IsDigit(c))
{
validCharacters[next++] = c;
}
}
return new string(validCharacters, 0, next);
}

using Linq:
string cleanPhoneNumber = new String(phoneNumber.Where(Char.IsDigit).ToArray());

c# get the first ';' after parentheses

i feel dumb for asking a most likely silly question.
I am helping someone getting the results he wishes for his custom compiler that reads all lines of an xml file in one string so it will look like below, and since he wants it to "Support" to call variables inside the array worst case scenario would look like below:
"Var1 = [5,4,3,2]; Var2 = [2,8,6,Var1;4];"
What i need is to find the first ";" after "[" and "]" and split it, so i stand with this:
"Var1 = [5,4,3,2];
It will also have to support multiple "[", "]" for example:
"Var2 = [5,Var1,[4],2];"
EDIT: There may also be Data in between the last "]" and ";"
For example:
"Var2 = [5,[4],2]Var1;
What can i do here? Im kind of stuck.

You can try regular expressions, e.g.
string source = "Var1 = [5,4,3,2]; Var2 = [2,8,6,Var1;4];";
// 1. final (or the only) chunk doesn't necessary contain '];':
// "abc" -> "abc"
// 2. chunk has at least one symbol except '];'
string pattern = ".+?(][a-zA-Z0-9]*;|$)";
var items = Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value)
.ToArray();
Console.Write(string.Join(Environment.NewLine, items));
Outcome:
Var1 = [5,4,3,2]abc123;
Var2 = [2,8,6,Var1;4];

^([^;]+);
This regex should work for all.
You can use it like here:
string[] lines =
{
"Var1 = [5,4,3,2]; Var2 = [2,8,6,Var1;4];",
"Var2 = [5,[4],2]Var1; Var2 = [2,8,6,Var1;4];"
};
Regex pattern = new Regex(#"^([^;]+);");
foreach (string s in lines){
Match match = pattern.Match(s);
if (match.Success)
{
Console.WriteLine(match.Value);
}
}
The explanation is:
^ means starts with and is [^;] anything but a semicolon
+ means repeated one or more times and is ; followed by a semicolon
This will find Var1 = [5,4,3,2]; as well as Var1 = [5,4,3,2];
You can see the output HERE

public static string Extract(string str, char splitOn)
{
var split = false;
var count = 0;
var bracketCount = 0;
foreach (char c in str)
{
count++;
if (split && c == splitOn)
return str.SubString(0, count);
if (c == '[')
{
bracketCount++;
split = false;
}
else if (c == ']')
{
bracketCount--;
if (bracketCount == 0)
{
split = true;
}
else if (bracketCount < 0)
throw new FormatException(); //?
}
}
return str;
}

How to insert spaces between the characters of a string

Is there an easy method to insert spaces between the characters of a string? I'm using the below code which takes a string (for example ( UI$.EmployeeHours * UI.DailySalary ) / ( Month ) ) . As this information is getting from an excel sheet, i need to insert [] for each columnname. The issue occurs if user avoids giving spaces after each paranthesis as well as an operator. AnyOne to help?
text = e.Expression.Split(Splitter);
string expressionString = null;
for (int temp = 0; temp < text.Length; temp++)
{
string str = null;
str = text[temp];
if (str.Length != 1 && str != "")
{
expressionString = expressionString + "[" + text[temp].TrimEnd() + "]";
}
else
expressionString = expressionString + str;
}
User might be inputing something like (UI$.SlNo-UI+UI$.Task)-(UI$.Responsible_Person*UI$.StartDate) while my desired output is ( [UI$.SlNo-UI] + [UI$.Task] ) - ([UI$.Responsible_Person] * [UI$.StartDate] )

Here is a short way to insert spaces after every single character in a string (which I know isn't exactly what you were asking for):
var withSpaces = withoutSpaces.Aggregate(string.Empty, (c, i) => c + i + ' ');
This generates a string the same as the first, except with a space after each character (including the last character).

You can do that with regular expressions:
using System.Text.RegularExpressions;
class Program {
static void Main() {
string expression = "(UI$.SlNo-UI+UI$.Task)-(UI$.Responsible_Person*UI$.StartDate) ";
string replaced = Regex.Replace(expression, #"([\w\$\.]+)", " [ $1 ] ");
}
}
If you are not familiar with regular expressions this might look rather cryptic, but they are a powerful tool, and worth learning. In case, you may check how regular expressions work, and use a tool like Expresso to test your regular expressions.
Hope this helps...

Here is an algorithm that does not use regular expressions.
//Applies dobule spacing between characters
public static string DoubleSpace(string s)
{
if (string.IsNullOrEmpty(s))
{
return string.Empty;
}
char[] a = s.ToCharArray();
char[] b = new char[ (a.Length * 2) - 1];
int bIndex = 0;
for(int i = 0; i < a.Length; i++)
{
b[bIndex++] = a[i];
//Insert a white space after the char
if(i < (a.Length - 1))
{
b[bIndex++] = ' ';
}
}
return new string(b);
}

Well, you can do this by using Regular expressions, search for specific paterns and add brackets where needed. You could also simply Replace every Paranthesis with the same Paranthesis but with spaces on each end.
I would also advice you to use StringBuilder aswell instead of appending to an existing string (this creates a new string for each manipulation, StringBuilder has a smaller memory footprint when doing this kind of manipulation)

regex/linq to replace consecutive characters with count

I have the following method (written in C#/.NET). Input text consist only of letters (no digits). Returned value is another text in which groups of more than two consecutive characters are replaced with one the character preceded with a count of repetitions.
Ex.: aAAbbbcccc -> aAA3b4c
public static string Pack(string text)
{
if (string.IsNullOrEmpty(text)) return text;
StringBuilder sb = new StringBuilder(text.Length);
char prevChar = text[0];
int prevCharCount = 1;
for (int i = 1; i < text.Length; i++)
{
char c = text[i];
if (c == prevChar) prevCharCount++;
else
{
if (prevCharCount > 2) sb.Append(prevCharCount);
else if (prevCharCount == 2) sb.Append(prevChar);
sb.Append(prevChar);
prevChar = c;
prevCharCount = 1;
}
}
if (prevCharCount > 2) sb.Append(prevCharCount);
else if (prevCharCount == 2) sb.Append(prevChar);
sb.Append(prevChar);
return sb.ToString();
}
The method is not too long. But does any one has an idea how to do that in a more concise way using regex? Or LINQ?

How about:
static readonly Regex re = new Regex(#"(\w)(\1){2,}", RegexOptions.Compiled);
static void Main() {
string result = re.Replace("aAAbbbcccc",
match => match.Length.ToString() + match.Value[0]);
}
The regex is a word char, followed by the same (back-ref) at least twice; the lamba takes the length of the match (match.Length) and appends the first character (match.Value[0])

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular expression help - ignoring parenthesis, ands, ors and whitespace again - c#

You can try using Regex.Replace, with \b word boundary regex string input = "A AND B AND (A OR B AND (B AND A AND A))"; string pattern = "\\bA\\b"; string replacement = "MyRandomPhrase"; Regex rgx = new Regex(pattern); string result = rgx.Replace(input, replacement);

Related

Editing string in C#

Clean string to have only numbers c#

c# get the first ';' after parentheses

How to insert spaces between the characters of a string

regex/linq to replace consecutive characters with count

Categories

Resources