Regular expression to extract characters in between other characters

Regular expression to extract characters in between other characters - c#

I have a string which is //{characters}\n.
And I need a regular expression to extract the character in between // and \n.

Regular expressions are nice and all, but why not use Substring?
string input = "//{characters}\n";
string result = input.Split('\n')[0].Substring(2);
or
string result = input.Substring(2, input.Length - 3);

Using RegEx:
Regex g;
Match m;
g = new Regex("//(.*)\n"); // if you have just alphabet characters replace .* with \w*
m = g.Match(input);
if (m.Success == true)
output = m.Groups[1].Value;

This should work:
string s1 = "//{characters}\n";
string final = (s1.Replace("//", "").Replace("\n", ""));

Related

Removing non-ASCII characters from string

I am trying to strip non-ASCII character from strings I am reading from a text file and can't get it to do so. I checked some of the suggestions from posts in SO and other sites, all to no avail.
This is what I have and what I have tried:
String in text file:
2021-03-26 10:00:16:648|2021-03-26 10:00:14:682|MPE->IDC|[10.20.30.40:41148]|203, ? ?'F?~?^?W?|?8wL?i??{?=kb ? Y R?
String read from the file:
"2021-03-26 10:00:16:648|2021-03-26 10:00:14:682|[10.20.30.40:41148]|203,\u0016\u0003\u0001\0?\u0001\0\0?\u0003\u0001'F?\u001e~\u0018?^?W\u0013?|?8wL\v?i??{?=kb\t?\tY\u0005\0\0R?"
Methods to get rid of non-ASCII characters:
Regex reAsciiPattern = new Regex(#"[^\u0000-\u007F]+"); // Non-ASCII characters
sLine = reAsciiPattern.Replace(sLine, ""); // remove non-ASCII chars
Regex reAsciiPattern2 = new Regex(#"[^\x00-\x7F]+"); // Non-ASCII characters
sLine = reAsciiPattern2.Replace(sLine, ""); // remove non-ASCII chars
string asAscii = Encoding.ASCII.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.GetEncoding(
Encoding.ASCII.EncodingName,
new EncoderReplacementFallback(string.Empty),
new DecoderExceptionFallback()
),
Encoding.UTF8.GetBytes(sLine)
)
);
What am I missing?
Thanks.

This can be done without a Regex using a loop and a StringBuilder:
var sb = new StringBuilder();
foreach(var ch in line) {
//printable Ascii range
if (ch >= 32 && ch < 127) {
sb.Append(ch);
}
}
line = sb.ToString();
Or you can use some LINQ:
line = string.Concat(
line.Where(ch => ch >= 32 && ch < 127)
);
If you must do this with Regex then the following should suffice (again this keeps printable ASCII only)
line = Regex.Replace(line, #"[^\u0020-\u007e]", "");
Try It Online
If you want all ASCII (including non-printable) characters, then modify the tests to
ch <= 127 // for the loops
#"[^\u0000-\u007f]" // for the regex

You can use the following regular expression to get rid of all non-printable characters.
Regex.Replace(sLine, #"[^\u0020-\u007E]+", string.Empty);

This is what worked for me based on a post here
using System.Text.RegularExpressions;
...
Regex reAsciiNonPrintable = new Regex(#"\p{C}+"); // Non-printable characters
string sLine;
using (StreamReader sr = File.OpenText(Path.Combine(Folder, FileName)))
{
while (!sr.EndOfStream)
{
sLine = sr.ReadLine().Trim();
if (!string.IsNullOrEmpty(sLine))
{
Match match = reAsciiNonPrintable.Match(sLine);
if (match.Success)
continue; // skip the line
...
}
...
}
....
}

Since a string is an IEnumerable<char> where each char represents one UTF-16 code unit (possibly a surrogate), you can also do:
var ascii = new string(sLine.Where(x => x <= sbyte.MaxValue).ToArray());
Or if you want only printable ASCII:
var asciiPrintable = new string(sLine.Where(x => ' ' <= x && x <= '~').ToArray());
I realize now that this is mostly a duplicate of pinkfloydx33's answer, so go and upvote that.
If the string contains accented letters, the result can depend on the normalization, so compare:
var sLine1 = "olé";
var sLine2 = sLine1.Normalize(NormalizationForm.FormD);

How to remove a portion of string

I want to remove word Test and Leaf from the specified string beginning only,not from the other side,so string Test_AA_234_6874_Test should be AA_234_6874_Test,But when i use .Replace it will replace word Test from everywhere which i don't want.How to do it
This is the code what i have done it
string st = "Test_AA_234_6874_Test";
st = st.Replace("Test_","");

You could use a regex to do this. The third argument of the regex replace method specifics how many times you want to replace.
string st = "Test_AA_234_6874_Test";
var regex = new Regex("(Test|Leaf)_");
var value = regex.Replace(st, "", 1);
Or if the string to replace only occurs on the start just use ^ which asserts the position at start of the string.
string st = "Test_AA_234_6874_Test";
var regex = new Regex("^(Test|Leaf)_");
var value = regex.Replace(st, "");
If you know that you allways have to remove the first 5 letters you can also use Substring which is more performant.
string st = "Test_AA_234_6874_Test";
var value = st.Substring(5, st.Length - 5);

The simplest way to do this is by using a Regular Expression like so.
using System;
using System.Text.RegularExpressions;
using System.Text;
namespace RegExTest
{
class Program
{
static void Main(string[] args)
{
var input = "Test_AA_234_6874_Test";
var matchText = "Test";
var replacement = String.Empty;
var regex = new Regex("^" + matchText);
var output = regex.Replace(input, replacement);
Console.WriteLine("Converted String: {0}", output);
Console.ReadKey();
}
}
}
The ^ will match text at the beginning of the string.

Consider checking whether the string starts with "Start" and/or ends with "Trim" and decide the end and start positions you'd like to maintain. Then use Substring method to get only the portion you need.
public string Normalize(string input, string prefix, string suffix)
{
// Validation
int length = input.Length;
int startIndex = 0;
if(input.StartsWith(prefix))
{
startIndex = prefix.Length;
length -= prefix.Length;
}
if (input.EndsWith (suffix))
{
length -= suffix.Length;
}
return input.Substring(startIndex, length);
}
Hope this helps.

string wordToRemoveFromBeginning = "Test_";
int index = st.IndexOf(wordToRemoveFromBeginning);
string cleanPath = (index < 0) ? st : st.Remove(index,
wordToRemoveFromBeginning.Length);

Use a regular expression.
var str1 = "Test_AA_234_6874_Test";
var str2 = "Leaf_AA_234_6874_Test";
str1 = Regex.Replace(str1, "^Test", "");
str2 = Regex.Replace(str2, "^Leaf", "");
Regex.Replace parameters are your input string (str1), the pattern you want to match, and what to replace it with, in this case a blank space. The ^ character means look at the start of the string, so something like "MyTest_AAAA_234_6874_Test" would stil return "MyTest_AA_234_6874_Test".

I am gonna use some very simple code here
string str = "Test_AA_234_6874_Test";
string substring = str.Substring(0, 4);
if (substring == "Test" || substring == "Leaf")
{
str= str.Remove(0, 5);
}

Change in string some part, but without one part - where are numbers

For example I have such string:
ex250-r-ninja-08-10r_
how could I change it to such string?
ex250 r ninja 08-10r_
as you can see I change all - to space, but didn't change it where I have XX-XX part... how could I do such string replacement in c# ? (also string could be different length)
I do so for -
string correctString = errString.Replace("-", " ");
but how to left - where number pattern XX-XX ?

You can use regular expressions to only perform substitutions in certain cases. In this case, you want to perform a substitution if either side of the dash is a non-digit. That's not quite as simple as it might be, but you can use:
string ReplaceSomeHyphens(string input)
{
string result = Regex.Replace(input, #"(\D)-", "${1} ");
result = Regex.Replace(result, #"-(\D)", " ${1}");
return result;
}
It's possible that there's a more cunning way to do this in a single regular expression, but I suspect that it would be more complicated too :)

A very uncool approach using a StringBuilder. It'll replace all - with space if the two characters before and the two characters behind are not digits.
StringBuilder sb = new StringBuilder();
for (int i = 0; i < text.Length; i++)
{
bool replace = false;
char c = text[i];
if (c == '-')
{
if (i < 2 || i >= text.Length - 2) replace = true;
else
{
bool leftDigit = text.Substring(i - 2, 2).All(Char.IsDigit);
bool rightDigit = text.Substring(i + 1, 2).All(Char.IsDigit);
replace = !leftDigit || !rightDigit;
}
}
if (replace)
sb.Append(' ');
else
sb.Append(c);
}

Since you say you won't have hyphens at the start of your string then you need to capture every occurrence of - that is preceded by a group of characters which contains at least one letter and zero or many numbers. To achieve this, use positive lookbehind in your regex.
string strRegex = #"(?<=[a-z]+[0-9]*)-";
Regex myRegex = new Regex(strRegex, RegexOptions.IgnoreCase | RegexOptions.Multiline);
string strTargetString = #"ex250-r-ninja-08-10r_";
string strReplace = #" ";
return myRegex.Replace(strTargetString, strReplace);
Here are the results:

Taking a piece of a REGEX and setting it to a String

Is there any way to take a part out of a regex? Let's say I have a match for this
\s*(string)\s*(.*\()\s*(\d*)\)\s*;?(.*)
and I want to change it like this
Regex.Replace(line, #"\s*(string)\s*(.*\()\s*(\d*)\)\s*;?(.*)", "$1 $2($3) // $4", RegexOptions.IgnoreCase);
Is there any way I can grab the $4 by itself and set it equal to some string variable?
Let's say the regex match is: string (55) ;comment
In this case I'd like to get the word comment only and set it to a string without going through the String.Split function. Ultimately, though, I'd just like to get the digits between the parentheses.

There's an overload for the Replace method which takes a MatchEvaluator delegate:
string pattern = "...";
string result = Regex.Replace(line, pattern, m =>
{
int digits = 0;
string comment = m.Groups[4].Value; // $4
int.TryParse(m.Groups[3].Value, out digits); // $3
return string.Format("{0} {1}({2}) // {3}",
m.Groups[1].Value, m.Groups[2].Value, digits, comment);
}, RegexOptions.IgnoreCase);
Hope this helps.

Yes, if I understand the question correctly:
var re = new Regex(#"\s*(string)\s*(.*\()\s*(\d*)\)\s*;?(.*)");
var match = re.Match(input);
if (match.Success)
{
int i = match.Groups[4].Index;
int n = match.Groups[4].Length;
input = input.Substring(0, i) + replacementString + input.Substring(i + n);
}

C# String replacement using Regex

I currently have the code below, to replace a characters in a string but I now need to replace characters within the first X (in this case 3) characters and leave the rest of the string. In my example below I have 51115 but I need to replace any 5 within the first 3 characters and I should end up with 61115.
My current code:
value = 51115;
oldString = 5;
newString = 6;
result = Regex.Replace(value.ToString(), oldString, newString, RegexOptions.IgnoreCase);
result is now 61116. What would you suggest I do to query just the first x characters?
Thanks

Not particularly fancy, but only give regex the data it should be replacing; only send in the range of characters that should potentially be replaced.
result = Regex.Replace(value.ToString().Substring(0, x), oldString, newString, RegexOptions.IgnoreCase);

If you're just replacing a single character only, you could just write the code to do the replacement yourself. It'd be faster than messing with a substring and then a RegEx replace (which is a waste anyway if you're doing a single-char replacement).
StringBuilder sb = new StringBuilder(oldString.Length);
foreach(char c in oldString) {
if(c == replaceFrom) { c = replaceTo; }
sb.Append(c);
}
return sb.ToString();

I think the character-by-character option mentioned here is probably clearer, but if you really want a regex:
string result = "";
int value = 55555;
string oldString = "5";
string newString = "6";
var match = new Regex(#"(\d{1,3})(\d+)?").Match(value.ToString());
if (match.Groups.Count > 1)
result = match.Groups[1].Value.Replace(oldString, newString) + (match.Groups.Count > 2 ? match.Groups[2].Value : "");

I love RegEx, but in this case I would just do a .Replace
string value;
string oldString;
string newString;
value = "51115";
int iLenToLook;
iLenToLook = 3;
oldString = "5";
newString = "6";
string result;
result = value.Length > iLenToLook ? value.Substring(iLenToLook, value.Length - iLenToLook) :"";
result = value.Substring(0, value.Length >= iLenToLook ? iLenToLook : value.Length).Replace(oldString, newString) + result;
EDIT I changed it to get the non-replaced portion first, in case there were replacement strings of differing lengths than the original.

Every time someone in the .NET world has a question about regex, I recommend Expresso (link). It's a great tool for working in the confusing and thorny world of regular expressions.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular expression to extract characters in between other characters - c#

I have a string which is //{characters}\n. And I need a regular expression to extract the character in between // and \n.

Regular expressions are nice and all, but why not use Substring? string input = "//{characters}\n"; string result = input.Split('\n')[0].Substring(2); or string result = input.Substring(2, input.Length - 3);

Using RegEx: Regex g; Match m; g = new Regex("//(.)\n"); // if you have just alphabet characters replace . with \w* m = g.Match(input); if (m.Success == true) output = m.Groups[1].Value;

This should work: string s1 = "//{characters}\n"; string final = (s1.Replace("//", "").Replace("\n", ""));

Related

Removing non-ASCII characters from string

How to remove a portion of string

Change in string some part, but without one part - where are numbers

Taking a piece of a REGEX and setting it to a String

C# String replacement using Regex

Categories

Resources

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular expression to extract characters in between other characters - c#

I have a string which is //{characters}\n. And I need a regular expression to extract the character in between // and \n.

Regular expressions are nice and all, but why not use Substring? string input = "//{characters}\n"; string result = input.Split('\n')[0].Substring(2); or string result = input.Substring(2, input.Length - 3);

Using RegEx: Regex g; Match m; g = new Regex("//(.*)\n"); // if you have just alphabet characters replace .* with \w* m = g.Match(input); if (m.Success == true) output = m.Groups[1].Value;

This should work: string s1 = "//{characters}\n"; string final = (s1.Replace("//", "").Replace("\n", ""));

Related

Removing non-ASCII characters from string

How to remove a portion of string

Change in string some part, but without one part - where are numbers

Taking a piece of a REGEX and setting it to a String

C# String replacement using Regex

Categories

Resources

Using RegEx: Regex g; Match m; g = new Regex("//(.)\n"); // if you have just alphabet characters replace . with \w* m = g.Match(input); if (m.Success == true) output = m.Groups[1].Value;