Special characters Regex

Special characters Regex - c#

Hello I'm try to remove special characters from user inputs.
public void fd()
{
string output = "";
string input = Console.ReadLine();
char[] charArray = input.ToCharArray();
foreach (var item in charArray)
{
if (!Char.IsLetterOrDigit(item))
{
\\\CODE HERE }
}
output = new string(trimmedChars);
Console.WriteLine(output);
}
At the end I'm turning it back to a string. My code only removes one special character in the string. Does anyone have any suggestions on a easier way instead

You have a nice implementation, just consider using next code, which is only a bit shorter, but has a little bit higher abstractions
var input = " th#ere's! ";
Func<char, bool> isSpecialChar = ch => !char.IsLetter(ch) && !char.IsDigit(ch);
for (int i = 1; i < input.Length - 1; i++)
{
//if current character is a special symbol
if(isSpecialChar(input[i]))
{
//if previous or next character are special symbols
if(isSpecialChar(input[i-1]) || isSpecialChar(input[i+1]))
{
//remove that character
input = input.Remove(i, 1);
//decrease counter, since we removed one char
i--;
}
}
}
Console.WriteLine(input); //prints " th#ere's "
A new string would be created each time you would call Remove. Use a StringBuilder for a more memory-performant solution.

The problem with your code is that you are taking the data from charArray and putting the result in trimmedChars for each change that you make, so each change will ignore all previous changes and work with the original. At the end you only have the last change.
Another problem with the code is that you are using IndexOf to get the index of a character, but that will get the index of the first occurance of that character, not the index where you got that character. For example when you are at the second ! in the string "foo!bar!" you will get the index of the first one.
You don't need to turn the string into an array to work with the characters in the string. You can just loop through the index of the characters in the string.
Note that you should also check the value of the index when you are looking at the characters before and after, so that you don't try to look at characters that are outside the string.
public void fd() {
string input = Console.ReadLine();
int index = 0;
while (index < input.Length) {
if (!Char.IsLetterOrDigit(input, index) && ((index == 0 || !Char.IsLetterOrDigit(input, index - 1)) || (index == input.Length - 1 || !Char.IsLetterOrDigit(input, index + 1)))) {
input = input.Remove(index, 1);
} else {
index++;
}
}
Console.WriteLine(input);
}

Been awhile since I've hit on C#, but a reg ex might be helpful
string input = string.Format("{0}! ", Console.ReadLine());
Regex rgx = new Regex("(?i:[^a-z]?)[.](?i:[^a-z]?)");
string output = rgx.Replace(input, "$1$2");
The regex looks for a character with a non-alpha character on left or right and replaces it with nothing.

Related

Check for specific character between two characters

To illustrate:
Take the input string "Yesterday I ate two bu rgers" (space intentional)
I want to check the input string to see if a space (or any other pre-defined character) " " exists between the two characters (in this case) "u" and "r". And if it exists delete this character.
First I came up with this:
string someString = "Yesterday I ate two bu rgers";
string charA = 'u', charB = 'r';
if (someString.Contains(charA) &&
someString.Substring(someString.IndexOf(charA) + 1).Equals(" ") &&
someString.Substring(someString.IndexOf(charA) + 2).Equals(charB))
//delete the space
However not only does this feel (and look) inefficient as heck, It also fails if the sentence would be "Yesterday you ate two bu rgers" since it will take the index of the first "u". So I would have to do an additional check for multiple instances of charA
Another solution I thought of is to split the sentence on every space, and see if the last character of the split matches charA and the first character of the next split matches charB. And if it does join the two together.
string[] splitString = someString.Split(null);
for (int i = 0; i < splitString.Length -1; i++)
{
string lastChar = splitString[i].Substring(splitString[i].Length - 1);
string firstChar = splitString[i + 1].Substring(0, 1);
if(lastChar.Equals(charA) && firstChar.Equals(charB))
{
string joined = splitString[i] + splitString[i + 1];
}
}
However this method is also flawed as it breaks when i.e two spaces are present in the input.
Is there a way to do this without needing a bunch of if statements or loops? (unless there really is no other way I would really like to not use regex)

A string is an array of characters. Loop through it and inspect the characters.
for (int i = 2; i < someString.Length; i++) {
if (someString[i] == charB && someString[i - 2] == charA) {
//TODO: delete the char in between.
break;
}
}
If you start at index = 2 and test for the second character, you can simply go back by 2 positions to inspect the first one.
But of course you could also look ahead like this:
for (int i = 0; i < someString.Length - 2; i++) {
if (someString[i] == charA && someString[i + 2] == charB) {
//TODO: delete the char in between.
break;
}
}

Going to the end of a substring in c#

The comment // go to end, I can't figure out how to cleanly end the substring :(
Is there a simpler way to go to the end of the substring rather than mathing out the number by myself? For more complex strings this would be too hard
string word = Console.ReadLine();
string[] lines = File.ReadAllLines(file);
using (var far = File.CreateText(resultfile))
{
foreach (string line in lines)
{
StringBuilder NewL = new StringBuilder();
int ind = line.IndexOf(word);
if (ind >= 0)
{
if (ind == 0)
{
NewL.Append(line.Substring(ind+ word.Length +1, // go to end);
}else{
NewL.Append(line.Substring(0, ind - 1));
NewL.Append(line.Substring(ind + word.Length + 1, // go to end));}
far.WriteLine(NewL);
}
else
{
far.WriteLine(line);
}
}
I don't know what more details the stackoverflow wants, anyone who can answer this pretty sure can clearly understand this simple code anyways.

You can use the String.Substring(int) overload, which automatically continues to the end of the source string:
NewL.Append(line.Substring(ind + word.Length + 1));
Retrieves a substring from this instance. The substring starts at a specified character position and continues to the end of the string.

It seems to me that you are just trying to remove a certain word from the loaded lines. If this is your task then you can simply replace the word with an empty string
foreach (string line in lines)
{
string newLine = line.Replace(word, "");
far.WriteLine(newLine);
}
Or even without an explicit loop with a bit of Linq
var result = lines.Select(x = x.Replace(word,""));
File.WriteAllLines("yourFile.txt", result);
Or, given the requirement to match an additional character after the word you can solve it with Regex.
Regex r = new Regex(word + ".");
var result = lines.Select(x => r.Replace(x, ""));
File.WriteAllLines("yourFile.txt", result);

How to remove certain characters

I'm trying to remove single vowels from a string, but not if a vowel is double same.
For example string
"I am keeping a foobar"
should print out as
"m keepng foobr"
I have tried everything but didn't come up with a solution so far.

Try:
Regex.Replace(input, #"([aeiou])\1", "");
Though for I am keeping a foobar, it will give you m keepng foobr, which is different to your required m keepng foobr, as you're stripped spaces out of your required result, too.
If you want to remove the extraneous spaces, then it's a three step operation: remove vowels; remove proceeding/trailing spaces; remove double spaces.
var raw = Regex.Replace(input, #"([aeiou])\1", "");
var trimmed = raw.Trim();
var final = trimmed.Replace(" ", " ");

You could try this logic:
loop trough string and check two by two characters
if (isBothVowelsAndEqual()) do nothing; else removeFirstChar();
EDIT:
public List<char> vowels = "AEIOUaeiou".ToList();
public bool isBothVowelsAndEqual(char first, char second)
{
return (first == second && vowels.Contains(first));
}
const string s = "I am keeeping a foobar";
string output=String.Empty;
for (int i = 0; i < s.Length-1; i++)
{
if (isBothVowelsAndEqual(s[i], s[i + 1]))
{
output = output + s[i] + s[i+1];
i++;
}
else
{
if (!vowels.Contains(s[i])) {
output += s[i];
}
}
}
Console.WriteLine(output.Trim());

Regular expression for pipe delimited and double quoted string

I have a string something like this:
"2014-01-23 09:13:45|\"10002112|TR0859657|25-DEC-2013>0000000000000001\"|10002112"
I would like to split by pipe apart from anything wrapped in double quotes so I have something like (similar to how csv is done):
[0] => 2014-01-23 09:13:45
[1] => 10002112|TR0859657|25-DEC-2013>0000000000000001
[2] => 10002112
I would like to know if there is a regular expression that can do this?

I think you may need to write your own parser.
Yo will need:
custom collection to keep results
boolean flag to decide whether pipe is inside quotation or outside quotation marks
string (or StringBuilder) to keep current word
The idea is that you read string char by char. Each char is appended to the word. If there is a pipe outside quotation marks you add the word to your result collection. If there is a quote you switch a flag so you don't treat the pipe as a divider anymore but you append it as a part of the word. Then if there is another quotation you switch the flag back again. So next pipe will result in adding the whole word (with pipes within quotation marks) to the collection. I tested the code below on your example and it worked.
private static List<string> ParseLine(string yourString)
{
bool ignorePipe = false;
string word = string.Empty;
List<string> divided = new List<string>();
foreach (char c in yourString)
{
if (c == '|' &&
!ignorePipe)
{
divided.Add(word);
word = string.Empty;
}
else if (c == '"')
{
ignorePipe = !ignorePipe;
}
else
{
word += c;
}
}
divided.Add(word);
return divided;
}

How about this Regular Expression:
/((["|]).*\2)/g
Online Demo
It looks like it could be used as valid split expression.

I'm going to blatantly ignore the fact that you want a RegEx, because I think that making your own IEnumerable will be easier. Plus, you get instant access to Linq.
var line = "2014-01-23 09:13:45|\"10002112|TR0859657|25-DEC-2013>0000000000000001\"|10002112";
var data = GetPartsFromLine(line).ToList();
private static IEnumerable<string> GetPartsFromLine(string line)
{
int position = -1;
while (position < line.Length)
{
position++;
if (line[position] == '"')
{
//go find the next "
int endQuote = line.IndexOf('"', position + 1);
yield return line.Substring(position + 1, endQuote - position - 1);
position = endQuote;
if (position < line.Length && line[position + 1] == '|')
{
position++;
}
}
else
{
//go find the next |
int pipe = line.IndexOf('|', position + 1);
if (pipe == -1)
{
//hit the end of the line
yield return line.Substring(position);
position = line.Length;
}
else
{
yield return line.Substring(position, pipe - position);
position = pipe;
}
}
}
}
This hasn't been fully tested, but it works with your example.

c# getting a string within another string

i have a string like this:
some_string = "A simple demo of SMS text messaging.\r\n+CMGW: 3216\r\n\r\nOK\r\n\"
im coming from vb.net and i need to know in c#, if i know the position of CMGW, how do i get "3216" out of there?
i know that my start should be the position of CMGW + 6, but how do i make it stop as soon as it finds "\r" ??
again, my end result should be 3216
thank you!

Find the index of \r from the start of where you're interested in, and use the Substring overload which takes a length:
// Production code: add validation here.
// (Check for each index being -1, meaning "not found")
int cmgwIndex = text.IndexOf("CMGW: ");
// Just a helper variable; makes the code below slightly prettier
int startIndex = cmgwIndex + 6;
int crIndex = text.IndexOf("\r", startIndex);
string middlePart = text.Substring(startIndex, crIndex - startIndex);

If you know the position of 3216 then you can just do the following
string inner = some_string.SubString(positionOfCmgw+6,4);
This code will take the substring of some_string starting at the given position and only taking 4 characters.
If you want to be more general you could do the following
int start = positionOfCmgw+6;
int endIndex = some_string.IndexOf('\r', start);
int length = endIndex - start;
string inner = some_string.SubString(start, length);

One option would be to start from your known index and read characters until you hit a non-numeric value. Not the most robust solution, but it will work if you know your input's always going to look like this (i.e., no decimal points or other non-numeric characters within the numeric part of the string).
Something like this:
public static int GetNumberAtIndex(this string text, int index)
{
if (index < 0 || index >= text.Length)
throw new ArgumentOutOfRangeException("index");
var sb = new StringBuilder();
for (int i = index; i < text.Length; ++i)
{
char c = text[i];
if (!char.IsDigit(c))
break;
sb.Append(c);
}
if (sb.Length > 0)
return int.Parse(sb.ToString());
else
throw new ArgumentException("Unable to read number at the specified index.");
}
Usage in your case would look like:
string some_string = #"A simple demo of SMS text messaging.\r\n+CMGW: 3216\r\n...";
int index = some_string.IndexOf("CMGW") + 6;
int value = some_string.GetNumberAtIndex(index);
Console.WriteLine(value);
Output:
3216

If you're looking to extract the number portion of 'CMGW: 3216' then a more reliable method would be to use regular expressions. That way you can look for the entire pattern, and not just the header.
var some_string = "A simple demo of SMS text messaging.\r\n+CMGW: 3216\r\n\r\nOK\r\n";
var match = Regex.Match(some_string, #"CMGW\: (?<number>[0-9]+)", RegexOptions.Multiline);
var number = match.Groups["number"].Value;

More general, if you don't know the start position of CMGW but the structure remains as before.
String s;
char[] separators = {'\r'};
var parts = s.Split(separators);
parts.Where(part => part.Contains("CMGW")).Single().Reverse().TakeWhile(c => c != ' ').Reverse();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Special characters Regex - c#

Related

Check for specific character between two characters

Going to the end of a substring in c#

How to remove certain characters

Regular expression for pipe delimited and double quoted string

c# getting a string within another string

Categories

Resources