String cleaning and formatting

String cleaning and formatting - c#

I have a URL formatter in my application but the problem is that the customer wants to be able to enter special characters like:
: | / - “ ‘ & * # #
I have a string:
string myCrazyString = ":|/-\“‘&*##";
I have a function where another string is being passed:
public void CleanMyString(string myStr)
{
}
How can I compare the string being passed "myStr" to "myCrazyString" and if "myStr has any of the characters in myCrazyString to remove it?
So if I pass to my function:
"this ' is a" cra#zy: me|ssage/ an-d I& want#to clea*n it"
It should return:
"this is a crazy message and I want to clean it"
How can I do this in my CleanMyString function?

Use Regular Expression for that Like:
pattern = #"(:|\||\/|\-|\\|\“|\‘|\&|\*|\#|\#)";
System.Text.RegularExpressions.Regex.Replace(inputString, pattern, string.Empty);
split each string you want to remove by |
To remove the special characters like the | itself use \, so \| this will handle the | as normal character.
Test:
inputString = #"H\I t&he|r#e!";
//output is: HI there!

solution without regular expressions, just for availability purposes:
static string clear(string input)
{
string charsToBeCleared = ":|/-\“‘&*##";
string output = "";
foreach (char c in input)
{
if (charsToBeCleared.IndexOf(c) < 0)
{
output += c;
}
}
return output;
}

You can use Regex as others mentioned, or code like this:
char[] myCrazyChars = "\"\':|/-\\“‘&*##".ToCharArray();
string myCrazyString = "this ' is a\" cra#zy: me|ssage/ an-d I& want#to clea*n it";
string[] splittedCrazyString = myCrazyString.Split(myCrazyChars);
string notCrazyStringAtAll = string.Join("", splittedCrazyString);

Try using a Regular Expression.

Here's a fairly straight-forward way to do it. Split the string based on all of the characters in your "crazy string and then join them back together without the bad characters.
string myCrazyString = #":|/-\“‘&*##";
string str = #"this ' is a"" cra#zy: me|ssage/ an-d I& want#to clea*n it";
string[] arr = str.Split(myCrazyString.ToCharArray(), StringSplitOptions.None);
str = string.Join(string.Empty, arr);

Another possible solution:
namespace RemoveChars
{
class Program
{
static string str = #"this ' is a\“ cra#zy: me|ssage/ an-d I& want#to clea*n it";
static void Main(string[] args)
{
CleanMyString(str);
}
public static void CleanMyString(string myStr)
{
string myCrazyString = #":|/-\“‘&*##";
var result = "";
foreach (char c in myStr)
{
var t = true; // t will remain true if c is not a crazy char
foreach (char ch in myCrazyString)
if (c == ch)
{
t = false;
break;
}
if (t)
result += c;
}
}
}
}

You could try an if statement and if a character is present then mention the craziness
if (myCrazyString.Contains("#"))
{
Console.WriteLine("This string is out of controL!");
}
Regex is also a good idea(Maybe better)

Try this :
1.Define a StringBuilder
2.Iterate through the characters of the string to be cleaned.
3.Put everything required in the StringBuilder and ignore other special charactersby simply putting if conditions.
4.Rerurn StringBuilder.
Or
Try using Regular Expression.

Related

Problem with splitting string input in c# in a very specific way

string input = "\"Hello, World!\" ! \"Some other string\"";
Hello, I am having a problem with finding a solution to this. You see, I want to Split the string in half by the ! separating the two "fake strings" inside the string. I am aware that I can use String.Split(), but what if there is an exclamation mark inside the "fake string"?
Would appreciate if anyone could help.

You could use a regex: "(.*)" ! "(.*)"
https://regex101.com/r/4uBspp/1

Assuming the seperator will always be formed from the string \" ! \" you can use the Split overload function by passing that string as part of the string array.
string input = "\"Hello, World!\" ! \"Some other string\"";
var data = input.Split(new string[] { "\" ! \"" }, StringSplitOptions.None);

An alternative approach to the others is to use a state machine-style approach - if your start and end string delimiters were different (e.g. if they were < and > instead of ") then this would work better to support nested strings than a regex approach.
void Main()
{
var stateMachine = new StringSplitStateMachine();
stateMachine.Split("hi"); // hi
stateMachine.Split("hi!lo"); // hi, lo
stateMachine.Split("\"hi!lo\""); // "hi!lo"
stateMachine.Split("\"hi\"!lo"); // "hi", lo
}
public class StringSplitStateMachine
{
private readonly char _splitCharacter;
private readonly char _stringDelimiter;
public StringSplitStateMachine(char splitCharacter = '!', char stringDelimiter = '"')
{
_splitCharacter = splitCharacter;
_stringDelimiter = stringDelimiter;
}
public IEnumerable<string> Split(string input)
{
bool insideString =false;
var currentString = new StringBuilder();
foreach(var character in input)
{
if (character == _splitCharacter && !insideString)
{
yield return currentString.ToString();
currentString.Clear();
}
else
{
if (character == _stringDelimiter)
{
insideString = !insideString;
}
currentString.Append(character);
}
}
if (currentString.Length > 0)
{
yield return currentString.ToString();
}
}
}

How to correctly convert letters to numbers?

I have a string which comprise lots of letters. I have used the following code to convert it to numbers, but the new string t still gives me imperfect result.
For example:
tung2003 -> -1-1-1-12003
What I expected: 1161171101032003 (116 is the ASCII code of t, 117 is the ASCII code of u
string t=null;
foreach (char c in Properties.Settings.Default.password)
{
int ascii = (int)Char.GetNumericValue(c);
int counter=0;
counter = ascii;
t = t + Convert.ToString(counter);
}
The problem is the - character. I want my new string only comprises numbers.

It looks like you do not want the ASCII values of the numbers based on your expected output. In that case you can just do something like this:
string input = "tung2003";
string output = string.Empty;
foreach(char c in input)
{
if(char.IsNumber(c))
{
output += c;
}
else
{
output += ((byte)c).ToString();
}
}
//output is now: 1161171101032003
Fiddle here

Also added as a Linq expression for a short hand solution.
// Method 1 Linq
string output = string.Concat(("tung2003".ToCharArray()
.Select(s=> char.IsDigit(s) ? s.ToString() : ((int)s).ToString())));
// Method 2
string input = "tung2003";
string output = string.Empty;
foreach (char c in input)
{
if (Char.IsDigit(c)) output += c.ToString();
else output += ((int)c).ToString();
}

Extrapolating your output it looks like you want two different things. You want to tally each ascii character as long as it is a letter and extract the numeric values to append. The following provides three options, the first is to tally the ascii values from letters and the other two are ways to extract only digits. Because your code example uses a Password I am assuming you are trying to do some sort of custom hashing and if that is the case you should use a Hash implementation from the Cryptography namespace or some other package.
using System;
using System.Linq;
using System.Text.RegularExpressions;
namespace ConsoleApp5
{
class Program
{
static void Main(string[] args)
{
var combined = OnlyLettersToAscii("tung2003") + OnlyNumbers("tung2003");
Console.WriteLine($"Input: tung2003 Output: {OnlyNumbers("tung2003")}");
Console.WriteLine($"Input: tung2003 Output Regex: {OnlyNumbersWithRegex("tung2003")}");
Console.ReadKey();
}
private static string OnlyLettersToAscii(string originalString)
{
if (string.IsNullOrWhiteSpace(originalString)) return originalString;
return string.Join(string.Empty, originalString.ToArray()
.Where(w => char.IsLetter(w))
.Select(s => ((int)s).ToString()));
}
private static string OnlyNumbers(string originalString)
{
if (string.IsNullOrWhiteSpace(originalString)) return originalString;
return new string(originalString.Where(w => char.IsDigit(w)).ToArray());
}
public static string OnlyNumbersWithRegex(string originalString)
{
return Regex.Replace(originalString, #"[^\d]", string.Empty);
}
}
}

string t = "";
foreach (char c in Properties.Settings.Default.password)
{
if (IsNumber(x)) t += System.Convert.ToInt32(c).ToString();
else
{
t += c.ToString();
}
}
Moreover, if you just want to get rid off '-' the use this code: t =String.Replace(t, '-');

How to remove a portion of string

I want to remove word Test and Leaf from the specified string beginning only,not from the other side,so string Test_AA_234_6874_Test should be AA_234_6874_Test,But when i use .Replace it will replace word Test from everywhere which i don't want.How to do it
This is the code what i have done it
string st = "Test_AA_234_6874_Test";
st = st.Replace("Test_","");

You could use a regex to do this. The third argument of the regex replace method specifics how many times you want to replace.
string st = "Test_AA_234_6874_Test";
var regex = new Regex("(Test|Leaf)_");
var value = regex.Replace(st, "", 1);
Or if the string to replace only occurs on the start just use ^ which asserts the position at start of the string.
string st = "Test_AA_234_6874_Test";
var regex = new Regex("^(Test|Leaf)_");
var value = regex.Replace(st, "");
If you know that you allways have to remove the first 5 letters you can also use Substring which is more performant.
string st = "Test_AA_234_6874_Test";
var value = st.Substring(5, st.Length - 5);

The simplest way to do this is by using a Regular Expression like so.
using System;
using System.Text.RegularExpressions;
using System.Text;
namespace RegExTest
{
class Program
{
static void Main(string[] args)
{
var input = "Test_AA_234_6874_Test";
var matchText = "Test";
var replacement = String.Empty;
var regex = new Regex("^" + matchText);
var output = regex.Replace(input, replacement);
Console.WriteLine("Converted String: {0}", output);
Console.ReadKey();
}
}
}
The ^ will match text at the beginning of the string.

Consider checking whether the string starts with "Start" and/or ends with "Trim" and decide the end and start positions you'd like to maintain. Then use Substring method to get only the portion you need.
public string Normalize(string input, string prefix, string suffix)
{
// Validation
int length = input.Length;
int startIndex = 0;
if(input.StartsWith(prefix))
{
startIndex = prefix.Length;
length -= prefix.Length;
}
if (input.EndsWith (suffix))
{
length -= suffix.Length;
}
return input.Substring(startIndex, length);
}
Hope this helps.

string wordToRemoveFromBeginning = "Test_";
int index = st.IndexOf(wordToRemoveFromBeginning);
string cleanPath = (index < 0) ? st : st.Remove(index,
wordToRemoveFromBeginning.Length);

Use a regular expression.
var str1 = "Test_AA_234_6874_Test";
var str2 = "Leaf_AA_234_6874_Test";
str1 = Regex.Replace(str1, "^Test", "");
str2 = Regex.Replace(str2, "^Leaf", "");
Regex.Replace parameters are your input string (str1), the pattern you want to match, and what to replace it with, in this case a blank space. The ^ character means look at the start of the string, so something like "MyTest_AAAA_234_6874_Test" would stil return "MyTest_AA_234_6874_Test".

I am gonna use some very simple code here
string str = "Test_AA_234_6874_Test";
string substring = str.Substring(0, 4);
if (substring == "Test" || substring == "Leaf")
{
str= str.Remove(0, 5);
}

Sprache parser and characters escaping

I haven't found an example - what to do with characters escaping. I have found a code example:
static void Main(string[] args)
{
string text = "'test \\\' text'";
var result = Grammar.QuotedText.End().Parse(text);
}
public static class Grammar
{
private static readonly Parser<char> QuoteEscape = Parse.Char('\\');
private static Parser<T> Escaped<T>(Parser<T> following)
{
return from escape in QuoteEscape
from f in following
select f;
}
private static readonly Parser<char> QuotedTextDelimiter = Parse.Char('\'');
private static readonly Parser<char> QuotedContent =
Parse.AnyChar.Except(QuotedTextDelimiter).Or(Escaped(QuotedTextDelimiter));
public static Parser<string> QuotedText = (
from lquot in QuotedTextDelimiter
from content in QuotedContent.Many().Text()
from rquot in QuotedTextDelimiter
select content
).Token();
}
It parses a text successfully if the text doesn't contain escaping, but it doesn't parse text with characters escaping.

I had a similar problem, parsing strings using " as delimiter and \ as escape character. I wrote a simple parser for this (may not be the most elegant solution) and it seems to work nicely.
You should be able to adapt it, since the only difference appears to be the delimiter.
var escapedDelimiter = Parse.String("\\\"").Text().Named("Escaped delimiter");
var singleEscape = Parse.String("\\").Text().Named("Single escape character");
var doubleEscape = Parse.String("\\\\").Text().Named("Escaped escape character");
var delimiter = Parse.Char('"').Named("Delimiter");
var simpleLiteral = Parse.AnyChar.Except(singleEscape).Except(delimiter).Many().Text().Named("Literal without escape/delimiter character");
var stringLiteral = (from start in delimiter
from v in escapedDelimiter.Or(doubleEscape).Or(singleEscape).Or(simpleLiteral).Many()
from end in delimiter
select string.Concat(start) + string.Concat(v) + string.Concat(end));
The key part is from v in .... It searches for escaped delimiters first, then for double escape characters and then for single escape characters before trying to parse it as a "simpleLiteral" w/o any escape or delimiter characters. Changing the order here would result in parse errors (e.g if you would try to parse single escape before escaped delimiters, you would never find the latter, same for double escapes and single escape).
This step is repeated many times, until an unescaped delimiter occurs (from v in ... does not handle unescaped delimiters, but from end in delimiterdoes of course).

I had a requirement to parse string literals that can be denoted with single-quote or double-quotes, and moreover also support escaping of those.
The method generating the string literal parser:
private readonly StringBuilder _reusableStringBuilder = new StringBuilder();
private Parser<string> BuildStringLiteralParser(char delimiterChar)
{
var escapeChar = '\\';
var delimiter = Sprache.Parse.Char(delimiterChar);
var escape = Sprache.Parse.Char(escapeChar);
var escapedDelimiter = Sprache.Parse.String($"{escapeChar}{delimiterChar}");
var splitByEscape = Sprache.Parse.AnyChar
.Except(delimiter.Or(escape))
.Many()
.Text()
.DelimitedBy(escapedDelimiter);
string BuildStr(IEnumerable<IEnumerable<string>> splittedByEscape)
{
_reusableStringBuilder.Clear();
var i = 0;
foreach (var splittedByEscapedDelimiter in splittedByEscape)
{
if (i > 0)
{
_reusableStringBuilder.Append(escapeChar);
}
var j = 0;
foreach (var str in splittedByEscapedDelimiter)
{
if (j > 0)
{
_reusableStringBuilder.Append(delimiterChar);
}
_reusableStringBuilder.Append(str);
j++;
}
i++;
}
return _reusableStringBuilder.ToString();
}
return (from ln in delimiter
from splittedByEscape in splitByEscape.DelimitedBy(escape)
from rn in delimiter
select BuildStr(splittedByEscape)).Named("string");
}
Usage:
var stringParser = BuildStringLiteralParser('\"').Or(BuildStringLiteralParser('\''));
var str1 = stringParser.Parse("\"'Hello' \\\"John\\\"\"");
Console.WriteLine(str1);
var str2 = stringParser.Parse("\'\\'Hello\\' \"John\"\'");
Console.WriteLine(str2);
Output:
'Hello' "John"
'Hello' "John"
Check the working demo:
https://dotnetfiddle.net/8wFNbj

Trim a string in c# after special character

I want to trim a string after a special character..
Lets say the string is str="arjunmenon.uking". I want to get the characters after the . and ignore the rest. I.e the resultant string must be restr="uking".

How about:
string foo = str.EverythingAfter('.');
using:
public static string EverythingAfter(this string value, char c)
{
if(string.IsNullOrEmpty(value)) return value;
int idx = value.IndexOf(c);
return idx < 0 ? "" : value.Substring(idx + 1);
}

you can use like
string input = "arjunmenon.uking";
int index = input.LastIndexOf(".");
input = input.Substring(index+1, input.Split('.')[1].ToString().Length );

Use Split function
Try this
string[] restr = str.Split('.');
//restr[0] contains arjunmenon
//restr[1] contains uking

char special = '.';
var restr = str.Substring(str.IndexOf(special) + 1).Trim();

Try Regular Expression Language
using System.IO;
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string input = "arjunmenon.uking";
string pattern = #"[a-zA-Z0-9].*\.([a-zA-Z0-9].*)";
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine(match.Value);
if (match.Groups.Count > 1)
for (int ctr = 1; ctr < match.Groups.Count; ctr++)
Console.WriteLine(" Group {0}: {1}", ctr, match.Groups[ctr].Value);
}
}
}
Result:
arjunmenon.uking
Group 1: uking

Personally, I won't do the split and go for the index[1] in the resulting array, if you already know that your correct stuff is in index[1] in the splitted string, then why don't you just declare a constant with the value you wanted to "extract"?
After you make a Split, just get the last item in the array.
string separator = ".";
string text = "my.string.is.evil";
string[] parts = text.Split(separator);
string restr = parts[parts.length - 1];
The variable restr will be = "evil"

string str = "arjunmenon.uking";
string[] splitStr = str.Split('.');
string restr = splitStr[1];

Not like the methods that uses indexes, this one will allow you not to use the empty string verifications, and the presence of your special caracter, and will not raise exceptions when having empty strings or string that doesn't contain the special caracter:
string str = "arjunmenon.uking";
string restr = str.Split('.').Last();
You may find all the info you need here : http://msdn.microsoft.com/fr-fr/library/b873y76a(v=vs.110).aspx
cheers

I think the simplest way will be this:
string restr, str = "arjunmenon.uking";
restr = str.Substring(str.LastIndexOf('.') + 1);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

String cleaning and formatting - c#

solution without regular expressions, just for availability purposes: static string clear(string input) { string charsToBeCleared = ":|/-\“‘&*##"; string output = ""; foreach (char c in input) { if (charsToBeCleared.IndexOf(c) < 0) { output += c; } } return output; }

Try using a Regular Expression.

You could try an if statement and if a character is present then mention the craziness if (myCrazyString.Contains("#")) { Console.WriteLine("This string is out of controL!"); } Regex is also a good idea(Maybe better)

Try this : 1.Define a StringBuilder 2.Iterate through the characters of the string to be cleaned. 3.Put everything required in the StringBuilder and ignore other special charactersby simply putting if conditions. 4.Rerurn StringBuilder. Or Try using Regular Expression.

Related

Problem with splitting string input in c# in a very specific way

How to correctly convert letters to numbers?

How to remove a portion of string

Sprache parser and characters escaping

Trim a string in c# after special character

Categories

Resources