String Cleaning in C#

String Cleaning in C# - c#

I am trying to write a function that as input takes a string containing words and removes all single character words and returns the new string without the removed characters
E.g.:
string news = FunctionName("This is a test");
//'news' here should be "This is test".
Can you please help?

Obligatory LINQ one-liner:
string.Join(" ", "This is a test".Split(' ').Where(x => x.Length != 1).ToArray())
Or as a nicer extension method:
void Main()
{
var output = "This is a test".WithoutSingleCharacterWords();
}
public static class StringExtensions
{
public static string WithoutSingleCharacterWords(this string input)
{
var longerWords = input.Split(' ').Where(x => x.Length != 1).ToArray();
return string.Join(" ", longerWords);
}
}

I'm sure there's a nicer answer using regex, but you could do the following:
string[] words = news.Split(' ');
StringBuilder builder = new StringBuilder();
foreach (string word in words)
{
if (word.Length > 1)
{
if (builder.ToString().Length ==0)
{
builder.Append(word);
}
else
{
builder.Append(" " + word);
}
}
}
string result = builder.ToString();

The interesting thing about this question is that presumably you also want to remove one of the spaces surrounding the single-letter word.
string[] oldText = {"This is a test", "a test", "test a"};
foreach (string s in oldText) {
string newText = Regex.Replace(s, #"\s\w\b|\b\w\s", string.Empty);
WL("'" + s + "' --> '" + newText + "'");
}
Output...
'This is a test' --> 'This is test'
'a test' --> 'test'
'test a' --> 'test'

With Linq syntax, you could do something like
return string.Join(' ', from string word in input.Split(' ') where word.Length > 1))

string str = "This is a test.";
var result = str.Split(' ').Where(s => s.Length > 1).Aggregate((s, next) => s + " " + next);
UPD
Using the extension method:
public static string RemoveSingleChars(this string str)
{
return str.Split(' ').Where(s => s.Length > 1).Aggregate((s, next) => s + " " + next);
}
//----------Usage----------//
var str = "This is a test.";
var result = str.RemoveSingleChars();

Related

Text editing small program

The program should return edit text, where you have to replace" - ", ": ", "; ", ", ", " " with "\t".
The problem here is the result
Input: Китай: 1405023000; 24.08.2020; 17.99%
Expected Китай 1405023000 24.08.2020 17.99%
Myne Китай: 1405023000; 24.08.2020; 17.99%
So for some reason, I believe he messing with the order of `stringSeparators` elements or what. I am interested in this moment
public static string ReplaceIncorrectSeparators(string text)
{
string populationEdited = "";
string[] stringSeparators = new string[] {" - ", ": ", "; ", ", ", " "};
for (int i = 0; i < stringSeparators.Length; i++)
{
populationEdited = text.Replace(stringSeparators[i], "\t");
}
return populationEdited;
}
I've already solved the problem in another way but I want to solve it with separators.

The main problem in your code is that it doesn't store the result of Replace properly. This should do the trick:
public static string ReplaceIncorrectSeparators(string text)
{
string populationEdited = text; // You need to start with the original
string[] stringSeparators = new string[] {" - ", ": ", "; ", ", ", " "};
for (int i = 0; i < stringSeparators.Length; i++)
{
// And here instead of text.Replace you do populationEdited.Replace
populationEdited = populationEdited.Replace(stringSeparators[i], "\t");
}
return populationEdited;
}

You could Regex as an alternative. It would make your code shorter (an in my opinion more readable).
public static string ReplaceIncorrectSeparators(string text)
{
Regex regex = new Regex(#" - |: |; |, | ");
return regex.Replace(text, "\t");
}

Split and Append "AND" between values

how to split below value and append AND between values ?
I cannot Split with Space as there is spaces between words
"\"Mark John\" \"Tina Roy\""
as
"\"Mark John\" AND \"Tina Roy\""
In the end it should look like -
"Mark John" AND "Tina Roy"
Any help is appreciated.
string operatorValue = " AND ";
if (!string.IsNullOrEmpty(operatorValue))
{
foreach (string searchVal in SearchRequest.Text.Split(' '))
{
if (!string.IsNullOrEmpty(searchVal))
searchValue += searchVal + operatorValue;
}
}
int index = searchValue.LastIndexOf(operatorValue);
if (index != -1)
{
outputSearchValue = searchValue.Substring(0, index);
}

Try
var result = str.Replace("\" \"","\" And \"");
If you have more than one name, or there is a possibility that you could have more than one whitespace between two names, you could opt for Regex.
var result = Regex.Replace(str,"\"\\s+\"","\" And \"");
Example,
var str = "\"Mark John\" \"Tina Roy\" \"Anu Viswan\"";
var result = Regex.Replace(str,"\"\\s+\"","\" And \"");
Output
"Mark John" And "Tina Roy" And "Anu Viswan"

Or use Regular Expressions:
var test = "\"John Smith\" \"Bill jones\" \"Bob Norman\"";
Console.WriteLine(Regex.Replace(test, "\" \"", "\" AND \""));

Instead of splitting, replace the " " with " AND "
var test = "\"Mark John\" \"Tina Roy\"";
var new_string= test.Replace("\" \"", " AND ");

Making first 3 characters to uppercase using regex

Consider the following helper method
public static string ToSentenceCase(this string str)
{
return Regex.Replace(str, "[a-z][A-Z]", m => m.Value[0] + " " + char.ToLower(m.Value[1]));
}
Now when called as:
static void Main(string[] args)
{
string a = "HelloWorld";
Console.WriteLine(CultureInfo.CurrentCulture.TextInfo.ToTitleCase(a.ToSentenceCase()));
}
This will output Hello World which works great.
Using this method I'm trying to change the 3 first characters to uppercase if they start with the string RMA. Is there are way to achieve this using a regex or would I have to create another method and call it once I have my returned string from ToSentenceCase()?
So if I had a string rmainfo I would want RMA Info

You may use:
public static string ToSentenceCase(this string str)
{
var temp = Regex.Replace(str, "[a-z][A-Z]", m => m.Value[0] + " " + char.ToLower(m.Value[1]));
return Regex.Replace(temp, "^rma.", m => m.Value.Substring(0, 3).ToUpper() + " " + char.ToUpper(m.Value[3]), RegexOptions.IgnoreCase);
}

If you insist on doing it with a regular expression, this could work:
var str = "rmaHelloWorld";
var str1 = Regex.Replace(str, "[a-z][A-Z]", m => m.Value[0] + " " + char.ToLower(m.Value[1]));
var str2 = Regex.Replace(str1, "^rma", m => m.Value.ToUpper());
Console.WriteLine(str2);

Removing multiple instances of words from a single set in a string

I have a string street that may contains:
street= "Siegfriedst strasse st 16.";
street= "Frontos strasse s .";
I want to remove the extra "st", "strasse" and "s".
I used:
street= street.Replace("(", "").Replace(")", "").Replace(".", "").
Replace("-", "").Replace("strasse","").
Replace("st","").Replace("s","");
But I don't want to remove "st" from "Siegfriedst" and "s" from "Frontos".

Perhaps this is what you want, it's not clear if you only want to remove duplicate words or sub-strings:
public static string RemoveDuplicates(string input, params string[] wordsToCheck)
{
var wordSet = new HashSet<string>(wordsToCheck);
int taken = 0;
var newWords = input.Split()
.Select(w => !wordSet.Contains(w) || ++taken == 1 ? w : "");
return string.Join(" ", newWords);
}
Usage:
string text = RemoveDuplicates("Siegfriedst strasse st 16.", "st", "strasse", "s");
Result: Siegfriedst strasse 16.

street = street.Replace(".", " ") //To better enable pattern matching
.Replace(" strasse ", "")
.Replace(" st ", " ")
.Replace(" s ", "")
.Replace(" ", " ")
.Trim(); //Trim() removes the leading and trailing whitespaces

street = street.Replace("(", "")
.Replace(")", "")
.Replace(".", "")
.Replace("-", "")
.Replace(" strasse "," ")
.Replace(" st "," ")
.Replace(" s "," ");

You can write a helper method like:
public static string Cleanup(string text, string[] exclude)
{
string[] parts = text.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
List<string> words = new List<string>();
foreach(string part in parts)
{
if (!exclude.Contains(part))
{
words.Add(part);
}
}
return string.Join(" ", words.ToArray());
}
and then use it like:
string street = Cleanup("Siegfriedst strasse st 16.", new string[] { "strasse", "st", "s", " " });

You can use regular expressions. This will match whole words "strasse" "st" or "s", but not in parts of words:
using System.Text.RegularExpressions;
Regex rgx = new Regex(#"\b(strasse|st|s)\b|\(|\)|\.");
street = rgx.Replace(street, "");

Based on the fact that you used concatenated Replace operations (thus not wanting any of the extra strings to appear) I would suggest the following LINQ query:
street = street.Split(' ').Where(s => s != "strasse" && s != "st" && s != "s").Aggregate((x, y) => x + " " + y);

Checking for and removing any characters in a string

I am wondering what would be the best way to specify an array of characters like,
{
}
[
]
and then check a string for these and if they are there, to completely remove them.
if (compiler.Parser.GetErrors().Count == 0)
{
AstNode root = compiler.Parse(phrase.ToLower());
if (compiler.Parser.GetErrors().Count == 0)
{
try
{
fTextSearch = SearchGrammar.ConvertQuery(root, SearchGrammar.TermType.Inflectional);
}
catch
{
fTextSearch = phrase;
}
}
else
{
fTextSearch = phrase;
}
}
else
{
fTextSearch = phrase;
}
string[] brackets = brackets = new string[]
{
"{",
"}",
"[",
"]"
};
string[] errorChars = errorChars = new string[]
{
"'",
"&"
};
StringBuilder sb = new StringBuilder();
string[] splitString = fTextSearch.Split(errorChars, StringSplitOptions.None);
int numNewCharactersAdded = 0;
foreach (string itm in splitString)
{
sb.Append(itm); //append string
if (fTextSearch.Length > (sb.Length - numNewCharactersAdded))
{
sb.Append(fTextSearch[sb.Length - numNewCharactersAdded]); //append splitting character
sb.Append(fTextSearch[sb.Length - numNewCharactersAdded - 1]); //append it again
numNewCharactersAdded++;
}
}
string newString = sb.ToString();

A regular expression can do this far more easily:
var result = Regex.Replace(input, #"[[\]()]", "");
Using a character set ([...]) to match anyone of the characters in it and replace with nothing. Regex.Replace will replace all matches.

Another concise way is using Enumerable.Except to get the set difference of the Chars(assuming brackets are chars):
String newString = new String(oldString.Except(brackets).ToArray());

string str = "faslkjnro(fjrmn){ferqwe}{{";
char[] separators = new []{'[', ']','{','}' };
var sb = new StringBuilder();
foreach (var c in str)
{
if (!separators.Contains(c))
{
sb.Append(c);
}
}
return sb.ToString();

How about this:
string myString = "a12{drr[ferr]vgb}rtg";
myString = myString.Replace("[", "").Replace("{", "").Replace("]", "").Replace("}", "");
You end up with:
a12drrferrvgbrtg

I don't know if I understand your problem, but you can solve your problem with this:
string toRemove = "{}[]";
string result = your_string_to_be_searched;
foreach(char c in toRemove)
result = result.Replace(c.ToString(), "");
or with an extension method
static class Extensions
{
public static string RemoveAll(this string src, string chars)
{
foreach(char c in chars)
src= src.Replace(c.ToString(), "");
return src;
}
}
With this you can use string result = your_string_to_be_searched.RemoveAll("{}[]");

string charsToRemove = #"[]{}";
string pattern = string.Format("[{0}]", Regex.Escape(charsToRemove));
var result = Regex.Replace(input, pattern, "");
The primary advantage of this over some of the other similar answers is that you aren't bothered with determining which characters need to be escaped in RegEx; you can let the library take care of that for you.

You can do this in a pretty compact fashion like this:
string s = "ab{c[d]}";
char[] ca = new char[] {'{', '}', '[', ']'};
Array.ForEach(ca, e => s = s.Replace(e.ToString(), ""));
Or this:
StringBuilder s = new StringBuilder("ab{c[d]}");
char[] ca = new char[] {'{', '}', '[', ']'};
Array.ForEach(ca, e => s.Replace(e.ToString(), ""));

Taken from this answer: https://stackoverflow.com/a/12800424/1498669
Just use .Split() with the char[] of your desired removeables and recapture it with .Join() or .Concat()
char[] delChars = "[]{}<>()".ToCharArray();
string input = "some (crazy) string with brac[et]s in{si}de";
string output = string.Join(string.Empty, input.Split(delChars));
//or
string output = string.Concat(input.Split(delChars));
References:
https://learn.microsoft.com/en-us/dotnet/csharp/how-to/parse-strings-using-split
https://learn.microsoft.com/en-us/dotnet/csharp/how-to/concatenate-multiple-strings#code-try-4

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

String Cleaning in C# - c#

I am trying to write a function that as input takes a string containing words and removes all single character words and returns the new string without the removed characters E.g.: string news = FunctionName("This is a test"); //'news' here should be "This is test". Can you please help?

With Linq syntax, you could do something like return string.Join(' ', from string word in input.Split(' ') where word.Length > 1))

Related

Text editing small program

Split and Append "AND" between values

Making first 3 characters to uppercase using regex

Removing multiple instances of words from a single set in a string

Checking for and removing any characters in a string

Categories

Resources