How to Fix Malformed Whitespace Separator using C#

How to Fix Malformed Whitespace Separator using C# - c#

I have the following string:
string testString = ",,,The,,,boy,,,kicked,,,the,,ball";
I want to remove the unwanted commas and have the sentence as this (simply printed to the console):
The boy kicked the ball
I tried the below code:
string testString = ",,,The,,,boy,,,kicked,,,the,,ball";
string manipulatedString = testString.Replace(",", " "); //line 2
Console.WriteLine(manipulatedString.Trim() + "\n");
string result = Regex.Replace(manipulatedString, " ", " ");
Console.WriteLine(result.TrimStart());
However, I end up with a result with double whitespaces as so:
The boy kicked the ball
It kind of makes sense why I am getting such an anomalous output because in line 2 I am saying that for every comma (,) character, replace that with whitespace and it will do that for every occurrence.
What's the best way to solve this?

This is a simple solution using Split and Join
string testString = ",,,The,,,boy,,,kicked,,,the,,ball";
var splitted = testString.Split(new char[] {','}, StringSplitOptions.RemoveEmptyEntries);
string result = string.Join(" ", splitted);
Console.WriteLine(result);

You could use regex to replace the pattern ,+ (one or more occurrences of a comma) by a whitespace.
var replacedString = Regex.Replace(testString, ",+", " ").Trim();
Added Trim to remove white spaces at beginning/end as I assume you want to remove them.

Related

Smarter string replace based on a pattern

I have a string that looks like this:
string1 + \t\t\t\t\t\t\t + string2
string1 can be anything and string2 can be one of the following: Display, Search, Fee. For the escaped characters, sometimes I get 10, sometimes I get 5, sometimes I get some amount N... I am only expecting one \t character between string1 and string2.
What I have so far:
string newLine0 = line.Replace("\t\t\t\t\t\t\t\t\t\t\t\t\t\tDisplay", "\tDisplay");
string newline1 = newLine0.Replace("\t\tFee", "\tFee");
string newLine2 = newline1.Replace("\t\tSearch", "\tSearch");
string newLine3 = newLine2.Replace("\t\t\t\t\t\t\t\t\t\t\t\tDisplay", "\tDisplay");
string newLine4 = newLine3.Replace("\t\tDisplay", "\tDisplay");
Is there a better way to do this with cleaner code and less variables?

It seems like you could simply replace instances of more than one \t with a single \t:
string newLine = Regex.Replace(line, #"\t{2,}", "\t");
If you only want to remove extra tabs if one of the words Display, Fee or Search follows, use
string newLine = Regex.Replace(line, #"\t{2,}(?=Display|Fee|Search)", "\t");

If N tabs precede a word, make N be 1:
string newLine = Regex.Replace(line, #"\b(\t+)(\t\w)\2\b", "$+");
\b - starting from a word boundary
(\t+) - match one or more tabs (first grouping)
(\t\w) - followed by just one tab and a word (second grouping)
\2 - match the second captured group
$+ - substitute the whole match (/\t*\w/) with only the second matched group (/\t\w).

Regex Split - escape the same character I use for split

I have a string:
string data = "SEQUENCE $FIRST$ THEN $SECOND$ AND FINALLY \\$12345";
I want to split it up using Regex using the "$" character. However, I want to use \ as escape character.
string[] sComponents = Regex.Split(data, "(\\$)", RegexOptions.ExplicitCapture);
By running the code above I would get:
sComponents[0] = "SEQUENCE "
sComponents[1] = "FIRST"
sComponents[2] = " THEN "
sComponents[3] = "SECOND"
sComponents[4] = " AND FINALLY "
sComponents[5] = "12345"
But I want sComponents[4] to contain the $ such as " AND FINALLY $12345"
What is the best way to achieve this, does Regex has some type of escape character when splitting? or I have to manually handle this before I call Regex Split with my own logic?
Basically it comes down to, if Regex sees "$" then split but if it sees "\\$" then ignore it don't split at this very position.

Just split the input string according to the below regex which uses negative lookahead.
\$(?!\d)
Code:
string value = "SEQUENCE $FIRST$ THEN $SECOND$ AND FINALLY $12345";
string[] lines = Regex.Split(value, #"\$(?!\d)");
foreach (string line in lines) {
Console.WriteLine(line);
IDEONE
Update:
Use the below regex to split the input according to the $ symbol which is not preceeded by two backslashes.
(?<!\\\\)\$
Code:
string value = "SEQUENCE $FIRST$ THEN $SECOND$ AND FINALLY \\\\$12345";
string[] lines = Regex.Split(value, #"(?<!\\\\)\$");
foreach (string line in lines) {
Console.WriteLine(line);
IDEONE

Replace SubString on Partial match of word

I have two strings :-
String S1 = "This is my\r\n string."
String S2 = "This is my\n self."
I want to have a generic method to replace any existence of "\n" to "\r\n". But it should not replace any part of the string if it already has "\r\n".

Use regular expression with negative lookbehind:
string result = Regex.Replace(input, #"(?<!\r)\n", "\r\n");
It matches all \n which are not preceded by \r.

Try something like this:
var unused = "§";
S2 =
S2
.Replace("\r\n", unused)
.Replace("\n", unused)
.Replace(unused, "\r\n");

Assuming you have well-behaved standard input text, i.e. no consecutive \r, you can simply use:
var result = S1.replace("\n","\r\n").replace("\r\r","\r")
This won't work in general cases, obviously

How to remove leading and trailing spaces from a string

I have the following input:
string txt = " i am a string "
I want to remove space from start of starting and end from a string.
The result should be: "i am a string"
How can I do this in c#?

String.Trim
Removes all leading and trailing white-space characters from the current String object.
Usage:
txt = txt.Trim();
If this isn't working then it highly likely that the "spaces" aren't spaces but some other non printing or white space character, possibly tabs. In this case you need to use the String.Trim method which takes an array of characters:
char[] charsToTrim = { ' ', '\t' };
string result = txt.Trim(charsToTrim);
Source
You can add to this list as and when you come across more space like characters that are in your input data. Storing this list of characters in your database or configuration file would also mean that you don't have to rebuild your application each time you come across a new character to check for.
NOTE
As of .NET 4 .Trim() removes any character that Char.IsWhiteSpace returns true for so it should work for most cases you come across. Given this, it's probably not a good idea to replace this call with the one that takes a list of characters you have to maintain.
It would be better to call the default .Trim() and then call the method with your list of characters.

You can use:
String.TrimStart - Removes all leading occurrences of a set of characters specified in an array from the current String object.
String.TrimEnd - Removes all trailing occurrences of a set of characters specified in an array from the current String object.
String.Trim - combination of the two functions above
Usage:
string txt = " i am a string ";
char[] charsToTrim = { ' ' };
txt = txt.Trim(charsToTrim)); // txt = "i am a string"
EDIT:
txt = txt.Replace(" ", ""); // txt = "iamastring"

I really don't understand some of the hoops the other answers are jumping through.
var myString = " this is my String ";
var newstring = myString.Trim(); // results in "this is my String"
var noSpaceString = myString.Replace(" ", ""); // results in "thisismyString";
It's not rocket science.

txt = txt.Trim();

Or you can split your string to string array, splitting by space and then add every item of string array to empty string.
May be this is not the best and fastest method, but you can try, if other answer aren't what you whant.

text.Trim() is to be used
string txt = " i am a string ";
txt = txt.Trim();

Use the Trim method.

static void Main()
{
// A.
// Example strings with multiple whitespaces.
string s1 = "He saw a cute\tdog.";
string s2 = "There\n\twas another sentence.";
// B.
// Create the Regex.
Regex r = new Regex(#"\s+");
// C.
// Strip multiple spaces.
string s3 = r.Replace(s1, #" ");
Console.WriteLine(s3);
// D.
// Strip multiple spaces.
string s4 = r.Replace(s2, #" ");
Console.WriteLine(s4);
Console.ReadLine();
}
OUTPUT:
He saw a cute dog.
There was another sentence.
He saw a cute dog.

You Can Use
string txt = " i am a string ";
txt = txt.TrimStart().TrimEnd();
Output is "i am a string"

Regex removing double/triple comma in string

I need to parse a string so the result should output like that:
"abc,def,ghi,klm,nop"
But the string I am receiving could looks more like that:
",,,abc,,def,ghi,,,,,,,,,klm,,,nop"
The point is, I don't know in advance how many commas separates the words.
Is there a regex I could use in C# that could help me resolve this problem?

You can use the ,{2,} expression to match any occurrences of 2 or more commas, and then replace them with a single comma.
You'll probably need a Trim call in there too, to remove any leading or trailing commas left over from the Regex.Replace call. (It's possible that there's some way to do this with just a regex replace, but nothing springs immediately to mind.)
string goodString = Regex.Replace(badString, ",{2,}", ",").Trim(',');

Search for ,,+ and replace all with ,.
So in C# that could look like
resultString = Regex.Replace(subjectString, ",,+", ",");
,,+ means "match all occurrences of two commas or more", so single commas won't be touched. This can also be written as ,{2,}.

a simple solution without regular expressions :
string items = inputString.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
string result = String.Join(",", items);

Actually, you can do it without any Trim calls.
text = Regex.Replace(text, "^,+|,+$|(?<=,),+", "");
should do the trick.
The idea behind the regex is to only match that, which we want to remove. The first part matches any string of consecutive commas at the start of the input string, the second matches any consecutive string of commas at the end, while the last matches any consecutive string of commas that follows a comma.

Here is my effort:
//Below is the test string
string test = "YK 002 10 23 30 5 TDP_XYZ "
private static string return_with_comma(string line)
{
line = line.TrimEnd();
line = line.Replace(" ", ",");
line = Regex.Replace(line, ",,+", ",");
string[] array;
array = line.Split(',');
for (int x = 0; x < array.Length; x++)
{
line += array[x].Trim();
}
line += "\r\n";
return line;
}
string result = return_with_comma(test);
//Output is
//YK,002,10,23,30,5,TDP_XYZ

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to Fix Malformed Whitespace Separator using C# - c#

This is a simple solution using Split and Join string testString = ",,,The,,,boy,,,kicked,,,the,,ball"; var splitted = testString.Split(new char[] {','}, StringSplitOptions.RemoveEmptyEntries); string result = string.Join(" ", splitted); Console.WriteLine(result);

You could use regex to replace the pattern ,+ (one or more occurrences of a comma) by a whitespace. var replacedString = Regex.Replace(testString, ",+", " ").Trim(); Added Trim to remove white spaces at beginning/end as I assume you want to remove them.

Related

Smarter string replace based on a pattern

Regex Split - escape the same character I use for split

Replace SubString on Partial match of word

How to remove leading and trailing spaces from a string

Regex removing double/triple comma in string

Categories

Resources