How to remove leading and trailing spaces from a string - c#

I have the following input:
string txt = " i am a string "
I want to remove space from start of starting and end from a string.
The result should be: "i am a string"
How can I do this in c#?

String.Trim
Removes all leading and trailing white-space characters from the current String object.
Usage:
txt = txt.Trim();
If this isn't working then it highly likely that the "spaces" aren't spaces but some other non printing or white space character, possibly tabs. In this case you need to use the String.Trim method which takes an array of characters:
char[] charsToTrim = { ' ', '\t' };
string result = txt.Trim(charsToTrim);
Source
You can add to this list as and when you come across more space like characters that are in your input data. Storing this list of characters in your database or configuration file would also mean that you don't have to rebuild your application each time you come across a new character to check for.
NOTE
As of .NET 4 .Trim() removes any character that Char.IsWhiteSpace returns true for so it should work for most cases you come across. Given this, it's probably not a good idea to replace this call with the one that takes a list of characters you have to maintain.
It would be better to call the default .Trim() and then call the method with your list of characters.

You can use:
String.TrimStart - Removes all leading occurrences of a set of characters specified in an array from the current String object.
String.TrimEnd - Removes all trailing occurrences of a set of characters specified in an array from the current String object.
String.Trim - combination of the two functions above
Usage:
string txt = " i am a string ";
char[] charsToTrim = { ' ' };
txt = txt.Trim(charsToTrim)); // txt = "i am a string"
EDIT:
txt = txt.Replace(" ", ""); // txt = "iamastring"

I really don't understand some of the hoops the other answers are jumping through.
var myString = " this is my String ";
var newstring = myString.Trim(); // results in "this is my String"
var noSpaceString = myString.Replace(" ", ""); // results in "thisismyString";
It's not rocket science.

txt = txt.Trim();

Or you can split your string to string array, splitting by space and then add every item of string array to empty string.
May be this is not the best and fastest method, but you can try, if other answer aren't what you whant.

text.Trim() is to be used
string txt = " i am a string ";
txt = txt.Trim();

Use the Trim method.

static void Main()
{
// A.
// Example strings with multiple whitespaces.
string s1 = "He saw a cute\tdog.";
string s2 = "There\n\twas another sentence.";
// B.
// Create the Regex.
Regex r = new Regex(#"\s+");
// C.
// Strip multiple spaces.
string s3 = r.Replace(s1, #" ");
Console.WriteLine(s3);
// D.
// Strip multiple spaces.
string s4 = r.Replace(s2, #" ");
Console.WriteLine(s4);
Console.ReadLine();
}
OUTPUT:
He saw a cute dog.
There was another sentence.
He saw a cute dog.

You Can Use
string txt = " i am a string ";
txt = txt.TrimStart().TrimEnd();
Output is "i am a string"

Related

C# - Split a string separated by ':'

Im trying to split this string:
PublishDate: "2011-03-18T11:08:07.983"
I tried Split method but it's not successful.
str.Split(new[] { ':', ' ' }, StringSplitOptions.RemoveEmptyEntries)
As a result I get PublishDate 2011-03-18T11 08 07.983
But correct result is PublishDate 2011-03-18T11:08:07.983
What i need to do?
Split(String, Int32, StringSplitOptions)
Splits a string into a maximum number of substrings based on a specified delimiting string and, optionally, options.
str.Split(':', 2, StringSplitOptions.RemoveEmptyEntries)
https://learn.microsoft.com/en-us/dotnet/api/system.string.split?view=net-6.0#system-string-split(system-string-system-int32-system-stringsplitoptions)
I would solve like this:
locate the index of the first :. The property name will be all the characters before this, which you can extract with Substring and Trim to remove whitespace before the colon, if present.
locate the index of the first " and last ". Characters between the first and last quotes are the property value.
string input = "PublishDate: \"2011-03-18T11:08:07.983\"";
int iColon = input.IndexOf(':');
int iOpenQuote = input.IndexOf('"', iColon);
int iCloseQuote = input.LastIndexOf('"');
string propertyName = input.Substring(0, iColon).Trim();
string propertyValue = input.Substring(iOpenQuote + 1, iCloseQuote - iOpenQuote - 1);
This does not handle escaped characters within the property value (for example, to embed a literal quote or newline using a typical escape sequence like \" or \n). But it's likely good enough to extract a date/time string, and permits all characters because of the use of LastIndexOf. However, this is not robust against malformed input, so you will want to add checks for missing colon, or missing quote, or what happens when the close quote is missing (same same index for start and end quote).
So if I got you right, you want as a result: PublishDate 2011-03-18T11:08:07.983.
Then I would recommend you to use the string.Replace method.
using System;
public class HelloWorld
{
public static void Main(string[] args)
{
string yourData = "PublishDate: \"2011-03-18T11:08:07.983\"";
// First replace the colon and the space after the PublishDate with and space
// then replace the quotes from the timestamp -> "2011-03-18T11:08:07.983"
yourData = yourData.Replace(": ", " ").Replace("\"", "");
// Output the result -> PublishDate 2011-03-18T11:08:07.983
Console.WriteLine(yourData);
}
}

Remove character and space from a string

I am trying to remove all the characters appearing on one string from another. Ideally resulting string will not contain two spaces next to each other, at very least removed characters must not be replaced with spaces (or any other invisible characters).
I come up with following code but some sort of a space is left behind if I do so (in addition to having multiple sequential spaces instead of " a "). There is a remove method as well but it required an index and hence will be complicating the solution.
String s1="aeiou";
String s2="This is a test string which could be any text";
Console.WriteLine(s2);
for (int i=0; i<s1.Length; i++)
{
if(s2.Contains(s1[i]))
{
s2= s2.Replace(s1[i],'\0');
}
}
Console.WriteLine(s2);
Output:
Expected Output:
Ths s tst strng whch cld b ny txt
I used '\0' as string.Replace() is expecting characters only and for version with the second argument to be string.Empty first argument must be string too (which requires conversion - shown as "variant 1" later).
I already took reference from these related/suggested as duplicates posts (Remove characters from C# string, Remove '\' char from string c#) and did not find any approach that completely satisfy me.
Variant 1 (based on most voted answer. This version requires converting each character I want to replace to string which I don't like:
String s1="aeiou";
String s2="This is a test string which could be any text";
Console.WriteLine(s2);
foreach(var c in s1)
{
s2 = s2.Replace(c.ToString(), string.Empty);
}
Console.WriteLine(s2);
Variant 2 - String.Join with String.Split (answer). Requires converting my source replace string into array when I'd prefer to avoid that.
String s1="aeiou";
String s2="This is a test string which could be any text";
s2 = String.Join("", s2.Split(s1.ToCharArray()));
Variant 3 - Regex.Replace (answer) - this is even more complicated than variant 2 as I need to convert my replace string into proper regular expression, potentially being totally broken for something like "^!" as string to replace (also not needed in this particular case):
String s1="aeiou";
String s2="This is a test string which could be any text";
s2 = Regex.Replace(s2, "["+s1+"]", String.Empty);
Console.WriteLine(s2);
Variant 4 using Linq with constructing string from resulting char array (answer requires converting resulting sequence into array before constructing the string (which ideally should be avoided):
String s1="aeiou";
String s2="This is a test string which could be any text";
s2 = new string(s2.Where(c => !s1.Contains(c)).ToArray());
Console.WriteLine(s2);
Variant 5 - using String.Concat (answer) which so far looks the best but using Linq (I prefer not to... also maybe there is no good reason to be concerned of using Linq here)
String s1="aeiou";
String s2="This is a test string which could be any text";
s2 = string.Concat(s2.Where(c => !s1.Contains(c)));
Console.WriteLine(s2);
None of the solution I come up remove duplicate spaces, all variant X version do remove characters just fine but have some issues for my case. Ideal answer will not create too many extra strings, no Linq and no extra conversions to arrays.
Assuming you want to exclude chars in a string, and replace multiple white spaces with a single space afterwards, you can use regex easily in 2 steps
string input = "This is a test string which could be any text";
string exclude = "aeiou";
var stripped = Regex.Replace(input, $"[{exclude}]", ""); // exclude chars
var cleaned = Regex.Replace(stripped, "[ ]{2,}", " "); // replace multiple spaces
Console.WriteLine(stripped);
Console.WriteLine(cleaned);
Output
Ths s tst strng whch cld b ny txt
Ths s tst strng whch cld b ny txt
Full Demo Here
Note: if your string can contain characters that need to be escaped in regex use Regex.Escape as shown in following answer - $"[{Regex.Escape(exclude)}]".
In your situation use StringBuilder, to build your result from s2:
String s1 = "aeiou";
String s2 = "This is a test string which could be any text";
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s2.Length; i++)
{
// Check if current char is not contained in s1,
// then add it to sb
if (!s1.Contains(s2[i]))
{
sb.Append(s2[i]);
}
}
string result = sb.ToString();
Edit:
In order to remove spaces from string you can do:
string result = string.Join(" ", sb.ToString().Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries));
Output:
Ths s tst strng whch cld b ny txt
Also, here is LINQ solution for that:
var result = string.Concat(s2.Where(c => !s1.Contains(c)));
Also for this one, if you want to remove spaces in between words (you can create an extension method for that):
var raw = string.Concat(s2.Where(c => !s1.Contains(c)));
var result = string.Join(" ", raw.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries));
References: Enumerable.Where Method, String.Contains Method, String.Concat Method

Remove substring that starts with SOT and ends EOT, from string

I have a program that reads certain strings from memory. The strings contain, for the most part, recognizable characters. At random points in the strings however, "weird" characters appear. Characters I did not recognize. By going to a site that allows me to paste in Unicode characters to see what they are, I found that a selection of the "weird" characters were these:
\x{1} SOH, "start of heading", ctrl-a
\x{2} SOT, "start of text"
\x{3} EOT, "end of text"
\x{7} BEL, bell, ctrl-g
\x{13} dc3, device control three, ctrl-s
\x{11} dc1, device control one, ctrl-q
\x{14} dc4, device control four, ctrl-t
\x{1A} sub, substitute, ctrl-z
\x{6} ack, acknowledge, ctrl-f
I wanted to parse my strings to remove these characters. What I found out though, by looking at the strings, was that all the unwanted characters were always surrounded by the SOT and EOT, respectively.
Therefore, I am thinking that my question is: How can I remove, from a string, all occurrences of substrings that starts with SOT and ends with EOT?
Edit: Attempt at Solution
Using ideas from #RagingCain I made the following method:
private static string RemoveInvalidCharacters(string input)
{
while (true)
{
var start = input.IndexOf('\u0002');
var end = input.IndexOf('\u0003', start);
if (start == -1 || end == -1) break;
Console.WriteLine(#"Start: " + start + #". End: " + end);
var diff = end - start;
input = input.Remove(start, diff);
}
return input;
}
It does the trick, thanks again.
Regex would be your solution and should work fine. You would assign these characters to the Pattern and you can use the sub-method Match or even just Replace them with whitespace " ", or just cut them from the string all together by using "".
Regex.Replace: https://msdn.microsoft.com/en-us/library/xwewhkd1(v=vs.110).aspx
Regex.Match: https://msdn.microsoft.com/en-us/library/bk1x0726(v=vs.110).aspx
Regex example:
public static void Main()
{
string input = "This is text with far too much " +
"whitespace.";
string pattern = "\\s+";
string replacement = " ";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
Console.WriteLine("Original String: {0}", input);
Console.WriteLine("Replacement String: {0}", result);
}
I know the difficulty though of not being able to "see" them so you should assign them to Char variables by Unicode itself, add them to the pattern for replace.
Char Variables: https://msdn.microsoft.com/en-us/library/x9h8tsay.aspx
Unicode for Start of Text:
http://www.fileformat.info/info/unicode/char/0002/index.htm
Unicode for End of Text:
http://www.fileformat.info/info/unicode/char/0003/index.htm
To apply to your solution:
Does string contain SOT, EOT.
If true, remove entire string/sub-string/SOT or EOT.
It maybe easier to split original string into a string[], then go line by line... it's difficult to parse through your string without knowing what it looks like so hopefully I provided something that helps ^.^

Search for a newline Character C#.net

How do i search a string for a newline character? Both of the below seem to be returning -1....!
theJ = line.IndexOf(Environment.NewLine);
OR
theJ = line.IndexOf('\n');
The string it's searching is "yo\n"
the string i'm parsing contains this "printf("yo\n");"
the string i see contained during the comparison is this: "\tprintf(\"yo\n\");"
"yo\n" // output as "yo" + newline
"yo\n".IndexOf('\n') // returns 2
"yo\\n" // output as "yo\n"
"yo\\n".IndexOf('\n') // returns -1
Are you sure you're searching yo\n and not yo\\n?
Edit
Based on your update, I can see that I guessed correctly. If your string says:
printf("yo\n");
... then this does not contain a newline character. If it did, it would look like this:
printf("yo
");
What it actually has is an escaped newline character, or in other words, a backslash character followed by an 'n'. That's why the string you're seeing when you debug is "\tprintf(\"yo\\n\");". If you want to find this character combination, you can use:
line.IndexOf("\\n")
For example:
"\tprintf(\"yo\\n\");" // output as " printf("yo\n");"
"\tprintf(\"yo\\n\");".IndexOf("\\n") // returns 11
Looks like your line does not contain a newline.
If you are using File.ReadAllLines or string.Split on newline, then each line in the returned array will not contain the newline. If you are using StreamReader or one of the classes inheriting from it, the ReadLine method will return the string without the newline.
string lotsOfLines = #"one
two
three";
string[] lines = lotsOfLines.Split('\n');
foreach(string line in lines)
{
Console.WriteLine(line.IndexOf('\n'); // prints -1 three times
}
That should work although in Windows you'll have to search for '\r\n'.
-1 simply means that no enter was found.
It depends what you are trying to do. Both may no be identical on some platforms.
Environment.NewLine returns:
A string containing "\r\n" for non-Unix platforms, or a string
containing "\n" for Unix platforms.
Also:
If you want to search for the \n char (new line on Unix), use \n
If you want to search for the \r\n chars (new line on Windows), use \r\n
If your search depend on the current platform, use Environment.NewLine
If it returns -1 in both cases you mentioned, then you don't have a new line in your string.
When I was in college and I did a WebForms aplication to order referencies.
And the line break/carriage return it was what I used to break a referense.
//Text from TextBox
var text = textBox1.Text;
//Create an array with the text between the carriage returns
var references = text.Split(new string[] { "\r\n", "\r" }, StringSplitOptions.RemoveEmptyEntries);
//Simple OrderBy(Alphabetical)
var ordered = references.ToList<string>().OrderBy(ff => ff);
//Return the entry text ordered alphabetical em with a carriage return between every result
var valueToReturn = String.Join(Environment.NewLine, ordered);
textBox1.Text = valueToReturn;
The Environment.NewLine is not the same as \n. It is a CRLF (\r\n). However, I did try with the \n using IndexOf and my test did find the value. Are you sure what you're searching for is a \n rather than a \r? View your text in hexadecimal format and see what the hex value is.

Replace Line Breaks in a String C#

How can I replace Line Breaks within a string in C#?
Use replace with Environment.NewLine
myString = myString.Replace(System.Environment.NewLine, "replacement text"); //add a line terminating ;
As mentioned in other posts, if the string comes from another environment (OS) then you'd need to replace that particular environments implementation of new line control characters.
The solutions posted so far either only replace Environment.NewLine or they fail if the replacement string contains line breaks because they call string.Replace multiple times.
Here's a solution that uses a regular expression to make all three replacements in just one pass over the string. This means that the replacement string can safely contain line breaks.
string result = Regex.Replace(input, #"\r\n?|\n", replacementString);
To extend The.Anyi.9's answer, you should also be aware of the different types of line break in general use. Dependent on where your file originated, you may want to look at making sure you catch all the alternatives...
string replaceWith = "";
string removedBreaks = Line.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
should get you going...
I would use Environment.Newline when I wanted to insert a newline for a string, but not to remove all newlines from a string.
Depending on your platform you can have different types of newlines, but even inside the same platform often different types of newlines are used. In particular when dealing with file formats and protocols.
string ReplaceNewlines(string blockOfText, string replaceWith)
{
return blockOfText.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
}
If your code is supposed to run in different environments, I would consider using the Environment.NewLine constant, since it is specifically the newline used in the specific environment.
line = line.Replace(Environment.NewLine, "newLineReplacement");
However, if you get the text from a file originating on another system, this might not be the correct answer, and you should replace with whatever newline constant is used on the other system. It will typically be \n or \r\n.
if you want to "clean" the new lines, flamebaud comment using regex #"[\r\n]+" is the best choice.
using System;
using System.Text.RegularExpressions;
class MainClass {
public static void Main (string[] args) {
string str = "AAA\r\nBBB\r\n\r\n\r\nCCC\r\r\rDDD\n\n\nEEE";
Console.WriteLine (str.Replace(System.Environment.NewLine, "-"));
/* Result:
AAA
-BBB
-
-
-CCC
DDD---EEE
*/
Console.WriteLine (Regex.Replace(str, #"\r\n?|\n", "-"));
// Result:
// AAA-BBB---CCC---DDD---EEE
Console.WriteLine (Regex.Replace(str, #"[\r\n]+", "-"));
// Result:
// AAA-BBB-CCC-DDD-EEE
}
}
Use new in .NET 6 method
myString = myString.ReplaceLineEndings();
Replaces ALL newline sequences in the current string.
Documentation:
ReplaceLineEndings
Don't forget that replace doesn't do the replacement in the string, but returns a new string with the characters replaced. The following will remove line breaks (not replace them). I'd use #Brian R. Bondy's method if replacing them with something else, perhaps wrapped as an extension method. Remember to check for null values first before calling Replace or the extension methods provided.
string line = ...
line = line.Replace( "\r", "").Replace( "\n", "" );
As extension methods:
public static class StringExtensions
{
public static string RemoveLineBreaks( this string lines )
{
return lines.Replace( "\r", "").Replace( "\n", "" );
}
public static string ReplaceLineBreaks( this string lines, string replacement )
{
return lines.Replace( "\r\n", replacement )
.Replace( "\r", replacement )
.Replace( "\n", replacement );
}
}
To make sure all possible ways of line breaks (Windows, Mac and Unix) are replaced you should use:
string.Replace("\r\n", "\n").Replace('\r', '\n').Replace('\n', 'replacement');
and in this order, to not to make extra line breaks, when you find some combination of line ending chars.
Why not both?
string ReplacementString = "";
Regex.Replace(strin.Replace(System.Environment.NewLine, ReplacementString), #"(\r\n?|\n)", ReplacementString);
Note: Replace strin with the name of your input string.
I needed to replace the \r\n with an actual carriage return and line feed and replace \t with an actual tab. So I came up with the following:
public string Transform(string data)
{
string result = data;
char cr = (char)13;
char lf = (char)10;
char tab = (char)9;
result = result.Replace("\\r", cr.ToString());
result = result.Replace("\\n", lf.ToString());
result = result.Replace("\\t", tab.ToString());
return result;
}
var answer = Regex.Replace(value, "(\n|\r)+", replacementString);
As new line can be delimited by \n, \r and \r\n, first we’ll replace \r and \r\n with \n, and only then split data string.
The following lines should go to the parseCSV method:
function parseCSV(data) {
//alert(data);
//replace UNIX new lines
data = data.replace(/\r\n/g, "\n");
//replace MAC new lines
data = data.replace(/\r/g, "\n");
//split into rows
var rows = data.split("\n");
}
Use the .Replace() method
Line.Replace("\n", "whatever you want to replace with");
Best way to replace linebreaks safely is
yourString.Replace("\r\n","\n") //handling windows linebreaks
.Replace("\r","\n") //handling mac linebreaks
that should produce a string with only \n (eg linefeed) as linebreaks.
this code is usefull to fix mixed linebreaks too.
Another option is to create a StringReader over the string in question. On the reader, do .ReadLine() in a loop. Then you have the lines separated, no matter what (consistent or inconsistent) separators they had. With that, you can proceed as you wish; one possibility is to use a StringBuilder and call .AppendLine on it.
The advantage is, you let the framework decide what constitutes a "line break".
string s = Regex.Replace(source_string, "\n", "\r\n");
or
string s = Regex.Replace(source_string, "\r\n", "\n");
depending on which way you want to go.
Hopes it helps.
If you want to replace only the newlines:
var input = #"sdfhlu \r\n sdkuidfs\r\ndfgdgfd";
var match = #"[\\ ]+";
var replaceWith = " ";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input.Replace(#"\n", replaceWith).Replace(#"\r", replaceWith), match, replaceWith);
Console.WriteLine("output: " + x);
If you want to replace newlines, tabs and white spaces:
var input = #"sdfhlusdkuidfs\r\ndfgdgfd";
var match = #"[\\s]+";
var replaceWith = "";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input, match, replaceWith);
Console.WriteLine("output: " + x);
This is a very long winded one-liner solution but it is the only one that I had found to work if you cannot use the the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method
MyStr.replace( System.String.Concat( System.Char.ConvertFromUtf32(13).ToString(), System.Char.ConvertFromUtf32(10).ToString() ), ReplacementString );
This is somewhat offtopic but to get it to work inside Visual Studio's XML .props files, which invoke .NET via the XML properties, I had to dress it up like it is shown below.
The Visual Studio XML --> .NET environment just would not accept the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method.
$([System.IO.File]::ReadAllText('MyFile.txt').replace( $([System.String]::Concat($([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString()))),$([System.String]::Concat('^',$([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString())))))
Based on #mark-bayers answer and for cleaner output:
string result = Regex.Replace(ex.Message, #"(\r\n?|\r?\n)+", "replacement text");
It removes \r\n , \n and \r while perefer longer one and simplify multiple occurances to one.

Categories

Resources