Is it possible to pass over a string, finding the white spaces?
For example a data set of:
string myString = "aa bbb cccc dd";
How could I loop through and detect each white space, and manipulate that space?
I need to do this in the most effecient way possible.
Thanks.
UPDATE:
I need to manipulate the space by increasing the white space from an integer value. So for instance increase the space to have 3 white spaces instead of one. I'd like to make it go through each white space in one loop, any method of doing this already in .NET? By white space I mean a ' '.
You can use the Regex.Replace method. This will replace any group of white space character with a dash:
myString = Regex.Replace(myString, "(\s+)", m => "-");
Update:
This will find groups of space characters and replace with the tripple amount of spaces:
myString = Regex.Replace(
myString,
"( +)",
m => new String(' ', m.Groups[1].Value.Length * 3)
);
However, that's a bit too simple to make use of regular expressions. You can do the same with a regular replace:
myString = myString.Replace(" ", " ");
This will replace each space intead of replace groups of spaces, but the regular replace is much simpler than Regex.Replace, so it should still be at least as fast, and the code is simpler.
If you want to replace all whitespace in one swoop, you can do:
// changes all strings to dashes
myString.Replace(' ', '-');
If you want to go case by case (that is, not just a mass replace), you can loop through IndexOf():
int pos = myString.IndexOf(' ');
while (pos >= 0)
{
// do whatever you want with myString # pos
// find next
pos = myString.IndexOf(' ', pos + 1);
}
UPDATE
As per your update, you could replace single spaces with the number of spaces specified by a variable (such as numSpaces) as follows:
myString.Replace(" ", new String(' ', numSpaces));
If you just want to replace all spaces with some other character:
myString = myString.Replace(' ', 'x');
If you need the possibility of doing something different to each:
foreach(char c in myString)
{
if (c == ' ')
{
// do something
}
}
Edit:
Per your comment clarifying your question:
To change each space to three spaces, you can do this:
myString = myString.Replace(" ", " ");
However note that this doesn't take into account instances where your input string already has two or more spaces. If that is a possibility you will want to use a regex.
Depending on what you're tring to do:
for(int k = 0; k < myString.Length; k++)
{
if(myString[k].IsWhiteSpace())
{
// do something with it
}
}
The above is a single pass through the string, so it's O(n). You can't really get more efficient that that.
However, if you want to manipulate the original string your best bet is to Use a StringBuilder to process the changes:
StringBuilder sb = new StringBuilder(myString);
for(int k = 0; k < myString.Length; k++)
{
if(myString[k].IsWhiteSpace())
{
// do something with sb
}
}
Finally, don't forget about Regular Expressions. It may not always be the most efficient method in terms of code run-time complexity but as far as efficiency of coding it may be a good trade-off.
For instance, here's a way to match all white spaces:
var rex = new System.Text.RegularExpressions.Regex("[^\\s](\\s+)[^\\s]");
var m = rex.Match(myString);
while(m.Success)
{
// process the match here..
m.NextMatch();
}
And here's a way to replace all white spaces with an arbitrary string:
var rex = new System.Text.RegularExpressions.Regex("\\s+");
String replacement = "[white_space]";
// replaces all occurrences of white space with the string [white_space]
String result = rex.Replace(myString, replacement);
Use string.Replace().
string newString = myString.Replace(" ", " ");
LINQ query below returns a set of anonymous type items with two properties - "sybmol" represents a white space character, and "index" - index in the input sequence. After that you have all whitespace characters and a position in the input sequence, now you can do what you want with this.
string myString = "aa bbb cccc dd";
var res = myString.Select((c, i) => new { symbol = c, index = i })
.Where(c => Char.IsWhiteSpace(c.symbol));
EDIT: For educational purposes below is implementation you are looking for, but obviously in real system use built in string constructor and String.Replace() as shown in other answers
string myString = "aa bbb cccc dd";
var result = this.GetCharacters(myString, 5);
string output = new string(result.ToArray());
public IEnumerable<char> GetCharacters(string input, int coeff)
{
foreach (char c in input)
{
if (Char.IsWhiteSpace(c))
{
int counter = coeff;
while (counter-- > 0)
{
yield return c;
}
}
else
{
yield return c;
}
}
}
var result = new StringBuilder();
foreach(Char c in myString)
{
if (Char.IsWhiteSpace(c))
{
// you can do what you wish here. strings are immutable, so you can only make a copy with the results you want... hence the "result" var.
result.Append('_'); // for example, replace space with _
}
else result.Append(c);
}
myString = result.ToString();
If you want to replace the white space with, e.g. '_', you can using String.Replace.
Example:
string myString = "aa bbb cccc dd";
string newString = myString.Replace(" ", "_"); // gives aa_bbb_cccc_dd
In case you want to left/right justify your string
int N=10;
string newstring = String.Join(
"",
myString.Split(' ').Select(s=>s.PadRight(N-s.Length)));
Related
I have string like:
/api/agencies/{AgencyGuid}/contacts/{ContactGuid}
I need to change text in { } to cameCase
/api/agencies/{agencyGuid}/contacts/{contactGuid}
How can I do that? What is the best way to do that? Please help
I have no experience with Regex. So, I have tried so far:
string str1 = "/api/agencies/{AgencyGuid}/contacts/{ContactGuid}";
string str3 = "";
int i = 0;
while(i < str1.Length)
{
if (str1[i] == '{')
{
str3 += "{" + char.ToLower(str1[i + 1]);
i = i + 2;
} else
{
str3 += str1[i];
i++;
}
}
You can do it with regex of course.
But you can do it also with LINQ like this:
var result = String.Join("/{",
str1.Split(new string[1] { "/{" }, StringSplitOptions.RemoveEmptyEntries)
.Select(k => k = !k.StartsWith("/") ? Char.ToLowerInvariant(k[0]) + k.Substring(1) : k));
What is done here is: Splitting into 3 parts:
"/api/agencies/"
"AgencyGuid}/contactpersons"
"ContactPersonGuid}"
After that we are selecting from each element such value: "If you start with "/" it means you are the first element. If so - you should be returned without tampering. Otherwise : take first char (k[0]) change it to lowercase ( Char.ToLowerInvariant() ) and concatenate with the rest.
At the end Join those three (one unchanged and two changed) strings
With Regex you can do it as:
var regex = new Regex(#"\/{(\w)");
var result = regex.Replace(str1, m => m.ToString().ToLower());
in regex we search for pattern "/{\w" meaning find "/{" and one letter (\w). This char will be taken into a group ( because of () surrounding) and after that run Regex and replace such group to m.ToString().ToLower()
I probably wouldn't use regex, but since you asked
Regex.Replace(
"/api/agencies/{AgencyGuid}/contactpersons/{ContactPersonGuid}",
#"\{[^\}]+\}",
m =>
$"{{{m.Value[1].ToString().ToLower()}{m.Value.Substring(2, m.Value.Length-3)}}}",
RegexOptions.ExplicitCapture
)
This assumes string interpolation in c# 6, but you can do the same thing by concatenating.
Explanation:
{[^}]+} - grab all letters that follow an open mustache that are not a close mustache and then the close mustache
m => ... - A lambda to run on each match
"{{{m.Value[1].ToString().ToLower()}{m.Value.Substring(2, m.Value.Length-3)}}}" - return a new string by taking the an open mustache, the first letter lowercased, then the rest of the string, then a close mustache.
I have a string in the format:
abc def ghi xyz
I would like to end with it in format:
abcdefghi xyz
What is the best way to do this? In this particular case, I could just strip off the last three characters, remove spaces, and then add them back at the end, but this won't work for cases in which the multiple spaces are in the middle of the string.
In Short, I want to remove all single whitespaces, and then replace all multiple whitespaces with a single. Each of those steps is easy enough by itself, but combining them seems a bit less straightforward.
I'm willing to use regular expressions, but I would prefer not to.
This approach uses regular expressions but hopefully in a way that's still fairly readable. First, split your input string on multiple spaces
var pattern = #" +"; // match two or more spaces
var groups = Regex.Split(input, pattern);
Next, remove the (individual) spaces from each token:
var tokens = groups.Select(group => group.Replace(" ", String.Empty));
Finally, join your tokens with single spaces
var result = String.Join(' ', tokens.ToArray());
This example uses a literal space character rather than 'whitespace' (which includes tabs, linefeeds, etc.) - substitute \s for ' ' if you need to split on multiple whitespace characters rather than actual spaces.
Well, Regular Expressions would probably be the fastest here, but you could implement some algorithm that uses a lookahead for single spaces and then replaces multiple spaces in a loop:
// Replace all single whitespaces
for (int i = 0; i < sourceString.Length; i++)
{
if (sourceString[i] = ' ')
{
if (i < sourceString.Length - 1 && sourceString[i+1] != ' ')
sourceString = sourceString.Delete(i);
}
}
// Replace multiple whitespaces
while (sourceString.Contains(" ")) // Two spaces here!
sourceString = sourceString.Replace(" ", " ");
But hey, that code is pretty ugly and slow compared to a proper regular expression...
For a Non-REGEX option you can use:
string str = "abc def ghi xyz";
var result = str.Split(); //This will remove single spaces from the result
StringBuilder sb = new StringBuilder();
bool ifMultipleSpacesFound = false;
for (int i = 0; i < result.Length;i++)
{
if (!String.IsNullOrWhiteSpace(result[i]))
{
sb.Append(result[i]);
ifMultipleSpacesFound = false;
}
else
{
if (!ifMultipleSpacesFound)
{
ifMultipleSpacesFound = true;
sb.Append(" ");
}
}
}
string output = sb.ToString();
The output would be:
output = "abcdefghi xyz"
Here's an approach which uses some fairly subtle logic:
public static string RemoveUnwantedSpaces(string text)
{
var sb = new StringBuilder();
char lhs = '\0';
char mid = '\0';
foreach (char rhs in text)
{
if (rhs != ' ' || (mid == ' ' && lhs != ' '))
sb.Append(rhs);
lhs = mid;
mid = rhs;
}
return sb.ToString().Trim();
}
How it works:
We will examine each possible three-character subsequence linearly across the string (in a kind of three-character sliding window). These three characters will be represented, in order, by the variables lhs, mid and rhs.
For each rhs character in the string:
If it's not a space we should output it.
If it is a space, and the previous character was also space but the one before that isn't, then this is the second in a sequence of at least two spaces, and therefore we should output one space.
Otherwise, don't output a space because this is either the first or the third (or later) space in a sequence of two or more spaces and in either case we don't want to output a space: If this happens to be the first in a sequence of two or more spaces, a space will be output when the second space comes along. If this is the third or later, we've already output a space for it.
The subtlety here is that I've avoided special casing the beginning of the sequence by initialising the lhs and mid variables with non-space characters. It doesn't matter what those values are, as long as they are not spaces, but I made them \0 to indicate that they are special values.
After second thought here is one line regex solution:
Regex.Replace("abc def ghi xyz", "( )( )*([^ ])", "$2$3")
the result of this is "abcdefghi xyz"
ORIGINAL ANSWER:
Two lines of code regex solution:
var tmp = Regex.Replace("abc def ghi xyz", "( )([^ ])", "$2")
tmp is "abcdefghi xyz"
then:
var result = Regex.Replace(tmp, "( )+", " ");
result is "abcdefghi xyz"
Explanation:
The first line of code removes single whitespaces and removes one whitespace for multiple whitespaces (so there are 3 spaces in tmp between letters i and x).
The second line just replace multiple whitespaces with one.
In-depth explanation of first line:
We match input string to regex that matches one space and non-space next to it. We also put this two characters in separate groups (we use ( ) for anonymous grouping).
So for "abc def ghi xyz" string we have this matches and groups:
match: " d" group1: " " group2: "d"
match: " g" group1: " " group2: "g"
match: " x" group1: " " group2: "x"
We are using substitution syntax for Regex.Replace method to replace match with the content of second group (which is non-whitespace character)
I found it not efficient to iterate through string parts split by space character and extract numeric parts and apply
UInt64.Parse(Regex.Match(numericPart, #"\d+").Value)
and the concatenating them together to form the string with numbers being grouped.
Is there a better, more efficient way to 3-digit grouping of all numbers in an string containing other characters?
I am pretty sure the most efficient way (CPU-wise, with just a single pass over the string) is the basic foreach loop, along these lines
var sb = new StringBuilder()
foreach(char c in inputString)
{
// if c is a digit count
// else reset counter
// if there are three digits insert a "."
}
return sb.ToString()
This will produce 123.456.7
If you want 1.234.567 you'll need an additional buffer for digit-sequences
So you want to replace all longs in a string with the same long but with a number-group-separator of the current culture? .... Yes
string[] words = input.Split();
var newWords = words.Select(w =>
{
long l;
bool isLong = System.Int64.TryParse(w.Trim(), out l);
if(isLong)
return l.ToString("N0");
else
return w;
});
string result = string.Join(" ", newWords);
With the input from your comment:
string input = "hello 134443 in the 33 when 88763 then";
You get the expected result: "hello 134,443 in the 33 when 88,763 then", if your current culture uses comma as number-group-separator.
I will post my regex-based example. I believe regex does not have to be too slow, especially once it is compiled and is declared with static and readonly.
// Declare the regex
private static readonly Regex regex = new Regex(#"(\d)(?=(\d{3})+(?!\d))", RegexOptions.Compiled);
// Then, somewhere inside a method
var replacement = string.Format("$1{0}", System.Globalization.CultureInfo.CurrentCulture.NumberFormat.NumberGroupSeparator); // Get the system digit grouping separator
var strn = "Hello 34234456 where 3334 is it?"; // Just a sample string
// Somewhere (?:inside a loop)?
var res = regex.Replace(strn, replacement);
Output (if , is a system digit grouping separator):
Hello 34,234,456 where 3,334 is it?
Split or Regex.Split is used to extract the word in a sentence(s) and store them in array. I instead would like to extract the spaces in a sentence(s) and store them in array (it is possible that this sentence contains multiple spaces). Is there easy way of doing it? I first tried to split it normally, and then use string.split(theSplittedStrings, StringSplitOptions.RemoveEmptyEntries) however, that did not preserve the amount of spaces that exists.
---------- EDIT -------------
for example. If there is a sentence "This is a test".
I would like to make an array of string { " ", " ", " "}.
---------- EDIT END ---------
Any helps are appreciated.
Thank you.
EDIT:
Based on your edited question, I believe you can do that with simple iteration like:
string str = "This is a test";
List<string> spaceList = new List<string>();
var temp = str.TakeWhile(char.IsWhiteSpace).ToList();
List<char> charList = new List<char>();
foreach (char c in str)
{
if (c == ' ')
{
charList.Add(c);
}
if (charList.Any() && c != ' ')
{
spaceList.Add(new string(charList.ToArray()));
charList = new List<char>();
}
}
That would give you spaces in different elements of List<string>, if you need an array back then you can call ToArray
(Old Answer)
You don't need string.Split. You can count the spaces in the string and then create array like:
int spaceCount = str.Count(r => r == ' ');
char[] array = Enumerable.Repeat<char>(' ', spaceCount).ToArray();
If you want to consider White-Space (Space, LineBreak, Tabs) as space then you can use:
int whiteSpaceCount = str.Count(char.IsWhiteSpace);
This code matches all spaces in the input string and outputs their indexes:
const string sentence = "This is a test sentence.";
MatchCollection matches = Regex.Matches(sentence, #"\s");
foreach (Match match in matches)
{
Console.WriteLine("Space at character {0}", match.Index);
}
This code retrieves all space groups as an array:
const string sentence = "This is a test sentence.";
string[] spaceGroups = Regex.Matches(sentence, #"\s+").Cast<Match>().Select(arg => arg.Value).ToArray();
In either case, you can look at the Match instances' Index property values to get the location of the space/space group in the string.
I have string in my c# code
a,b,c,d,"e,f",g,h
I want to replace "e,f" with "e f" i.e. ',' which is inside inverted comma should be replaced by space.
I tried using string.split but it is not working for me.
OK, I can't be bothered to think of a regex approach so I am going to offer an old fashioned loop approach which will work:
string DoReplace(string input)
{
bool isInner = false;//flag to detect if we are in the inner string or not
string result = "";//result to return
foreach(char c in input)//loop each character in the input string
{
if(isInner && c == ',')//if we are in an inner string and it is a comma, append space
result += " ";
else//otherwise append the character
result += c;
if(c == '"')//if we have hit an inner quote, toggle the flag
isInner = !isInner;
}
return result;
}
NOTE: This solution assumes that there can only be one level of inner quotes, for example you cannot have "a,b,c,"d,e,"f,g",h",i,j" - because that's just plain madness!
For the scenario where you only need to match one pair of letters, the following regex will work:
string source = "a,b,c,d,\"e,f\",g,h";
string pattern = "\"([\\w]),([\\w])\"";
string replace = "\"$1 $2\"";
string result = Regex.Replace(source, pattern, replace);
Console.WriteLine(result); // a,b,c,d,"e f",g,h
Breaking apart the pattern, it is matching any instance where there is a "X,X" sequence where X is any letter, and is replacing it with the very same sequence, with a space in between the letters instead of a comma.
You could easily extend this if you needed to to have it match more than one letter, etc, as needed.
For the case where you can have multiple letters separated by commas within quotes that need to be replaced, the following can do it for you. Sample text is a,b,c,d,"e,f,a",g,h:
string source = "a,b,c,d,\"e,f,a\",g,h";
string pattern = "\"([ ,\\w]+),([ ,\\w]+)\"";
string replace = "\"$1 $2\"";
string result = source;
while (Regex.IsMatch(result, pattern)) {
result = Regex.Replace(result, pattern, replace);
}
Console.WriteLine(result); // a,b,c,d,"e f a",g,h
This does something similar compared to the first one, but just removes any comma that is sandwiched by letters surrounded by quotes, and repeats it until all cases are removed.
Here's a somewhat fragile but simple solution:
string.Join("\"", line.Split('"').Select((s, i) => i % 2 == 0 ? s : s.Replace(",", " ")))
It's fragile because it doesn't handle flavors of CSV that escape double-quotes inside double-quotes.
Use the following code:
string str = "a,b,c,d,\"e,f\",g,h";
string[] str2 = str.Split('\"');
var str3 = str2.Select(p => ((p.StartsWith(",") || p.EndsWith(",")) ? p : p.Replace(',', ' '))).ToList();
str = string.Join("", str3);
Use Split() and Join():
string input = "a,b,c,d,\"e,f\",g,h";
string[] pieces = input.Split('"');
for ( int i = 1; i < pieces.Length; i += 2 )
{
pieces[i] = string.Join(" ", pieces[i].Split(','));
}
string output = string.Join("\"", pieces);
Console.WriteLine(output);
// output: a,b,c,d,"e f",g,h