I need to read a string, character by character, and build a new string as the output.
What's the best approach to do this in C#?
Use a StringBuilder? Use some writer/stream?
Note that there will be no I/O operations--this is strictly an in-memory transformation.
If the size of the string cannot be determined at compile time and it may also be relatively large, you should use a StringBuilder for concatenation as it acts like a mutable string.
var input = SomeLongString;
// may as well initialize the capacity as well
// as the length will be 1 to 1 with the unprocessed input.
var sb = new StringBuilder( input.Length );
foreach( char c in input )
{
sb.Append( Process( c ) );
}
if it's just one string you can use a collection to hold your characters and then just create the string using the constructor:
IEnumerable<char> myChars = ...;
string result = new string(myChars);
Using Linq and with the help of a method ProcessChar(char c) that transforms each character to its output value this could be just a query transformation (using the string constructor that takes an IEnumerable<char> as input):
string result = new string(sourceString.Select(c => ProcessChar(c)));
This is as efficient as using a StringBuilder (since StringBuilder is used internally in the string class to construct the string from the IEnumerable), but much more readable in my opinion.
Stringbuilder is usually a pretty good bet. I've written lots of javascript in webpages using it.
A StringBuilder is good idea for building your new string, because you can efficiently append new values to it. As for reading the characters from the input string, a StringReader would be a sufficient choice.
void Main()
{
string myLongString = "lf;kajsd;lfkjal;dfkja;lkdfja;lkdjf;alkjdfa";
var transformedTString = string.Join(string.Empty, myLongString.ToCharArray().Where(x => x != ';'));
transformedTString.Dump();
}
If you have more complicated logic you can move your validation to separate predicated method
void Main()
{
string myLongString = "lf;kajsd;lfkjal;dfkja;lkdfja;lkdjf;alkjdfa";
var transformedTString = string.Join(string.Empty, myLongString.ToCharArray().Where(MyPredicate));
transformedTString.Dump();
}
public bool MyPredicate(char c)
{
return c != ';';
}
What's the difference between read string and output string? I mean why do you have to read char by char?
I use this method for reading string
string str = "some stuff";
string newStr = ToNewString(str);
string ToNewString(string arg)
{
string r = string.Empty;
foreach (char c in arg)
r += DoWork(c);
return r;
}
char DoWorK(char arg)
{
// What do you want to do here?
}
Related
I've been using C# String.Format for formatting numbers before like this (in this example I simply want to insert a space):
String.Format("{0:### ###}", 123456);
output:
"123 456"
In this particular case, the number is a string. My first thought was to simply parse it to a number, but it makes no sense in the context, and there must be a prettier way.
Following does not work, as ## looks for numbers
String.Format("{0:### ###}", "123456");
output:
"123456"
What is the string equivalent to # when formatting? The awesomeness of String.Format is still fairly new to me.
You have to parse the string to a number first.
int number = int.Parse("123456");
String.Format("{0:### ###}", number);
of course you could also use string methods but that's not as reliable and less safe:
string strNumber = "123456";
String.Format("{0} {1}", strNumber.Remove(3), strNumber.Substring(3));
As Heinzi pointed out, you can not have format specifier for string arguments.
So, instead of String.Format, you may use following:
string myNum="123456";
myNum=myNum.Insert(3," ");
Not very beautiful, and the extra work might outweigh the gains, but if the input is a string on that format, you could do:
var str = "123456";
var result = String.Format("{0} {1}", str.Substring(0,3), str.Substring(3));
string is not a IFormattable
Console.WriteLine("123456" is IFormattable); // False
Console.WriteLine(21321 is IFormattable); // True
No point to supply a format if the argument is not IFormattable only way is to convert your string to int or long
We're doing string manipulation, so we could always use a regex.
Adapted slightly from here:
class MyClass
{
static void Main(string[] args)
{
string sInput, sRegex;
// The string to search.
sInput = "123456789";
// The regular expression.
sRegex = "[0-9][0-9][0-9]";
Regex r = new Regex(sRegex);
MyClass c = new MyClass();
// Assign the replace method to the MatchEvaluator delegate.
MatchEvaluator myEvaluator = new MatchEvaluator(c.ReplaceNums);
// Replace matched characters using the delegate method.
sInput = r.Replace(sInput, myEvaluator);
// Write out the modified string.
Console.WriteLine(sInput);
}
public string ReplaceNums(Match m)
// Replace each Regex match with match + " "
{
return m.ToString()+" ";
}
}
How's that?
It's been ages since I used C# and I can't test, but this may work as a one-liner which may be "neater" if you only need it once:
sInput = Regex("[0-9][0-9][0-9]").Replace(sInput,MatchEvaluator(Match m => m.ToString()+" "));
There is no way to do what you want unless you parse the string first.
Based on your comments, you only really need a simple formatting so you are better off just implementing a small helper method and thats it. (IMHO it's not really a good idea to parse the string if it isn't logically a number; you can't really be sure that in the future the input string might not be a number at all.
I'd go for something similar to:
public static string Group(this string s, int groupSize = 3, char groupSeparator = ' ')
{
var formattedIdentifierBuilder = new StringBuilder();
for (int i = 0; i < s.Length; i++)
{
if (i != 0 && (s.Length - i) % groupSize == 0)
{
formattedIdentifierBuilder.Append(groupSeparator);
}
formattedIdentifierBuilder.Append(s[i]);
}
return formattedIdentifierBuilder.ToString();
}
EDIT: Generalized to generic grouping size and group separator.
The problem is that # is a Digit placeholder and it is specific to numeric formatting only. Hence, you can't use this on strings.
Either parse the string to a numeric, so the formatting rules apply, or use other methods to split the string in two.
string.Format("{0:### ###}", int.Parse("123456"));
I have two strings
string str1 = "Hello World !"; // the position of W character is 6
string str2 = "peace";
//...
string result = "Hello peace !"; // str2 is written to str1 from position 6
Is there a function like this:
string result = str1.Rewrite(str2, 6); // (string, position)
EDITED
This "Hello World !" is just an example, I don't know whether there is "W" character in this string, what I only know are: str1, str2, position (int)
There is not, but you could create one using an extension method.
public static class StringExtensions
{
public static string Rewrite(this string input, string replacement, int index)
{
var output = new System.Text.StringBuilder();
output.Append(input.Substring(0, index));
output.Append(replacement);
output.Append(input.Substring(index + replacement.Length));
return output.ToString();
}
}
Then, the code you posted in your original question would work:
string result = str1.Rewrite(str2, 6); // (string, position)
#danludwigs answer is better from a code understandability perspective, however this version is a tad faster. Your explanation that you are dealing with binary data in string format (wtf bbq btw :) ) does mean that speed might be of the essence. Although using a byte array or something might be better than using a string :)
public static string RewriteChar(this string input, string replacement, int index)
{
// Get the array implementation
var chars = input.ToCharArray();
// Copy the replacement into the new array at given index
// TODO take care of the case of to long string?
replacement.ToCharArray().CopyTo(chars, index);
// Wrap the array in a string represenation
return new string(chars);
}
There is many way to do this...
Because I'm a lazy ass, I would go:
result = str1.Substring(0, 6) + str2 + str1.Substring(12, 2);
or
result = str1.Replace("World", str2);
My advice would be, in Visual Studio, right click on "string" and select "Go To Definition". You will see all the methods available to the string "class".
Suppose I am given a following text (in a string array)
engine.STEPCONTROL("00000000","02000001","02000043","02000002","02000007","02000003","02000008","02000004","02000009","02000005","02000010","02000006","02000011");
if("02000001" == 1){
dimlevel = 1;
}
if("02000001" == 2){
dimlevel = 3;
}
I'd like to extract the strings that's in between the quotation mark and put it in a separate string array. For instance, string[] extracted would contain 00000000, 02000001, 02000043....
What is the best approach for this? Should I use regular expression to somehow parse those lines and split it?
Personally I don't think a regular expression is necessary. If you can be sure that the input string is always as described and will not have any escape sequences in it or vary in any other way, you could use something like this:
public static string[] ExtractNumbers(string[] originalCodeLines)
{
List<string> extractedNumbers = new List<string>();
string[] codeLineElements = originalCodeLines[0].Split('"');
foreach (string element in codeLineElements)
{
int result = 0;
if (int.TryParse(element, out result))
{
extractedNumbers.Add(element);
}
}
return extractedNumbers.ToArray();
}
It's not necessarily the most efficient implementation but it's quite short and its easy to see what it does.
that could be
string data = "\"00000000\",\"02000001\",\"02000043\"".Replace("\"", string.Empty);
string[] myArray = data.Split(',');
or in 1 line
string[] data = "\"00000000\",\"02000001\",\"02000043\"".Replace("\"", string.Empty).Split(',');
I think I've come across this requirement for a dozen times. But I could never find a satisfying solution. For instance, there are a collection of string which I want to serialize (to disk or through network) through a channel where only plain string is allowed.
I almost always end up using "split" and "join" with ridiculous separator like
":::==--==:::".
like this:
public static string encode(System.Collections.Generic.List<string> data)
{
return string.Join(" :::==--==::: ", data.ToArray());
}
public static string[] decode(string encoded)
{
return encoded.Split(new string[] { " :::==--==::: " }, StringSplitOptions.None);
}
But this simple solution apparently has some flaws. The string cannot contains the separator string. And consequently, the encoded string can no longer re-encoded again.
AFAIK, the comprehensive solution should involve escaping the separator on encoding and unescaping on decoding. While the problem sound simple, I believe the complete solution can take significant amount of code. I wonder if there is any trick allowed me to build encoder & decoder in very few lines of code ?
Add a reference and using to System.Web, and then:
public static string Encode(IEnumerable<string> strings)
{
return string.Join("&", strings.Select(s => HttpUtility.UrlEncode(s)).ToArray());
}
public static IEnumerable<string> Decode(string list)
{
return list.Split('&').Select(s => HttpUtility.UrlDecode(s));
}
Most languages have a pair of utility functions that do Url "percent" encoding, and this is ideal for reuse in this kind of situation.
You could use the .ToArray property on the List<> and then serialize the Array - that could then be dumped to disk or network, and reconstituted with a deserialization on the other end.
Not too much code, and you get to use the serialization techniques already tested and coded in the .net framework.
You might like to look at the way CSV files are formatted.
escape all instances of a deliminater, e.g. " in the string
wrap each item in the list in "item"
join using a simple seperator like ,
I don't believe there is a silver bullet solution to this problem.
Here's an old-school technique that might be suitable -
Serialise by storing the width of each string[] as a fixed-width prefix in each line.
So
string[0]="abc"
string[1]="defg"
string[2]=" :::==--==::: "
becomes
0003abc0004defg0014 :::==--==:::
...where the size of the prefix is large enough to cater for the string maximum length
You could use an XmlDocument to handle the serialization. That will handle the encoding for you.
public static string encode(System.Collections.Generic.List<string> data)
{
var xml = new XmlDocument();
xml.AppendChild(xml.CreateElement("data"));
foreach (var item in data)
{
var xmlItem = (XmlElement)xml.DocumentElement.AppendChild(xml.CreateElement("item"));
xmlItem.InnerText = item;
}
return xml.OuterXml;
}
public static string[] decode(string encoded)
{
var items = new System.Collections.Generic.List<string>();
var xml = new XmlDocument();
xml.LoadXml(encoded);
foreach (XmlElement xmlItem in xml.SelectNodes("/data/item"))
items.Add(xmlItem.InnerText);
return items.ToArray();
}
I would just prefix every string with its length and an terminator indicating the end of the length.
abc
defg
hijk
xyz
546
4.X
becomes
3: abc 4: defg 4: hijk 3: xyz 3: 546 3: 4.X
No restriction or limitations at all and quite simple.
Json.NET is a very easy way to serialize about any object you can imagine. JSON keeps things compact and can be faster than XML.
List<string> foo = new List<string>() { "1", "2" };
string output = JsonConvert.SerializeObject(foo);
List<string> fooToo = (List<string>)JsonConvert.DeserializeObject(output, typeof(List<string>));
It can be done much simpler if you are willing to use a separator of 2 characters long:
In java code:
StringBuilder builder = new StringBuilder();
for(String s : list) {
if(builder.length() != 0) {
builder.append("||");
}
builder.append(s.replace("|", "|p"));
}
And back:
for(String item : encodedList.split("||")) {
list.add(item.replace("|p", "|"));
}
You shouldn't need to do this manually. As the other answers have pointed out, there are plenty of ways, built-in or otherwise, to serialize/deserialize.
However, if you did decide to do the work yourself, it doesn't require that much code:
public static string CreateDelimitedString(IEnumerable<string> items)
{
StringBuilder sb = new StringBuilder();
foreach (string item in items)
{
sb.Append(item.Replace("\\", "\\\\").Replace(",", "\\,"));
sb.Append(",");
}
return (sb.Length > 0) ? sb.ToString(0, sb.Length - 1) : string.Empty;
}
This will delimit the items with a comma (,). Any existing commas will be escaped with a backslash (\) and any existing backslashes will also be escaped.
public static IEnumerable<string> GetItemsFromDelimitedString(string s)
{
bool escaped = false;
StringBuilder sb = new StringBuilder();
foreach (char c in s)
{
if ((c == '\\') && !escaped)
{
escaped = true;
}
else if ((c == ',') && !escaped)
{
yield return sb.ToString();
sb.Length = 0;
}
else
{
sb.Append(c);
escaped = false;
}
}
yield return sb.ToString();
}
Why not use Xstream to serialise it, rather than reinventing your own serialisation format?
Its pretty simple:
new XStream().toXML(yourobject)
Include the System.Linq library in your file and change your functions to this:
public static string encode(System.Collections.Generic.List<string> data, out string delimiter)
{
delimiter = ":";
while(data.Contains(delimiter)) delimiter += ":";
return string.Join(delimiter, data.ToArray());
}
public static string[] decode(string encoded, string delimiter)
{
return encoded.Split(new string[] { delimiter }, StringSplitOptions.None);
}
There are loads of textual markup languages out there, any would function
Many would function trivially given the simplicity of your input it all depends on how:
human readable you want the encoding
resilient to api changes it should be
how easy to parse it is
how easy it is to write or get a parser for it.
If the last one is the most important then just use the existing xml libraries MS supply for you:
class TrivialStringEncoder
{
private readonly XmlSerializer ser = new XmlSerializer(typeof(string[]));
public string Encode(IEnumerable<string> input)
{
using (var s = new StringWriter())
{
ser.Serialize(s, input.ToArray());
return s.ToString();
}
}
public IEnumerable<string> Decode(string input)
{
using (var s = new StringReader(input))
{
return (string[])ser.Deserialize(s);
}
}
public static void Main(string[] args)
{
var encoded = Encode(args);
Console.WriteLine(encoded);
var decoded = Decode(encoded);
foreach(var x in decoded)
Console.WriteLine(x);
}
}
running on the inputs "A", "<", ">" you get (edited for formatting):
<?xml version="1.0" encoding="utf-16"?>
<ArrayOfString
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<string>A</string>
<string><</string>
<string>></string>
</ArrayOfString>
A
<
>
Verbose, slow but extremely simple and requires no additional libraries
What is the PHP preg_replace in C#?
I have an array of string that I would like to replace by an other array of string. Here is an example in PHP. How can I do something like that in C# without using .Replace("old","new").
$patterns[0] = '/=C0/';
$patterns[1] = '/=E9/';
$patterns[2] = '/=C9/';
$replacements[0] = 'à';
$replacements[1] = 'é';
$replacements[2] = 'é';
return preg_replace($patterns, $replacements, $text);
Real men use regular expressions, but here is an extension method that adds it to String if you wanted it:
public static class ExtensionMethods
{
public static String PregReplace(this String input, string[] pattern, string[] replacements)
{
if (replacements.Length != pattern.Length)
throw new ArgumentException("Replacement and Pattern Arrays must be balanced");
for (var i = 0; i < pattern.Length; i++)
{
input = Regex.Replace(input, pattern[i], replacements[i]);
}
return input;
}
}
You use it like this:
class Program
{
static void Main(string[] args)
{
String[] pattern = new String[4];
String[] replacement = new String[4];
pattern[0] = "Quick";
pattern[1] = "Fox";
pattern[2] = "Jumped";
pattern[3] = "Lazy";
replacement[0] = "Slow";
replacement[1] = "Turtle";
replacement[2] = "Crawled";
replacement[3] = "Dead";
String DemoText = "The Quick Brown Fox Jumped Over the Lazy Dog";
Console.WriteLine(DemoText.PregReplace(pattern, replacement));
}
}
You can use .Select() (in .NET 3.5 and C# 3) to ease applying functions to members of a collection.
stringsList.Select( s => replacementsList.Select( r => s.Replace(s,r) ) );
You don't need regexp support, you just want an easy way to iterate over the arrays.
public static class StringManipulation
{
public static string PregReplace(string input, string[] pattern, string[] replacements)
{
if (replacements.Length != pattern.Length)
throw new ArgumentException("Replacement and Pattern Arrays must be balanced");
for (int i = 0; i < pattern.Length; i++)
{
input = Regex.Replace(input, pattern[i], replacements[i]);
}
return input;
}
}
Here is what I will use. Some code of Jonathan Holland but not in C#3.5 but in C#2.0 :)
Thx all.
You are looking for System.Text.RegularExpressions;
using System.Text.RegularExpressions;
Regex r = new Regex("=C0");
string output = r.Replace(text);
To get PHP's array behaviour the way you have you need multiple instances of `Regex
However, in your example, you'd be much better served by .Replace(old, new), it's much faster than compiling state machines.
Edit: Uhg I just realized this question was for 2.0, but I'll leave it in case you do have access to 3.5.
Just another take on the Linq thing. Now I used List<Char> instead of Char[] but that's just to make it look a little cleaner. There is no IndexOf method on arrays but there is one on List. Why did I need this? Well from what I am guessing, there is no direct correlation between the replacement list and the list of ones to be replaced. Just the index.
So with that in mind, you can do this with Char[] just fine. But when you see the IndexOf method, you have to add in a .ToList() before it.
Like this: someArray.ToList().IndexOf
String text;
List<Char> patternsToReplace;
List<Char> patternsToUse;
patternsToReplace = new List<Char>();
patternsToReplace.Add('a');
patternsToReplace.Add('c');
patternsToUse = new List<Char>();
patternsToUse.Add('X');
patternsToUse.Add('Z');
text = "This is a thing to replace stuff with";
var allAsAndCs = text.ToCharArray()
.Select
(
currentItem => patternsToReplace.Contains(currentItem)
? patternsToUse[patternsToReplace.IndexOf(currentItem)]
: currentItem
)
.ToArray();
text = new String(allAsAndCs);
This just converts the text to a character array, selects through each one. If the current character is not in the replacement list, just send back the character as is. If it is in the replacement list, return the character in the same index of the replacement characters list. Last thing is to create a string from the character array.
using System;
using System.Collections.Generic;
using System.Linq;