I recently came across a php code where a CSV string was split into two variables:
list($this->field_one, $this->field_two) = explode(",", $fields);
I turned this into:
string[] tmp = s_fields.Split(',');
field_one = tmp[0];
field_two = tmp[1];
Is there a C# equivalent without creating a temporary array?
Jon Skeet said the right thing. GC will do the thing, don't you worry.
But if you like the syntax so much (which is pretty considerable), you can use this, I guess.
public static class MyPhpStyleExtension
{
public void SplitInTwo(this string str, char splitBy, out string first, out string second)
{
var tempArray = str.Split(splitBy);
if (tempArray.length != 2) {
throw new NotSoPhpResultAsIExpectedException(tempArray.length);
}
first = tempArray[0];
second = tempArray[1];
}
}
I almost feel guilty by writing this code. Hope this will do the thing.
The answer to your quite narrow question is no. C# does not provide a 'multi-assignment' capability, so you cannot extract an arbitrary set of values from anything (such as Split()) and break them out into individual named variables.
There is a workaround for a specific number of variables, by writing a parameter with out arguments. See #vlad for an answer based on that.
But why would you want to? C# provides an impressive range of features that will allow you to take apart strings and deal them with the parts in a such a wide range of different ways that the lack of 'multi-assignment' should barely be noticed.
Parsing strings usually involves other operations such as dealing with formatting errors, trimming white space, case folding. There could be less than 2 strings, or more. Requirements could change over time. When you are ready for a more capable string parser, C# will be waiting.
You may use the following approach:
-Create a class that inherits from dynamic object.
-Create a local dictionary that will store your variables values in the inherited class.
-Override the TryGetMember and TrySetMember functions to get and set values from and into the dictionary.
-Now, you can split your string and put it in the dictionary, then access your variable like:
dynamicObject.var1
Related
We have a requirement to transform a string containing a date in dd/mm/yyyy format to ddmmyyyy format (In case you want to know why I am storing dates in a string, my software processes bulk transactions files, which is a line based textual file format used by a bank).
And I am currently doing this:
string oldFormat = "01/01/2014";
string newFormat = oldFormat.Replace("/", "");
Sure enough, this converts "01/01/2014" to "01012014". But my question is, does the replace happen in one step, or does it create an intermediate string (e.g.: "0101/2014" or "01/012014")?
Here's the reason why I am asking this:
I am processing transaction files ranging in size from few kilobytes to hundreds of megabytes. So far I have not had a performance/memory problem, because I am still testing with very small files. But when it comes to megabytes I am not sure if I will have problems with these additional strings. I suspect that would be the case because strings are immutable. With millions of records this additional memory consumption will build up considerably.
I am already using StringBuilders for output file creation. And I also know that the discarded strings will be garbage collected (at some point before the end of the time). I was wondering if there is a better, more efficient way of replacing all occurrences of a specific character/substring in a string, that does not additionally create an string.
Sure enough, this converts "01/01/2014" to "01012014". But my question
is, does the replace happen in one step, or does it create an
intermediate string (e.g.: "0101/2014" or "01/012014")?
No, it doesn't create intermediate strings for each replacement. But it does create new string, because, as you already know, strings are immutable.
Why?
There is no reason to a create new string on each replacement - it's very simple to avoid it, and it will give huge performance boost.
If you are very interested, referencesource.microsoft.com and SSCLI2.0 source code will demonstrate this(how-to-see-code-of-method-which-marked-as-methodimploptions-internalcall):
FCIMPL3(Object*, COMString::ReplaceString, StringObject* thisRefUNSAFE,
StringObject* oldValueUNSAFE, StringObject* newValueUNSAFE)
{
// unnecessary code ommited
while (((index=COMStringBuffer::LocalIndexOfString(thisBuffer,oldBuffer,
thisLength,oldLength,index))>-1) && (index<=endIndex-oldLength))
{
replaceIndex[replaceCount++] = index;
index+=oldLength;
}
if (replaceCount != 0)
{
//Calculate the new length of the string and ensure that we have
// sufficent room.
INT64 retValBuffLength = thisLength -
((oldLength - newLength) * (INT64)replaceCount);
gc.retValString = COMString::NewString((INT32)retValBuffLength);
// unnecessary code ommited
}
}
as you can see, retValBuffLength is calculated, which knows the amount of replaceCount's. The real implementation can be a bit different for .NET 4.0(SSCLI 4.0 is not released), but I assure you it's not doing anything silly :-).
I was wondering if there is a better, more efficient way of replacing
all occurrences of a specific character/substring in a string, that
does not additionally create an string.
Yes. Reusable StringBuilder that has capacity of ~2000 characters. Avoid any memory allocation. This is only true if the the replacement lengths are equal, and can get you a nice performance gain if you're in tight loop.
Before writing anything, run benchmarks with big files, and see if the performance is enough for you. If performance is enough - don't do anything.
Well, I'm not a .NET development team member (unfortunately), but I'll try to answer your question.
Microsoft has a great site of .NET Reference Source code, and according to it, String.Replace calls an external method that does the job. I wouldn't argue about how it is implemented, but there's a small comment to this method that may answer your question:
// This method contains the same functionality as StringBuilder Replace. The only difference is that
// a new String has to be allocated since Strings are immutable
Now, if we'll follow to StringBuilder.Replace implementation, we'll see what it actually does inside.
A little more on a string objects:
Although String is immutable in .NET, this is not some kind of limitation, it's a contract. String is actually a reference type, and what it includes is the length of the actual string + the buffer of characters. You can actually get an unsafe pointer to this buffer and change it "on the fly", but I wouldn't recommend doing this.
Now, the StringBuilder class also holds a character array, and when you pass the string to its constructor it actually copies the string's buffer to his own (see Reference Source). What it doesn't have, though, is the contract of immutability, so when you modify a string using StringBuilder you are actually working with the char array. Note that when you call ToString() on a StringBuilder, it creates a new "immutable" string any copies his buffer there.
So, if you need a fast and memory efficient way to make changes in a string, StringBuilder is definitely your choice. Especially regarding that Microsoft explicitly recommends to use StringBuilder if you "perform repeated modifications to a string".
I haven't found any sources but i strongly doubt that the implementation creates always new strings. I'd implement it also with a StringBuilder internally. Then String.Replace is absolutely fine if you want to replace once a huge string. But if you have to replace it many times you should consider to use StringBuilder.Replace because every call of Replace creates a new string.
So you can use StringBuilder.Replace since you're already using a StringBuilder.
Is StringBuilder.Replace() more efficient than String.Replace?
String.Replace() vs. StringBuilder.Replace()
There is no string method for that. You are own your own. But you can try something like this:
oldFormat="dd/mm/yyyy";
string[] dt = oldFormat.Split('/');
string newFormat = string.Format("{0}{1}/{2}", dt[0], dt[1], dt[2]);
or
StringBuilder sb = new StringBuilder(dt[0]);
sb.AppendFormat("{0}/{1}", dt[1], dt[2]);
I know there is a rule about strings in C# that says:
When we create a textual string of type string, we can never change its value! When putting different value for a string variable thje first string will stay in memory and variable (which is kind of reference type) just gets the address of the new string.
So doing something like this:
string a = "aaa";
a = a.Trim(); // Creates a new string
is not recommended.
But what if I need to do some actions on the string according to user preferences, like so:
string a = "aaa";
if (doTrim)
a = a.Trim();
if (doSubstring)
a = a.Substring(...);
etc...
How can I do it without creating new strings on every action ?
I thougt about sending the string to a function by ref, like so:
void DoTrim(ref string value)
{
value = value.Trim(); // also creates new string
}
But this also creates a new string...
Can someone please tell me if there is a way of doing it without wasteing memory on each action ?
You are correct in that the operations you're performing are creating new strings, and not mutating a single string.
You are incorrect in that this is generally problematic or something to be avoided.
If your strings are hundreds of thousands of characters, then sure, copying all of those just to remove a few leading spaces, or to add a few characters to the end of it (repeatedly, in a loop, in particular) can actually be a problem.
If your strings aren't large, and you're not performing many (an in thousands of) operations on the string, then you almost certainly don't have a problem.
Now there are a handful of contexts, generally rather rare, that do run into problems with string manipulation. Probably the most common of the problematic contexts is appending a bunch of strings together, as doing so means copying all of the previously appended data for each new addition. If you're in that situation consider using something like a StringBuilder or a single call to string.Concat (the overload accepting a sequence of strings to concat) to perform this operation.
Other contexts are, for example, programs dealing with processing DNA strands. They'll often be taking strings of millions of characters and creating hundreds of thousands of many thousand character long substrings of that string. Using standard C# string operations would therefore result in a lot of unnecessary copying. People writing such programs end up creating objects that can represent a substring of another string without copying the data and instead referring to the existing string's underlying data source with an offset.
Sticking my neck out here a bit so I'll preface with saying in most cases Servy's answer is the correct answer. However, if you really do need lower level access and less string allocations, you could consider creating a character buffer (simple array for instance) that is big enough to fit your processed string and allow you direct manipulation of the characters. There are some significant downfalls to this, though. Including that you'll probably have to write your own Substring() and Trim() modifiers, and your buffer will likely be bigger than your input strings in many cases to accommodate unexpected string sizes. Once you are done manipulating your buffer, you could then package the character array up as a String. Since all of your manipulations are done on a single buffer, you should save a lot of allocations.
I would seriously consider if the above is worth the hassle, but if you really need the performance, this is the best solution I can think of.
How can I do it without creating new strings on every action?
You should only worry about that if you're handling big strings or if you're doing many string operations in a short period of time.
Even then, the performance loss due to creating more references is minimal.
The Garbage Collector has to collect all the unused string variables, but hey - that only really matters if you're doing MANY string operations.
So rather focus on readability in your code, rather than trying to optimize its performance in the first place.
If you really have to keep the same reference of string, you can simply use a StringBuilder.
Why do you feel uncomfortable creating new strings? There is a reason for the string API to be designed this way. For example, immutable objects are thread-safe (and they allow for a more functional programming style).
If you replace your simple string code by stringbuilders, your code might be more error-prone in multithreading scenarios (which is quite normal in a web application for example).
StringBuilders are used for concatenating strings, inserting characters, removing characters, etc. But they will need to reallocate and copy their internal characters arrays every now and then, too.
When you speak about memory consumption you have started to micro-optimize your code. Don't.
BTW: Have a look at the LINQ API. What does each operation do? Rats - it creates a new enumerator! A query like foos.Where(bar).Select(baz).FirstOrDefault() could certainly be memory-optimized by just creating a single enumerator object and modifying the criteria it applies when enumerating. </irony>
It will depend on what your exact use case is, but you might want to explore using the StringBuilder class which you can use to build and modify strings.
Okay I know this question is painfully simple, and I'll admit that I am pretty new to C# as well. But the title doesn't describe the entire situation here so hear me out.
I need to alter a URL string which is being created in a C# code behind, removing the substring ".aspx" from the end of the string. So basically I know that my URL, coming into this class, will be something like "Blah.aspx" and I want to get rid of the ".aspx" part of that string. I assume this is quite easy to do by just finding that substring, and removing it if it exists (or some similar strategy, would appreciate if someone has an elegant solution for it if they've thought done it before). Here is the problem:
"Because strings are immutable, it is not possible (without using unsafe code) to modify the value of a string object after it has been created." This is from the MSDN official website. So I'm wondering now, if strings are truly immutable, then I simply can't (shouldn't) alter the string after it has been made. So how can I make sure that what I'm planning to do is safe?
You don't change the string, you change the variable. Instead of that variable referring to a string such as "foo.aspx", alter it to point to a new string that has the value "foo".
As an analogy, adding one to the number two doesn't change the number two. Two is still just the same as it always way, you have changed a variable from referring to one number to refer to another.
As for your specific case, EndsWith and Remove make it easy enough:
if (url.EndsWith(".aspx"))
url = url.Remove(url.Length - ".aspx".Length);
Note here that Remove is taking one string, an integer, and giving us a brand new string, which we need to assign back to our variable. It doesn't change the string itself.
Also note that there is a URI class that you can use for parsing URLs, and it will be able to handle all of the complex situations that can arise, including hashes, query parameters, etc. You should use that to parse out the aspects of a URL that you are interested in.
String immutability is not a problem for normal usage -- it just means that member functions like "Replace", instead of modifying the existing string object, return a new one. In practical terms that usually just means you have to remember to copy the change back to the original, like:
string x = "Blah.aspx";
x.Replace(".aspx", ""); // still "Blah.aspx"
x = x.Replace(".aspx", ""); // now "Blah"
The weirdness around strings comes from the fact that System.String inherits System.Object, yet, because of its immutability, behaves like a value type rather than an object. For example, if you pass a string into a function, there's no way to modify it, unless you pass it by reference:
void Test(string y)
{
y = "bar";
}
void Test(ref string z)
{
z = "baz";
}
string x = "foo";
Test(x); // x is still "foo"
Test(ref x); // x is now "baz"
A String in C# is immutable, as you say. Meaning that this would create multiple String objects in memory:
String s = "String of numbers 0";
s += "1";
s += "2";
So, while the variable s would return to you the value String of numbers 012, internally it required the creation of three strings in memory to accomplish.
In your particular case, the solution is quite simple:
String myPath = "C:\\folder1\\folder2\\myFile.aspx";
myPath = Path.Combine(Path.GetDirectoryName(myPath), Path.GetFileNameWithoutExtension(myPath));
Again, this appears as if myPath has changed, but it really has not. An internal copy and assign took place and you get to keep using the same variable.
Also, if you must preserve the original variable, you could simply make a new variable:
String myPath = "C:\\folder1\\folder2\\myFile.aspx";
String thePath = Path.Combine(Path.GetDirectoryName(myPath), Path.GetFileNameWithoutExtension(myPath));
Either way, you end up with a variable you can use.
Note that the use of the Path methods ensures you get proper path operations, and not blind String replacements that could have unintended side-effects.
String.Replace() will not modify the string. It will create a new one. So the following code:
String myUrl = #"http://mypath.aspx";
String withoutExtension = myUrl.Replace(".aspx", "");
will create a brand-new string which is assigned to withoutExtension.
At the moment I maintain a quirky codebase, and came across the following same pattern more than 100 times:
string NotMySqlQuery = ""; //why initialize the string with "", only to overwrite it on the next line?
NotMySqlQuery = "The query to be executed";
Since I came across this so often, I now doubt my own good judgement.
Is this a trick to optimize the compiler or does it bring any other advantages?
It reminds me a bit of the old times when I did write some code in C++, but it still doesn't look like proper dealing with strings to me.
Why would someone write code like that?
There is no performance advantage of that syntax. It is even slightly worse than not initializing it at all, since the strings are immutable in c# and this way 2 separate strings are allocated.
For your simple case, it is better to save the 2 lines into one, there is no point to assign it an empty string, and immediately assign another value to it.
string NotMySqlQuery = "The query to be executed";
This is clearer.
Below is a code in C# for rss reader, why this code is bad? this class generates a list of the 5 most recent posts, sorted by title. What do you use to analyze code in C#?
static Story[] Parse(string content)
{
var items = new List<string>();
int start = 0;
while (true)
{
var nextItemStart = content.IndexOf("<item>", start);
var nextItemEnd = content.IndexOf("</item>", nextItemStart);
if (nextItemStart < 0 || nextItemEnd < 0) break;
String nextItem = content.Substring(nextItemStart, nextItemEnd + 7 - nextItemStart);
items.Add(nextItem);
start = nextItemEnd;
}
var stories = new List<Story>();
for (byte i = 0; i < items.Count; i++)
{
stories.Add(new Story()
{
title = Regex.Match(items[i], "(?<=<title>).*(?=</title>)").Value,
link = Regex.Match(items[i], "(?<=<link>).*(?=</link>)").Value,
date = Regex.Match(items[i], "(?<=<pubDate>).*(?=</pubdate>)").Value
});
}
return stories.ToArray();
}
why don't use XmlReader or XmlDocument or LINQ to Xml?
It is bad because it's using string parsing when there are excellent classes in the framework for parsing XML. Even better, there are classes to deal with RSS feeds.
ETA:
Sorry to not have answered your second question earlier. There are a great number of tools to analyze correctness and quality of C# code. There's probably a huge list compiled somewhere, but here's a few I use on a daily basis to help ensure quality code:
StyleCop (code formatting standards)
Resharper (idiomatic programming, gotcha catching)
FxCop (code correctness, standards adherence, idiomatic programming)
Pex (white box testing)
Nitriq (code quality metrics)
NUnit (unit testing)
You shouldn't parse XML with string functions and regular expressions. XML can get very complicated and be formatted many ways that a real XML parser like XmlReader can handle, but will break your simple string parsing code.
Basically: don't try and reinvent the wheel (an xml parser), especially when you don't realize how complicated that wheel actually is.
I think the worst thing for the code is the performance issue. You should parse the xml string into a XDocument(or similar structure) instead of parse it again and again using regex.
simply because you are reinventing a xml parser , use Linq to xml instead, it is very simple and clean.I am sure that you can do all the above in three lines of code if you use Linq to XML , your code uses a lot of magic numbers (ex: 7-n ..), the thing that make it unstable and non generic
For starters, it uses byte as a indexer instead of int (what if items has more items in it than a byte can represent?). It doesn't use idiomatic C# (see user1645569's response). It also unnecessarily uses var instead of specific data types (that's more stylistic though, but for me I prefer not to, hence by my metric it's not ideal (and you've given no other metric)).
Let me clarify what I am saying about "unnecessarily using var": var in and of itself is not bad, and I am not suggesting that. I am (mostly) suggesting the usage here isn't very consistent. For example, explicitly declaring start as an int, but then declaring nextItemEnd as var (which will deduce to int) and assigning nextItemEnd to start seems (to me) like a weird mixture between wanting to automatically deduce a variable's type and explicitly declaring it. I think it's good that var was not used in start's declaration (because then it's not exactly clear if the intent is an integer or a floating point number), but I (personally) don't think it helps any to declare nextItemStart and nextItemEnd as var. I tend to prefer to use var for more complex/longer data types (similar to how I use auto in C++ for iterators, but not for "simpler" data types).