Unexpected behaviour while using extension methods - c#

Why does this happen? Please observe the following code:
static class StringExtension
{
public static string Remove(this string s, char c)
{
return s.Replace(c.ToString(), "");
}
public static string Remove(this string s, char[] a)
{
foreach (char c in a)
{
s = s.Remove((char)c); // <---- ArgumentOutOfRange Exception here
}
return s;
}
}
class Program
{
static void Main(string[] args)
{
char[] a = new char[] { '.', ',' };
string testString = "Clean.this,string.from,periods.and,commas.";
Console.WriteLine(testString.Remove(a));
}
}
When I run this code, I get an ArgumentOutOfRange exception at the indicate line. Turns out that even if I have a specific code for an extension Remove (this, char) and I explicitly (although, there should be no reason for this) specify the parameter's type, it ignores my extension and tries to call the original Remove(int) method.
Am I doing something wrong or this is a bug in C#?
P.S. I use VS2010.

This line:
s.Remove((char) c);
is calling string.Remove(int) - the compiler will always use an applicable instance method instead of an extension method if it can. It's applicable due to the implicit conversion from char to int. That's the method that's throwing the exception, because the argument you're passing it is out of range. (In fact, you're lucky - in a worse situation it would be in range, and returning entirely unexpected results.)
In general I would strongly advise you not to create extension methods which have the same names as instance methods on the extended type, particularly if they've got the same number of parameters. Working out overloading is hard enough in general without adding extension methods to the mix. Don't forget that whenever you can't easily work out what your code is doing, someone reading the code in a year's time is going to have a ten times harder job.

Related

declare variables in argument list

It's possible in c# 7 to declare variables for out variables in argument list:
if (int.TryParse(input, out int result))
WriteLine(result);
Is it possible to declare ("non out") variable in argument list? Like this:
if (!string.IsNullOrEmpty(string result=FuncGetStr()))
WriteLine(result);
You can't do it in the argument list, no.
You could use pattern matching for this, but I wouldn't advise it:
if (FuncGetStr() is string result && !string.IsNullOrEmpty(result))
That keeps the declaration within the source code of the if, but the scope of result is still the enclosing block, so I think it would much simpler just to separate out:
// Mostly equivalent, and easier to read
string result = FuncGetStr();
if (!string.IsNullOrEmpty(result))
{
...
}
There are two differences I can think of:
result isn't definitely assigned after the if statement in the first version
string.IsNullOrEmpty isn't even called in the first version if FuncGetStr() returns null, as the is pattern won't match. You could therefore write it as:
if (FuncGetStr() is string result && result != "")
To be utterly horrible, you could do it, with a helper method to let you use out parameters. Here's a complete example. Please note that I am not suggesting this as something to do.
// EVIL CODE: DO NOT USE
using System;
public class Test
{
static void Main(string[] args)
{
if (!string.IsNullOrEmpty(Call(FuncGetStr, out string result)))
{
Console.WriteLine(result);
}
}
static string FuncGetStr() => "foo";
static T Call<T>(Func<T> func, out T x) => x = func();
}
You can assign variables in statements, but the declaration of the variables should be done outside of them. You can't combine them (outside out and pattern matching, as you already indicated in your question).
bool b;
string a;
if (b = string.IsNullOrEmpty(a = "a")){ }
On the why this behavior is different than with out, etc, Damien_The_Unbeliever's comment might be interesting:
The ability to declare out variables inline arises from the awkwardness that it a) has to be a variable rather than a value and b) there's often nothing too useful to do with the value if you declare it before the function is called. I don't see the same motivations for other such uses.

How to ensure compilation error on signature change where 'params' keyword is used

I have a method like this:
public void Foo(params string[] args) {
bar(args[0]);
bar(args[1]);
}
The new requirements lead to a change like this:
public void Foo(string baz, params string[] args) {
if("do bar".Equals(baz)) {
bar(args[0]);
bar(args[1]);
}
}
The problem is that even though I've changed the method signature, no compilation errors occur, which is correct of course, but I want there to be compilation errors for every call to Foo method where the argument baz has not been specified. That is, if a call to Foo before the change was this one:
Foo(p1,p2); //where p1 and p2 are strings
it now needs to be this one:
Foo(baz,p1,p2);
If it wouldn't be changed in this way, p1 would be assigned to baz, and the params array args would be of length 1 and an OutOfBounds exception would be thrown.
What's the best way to change the signature and ensure that all the calling code is updated accordingly? (The real scenario is where Foo lives in an assembly shared by many projects automatically built on a build server. A compilation error would thus be an easy way to detect all the code that needs to be touched to accomodate the change.)
Edit:
As Daniel Mann and others pointed out, the example above suggests that I should not use params at all. So I should explain that in my real world example it's not always the case that args needs to have two elements, as far as the logic in Foo is concerned args can contain any number of elements. So let's say this is Foo:
public void Foo(string baz, params string[] args) {
if("do bar".Equals(baz)) {
int x = GetANumberDynamically();
for(int i = 0; i<x; i++)
bar(args[i]);
}
}
Here's the solution. Do not change the former method signature, just add the Obsolete attribute with both arguments specified.
[Obsolete("Use Foo(string, params string[]) version instead of this", true)]
public void Foo(params string[] args) {
bar(args[0]);
bar(args[1]);
}
Then create a new method with a new signature.
public void Foo(string baz, params string[] args) {
if("do bar".Equals(baz)) {
bar(args[0]);
bar(args[1]);
}
}
The second argument in the Obsolete attribute ensures a compilation error. Without it it just causes a compilation warning. More info about the attribute is available on MSDN.
EDIT:
Based on discussion in comments below, Daniel Mann came up with an interesting problem.
That wouldn't solve the problem. What about if you call Foo("a", "b")? In that case, it will still call the non-obsolete method with only two arguments, and cause the same problem.
I would advise to check if there is more then one argument passed through args before calling bar.
The easiest solution is to not use the params keyword if you have required parameters.
Obviously, you're expecting args to contain at least two parameters. It's safe to say that those are required. Why not have a method signature like this?
public void Foo(string baz, string requiredArgument1, string requiredArgument2, params string[] optionalArguments)
That removes the ambiguity: It will always require at least 3 arguments.
Another option I hadn't even thought of for some reason is to use named parameters. Obviously, all of your code would have to explicitly do so, but you could do this:
Foo(baz: "bar", args: new [] {"a", "b", "c"});

Handling null in extension method

I have a simple extension method for string class which will strip all non numeric characters from a string. So if I have a string like for example a phone number such as "(555) 215-4444" it will convert it to "5552154444". It looks like this:
public static string ToDigitsOnly(this string input)
{
Regex digitsOnly = new Regex(#"[^\d]");
return digitsOnly.Replace(input, String.Empty);
}
I am just wondering what is the most elegant way to handle a null value here? Is there a typical pattern to follow in these cases, such as return back a null value if a null is passed in? It seems since I'm extending the string class here I may want to allow null values and not throw a arguement exception (since I'm not really passing in an arguement when I use this...) ? But some might argue I should throw an exception like a 'normal' method would. What's the best practice you are using here?
Thanks!
You can follow the principle of least surprise: use pattern implemented in LINQ:
public static string ToDigitsOnly(this string input)
{
if(input == null)
throw new ArgumentNullException("input");
Regex digitsOnly = new Regex(#"[^\d]");
return digitsOnly.Replace(input, String.Empty);
}
You can use method, proposed by Jon Skeet. It will reduce your check simply to
input.ThrowIfNull("input");
Also Jon has a good section 10.2.4 Calling a method on a null reference in C# in Depth, quote:
CHECKING FOR NULLITY As a conscientious developer, I’m sure that your
production methods always check their arguments’ validity before
proceeding. One question that naturally arises from this quirky
feature of extension methods is what exception to throw when the first
argument is null (assuming it’s not meant to be). Should it be
ArgumentNullException, as if it were a normal argument, or should it
be NullReferenceException, which is what would’ve happened if the
extension method had been an instance method to start with? I
recommend the former: it’s still an argument, even if the extension
method syntax doesn’t make that obvious.
I see this recommendation as (and from my personal experience): it's always better to check for null, specially for static methods and do not to rely on null values. One exception only if it is the exact purpose of your method, for example ThrowIfNull or IsNullOrEmpty extension methods.
It doesn't really matter as long as you communicate the behavior well (so that the end-user knows what to expect).
Consider using the built-in XML Documentation Comments to communicate expected behavior.
/// <exception cref="ArgumentNullException">argument is null.</exception>
public string Example( string argument )
{
if ( argument == null )
throw new ArgumentNullException();
return argument.ToString();
}
See MSDN documentation for many examples:
DateTime.ParseExact Method (String, String, IFormatProvider)
Uri.FromHex Method
Suppose I have this:
class A
{
public void F()
{
//do stuff
}
}
If I then run the following code, what happens?
A a = null;
a.F();
You get a NullReferenceException. So I would say the proper way to write an equivalent extension method would be as follows.
class A
{
}
static class AExtensions
{
void F(this A a)
{
if (a == null)
{
throw new NullReferenceException();
}
//do stuff
}
}
However, .NET disagrees with me on this. The standard in .NET is to instead throw an ArgumentException - so it's probably best to do that instead.
Simple ; Create another method for String , say IsInValid()
public static bool IsInValid(this string s)
{
return (s == null) || (s.Length == 0);
}
use whereever you wanna check...
Furthermore, you can use this extension anywhere

In c# , when sending a parameter to a method, when should we use "ref" and when "out" and when without any of them?

In c# , when sending a parameter to a method, when should we use "ref" and when "out" and when without any of them?
In general, you should avoid using ref and out, if possible.
That being said, use ref when the method might need to modify the value. Use out when the method always should assign something to the value.
The difference between ref and out, is that when using out, the compiler enforces the rule, that you need to assign something to the out paramter before returning. When using ref, you must assign a value to the variable before using it as a ref parameter.
Obviously, the above applies, when you are writing your own methods. If you need to call methods that was declared with the ref or out modifiers on their parameters, you should use the same modifier before your parameter, when calling the method.
Also remember, that C# passes reference types (classes) by reference (as in, the reference is passed by value). So if you provide some method with a reference type as a parameter, the method can modify the data of the object; even without ref or out. But it cannot modify the reference itself (as in, it cannot modify which object is being referenced).
They are used mainly to obtain multiple return values from a method call. Personally, I tend to not use them. If I want multiple return values from a method then I'll create a small class to hold them.
ref and out are used when you want something back from the method in that parameter. As I recall, they both actually compile down to the same IL, but C# puts in place some extra stuff so you have to be specific.
Here are some examples:
static void Main(string[] args)
{
string myString;
MyMethod0(myString);
Console.WriteLine(myString);
Console.ReadLine();
}
public static void MyMethod0(string param1)
{
param1 = "Hello";
}
The above won't compile because myString is never initialised. If myString is initialised to string.Empty then the output of the program will be a empty line because all MyMethod0 does is assign a new string to a local reference to param1.
static void Main(string[] args)
{
string myString;
MyMethod1(out myString);
Console.WriteLine(myString);
Console.ReadLine();
}
public static void MyMethod1(out string param1)
{
param1 = "Hello";
}
myString is not initialised in the Main method, yet, the program outputs "Hello". This is because the myString reference in the Main method is being updated from MyMethod1. MyMethod1 does not expect param1 to already contain anything, so it can be left uninitialised. However, the method should be assigning something.
static void Main(string[] args)
{
string myString;
MyMethod2(ref myString);
Console.WriteLine(myString);
Console.ReadLine();
}
public static void MyMethod2(ref string param1)
{
param1 = "Hello";
}
This, again, will not compile. This is because ref demands that myString in the Main method is initialised to something first. But, if the Main method is changed so that myString is initialised to string.Empty then the code will compile and the output will be Hello.
So, the difference is out can be used with an uninitialised object, ref must be passed an initialised object. And if you pass an object without either the reference to it cannot be replaced.
Just to be clear: If the object being passed is a reference type already then the method can update the object and the updates are reflected in the calling code, however the reference to the object cannot be changed. So if I write code like this:
static void Main(string[] args)
{
string myString = "Hello";
MyMethod0(myString);
Console.WriteLine(myString);
Console.ReadLine();
}
public static void MyMethod0(string param1)
{
param1 = "World";
}
The output from the program will be Hello, and not World because the method only changed its local copy of the reference, not the reference that was passed in.
I hope this makes sense. My general rule of thumb is simply not to use them. I feel it is a throw back to pre-OO days. (But, that's just my opinion)
(this is supplemental to the existing answers - a few extra considerations)
There is another scenario for using ref with C#, more commonly seen in things like XNA... Normally, when you pass a value-type (struct) around, it gets cloned. This uses stack-space and a few CPU cycles, and has the side-effect that any modifications to the struct in the invoked method are lost.
(aside: normally structs should be immutable, but mutable structs isn't uncommon in XNA)
To get around this, it is quite common to see ref in such programs.
But in most programs (i.e. where you are using classes as the default), you can normally just pass the reference "by value" (i.e. no ref/out).
Another very common use-case of out is the Try* pattern, for example:
string s = Console.ReadLine();
int i;
if(int.TryParse(s, out i)) {
Console.WriteLine("You entered a valid int: " + i);
}
Or similarly, TryGetValue on a dictionary.
This could use a tuple instead, but it is such a common pattern that it is reasonably understood, even by people who struggle with too much ref/out.
Very simple really. You use exactly the same keyword that the parameter was originally declared with in the method. If it was declared as out, you have to use out. If it was declared as ref, you have to use ref.
In addition to Colin's detailed answer, you could also use out parameters to return multiple values from one method call. See for example the method below which returns 3 values.
static void AssignSomeValues(out int first, out bool second, out string third)
{
first = 12 + 12;
second = false;
third = "Output parameters are okay";
}
You could use it like so
static void Main(string[] args) {
int i;
string s;
bool b;
AssignSomeValues(out i, out b, out s);
Console.WriteLine("Int value: {0}", i);
Console.WriteLine("Bool value: {0}", b);
Console.WriteLine("String value: {0}", s);
//wait for enter key to terminate program
Console.ReadLine(); }
Just make sure that you assign a valid value to each out parameter to avoid getting an error.
Try to avoid using ref. Out is okay, because you know what will happen, the old value will be gone and a new value will be in your variable even if the function failed. However, just by looking at the function you have no idea what will happen to a ref parameter. It may be the same, modified, or an entirely new object.
Whenever I see ref, I get nervous.
ref is to be avoided (I beleive there is an fx-cop rule for this also) however use ref when the object that is reference may itself changed. If you see the 'ref' keyword you know that the underlying object may no longer be referenced by the same variable after the method is called.

Why is params 'less performant' than a regular array?

If you go right now and type string.Format into your IDE, you'll see that there are 4 different overloads: one taking a string and object, another taking a string and two objects, then one taking three objects, and finally one that uses params. According to this answer, this is because params generates 'overhead', and some other languages may not support it.
My question is, why can't a method call like this:
void Foo()
{
Bar(1, 2, 3);
}
void Bar(params int[] args)
{
// use args...
}
Be essentially transformed at compile time to
void Foo()
{
Bar(new[] { 1, 2, 3 });
}
void Bar(int[] args)
{
// use args...
}
? Then it wouldn't create any overhead except for the array creation (which was necessary anyway), and would be fully compatible with other languages.
The number of arguments is already known at compile-time, so what's preventing the C# compiler from doing some kind of string substitution and making the first scenario essentially syntactic sugar for the second? Why did we have to implement hidden language features specifically to support variadic arguments?
The title makes an incorrect assumption.
Both a params and a non-params methods take an array; the difference is the compiler will emit the IL to create an array implicitly when making a params method call. An array is passed to both methods, as a single argument.
This can be seen in this .NET Fiddle (view "Tidy Up -> View IL").
using System;
public class Program
{
public static void Main()
{
var a1 = 1;
var a2 = 2;
var a3 = 3;
with_params(a1,a2,a3);
no_params(new [] {a1,a2,a3});
}
public static void with_params(params int[] x) {}
public static void no_params(int[] x) {}
}
In both cases the IL is identical; a new array is created, it is populated, and the array is supplied to the invoked method.
There is an "exception" to this identical IL generation in that the compiler can move out constant-valued arrays when used in the non-parameter form and use 'dup' initialization, as seen here. However, a new array is supplied as the argument in both cases.

Categories

Resources