I am currently developing a C# P/invoke wrapper to a DLL that is part of my product. I have no experience with C# and this is the first significant C# coding I have done. I am acutely aware that I am lacking a lot of knowledge of the finer points and idioms of the language.
My question concerns the unit tests that I am writing for which I am using NUnit. I have a need to compare the values of double[] variables. If I use Assert.AreEqual(...) to do this then the values are compared for exact equality. However, I would like to compare up to a tolerance. There are AreEqual() overloads for scalar real values that admit a delta parameter. However, I have not been able to find an equivalent for arrays. Have I missed something obvious?
At the moment I have solved the problem with the following code:
class Assert : NUnit.Framework.Assert
{
public static void AreEqual(double[] expected, double[] actual, double delta)
{
AreEqual(expected.Length, actual.Length);
for (int i = 0; i < expected.Length; i++)
{
AreEqual(expected[i], actual[i], delta);
}
}
}
Whilst this appears to work I wonder if there is a cleaner solution available. In particular I am concerned that using the same name for my derived class is, not only poor style, but could lead to un-anticipated problems down the road.
I would have like to use extension methods but I understand they are only viable for when there is an instance of the class under extension. Of course, I only ever call static methods on the Assert class.
I'm sorry if this seems a bit nebulous, but my instincts tell me that I'm not doing this the best way and I would like to know how to do it right.
Since the introduction of the fluent assertion syntax in NUnit, the Within() method has been available for this purpose:
double actualValue = 1.989;
double expectedValue = 1.9890;
Assert.That(actualValue, Is.EqualTo(expectedValue).Within(0.00001));
Assert.That(actualValue, Is.EqualTo(expectedValue).Within(1).Ulps);
Assert.That(actualValue, Is.EqualTo(expectedValue).Within(0.1).Percent);
For collections, the default behaviour of Is.EqualTo() is to compare the collections' members individually, with these individual comparisons being modified by Within(). Hence, you can compare two arrays of doubles like so:
var actualDoubles = new double[] {1.0 / 3.0, 0.7, 9.981};
var expectedDoubles = new double[] { 1.1 / 3.3, 0.7, 9.9810};
Assert.That(actualDoubles, Is.EqualTo(expectedDoubles).Within(0.00001));
Assert.That(actualDoubles, Is.EqualTo(expectedDoubles).Within(1).Ulps);
Assert.That(actualDoubles, Is.EqualTo(expectedDoubles).Within(0.1).Percent);
This will compare each element of actualDoubles to the corresponding element in expectedDoubles using the specified tolerance, and will fail if any are not sufficiently close.
I had a need to create custom assert, in your case there was an alternative provided by the framework. However this didn't work when I wanted to have a completely custom assert. I solved this by adding a new static class calling into nunit.
public static class FooAssert
{
public static void CountEquals(int expected, FooConsumer consumer)
{
int actualCount = 0;
while (consumer.ConsumeItem() != null)
actualCount++;
NUnit.Framework.Assert.AreEqual(expected, actualCount);
}
}
Then in a test
[Test]
public void BarTest()
{
// Arrange
...
// Act
...
// Assert
FooAssert.CountEquals(1, fooConsumer);
}
I know I am a little bit late for the party, it might still be useful for somebody
"Better" is always a matter of taste. In this case, i would say yes. You should make your own assert, without subclassing the nunit assert. Nunit already has multiple Assert classes with different specific assertions. Like CollectionAssert. Otherwise your method is fine.
I think what I would have done is simply define a function somewhere in your test harness
public static bool AreEqual( double[], double[], double delta )
that does the comparison and returns true or false appropriately. In your Test you simply write :
Assert.IsTrue( AreEqual( expected, result, delta ) ) ;
Related
Let's say I have a simple function:
public static int NewNumber(int lowestValue, int highiestValue) {}
I would like to have some a compiler check if parameters are correct. For example, in this case, the developer could mistakenly (or on purpose) call the method like this:
NewNumber(5, -5);
which is wrong in this case - The developer lied.
Sure I could make a simple check inside the method:
public static int NewNumber(int lowestValue, int highiestValue) {
if (highiestValue <= lowestValue) {
//Error
}
}
... and it would work just perfectly. However, I'm curious if there is anything the developer can do in this case to restrict such behavior without additional checking in the method itself.
EDIT: Found out the solution but nonrelated to the C#
Since I'm working in Unity I end up with writing custom inspector so values can be entered correctly in the Unity Inspector itself, thus eliminating unnecessary checks (and slowing the performance) when calling the method many times per second.
This I don't believe is possible. Consider this situation,
NewNumber(x, y);
What's x and y? The compiler doesn't necessarily know what the input is (e.g. x = Int32.Parse(Console.ReadLine());).
You gave hard-coded examples, and perhaps you might only use the function with hard-coded values, but the compiler only knows that 5 and -5 are integers and integers can be a literal 5, -5, etc or a variable var a = 5;
I don't think there is a compiler related parameter argument check. But it is always better to have your parameter check within your method (responsibility of your method to take care the parameter) and document it better so, the caller knows what data it should pass to your method.
Why it is not a good practice to compare two objects by serializing them and then compare the strings like in the following example?
public class Obj
{
public int Prop1 { get; set; }
public string Prop2 { get; set; }
}
public class Comparator<T> : IEqualityComparer<T>
{
public bool Equals(T x, T y)
{
return JsonConvert.SerializeObject(x) == JsonConvert.SerializeObject(y);
}
public int GetHashCode(T obj)
{
return JsonConvert.SerializeObject(obj).GetHashCode();
}
}
Obj o1 = new Obj { Prop1 = 1, Prop2 = "1" };
Obj o2 = new Obj { Prop1 = 1, Prop2 = "2" };
bool result = new Comparator<Obj>().Equals(o1, o2);
I have tested it and it works, it is generic so it could stand for a great diversity of objects, but what I am asking is which are the downsides of this approach for comparing objects?
I have seen it has been suggested in this question and it received some upvotes but I can't figure it out why this is not considered the best way, if somebody wants to compare just the values of the properties of two objects?
EDIT : I am strictly talking about Json serialize, not XML.
I am asking this because I want to create a simple and generic Comparator for a Unit Test project, so the performance of comparison does not bother me so much, as I know this may be one of the biggest down-sides. Also the typeless problem can be handled using in case of Newtonsoft.Json the TypeNameHandling property set to All.
The primary problem is that it is inefficient
As an example imagine this Equals function
public bool Equals(T x, T y)
{
return x.Prop1 == y.Prop1
&& x.Prop2 == y.Prop2
&& x.Prop3 == y.Prop3
&& x.Prop4 == y.Prop4
&& x.Prop5 == y.Prop5
&& x.Prop6 == y.Prop6;
}
if prop1 are not the same then the other 5 compares never need to be checked, if you did this with JSON you would have to convert the entire object into a JSON string then compare the string every time, this is on top of serialization being an expensive task all on its own.
Then the next problem is serialization is designed for communication e.g. from memory to a file, across a network, etc. If you have leveraged serialization for comparison you can degrade your ability to use it for it normal use, i.e. you can't ignore fields not required for transmission because ignoring them might break your comparer.
Next JSON in specific is Type-less which means than values than are not in anyway shape or form equal may be mistaken for being equal, and in the flipside values that are equal may not compare as equal due to formatting if they serialize to the same value, this is again unsafe and unstable
The only upside to this technique is that is requires little effort for the programmer to implement
You probably going to keep adding a bounty to the question until somebody tells you that it is just fine to do this. So you got it, don't hesitate to take advantage of the NewtonSoft.Json library to keep the code simple. You just need some good arguments to defend your decision if your code is ever reviewed or if somebody else takes over the maintenance of the code.
Some of the objections they may raise, and their counter-arguments:
This is very inefficient code!
It certainly is, particularly GetHashCode() can make your code brutally slow if you ever use the object in a Dictionary or HashSet.
Best counter-argument is to note that efficiency is of little concern in a unit test. The most typical unit test takes longer to get started than to actually execute and whether it takes 1 millisecond or 1 second is not relevant. And a problem you are likely to discover very early.
You are unit-testing a library you did not write!
That is certainly a valid concern, you are in effect testing NewtonSoft.Json's ability to generate a consistent string representation of an object. There is cause to be alarmed about this, in particular floating point values (float and double) are never not a problem. There is also some evidence that the library author is unsure how to do it correctly.
Best counter-argument is that the library is widely used and well maintained, the author has released many updates over the years. Floating point consistency concerns can be reasoned away when you make sure that the exact same program with the exact same runtime environment generates both strings (i.e. don't store it) and you make sure the unit-test is built with optimization disabled.
You are not unit-testing the code that needs to be tested!
Yes, you would only write this code if the class itself provides no way to compare objects. In other words, does not itself override Equals/GetHashCode and does not expose a comparator. So testing for equality in your unit test exercise a feature that the to-be-tested code does not actually support. Something that a unit test should never do, you can't write a bug report when the test fails.
Counter argument is to reason that you need to test for equality to test another feature of the class, like the constructor or property setters. A simple comment in the code is enough to document this.
By serializing your objects to JSON, you are basically changing all of your objects to another data type and so everything that applies to your JSON library will have an impact on your results.
So if there is a tag like [ScriptIgnore] in one of the objects, your code will simply ignore it since it has been omitted from your data.
Also, the string results can be the same for objects that are not the same. like this example.
static void Main(string[] args)
{
Xb x1 = new X1()
{
y1 = 1,
y2 = 2
};
Xb x2 = new X2()
{
y1 = 1,
y2= 2
};
bool result = new Comparator<Xb>().Equals(x1, x2);
}
}
class Xb
{
public int y1 { get; set; }
}
class X1 : Xb
{
public short y2 { get; set; }
}
class X2 : Xb
{
public long y2 { get; set; }
}
So as you see x1 has a different type from x2 and even the data type of the y2 is different for those two, but the json results will be the same.
Other than that, since both x1 and x2 are from type Xb, I could call your comparator without any problems.
I would like to correct the GetHashCode at the beginning.
public class Comparator<T> : IEqualityComparer<T>
{
public bool Equals(T x, T y)
{
return JsonConvert.SerializeObject(x) == JsonConvert.SerializeObject(y);
}
public int GetHashCode(T obj)
{
return JsonConvert.SerializeObject(obj).GetHashCode();
}
}
Okay, next, we discuss the problem of this method.
First, it just won't work for types with looped linkage.
If you have a property linkage as simple as A -> B -> A, it fails.
Unfortunately, this is very common in lists or map that interlink together.
Worst, there is hardly an efficient generic loop detection mechanism.
Second, comparison with serialization is just inefficient.
JSON needs reflection and lots of type judging before successfully compile its result.
Therefore, your comparer will become a serious bottleneck in any algorithm.
Usually, even if in thousands of records cases, JSON is considered slow enough.
Third, JSON has to go over every property.
It will become a disaster if your object links to any big object.
What if your object links to a big file?
As a result, C# simply leaves the implementation to user.
One has to know his class thoroughly before creating a comparator.
Comparison requires good loop detection, early termination and efficiency consideration.
A generic solution simply does not exist.
These are some of the downsides:
a) Performance will be increasingly bad the deeper your object tree is.
b) new Obj { Prop1 = 1 } Equals
new Obj { Prop1 = "1" } Equals
new Obj { Prop1 = 1.0 }
c) new Obj { Prop1 = 1.0, Prop2 = 2.0 } Not Equals
new Obj { Prop2 = 2.0, Prop1 = 1.0 }
First, I notice that you say "serialize them and then compare the strings." In general, ordinary string comparison will not work for comparing XML or JSON strings, you have to be a little more sophisticated than that. As a counterexample to string comparison, consider the following XML strings:
<abc></abc>
<abc/>
They are clearly not string equal but they definitely "mean" the same thing. While this example might seem contrived, it turns out that there are quite a few cases where string comparison doesn't work. For example, whitespace and indentation are significant in string comparison but may not be significant in XML.
The situation isn't all that much better for JSON. You can do similar counterexamples for that.
{ abc : "def" }
{
abc : "def"
}
Again, clearly these mean the same thing, but they're not string-equal.
Essentially, if you're doing string comparison you're trusting the serializer to always serialize a particular object in exactly the same way (without any added whitespace, etc), which ends up being remarkably fragile, especially given that most libraries do not, to my knowledge, provide any such guarantee. This is particularly problematic if you update the serialization libraries at some point and there's a subtle difference in how they do the serialization; in this case, if you try to compare a saved object that was serialized with the previous version of the library with one that was serialized with the current version then it wouldn't work.
Also, just as a quick note on your code itself, the "==" operator is not the proper way to compare objects. In general, "==" tests for reference equality, not object equality.
One more quick digression on hash algorithms: how reliable they are as a means of equality testing depends on how collision resistant they are. In other words, given two different, non-equal objects, what's the probability that they'll hash to the same value? Conversely, if two objects hash to the same value, what are the odds that they're actually equal? A lot of people take it for granted that their hash algorithms are 100% collision resistant (i.e. two objects will hash to the same value if, and only if, they're equal) but this isn't necessarily true. (A particularly well-known example of this is the MD5 cryptographic hash function, whose relatively poor collision resistance has rendered it unsuitable for further use). For a properly-implemented hash function, in most cases the probability that two objects that hash to the same value are actually equal is sufficiently high to be suitable as a means of equality testing but it's not guaranteed.
Object comparison using serialize and then comparing the strings representations in not effective in the following cases:
When a property of type DateTime exists in the types that need to be compared
public class Obj
{
public DateTime Date { get; set; }
}
Obj o1 = new Obj { Date = DateTime.Now };
Obj o2 = new Obj { Date = DateTime.Now };
bool result = new Comparator<Obj>().Equals(o1, o2);
It will result false even for objects very close created in time, unless they don't share the exactly same property.
For objects that have double or decimal values which need to be compared with an Epsilon to verify if they are eventually very close to each other
public class Obj
{
public double Double { get; set; }
}
Obj o1 = new Obj { Double = 22222222222222.22222222222 };
Obj o2 = new Obj { Double = 22222222222222.22222222221 };
bool result = new Comparator<Obj>().Equals(o1, o2);
This will also return false even the double values are really close to each other, and in the programs which involves calculation, it will become a real problem, because of the loss of precision after multiple divide and multiply operations, and the serialize does not offer the flexibility to handle these cases.
Also considering the above cases, if one wants not to compare a property, it will face the problem of introducing a serialize attribute to the actual class, even if it is not necessary and it will lead to code pollution or problems went it will have to actually use serialization for that type.
Note: These are some of the actual problems of this approach, but I am looking forward to find others.
For unit tests you don`t need write own comparer. :)
Just use modern frameworks. For example try FluentAssertions library
o1.ShouldBeEquivalentTo(o2);
Serialization was made for storing an object or sending it over a pipe (network) that is outside of the current execution context. Not for doing something inside the execution context.
Some serialized values might not be considered equal, which in fact they are : decimal "1.0" and integer "1" for instance.
For sure you can just like you can eat with a shovel but you don't because you might break your tooth!
You can use System.Reflections namespace to get all the properties of the instance like in this answer. With Reflection you can compare not only public properties, or fields (like using Json Serialization), but also some private, protected, etc. to increase the speed of calculation. And of course, it's obvious that you don't have to compare all properties or fields of instance if two objects are different (excluding the example when only the last property or field of object differs).
If we want to get a value from a method, we can use either return value, like this:
public int GetValue();
or:
public void GetValue(out int x);
I don't really understand the differences between them, and so, don't know which is better. Can you explain me this?
Thank you.
Return values are almost always the right choice when the method doesn't have anything else to return. (In fact, I can't think of any cases where I'd ever want a void method with an out parameter, if I had the choice. C# 7's Deconstruct methods for language-supported deconstruction acts as a very, very rare exception to this rule.)
Aside from anything else, it stops the caller from having to declare the variable separately:
int foo;
GetValue(out foo);
vs
int foo = GetValue();
Out values also prevent method chaining like this:
Console.WriteLine(GetValue().ToString("g"));
(Indeed, that's one of the problems with property setters as well, and it's why the builder pattern uses methods which return the builder, e.g. myStringBuilder.Append(xxx).Append(yyy).)
Additionally, out parameters are slightly harder to use with reflection and usually make testing harder too. (More effort is usually put into making it easy to mock return values than out parameters). Basically there's nothing I can think of that they make easier...
Return values FTW.
EDIT: In terms of what's going on...
Basically when you pass in an argument for an "out" parameter, you have to pass in a variable. (Array elements are classified as variables too.) The method you call doesn't have a "new" variable on its stack for the parameter - it uses your variable for storage. Any changes in the variable are immediately visible. Here's an example showing the difference:
using System;
class Test
{
static int value;
static void ShowValue(string description)
{
Console.WriteLine(description + value);
}
static void Main()
{
Console.WriteLine("Return value test...");
value = 5;
value = ReturnValue();
ShowValue("Value after ReturnValue(): ");
value = 5;
Console.WriteLine("Out parameter test...");
OutParameter(out value);
ShowValue("Value after OutParameter(): ");
}
static int ReturnValue()
{
ShowValue("ReturnValue (pre): ");
int tmp = 10;
ShowValue("ReturnValue (post): ");
return tmp;
}
static void OutParameter(out int tmp)
{
ShowValue("OutParameter (pre): ");
tmp = 10;
ShowValue("OutParameter (post): ");
}
}
Results:
Return value test...
ReturnValue (pre): 5
ReturnValue (post): 5
Value after ReturnValue(): 10
Out parameter test...
OutParameter (pre): 5
OutParameter (post): 10
Value after OutParameter(): 10
The difference is at the "post" step - i.e. after the local variable or parameter has been changed. In the ReturnValue test, this makes no difference to the static value variable. In the OutParameter test, the value variable is changed by the line tmp = 10;
What's better, depends on your particular situation. One of the reasons out exists is to facilitate returning multiple values from one method call:
public int ReturnMultiple(int input, out int output1, out int output2)
{
output1 = input + 1;
output2 = input + 2;
return input;
}
So one is not by definition better than the other. But usually you'd want to use a simple return, unless you have the above situation for example.
EDIT:
This is a sample demonstrating one of the reasons that the keyword exists. The above is in no way to be considered a best practise.
You should generally prefer a return value over an out param. Out params are a necessary evil if you find yourself writing code that needs to do 2 things. A good example of this is the Try pattern (such as Int32.TryParse).
Let's consider what the caller of your two methods would have to do. For the first example I can write this...
int foo = GetValue();
Notice that I can declare a variable and assign it via your method in one line. FOr the 2nd example it looks like this...
int foo;
GetValue(out foo);
I'm now forced to declare my variable up front and write my code over two lines.
update
A good place to look when asking these types of question is the .NET Framework Design Guidelines. If you have the book version then you can see the annotations by Anders Hejlsberg and others on this subject (page 184-185) but the online version is here...
http://msdn.microsoft.com/en-us/library/ms182131(VS.80).aspx
If you find yourself needing to return two things from an API then wrapping them up in a struct/class would be better than an out param.
There's one reason to use an out param which has not already been mentioned: the calling method is obliged to receive it. If your method produces a value which the caller should not discard, making it an out forces the caller to specifically accept it:
Method1(); // Return values can be discard quite easily, even accidentally
int resultCode;
Method2(out resultCode); // Out params are a little harder to ignore
Of course the caller can still ignore the value in an out param, but you've called their attention to it.
This is a rare need; more often, you should use an exception for a genuine problem or return an object with state information for an "FYI", but there could be circumstances where this is important.
It's preference mainly
I prefer returns and if you have multiple returns you can wrap them in a Result DTO
public class Result{
public Person Person {get;set;}
public int Sum {get;set;}
}
You should almost always use a return value. 'out' parameters create a bit of friction to a lot of APIs, compositionality, etc.
The most noteworthy exception that springs to mind is when you want to return multiple values (.Net Framework doesn't have tuples until 4.0), such as with the TryParse pattern.
You can only have one return value whereas you can have multiple out parameters.
You only need to consider out parameters in those cases.
However, if you need to return more than one parameter from your method, you probably want to look at what you're returning from an OO approach and consider if you're better off return an object or a struct with these parameters. Therefore you're back to a return value again.
I would prefer the following instead of either of those in this simple example.
public int Value
{
get;
private set;
}
But, they are all very much the same. Usually, one would only use 'out' if they need to pass multiple values back from the method. If you want to send a value in and out of the method, one would choose 'ref'. My method is best, if you are only returning a value, but if you want to pass a parameter and get a value back one would likely choose your first choice.
I think one of the few scenarios where it would be useful would be when working with unmanaged memory, and you want to make it obvious that the "returned" value should be disposed of manually, rather than expecting it to be disposed of on its own.
Additionally, return values are compatible with asynchronous design paradigms.
You cannot designate a function "async" if it uses ref or out parameters.
In summary, Return Values allow method chaining, cleaner syntax (by eliminating the necessity for the caller to declare additional variables), and allow for asynchronous designs without the need for substantial modification in the future.
As others have said: return value, not out param.
May I recommend to you the book "Framework Design Guidelines" (2nd ed)? Pages 184-185 cover the reasons for avoiding out params. The whole book will steer you in the right direction on all sorts of .NET coding issues.
Allied with Framework Design Guidelines is the use of the static analysis tool, FxCop. You'll find this on Microsoft's sites as a free download. Run this on your compiled code and see what it says. If it complains about hundreds and hundreds of things... don't panic! Look calmly and carefully at what it says about each and every case. Don't rush to fix things ASAP. Learn from what it is telling you. You will be put on the road to mastery.
Using the out keyword with a return type of bool, can sometimes reduce code bloat and increase readability. (Primarily when the extra info in the out param is often ignored.) For instance:
var result = DoThing();
if (result.Success)
{
result = DoOtherThing()
if (result.Success)
{
result = DoFinalThing()
if (result.Success)
{
success = true;
}
}
}
vs:
var result;
if (DoThing(out result))
{
if (DoOtherThing(out result))
{
if (DoFinalThing(out result))
{
success = true;
}
}
}
There is no real difference. Out parameters are in C# to allow method return more then one value, that's all.
However There are some slight differences , but non of them are really important:
Using out parameter will enforce you to use two lines like:
int n;
GetValue(n);
while using return value will let you do it in one line:
int n = GetValue();
Another difference (correct only for value types and only if C# doesn't inline the function) is that using return value will necessarily make a copy of the value when the function return, while using OUT parameter will not necessarily do so.
Please avoid using out parameters.
Although, they can make sense in certain situations (for example when implementing the Try-Parse Pattern), they are very hard to grasp.
Chances to introduce bugs or side effects by yourself (unless you are very experienced with the concept) and by other developers (who either use your API or may inherit your code) is very high.
According to Microsoft's quality rule CA1021:
Although return values are commonplace and heavily used, the correct application of out and ref parameters requires intermediate design and coding skills. Library architects who design for a general audience should not expect users to master working with out or ref parameters.
Therefore, if there is not a very good reason, please just don't use out or ref.
See also:
Is using "out" bad practice
https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1021
Both of them have a different purpose and are not treated the same by the compiler. If your method needs to return a value, then you must use return. Out is used where your method needs to return multiple values.
If you use return, then the data is first written to the methods stack and then in the calling method's. While in case of out, it is directly written to the calling methods stack. Not sure if there are any more differences.
out is more useful when you are trying to return an object that you declare in the method.
Example
public BookList Find(string key)
{
BookList book; //BookList is a model class
_books.TryGetValue(key, out book) //_books is a concurrent dictionary
//TryGetValue gets an item with matching key and returns it into book.
return book;
}
return value is the normal value which is returned by your method.
Where as out parameter, well out and ref are 2 key words of C# they allow to pass variables as reference.
The big difference between ref and out is, ref should be initialised before and out don't
I suspect I'm not going to get a look-in on this question, but I am a very experienced programmer, and I hope some of the more open-minded readers will pay attention.
I believe that it suits object-oriented programming languages better for their value-returning procedures (VRPs) to be deterministic and pure.
'VRP' is the modern academic name for a function that is called as part of an expression, and has a return value that notionally replaces the call during evaluation of the expression. E.g. in a statement such as x = 1 + f(y) the function f is serving as a VRP.
'Deterministic' means that the result of the function depends only on the values of its parameters. If you call it again with the same parameter values, you are certain to get the same result.
'Pure' means no side-effects: calling the function does nothing except computing the result. This can be interpreted to mean no important side-effects, in practice, so if the VRP outputs a debugging message every time it is called, for example, that can probably be ignored.
Thus, if, in C#, your function is not deterministic and pure, I say you should make it a void function (in other words, not a VRP), and any value it needs to return should be returned in either an out or a ref parameter.
For example, if you have a function to delete some rows from a database table, and you want it to return the number of rows it deleted, you should declare it something like this:
public void DeleteBasketItems(BasketItemCategory category, out int count);
If you sometimes want to call this function but not get the count, you could always declare an overloading.
You might want to know why this style suits object-oriented programming better. Broadly, it fits into a style of programming that could be (a little imprecisely) termed 'procedural programming', and it is a procedural programming style that fits object-oriented programming better.
Why? The classical model of objects is that they have properties (aka attributes), and you interrogate and manipulate the object (mainly) through reading and updating those properties. A procedural programming style tends to make it easier to do this, because you can execute arbitrary code in between operations that get and set properties.
The downside of procedural programming is that, because you can execute arbitrary code all over the place, you can get some very obtuse and bug-vulnerable interactions via global variables and side-effects.
So, quite simply, it is good practice to signal to someone reading your code that a function could have side-effects by making it non-value returning.
Introduction
As a developer, I'm involved in writing a lot of mathematical code everyday and I'd like to add very few syntactic sugar to the C# language to ease code writing and reviewing.
I've already read about this thread and this other one for possible solutions and would simply like to know which best direction to go and how much effort it may represent to solve only for the three following syntactical issues*.
*: I can survive without described syntactic sugars, but if it ain't too much work and Rube-Goldberg design for simple compilation process, it may be interesting to investigate further.
1. Multiple output arguments
I'd like to write:
[double x, int i] = foo(z);
Instead of:
double x;
int i;
foo(out x, out i, z);
NB: out parameters being placed first and foo being declared as usual (or using same kind of syntax).
2. Additional operators
I'd like to have a few new unary/binary operators. Don't know much how to define for these (and it seems quite complex for not introducing ambiguity when parsing sources), anyway would like to have something like:
namespace Foo
{
using binary operator "\" as "MyMath.LeftDivide";
using unary operator "'" as "MyMath.ConjugateTranspose";
public class Test
{
public void Example()
{
var y = x';
var z = x \ y;
}
}
}
Instead of:
namespace Foo
{
public class Test
{
public void Example()
{
var y = MyMath.ConjugateTranspose(x);
var z = MyMath.LeftDivide(x, y);
}
}
}
3. Automatic name insertion for static classes
It's extremely unappealing to endlessly repeating Math.BlaBlaBla() everywhere in a computation code instead of writing directly and simply BlaBlaBla.
For sure this can be solved by adding local methods to wrap Math.BlaBlaBla inside computation class. Anyway would be better when there's no ambiguity at all, or when ambiguity would be solved with some sort of implicit keyword, to automatically insert class names when required.
For instance:
using System;
using implicit MySystem; // Definying for 'MyMaths.Math.Bessel'
public class Example
{
public Foo()
{
var y = 3.0 * Cos(12.0);
var z = 3.0 * Bessel(42);
}
// Local definition of 'Bessel' function again
public static double Bessel(double x)
{
...
}
}
Would become:
using System;
using MySystem; // Definying for 'Math.Bessel'
public class Example
{
public Foo()
{
var y = 3.0 * System.Math.Cos(12.0); // No ambiguity at all
var z = 3.0 * MySystem.Math.Bessel(42); // Solved from `implicit` keyword
}
// Local definition of 'Bessel' function again
public static double Bessel(double x)
{
...
}
}
* The compiler may simply generate a warning to indicate that it solved the ambiguity because an implicit solution has been defined.
NB: 3) is satisfying enough for solving 2).
Have you considered using F# instead of C#? In my experience, I have found F# a great fit for scientific/mathematical oriented code, more so than C#.
Your first scenario is covered with tuples; you can write a function like let f a b = (a+b, a-b), which returns a tuple of 2 values directly, you can fairly easily overload operators or add your own, and modules may help you with the 3rd. And F# interoperates fairly smoothly with C#, so you can even pick F# for the parts where it's practical, and keep the rest of your code in C#. There are other niceties which work great for scientific code (units of measure for instance) as well...
I'd be one of the first people to support a mainstream C# that can be extended by users. However - for several reasons (design, time or cost-benefit) I cannot see C# being as extensible as you (or I) want. A C#-derived language I've found that is VERY good for meta-programming and extending functionality/syntax is Nemerle.
Boo is another .NET language with good meta-programming features. However, it is so far removed from C# that I didn't consider it an appropriate answer to this question (but added it for completeness, anyway).
I'm currently involved in a project where I have very large image volumes. This volumes have to processed very fast (adding, subtracting, thresholding, and so on). Additionally most of the volume are so large that they event don't fit into the memory of the system.
For that reason I have created an abstract volume class (VoxelVolume) that host the volume and image data and overloads the operators so that it's possible to perform the regular mathematical operations on volumes. Thereby two more questions opened up which I will put into stackoverflow into two additional threads.
Here is my first question. My volume is implemented in a way that it only can contain float array data, but most of the containing data is from an UInt16 image source. Only operations on the volume can create float array images.
When I started implementing such a volume the class looked like following:
public abstract class VoxelVolume<T>
{
...
}
but then I realized that overloading the operators or return values would get more complicated. An example would be:
public abstract class VoxelVolume<T>
{
...
public static VoxelVolume<T> Import<T>(param string[] files)
{
}
}
also adding two overloading operators would be more complicated:
...
public static VoxelVolume<T> operator+(VoxelVolume<T> A, VoxelVolume<T> B)
{
...
}
Let's assume I can overcome the problems described above, nevertheless I have different types of arrays that contain the image data. Since I have fixed my type in the volumes to float the is no problem and I can do an unsafe operation when adding the contents of two image volume arrays. I have read a few threads here and had a look around the web, but found no real good explanation of what to do when I want to add two arrays of different types in a fast way. Unfortunately every math operation on generics is not possible, since C# is not able to calculate the size of the underlying data type. Of course there might by a way around this problem by using C++/CLR, but currently everything I have done so far, runs in 32bit and 64bit without having to do a thing. Switching to C++/CLR seemed to me (pleased correct me if I'm wrong) that I'm bound to a certain platform (32bit) and I have to compile two assemblies when I let the application run on another platform (64bit). Is this true?
So asked shortly: How is it possible to add two arrays of two different types in a fast way. Is it true that the developers of C# haven't thought about this. Switching to a different language (C# -> C++) seems not to be an option.
I realize that simply performing this operation
float []A = new float[]{1,2,3};
byte []B = new byte[]{1,2,3};
float []C = A+B;
is not possible and unnecessary although it would be nice if it would work. My solution I was trying was following:
public static class ArrayExt
{
public static unsafe TResult[] Add<T1, T2, TResult>(T1 []A, T2 []B)
{
// Assume the length of both arrays is equal
TResult[] result = new TResult[A.Length];
GCHandle h1 = GCHandle.Alloc (A, Pinned);
GCHandle h2 = GCHandle.Alloc (B, Pinned);
GCHandle hR = GCHandle.Alloc (C, Pinned);
void *ptrA = h1.ToPointer();
void *ptrB = h2.ToPointer();
void *ptrR = hR.ToPointer();
for (int i=0; i<A.Length; i++)
{
*((TResult *)ptrR) = (TResult *)((T1)*ptrA + (T2)*ptrB));
}
h1.Free();
h2.Free();
hR.Free();
return result;
}
}
Please excuse if the code above is not quite correct, I wrote it without using an C# editor. Is such a solution a shown above thinkable? Please feel free to ask if I made a mistake or described some things incompletely.
Thanks for your help
Martin
This seems a (complicated) version of the "why don't we have an INumerical interface" .
The short answer to the last question is: No, going to unsafe pointers is no solution, the compiler still can't figure out the + in ((T1)*ptrA + (T2)*ptrB)) .
If you have only a few types like float and UInt32, provide all needed conversion functions, for example from VoxelVolume<UInt32> to VoxelVolume<float> and do the math on VoxelVolume<float>. That should be fast enough for most practical cases. You could even provide a generic conversion function from VoxelVolume<T1> to VoxelVolume<T2> (if T1 is convertible to T2). On the other hand, if you really need a
public static VoxelVolume<T2> operator+(VoxelVolume<T1> A,VoxelVolume<T2> B)
with type conversion from T1 to T2 for each array element, what hinders you from writing such operators?
Import, being a member of a generic class, probably doesn't also need to be itself generic. If it does, you definitely shouldn't use the same name T for both the generic parameter to the class and the generic parameter to the function.
What you're probably looking is Marc Gravell's Generic Operators
As for your questions about C++/CLI, yes this could help if you use templates instead of generics because then all the possible values for typename T are controlled at compile time and the compiler looks up the operators for each. Also, you can use /clr:pure or /clr:safe in which case your code will be MSIL, runnable on AnyCPU just like C#.
Admittedly, I didn't read the whole question (it's a quite a bit too long), but:
VoxelVolume<T> where T : ISummand ... T a; a.Add(b)
static float Sum (this VoxelVolume<float> self, VoxelVolume<float> other) {...}
To add float to byte in any meaningful sense you have to convert byte to float. So convert array of bytes to array of floats and then add them, you only lose some memory.