I'm currently doing a project for my c# classes. Our teacher gave us some code metrics limits that we have to abide to and one of them is cyclomatic complexity. Right now he complexity of the method below is 5, but it needs to be 4. Is there any way to improve that?
MethodI was talking about:
private bool MethodName()
{
int counter = 0;
for (int k = 0; k < 8; k++)
{
for (int j = 0; j < 3; j++)
{
if (class1.GetBoard()[array1[k, j, 0], array1[k, j, 1]] == player.WhichPlayer()) counter++;
}
if (counter == 3) return true;
else counter = 0;
}
return false;
}
I can wrap the conditions to reduce it. For example
private bool MethodName()
{
for (int k = 0; k < 8; k++)
{
bool b = true;
for (int j = 0; j < 3; j++)
{
b &= class1.GetBoard()[array1[k, j, 0], array1[k, j, 1]] == player.WhichPlayer();
}
if (b) return true;
}
return false;
}
For the OP (that seems to be just starting on programming):
It is nice that you got an assignment to reduce the cyclomatic complexity of a method, and it is good both to know what is it and how to keep it low as a general practice.
However, try to not get too zealous with these kind of metrics. There is more value on having the code as straightforward as possible, easy to reason about and understand quickly, and only worry about metrics after profiling the app and knowing the places it matters the most.
For more experienced coders:
This simple problem reminded me of a very famous discussion from 1968 between Dijkstra and other fellas on the ACM periodic. Although I tend to align with him on this matter, that was one answer from Frank Rubin that is very reasonable.
Frank basically advocates that "elegance" can many times arise from code clarity, instead of any other metrics of practices. Back then, the discussion was the over-use of the goto statement on the popular languages of that time. Today, discussion goes around cyclomatic complexity, terseness, oop or whatever.
The bottom line is, in my opinion:
know your tools
code with clarity in mind
try to write efficient code on the first pass, but don't overthink
profile your code and decide where it's work to spend more time
Back to the question
The implementation presented in the question got the following scores in my Visual Studio Analyzer:
Cycl. Compl: 5; Maintainability: 67
The snippet presented by #Boris got this:
Cycl. Compl: 4; Maintainability: 68
Even though the cyclomatic complexity improved, the maintainability index keeps basically the same. Personally, I consider the latter metric more valuable most of the time.
Just for fun, let's see how a solution akin the one presented by Frank Rubin that uses the dreaded goto statement would look like:
private bool MethodName() {
for (int k = 0; k < 8; k++) {
for (int j = 0; j < 3; j++) {
if (watheverTestCondition(k, j) is false) goto reject;
}
// condition is true for all items in this row
return true;
// if condition is false for any item, go straight to this line
reject:;
}
return false;
}
Honestly, I think this is the most clear, simple and performatic implementation for this. Do I recommend goto in general as a code feature? NO. Does it fit perfectly and smoothly in this specific case? YES. And what about the metrics?
Cycl. Compl: 4; Maintainability: 70
Bonus
Just because I wouldn't be able to sleep if I didn't tell that, this is how you would implement this in real life:
obj.Any(row => row.All(watheverTestCondition));
Cycl. Compl: 1; Maintainability: 80
Related
I was doing a codewars kata and it's working but I'm timing out.
I searched online for solutions, for some kind of reference but they were all for java script.
Here is the kata: https://i.stack.imgur.com/yGLmw.png
Here is my code:
public static int DblLinear(int n)
{
if(n > 0)
{
var list = new List<int>();
int[] next_two = new int[2];
list.Add(1);
for (int i = 0; i < n; i++)
{
for (int m = 0; m < next_two.Length; m++)
{
next_two[m] = ((m + 2) * list[i]) + 1;
}
if(list.Contains(next_two[0]))
{
list.Add(next_two[1]);
}
else if(list.Contains(next_two[1]))
{
list.Add(next_two[0]);
}
else
list.AddRange(next_two);
list.Sort();
}
return list[n];
}
return 1;
}
It's really slow solution but that's what seems to be working for me.
The first rule of performance optimization is to measure. Ideally using a profiler that can tell you where most of the time is spent, but for simple cases using some stopwatches can be sufficient.
I would guess that most of the time would be spent in list.Contains, since this is linear lookup, and is in the innermost loop. So one approach would be to change the list to a HashSet<int> to provide better lookup performance, skip the .Sort-call, and return the maximum value in the hashSet. As far as I can tell that should give the same result.
You might also consider using some specialized data structure that fits the problem better than the general containers provided in .Net.
I was solving a question in which I've to create a unique array from the sorted array which can have duplicate elements.
I solved the solution using the following code:
for (int i = 0; i < sorted.Length - 1; i++)
{
if (sorted[i] == sorted[i + 1])
{
unqiueList.Add(sorted[i]);
int j = i + 1;
while (j < sorted.Length)
{
if (sorted[i] != sorted[j])
{
break;
}
j++;
i++;
}
}
else
{
unqiueList.Add(sorted[i]);
}
}
Now, I want to know the complexity of this solution.
Some people say, it is N but some say it is N^2. It hints to my mind why not ask the same question to stack overflow to have better understanding of it.
Worst case is O(N).
It's a bit of a nasty one, but given the fact that i and j are incremented on every iteration in the while loop, there is basically no looping in looping.
The algorithm doesn't allow more iterations than sorted.Length.
Interestingly; this indicates that the while loop could be replaced with an if statement (might not be a simple one), but it would be a nice exercise.
I've stumbled upon this effect when debugging an application - see the repro code below.
It gives me the following results:
Data init, count: 100,000 x 10,000, 4.6133365 secs
Perf test 0 (False): 5.8289565 secs
Perf test 0 (True): 5.8485172 secs
Perf test 1 (False): 32.3222312 secs
Perf test 1 (True): 217.0089923 secs
As far as I understand, the array store operations shouldn't normally have such a drastic performance effect (32 vs 217 seconds). I wonder if anyone understands what effects are at play here?
UPD extra test added; Perf 0 shows the results as expected, Perf 1 - shows the performance anomaly.
class Program
{
static void Main(string[] args)
{
var data = InitData();
TestPerf0(data, false);
TestPerf0(data, true);
TestPerf1(data, false);
TestPerf1(data, true);
if (Debugger.IsAttached)
Console.ReadKey();
}
private static string[] InitData()
{
var watch = Stopwatch.StartNew();
var data = new string[100_000];
var maxString = 10_000;
for (int i = 0; i < data.Length; i++)
{
data[i] = new string('-', maxString);
}
watch.Stop();
Console.WriteLine($"Data init, count: {data.Length:n0} x {maxString:n0}, {watch.Elapsed.TotalSeconds} secs");
return data;
}
private static void TestPerf1(string[] vals, bool testStore)
{
var watch = Stopwatch.StartNew();
var counters = new int[char.MaxValue];
int tmp = 0;
for (var j = 0; ; j++)
{
var allEmpty = true;
for (var i = 0; i < vals.Length; i++)
{
var val = vals[i];
if (j < val.Length)
{
allEmpty = false;
var ch = val[j];
var count = counters[ch];
tmp ^= count;
if (testStore)
counters[ch] = count + 1;
}
}
if (allEmpty)
break;
}
// prevent the compiler from optimizing away our computations
tmp.GetHashCode();
watch.Stop();
Console.WriteLine($"Perf test 1 ({testStore}): {watch.Elapsed.TotalSeconds} secs");
}
private static void TestPerf0(string[] vals, bool testStore)
{
var watch = Stopwatch.StartNew();
var counters = new int[65536];
int tmp = 0;
for (var i = 0; i < 1_000_000_000; i++)
{
var j = i % counters.Length;
var count = counters[j];
tmp ^= count;
if (testStore)
counters[j] = count + 1;
}
// prevent the compiler from optimizing away our computations
tmp.GetHashCode();
watch.Stop();
Console.WriteLine($"Perf test 0 ({testStore}): {watch.Elapsed.TotalSeconds} secs");
}
}
After testing your code for quite some time my best guess is, as already said in the comments, that you experience a lot of cache-misses with your current solution. The line:
if (testStore)
counters[ch] = count + 1;
might be force the compiler to completely load a new cache-line into the memory and displace the current content. There might also be some problems with branch-prediction in this scenario. This is highly hardware dependent and I'm not aware of a really good solution to test this in any interpreted language (It's also quite hard in compiled languages where the hardware is set and well-known).
After going through the disassembly, you can clearly see that you also introduce a whole bunch of new instruction which might increase the before mentioned problems further.
Overall I'd advice you the re-write the complete algorithm as there are better places to improve performance instead of picking at this one little assignment. This would be the optimizations I'd suggest (this also improves readability):
Invert your i and j loop. This will remove the allEmpty variable completely.
Cast ch to int with var ch = (int) val[j]; - because you ALWAYS use it as index.
Think about why this might be a problem at all. You introduce a new instruction and any instruction comes at a cost. If this is really the primary "hot-spot" of your code you can start to think about better solutions (Remember: "premature optimization is the root of all evil").
As this is a "test setting" which the name suggests, is this important at all? Just remove it.
EDIT: Why did I suggest to invert to loops? With this little rearrangement of code:
foreach (var val in vals)
{
foreach (int ch in val)
{
var count = counters[ch];
tmp ^= count;
if (testStore)
{
counters[ch] = count + 1;
}
}
}
I come from runtimes like this:
to runtimes like this:
Do you still think it's not worth a try? I saved some orders of magnitude here and nearly eliminated the effect of the if (to be clear - all optimizations are disabled in the settings). If there are special reasons not to do this you should tell us more about the context in which this code will be used.
EDIT2: For the in-depth answer. My best explanation for why this problem occurs is because you cross-reference your cache-lines. In the lines:
for (var i = 0; i < vals.Length; i++)
{
var val = vals[i];
you load a really massive dataset. This is by far bigger than a cache-line itself. So it will most likely need to be loaded every iteration fresh from the memory into a new cache-line (displacing the old content). This is also known as "cache-thrashing" if I remember correctly. Thanks to #mjwills for pointing this out in his comment.
In my suggested solution, on the other hand, the content of a cache-line can stay alive as long as the inner loop did not exceed its boundaries (which happens a lot less if you use this direction of memory access).
This is the closest explanation why me code runs that much faster and it also supports the assumption that you have serious caching problems with your code.
I had a look and couldn't see anything quite answering my question.
I'm not exactly the best at creating accurate 'real life' tests, so i'm not sure if that's the problem here. Basically I want to create a few simple neural networks to create something to the effect of Gridworld. Performance of these neural networks will be critical and i dont want the hidden layer to be a bottleneck as much as possible.
I would rather use more memory and be faster, so I opted to use arrays instead of lists (due to lists having an extra bounds check over arrays). The arrays aren't always full, but because the if statement (check if the element is null) is the same until the end, it can be predicted and there is no performance drop from that at all.
My question comes from how I store the data for the network to process. I figured due to 2D arrays storing all the data together it would be better cache wise and would run faster. But from my mock up test that an array of arrays performs much better in this scenario.
Some Code:
private void RunArrayOfArrayTest(float[][] testArray, Data[] data)
{
for (int i = 0; i < testArray.Length; i++) {
for (int j = 0; j < testArray[i].Length; j++) {
var inputTotal = data[i].bias;
for (int k = 0; k < data[i].weights.Length; k++) {
inputTotal += testArray[i][k];
}
}
}
}
private void Run2DArrayTest(float[,] testArray, Data[] data, int maxI, int maxJ)
{
for (int i = 0; i < maxI; i++) {
for (int j = 0; j < maxJ; j++) {
var inputTotal = data[i].bias;
for (int k = 0; k < maxJ; k++) {
inputTotal += testArray[i, k];
}
}
}
}
These are the two functions that are timed. Each 'creature' has its own network (The first for loop), each network has hidden nodes (The second for loop) and i need to find the sum of the weights for each input (The third loop). In my test i stripped it so that it's not really what i am doing in my actual code, but the same amount of loops happen (The data variable would have it's own 2D array, but i didn't want to possibly skew the results). From this i was trying to get a feel for which one is faster, and to my surprise the array of arrays was.
Code to start the tests:
// Array of Array test
Stopwatch timer = Stopwatch.StartNew();
RunArrayOfArrayTest(arrayOfArrays, dataArrays);
timer.Stop();
Console.WriteLine("Array of Arrays finished in: " + timer.ElapsedTicks);
// 2D Array test
timer = Stopwatch.StartNew();
Run2DArrayTest(array2D, dataArrays, NumberOfNetworks, NumberOfInputNeurons);
timer.Stop();
Console.WriteLine("2D Array finished in: " + timer.ElapsedTicks);
Just wanted to show how i was testing it. The results from this in release mode give me values like:
Array of Arrays finished in: 8972
2D Array finished in: 16376
Can someone explain to me what i'm doing wrong? Why is an array of arrays faster in this situation by so much? Isn't a 2D array all stored together, meaning it would be more cache friendly?
Note i really do need this to be fast as it needs to sum up hundreds of thousands - millions of numbers per frame, and like i said i don't want this is be a problem. I know this can be multi threaded in the future quite easily because each network is completely separate and even each node is completely separate.
Last question i suppose, would something like this be possible to run on the GPU instead? I figure a GPU would not struggle to have much larger amounts of networks with much larger numbers of input/hidden neurons.
In the CLR, there are two different types of array:
Vectors, which are zero-based, single-dimensional arrays
Arrays, which can have non-zero bases and multiple dimensions
Your "array of arrays" is a "vector of vectors" in CLR terms.
Vectors are significantly faster than arrays, basically. It's possible that arrays could be optimized further in later CLR versions, but I doubt that there'll get the same amount of love as vectors, as they're so relatively rarely used. There's not a lot you can do to make CLR arrays faster. As you say, they'll be more cache friendly, but they have this CLR penalty.
You can improve your array-of-arrays code already, however, by only performing the first indexing operation once per row:
private void RunArrayOfArrayTest(float[][] testArray, Data[] data)
{
for (int i = 0; i < testArray.Length; i++) {
// These don't change in the loop below, so extract them
var row = testArray[i];
var inputTotal = data[i].bias;
var weightLength = data[i].weights.Length;
for (int j = 0; j < row.Length; j++) {
for (int k = 0; k < weightLength; k++) {
inputTotal += row[k];
}
}
}
}
If you want to get the cache friendliness and still use a vector, you could have a single float[] and perform the indexing yourself... but I'd probably start off with the array-of-arrays approach.
I'm facing a strange issue that I can't explain and I would like to know if some of you have the answer I'm lacking.
I have a small test app for testing multithreading modifications I'm making to a much larger code. In this app I've set up two functions, one that does a loop sequentially and one that uses the Task.Parallel.For . The two of them print out the time and final elements generated. What I'm seeing is that the function that executes the Parallel.For is generating less items than the sequential loop and this is huge problem for the real app(it's messing with some final results). So, my question is if someone has any idea why this could be happening and if so, if there's anyway to fix it.
Here is the code for the function that uses the parallel.for in my test app:
static bool[] values = new bool[52];
static List<int[]> combinations = new List<int[]>();
static void ParallelLoop()
{
combinations.Clear();
Parallel.For(0, 48, i =>
{
if (values[i])
{
for (int j = i + 1; j < 49; j++)
if (values[j])
{
for (int k = j + 1; k < 50; k++)
{
if (values[k])
{
for (int l = k + 1; l < 51; l++)
{
if (values[l])
{
for (int m = l + 1; m < 52; m++)
{
if (values[m])
{
int[] combination = { i, j, k, l, m };
combinations.Add(combination);
}
}
}
}
}
}
}
}
}); // Parallel.For
}
And here is the app output:
Executing sequential loop...
Number of elements generated: 1,712,304
Executing parallel loop...
Number of elements generated: 1,464,871
Thanks in advance and if you need some clarifications I'll do my best to explain in further detail.
You can't just add items in your list by multiple threads at the same time without any synchronization mechanism. List<T>.Add() actually does some none-trivial internal stuff (buffers...etc) so adding an item is not an atomic thread-safe operation.
Either:
Provide a way to synchronize your writes
Use a collection that supports concurrent writes (see System.Collections.Concurrent namespace)
Don't use multi-threading at all