Is there a pattern (refactoring) to improve this piece of code?
int phase = 0;
foreach (var some in arrayOfSome)
{
if (phase == 0)
{
bool result = DoSomething_0(some);
if (result) phase = 1;
}
else if (phase == 1)
{
bool result = DoSomething_1(some);
if (result) phase = 0;
result = DoSomething_1_0(some);
if (result) phase = 2;
}
else if (phase == 2)
{
bool result = DoSomething_2(some);
if (result) break;
}
}
First of all, I want to reduce the number of conditional operators and make the code more readable.
An elegant approach is to eliminate the if/then logic altogether. You can do this by creating an array of delegates indexed by state (phase). Each delegate slot in the array points at a function (or lambda) that does whatever and returns the new state (phase).
In these kinds of designs, you often end up with a loop that contains a single statement like this:
foreach (var x in data)
{
state = handler[state](x);
}
I've coded this kind of FSM many times and they are very generic when done correctly and very reliable. Bugs almost always boil down to small errors in handlers or incorrect handlers being defined for a given state/input combination.
A more general and powerful design has this pattern:
foreach (var x in data)
{
state = handler[state,x](x);
}
Where both the state and datum are used to select the handler delegate. There are huge benefits when using this pattern in that the temptation to add extra flags and conditions within the loop as the code or functionality grows over time.
The code you posted is OK but if - after time - one ends up with ten different states and umpteen conditions and many handlers then very subtle hard to fix bugs will creep in that are almost impossible sometimes to fix.
Don't see the need for refactoring here. Maybe you could reduce some lines and write it like this:
int phase = 0;
foreach (var some in arrayOfSome)
{
if (phase == 0)
{
if (DoSomething_0(some)) phase = 1;
}
if (phase == 1)
{
if (DoSomething_1(some)) phase = 0;
if (DoSomething_1_0(some)) phase = 2;
}
if (phase == 2)
{
if (DoSomething_2(some)) break;
}
}
If you are concerned with too many if's within a for (again it is readable in your case) you can have a look at Flattening Arrow Code
Related
Is there a classy way to do:
foreach(item i in ListItems1)
do ...
foreach(item i in ListItems2)
do ...
foreach(item i in ListItems2)
do ...
...
In a single foreach (using Linq I suppose ?) in C#, that does not hurt performance (especially the memory side) ?
The best way to manage this is to factor it into a function that you call 3 times:
public void ProcessList(List<myListType> theList)
{
//Do some cool stuff here...
}
But it's borderline whether you get much maintainability benefit with this. If you don't want to increase memory usage then this is probably the best refactor to make your code better. Unless your lists are huge the memory differences are likely to be negligible anyway.
You can always do
foreach (item i in ListItems1.Concat(ListItems2).Concat(ListItems3))
{
// do things
}
There's also the similar .Union() that removes duplicate items. However, since it is removing duplicates, it's less performant.
For the sake of answering your question and being a little creative, this here "technically" would work but I wouldn't advice using it.
int listItemTotalItemCount = ListItems1.Count() + ListItems2.Count() + ListItems3.Count();
for(int i = 0; i < listItemTotalItemCount; i++)
{
int listIndex = i;
object retrievedItem;
if(listIndex >= ListItems1.Count())
{
if (i - ListItems1.Count() >= ListItems2.Count())
{
retrievedItem = ListItems3[i - ListItems1.Count()- ListItems2.Count() ];
}
else
{
retrievedItem = ListItems2[i - ListItems1.Count()];
}
}
else
{
retrievedItem = ListItems1[listIndex];
}
retrievedItem.DoSomethingSpecial();
}
This can probably be improved as well with an array of the lists you want to iterate over, but as I said before I would not recommend this method as it is I'm guessing significantly worse than your original method.
I'm looping thorugh the rows of a dataset and inserting it into an active-space environment (by tibco, its an in-memory db). This is how i'm doing it.
Is there a faster way to go about this?
I was thinking of partitioning the rows and then paralleling each partition, but i have no clue if that will make it faster.
System.Threading.Tasks.Parallel.ForEach(
dataSet.Tables[0].Rows,
currRow =>
{
var tuple = Com.Tibco.As.Space.Tuple.Create();
for (int i = 0; i < currRow.Values.Length; i++)
{
if (currRow.Values[i] != null)
{
var k = ConvertToAny(currRow.Values[i].ToString());
if (k.GetType().IsEquivalentTo(typeof(DateTime)))
{
tuple.Put(dataSet.Tables[0].ColumnNames[i], (DateTime)k);
}
else if (k.GetType().IsEquivalentTo(typeof(double)))
{
tuple.Put(dataSet.Tables[0].ColumnNames[i], (double)k);
}
else
{
tuple.Put(dataSet.Tables[0].ColumnNames[i], k.ToString());
}
}
}
try
{
inSpace_.Put(tuple);
}
catch (Exception e)
{
}
}
);
I'm thinking of batching it at around 1000 at a time, if someone can please help :(
EDIT:
List tuplesToAdd = new List();
for (int i = 0; i < dataSet.Tables[0].Rows.Length; i++)
{
var tuple = Com.Tibco.As.Space.Tuple.Create();
for (int j = 0; j < dataSet.Tables[0].Rows[i].Values.Length; j++)
{
if (dataSet.Tables[0].Rows[i].Values[j] != null)
{
var k = ConvertToAny(dataSet.Tables[0].Rows[i].Values[j].ToString());
if (k is DateTime)
{
tuple.Put(dataSet.Tables[0].ColumnNames[j], (DateTime)k);
}
else if (k is Double)
{
tuple.Put(dataSet.Tables[0].ColumnNames[j], (Double)k);
}
else
{
tuple.Put(dataSet.Tables[0].ColumnNames[j], k.ToString());
}
}
}
tuplesToAdd.Add(tuple);
if (i % 100000 == 0 || i == dataSet.Tables[0].Rows.Length - 1)
{
ThreadStart TUPLE_WORKER = delegate
{
inSpace_.PutAll(tuplesToAdd);
};
new Thread(TUPLE_WORKER).Start();
tuplesToAdd.Clear();
}
}
There's my new way of trying to do it (by batching)
I'm not certain, but it looks like you could avoid the ToString in your conversion code. That is, rather than:
var k = ConvertToAny(currRow.Values[i].ToString());
if (k.GetType().IsEquivalentTo(typeof(DateTime)))
Can be replaced by ...
var k = currRow.Values[i];
if (k is DateTime)
{
tuple.Put(dataSet.Tables[0].ColumnNames[i], (DateTime)k);
}
That should save you converting to string and then back.
Added in response to comments
First, your ConvertToAny is unnecessary. The item in currRow.Values[i] is already the right type. You just don't know what type it is. Unless you're saying that it could be the string representation of a DateTime or a Double. If the type is already a double, for example, then there's no reason to convert to string, parse, and then convert back. That is, the following two bits of code do the same thing:
object o = 3.14;
var k = ConvertToAny(o.ToString());
if (k.GetType.IsEquivalentTo(typeof(double))
and
object o = 3.14;
if (o is double)
The only difference is that the second will be much faster.
However, if you have
object o = "3.14";
and you want that to be converted to double, then you'll have to do the conversion.
Your code that batches things has to lock the list when adding and updating. Otherwise you will corrupt it. I would suggest:
lock (tuplesToAdd)
{
tuplesToAdd.Add(tuple);
if ((tuplesToAdd.Count % 10000) == 0)
{
// push them all to the database.
inspace_.PutAll(tuplesToAdd);
tuplesToAdd.Clear();
}
}
And when you're all done (i.e. the Parallel.Foreach is done):
if (tuplesToAdd.Count > 0)
{
// push the remaining items
}
Now, if you want to avoid blocking all of the threads during the update, you can get a little creative.
First, create two objects that you can lock on:
private object lockObject = new Object();
private object pushLock = new Object();
Create that right after you create the tuplesToAdd list. Then, when you want to add an item:
Monitor.Enter(lockObject); // acquires the lock
tuplesToAdd.Add(tuple);
if (tuplesToAdd.Count == 100000)
{
var tuplesToPush = tuplesToAdd;
tuplesToAdd = new List<tuple>(10000);
Monitor.Exit(lockObject); // releases the lock so other threads can process
lock (pushLock) // prevent multiple threads from pushing at the same time
{
inspace_.PutAll(tuplesToPush);
}
}
else
{
Monitor.Exit(lockObject);
}
That way, while one thread is updating the database, the others can be filling the list for the next time around.
And after I think about it a bit more, you probably don't even need to use parallel processing for this task. It's likely that the vast majority of your time was being spent by threads waiting on the Put call. Using a single thread to batch these and write them in bulk will probably execute much faster than your original solution. The parallel version you decided on will be faster, but I doubt that it will be hugely faster.
JavaScript supports a goto like syntax for breaking out of nested loops. It's not a great idea in general, but it's considered acceptable practice. C# does not directly support the break labelName syntax...but it does support the infamous goto.
I believe the equivalent can be achieved in C#:
int i = 0;
while(i <= 10)
{
Debug.WriteLine(i);
i++;
for(int j = 0; j < 3; j++)
if (i > 5)
{
goto Break;//break out of all loops
}
}
Break:
By the same logic of JavaScript, is nested loop scenario an acceptable usage of goto? Otherwise, the only way I am aware to achieve this functionality is by setting a bool with appropriate scope.
My opinion: complex code flows with nested loops are hard to reason about; branching around, whether it is with goto or break, just makes it harder. Rather than writing the goto, I would first think really hard about whether there is a way to eliminate the nested loops.
A couple of useful techniques:
First technique: Refactor the inner loop to a method. Have the method return whether or not to break out of the outer loop. So:
for(outer blah blah blah)
{
for(inner blah blah blah)
{
if (whatever)
{
goto leaveloop;
}
}
}
leaveloop:
...
becomes
for(outer blah blah blah)
{
if (Inner(blah blah blah))
break;
}
...
bool Inner(blah blah blah)
{
for(inner blah blah blah)
{
if (whatever)
{
return true;
}
}
return false;
}
Second technique: if the loops do not have side effects, use LINQ.
// fulfill the first unfulfilled order over $100
foreach(var customer in customers)
{
foreach(var order in customer.Orders)
{
if (!order.Filled && order.Total >= 100.00m)
{
Fill(order);
goto leaveloop;
}
}
}
leaveloop:
instead, write:
var orders = from customer in customers
from order in customer.Orders;
where !order.Filled
where order.Total >= 100.00m
select order;
var orderToFill = orders.FirstOrDefault();
if (orderToFill != null) Fill(orderToFill);
No loops, so no breaking out required.
Alternatively, as configurator points out in a comment, you could write the code in this form:
var orderToFill = customers
.SelectMany(customer=>customer.Orders)
.Where(order=>!order.Filled)
.Where(order=>order.Total >= 100.00m)
.FirstOrDefault();
if (orderToFill != null) Fill(orderToFill);
The moral of the story: loops emphasize control flow at the expense of business logic. Rather than trying to pile more and more complex control flow on top of each other, try refactoring the code so that the business logic is clear.
I would personally try to avoid using goto here by simply putting the loop into a different method - while you can't easily break out of a particular level of loop, you can easily return from a method at any point.
In my experience this approach has usually led to simpler and more readable code with shorter methods (doing one particular job) in general.
Let's get one thing straight: there is nothing fundamentally wrong with using the goto statement, it isn't evil - it is just one more tool in the toolbox. It is how you use it that really matters, and it is easily misused.
Breaking out of a nested loop of some description can be a valid use of the statement, although you should first look to see if it can be redesigned. Can your loop exit expressions be rewritten? Are you using the appropriate type of loop? Can you filter the list of data you may be iterating over so that you don't need to exit early? Should you refactor some loop code into a separate function?
IMO it is acceptable in languages that do not support break n; where n specifies the number of loops it should break out.
At least it's much more readable than setting a variable that is then checked in the outer loop.
I believe the 'goto' is acceptable in this situation. C# does not support any nifty ways to break out of nested loops unfortunately.
It's a bit of a unacceptable practice in C#. If there's no way your design can avoid it, well, gotta use it. But do exhaust all other alternatives first. It will make for better readability and maintainability. For your example, I've crafted one such potential refactoring:
void Original()
{
int i = 0;
while(i <= 10)
{
Debug.WriteLine(i);
i++;
if (Process(i))
{
break;
}
}
}
bool Process(int i)
{
for(int j = 0; j < 3; j++)
if (i > 5)
{
return true;
}
return false;
}
I recommend using continue if you want to skip that one item, and break if you want to exit the loop. For deeper nested put it in a method and use return. I personally would rather use a status bool than a goto. Rather use goto as a last resort.
anonymous functions
You can almost always bust out the inner loop to an anonymous function or lambda. Here you can see where the function used to be an inner loop, where I would have had to use GoTo.
private void CopyFormPropertiesAndValues()
{
MergeOperationsContext context = new MergeOperationsContext() { GroupRoot = _groupRoot, FormMerged = MergedItem };
// set up filter functions caller
var CheckFilters = (string key, string value) =>
{
foreach (var FieldFilter in MergeOperationsFieldFilters)
{
if (!FieldFilter(key, value, context))
return false;
}
return true;
};
// Copy values from form to FormMerged
foreach (var key in _form.ValueList.Keys)
{
var MyValue = _form.ValueList(key);
if (CheckFilters(key, MyValue))
MergedItem.ValueList(key) = MyValue;
}
}
This often occurs when searching for multiple items in a dataset manually, as well. Sad to say the proper use of goto is better than Booleans/flags, from a clarity standpoint, but this is more clear than either and avoids the taunts of your co-workers.
For high-performance situations, a goto would be fitting, however, but only by 1%, let's be honest here...
int i = 0;
while(i <= 10)
{
Debug.WriteLine(i);
i++;
for(int j = 0; j < 3 && i <= 5; j++)
{
//Whatever you want to do
}
}
Unacceptable in C#.
Just wrap the loop in a function and use return.
EDIT: On SO, downvoting is used to on incorrect answers, and not on answers you disagree with. As the OP explicitly asked "is it acceptable?", answering "unacceptable" is not incorrect (although you might disagree).
This question already has answers here:
Closed 11 years ago.
Possible Duplicates:
Breaking out of a nested loop
How to break out of 2 loops without a flag variable in C#?
Hello I have a function that has nested loops. Once the condition has been met, I want to break out of nested loops. The code is something like this below:
foreach (EmpowerTaxView taxView in taxViews)
{
foreach (PayrollEmployee payrollEmployee in payrollEmployees)
{
//PayStub payStub = payrollEmployee.InternalPayStub;
IReadOnlyList<PayrollWorkLocation> payrollWorkLocations = payrollEmployee.PayrollWorkLocations;
foreach (PayrollWorkLocation payrollWorkLocation in payrollWorkLocations)
{
Tax tax = GetTaxEntity(payrollWorkLocation, taxView.BSITypeCode, taxView.BSIAuthorityCode,
paidbyEr, resCode);
if (tax != null && tax.Rate.HasValue)
{
taxRate = tax.Rate.Value;
break;
}
}
}
}
Unfortunately, break comes out of only one loop. I want to break out of the whole thing. Please, I know some people have suggested goto: statement. I am wondering is there any other way around, such writing some LINQ queries to the same effect.
Any ideas and suggestions are greatly appreciated !
Two options suggest themselves as ways of getting out without having an extra flag variable to indicate "you should break out of the inner loop too". (I really dislike having such variables, personally.
One option is to pull all of this code into a separate method - then you can just return from the method. This would probably improve your code readability anyway - this really feels like it's doing enough to warrant extracting into a separate method.
The other obvious option is to use LINQ. Here's an example which I think would work:
var taxRate = (from taxView in taxViews
from employee in payrollEmployees
from location in employee.PayrollWorkLocations
let tax = GetTaxEntity(location, taxView.BSITypeCode,
taxView.BSIAuthorityCode,
paidbyEr, resCode)
where tax != null && tax.Rate.HasValue
select tax.Rate).FirstOrDefault();
That looks considerably cleaner than lots of foreach loops to me.
Note that I haven't selected tax.Rate.Value - just tax.Rate. That means the result will be a "null" decimal? (or whatever type tax.Rate is) if no matching rates are found, or the rate otherwise. So you'd then have:
if (taxRate != null)
{
// Use taxRate.Value here
}
Well, you could use the dreaded goto, refactor your code, or this:
// anon-method
Action work = delegate
{
for (int x = 0; x < 100; x++)
{
for (int y = 0; y < 100; y++)
{
return; // exits anon-method
}
}
};
work(); // execute anon-method
You could use a flag variable.
bool doMainBreak = false;
foreach (EmpowerTaxView taxView in taxViews)
{
if (doMainBreak) break;
foreach (PayrollEmployee payrollEmployee in payrollEmployees)
{
if (doMainBreak) break;
//PayStub payStub = payrollEmployee.InternalPayStub;
IReadOnlyList<PayrollWorkLocation> payrollWorkLocations = payrollEmployee.PayrollWorkLocations;
foreach (PayrollWorkLocation payrollWorkLocation in payrollWorkLocations)
{
Tax tax = GetTaxEntity(payrollWorkLocation, taxView.BSITypeCode, taxView.BSIAuthorityCode,
paidbyEr, resCode);
if (tax != null && tax.Rate.HasValue)
{
taxRate = tax.Rate.Value;
doMainBreak = true;
break;
}
}
}
}
for (var keyValue = 0; keyValue < dwhSessionDto.KeyValues.Count; keyValue++)
{...}
var count = dwhSessionDto.KeyValues.Count;
for (var keyValue = 0; keyValue < count; keyValue++)
{...}
I know there's a difference between the two, but is one of them faster than the other? I would think the second is faster.
Yes, the first version is much slower. After all, I'm assuming you're dealing with types like this:
public class SlowCountProvider
{
public int Count
{
get
{
Thread.Sleep(1000);
return 10;
}
}
}
public class KeyValuesWithSlowCountProvider
{
public SlowCountProvider KeyValues
{
get { return new SlowCountProvider(); }
}
}
Here, your first loop will take ~10 seconds, whereas your second loop will take ~1 second.
Of course, you might argue that the assumption that you're using this code is unjustified - but my point is that the right answer will depend on the types involved, and the question doesn't state what those types are.
Now if you're actually dealing with a type where accessing KeyValues and Count is cheap (which is quite likely) I wouldn't expect there to be much difference. Mind you, I'd almost always prefer to use foreach where possible:
foreach (var pair in dwhSessionDto.KeyValues)
{
// Use pair here
}
That way you never need the count. But then, you haven't said what you're trying to do inside the loop either. (Hint: to get more useful answers, provide more information.)
it depends how difficult it is to compute dwhSessionDto.KeyValues.Count if its just a pointer to an int then the speed of each version will be the same. However, if the Count value needs to be calculated, then it will be calculated every time, and therefore impede perfomance.
EDIT -- heres some code to demonstrate that the condition is always re-evaluated
public class Temp
{
public int Count { get; set; }
}
static void Main(string[] args)
{
var t = new Temp() {Count = 5};
for (int i = 0; i < t.Count; i++)
{
Console.WriteLine(i);
t.Count--;
}
Console.ReadLine();
}
The output is 0, 1, 2 - only !
See comments for reasons why this answer is wrong.
If there is a difference, it’s the other way round: Indeed, the first one might be faster. That’s because the compiler recognizes that you are iterating from 0 to the end of the array, and it can therefore elide bounds checks within the loop (i.e. when you access dwhSessionDTo.KeyValues[i]).
However, I believe the compiler only applies this optimization to arrays so there probably will be no difference here.
It is impossible to say without knowing the implementation of dwhSessionDto.KeyValues.Count and the loop body.
Assume a global variable bool foo = false; and then following implementations:
/* Loop body... */
{
if(foo) Thread.Sleep(1000);
}
/* ... */
public int Count
{
get
{
foo = !foo;
return 10;
}
}
/* ... */
Now, the first loop will perform approximately twice as fast as the second ;D
However, assuming non-moronic implementation, the second one is indeed more likely to be faster.
No. There is no performance difference between these two loops. With JIT and Code Optimization, it does not make any difference.
There is no difference but why you think that thereis difference , can you please post your findings?
if you see the implementation of insert item in Dictionary using reflector
private void Insert(TKey key, TValue value, bool add)
{
int freeList;
if (key == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}
if (this.buckets == null)
{
this.Initialize(0);
}
int num = this.comparer.GetHashCode(key) & 0x7fffffff;
int index = num % this.buckets.Length;
for (int i = this.buckets[index]; i >= 0; i = this.entries[i].next)
{
if ((this.entries[i].hashCode == num) && this.comparer.Equals(this.entries[i].key, key))
{
if (add)
{
ThrowHelper.ThrowArgumentException(ExceptionResource.Argument_AddingDuplicate);
}
this.entries[i].value = value;
this.version++;
return;
}
}
if (this.freeCount > 0)
{
freeList = this.freeList;
this.freeList = this.entries[freeList].next;
this.freeCount--;
}
else
{
if (this.count == this.entries.Length)
{
this.Resize();
index = num % this.buckets.Length;
}
freeList = this.count;
this.count++;
}
this.entries[freeList].hashCode = num;
this.entries[freeList].next = this.buckets[index];
this.entries[freeList].key = key;
this.entries[freeList].value = value;
this.buckets[index] = freeList;
this.version++;
}
Count is a internal member to this class which is incremented each item you insert an item into dictionary
so i beleive that there is no differenct at all.
The second version can be faster, sometimes. The point is that the condition is reevaluated after every iteration, so if e.g. the getter of "Count" actually counts the elements in an IEnumerable, or interogates a database /etc, this will slow things down.
So I'd say that if you dont affect the value of "Count" in the "for", the second version is safer.