I'm building an application that is seriously slower than it should be (a process takes 4 seconds when it should take only .1 seconds, that's my goal at least).
I have a bunch of methods that pass an array from one to the other. This has kept my code nice and organized, but I'm worried that it's killing the efficiency of my code.
Can anyone confirm if this is the case?
Also, I have all of my code contained in a class separate from my UI. Is this going make things run significantly slower than if I had my code contained in the Form1.cs file?
Edit: There are about 95000 points that need to be calculated, each point goes through 7 methods that does additional calculations.
Have you tried any profiling or performance tools to narrow down why the slowdown occurs?
It might show you ways that you could use to refactor your code and improve performance.
This question asked by another user has several options that you can choose from:
Good .Net Profilers
No. This is not what is killing your code speed, unless many methods means like a million or something. You probably have more things iterating through your array than you need or realize, and the array itself may have a larger memory footprint than you realize.
Perhaps you should look into a design where instead of passing the array to 7 methods, you iterate the array once, passing the members to 7 methods, this will minimize the number of times you're iterating through 95000 members.
In general, function calls are basic enough to be highly optimized by any interpreter (or compiler). Therefore these do not produce to much blow-up in run time. In fact, if wrap your problem to, say, some fancy iterative solution, you save handling the stack, but instead have to handle some iteration variables, which will not be to hard.
I know, there have been programmers who wondered why their recursive algorithms have been so slow, until someone told them not to pass array entries by value.
You should provide some sample code. In general, you should for other bottlenecks, or find another algorithm.
Just need to run it against a good profiling tool. I've got some stuff I wished only took 4 seconds - works with upwards of a hundred million records in a pass.
An Array is a reference type not a value type. Therefore you never pass the array. You are actually passing the pointer to the array in memory. So passing the array isn't your issue. Most likely you have an issue with what you do with your array. You need to do what Jamie Keeling said and run it through a profiler or even just debug it and see if you get stuck in some big loops.
Why are you loading them all into an array and doing each method in turn rather than iterating through them as loaded?
If you can obtain them (from whatever input source) deal with them and output them (whether to screen, file our wherever) this will inevitably use less memory and reduce start-up time, at the very least.
If this answer is applicable to your situation, start by changing your methods to deal with enumerations rather than arrays (non-breaking change, since arrays are enumerations), then change your input method to yield return items as loaded rather than loading an entire array.
Sorry for posting an old link (.NET 1.1) but it was contained in VS2010 article, so:
Here you can read about method costs. (Initial link)
Then, if you start your code from VS (no matters, even in Release mode) the VS debugger connects to your code and slow it down.
I know that for this advise I will be minused but... The max performance will be achieved with unsafe operations with arrays (yes it's UNSAFE, but when there is performance deal, so...)
And the last - refactor your code to use minimum of methods which working with your arrays. It will improve the performance.
Related
Please ignore code readability in this question.
In terms of performance, should the following code be written like this:
int maxResults = criteria.MaxResults;
if (maxResults > 0)
{
while (accounts.Count > maxResults)
accounts.RemoveAt(maxResults);
}
or like this:
if (criteria.MaxResults > 0)
{
while (accounts.Count > criteria.MaxResults)
accounts.RemoveAt(criteria.MaxResults);
}
?
Edit: criteria is a class, and MaxResults is a simple integer property (i.e., public int MaxResults { get { return _maxResults; } }.
Does the C# compiler treat MaxResults as a black box and evaluate it every time? Or is it smart enough to figure out that I've got 3 calls to the same property with no modification of that property between the calls? What if MaxResults was a field?
One of the laws of optimization is precalculation, so I instinctively wrote this code like the first listing, but I'm curious if this kind of thing is being done for me automatically (again, ignore code readability).
(Note: I'm not interested in hearing the 'micro-optimization' argument, which may be valid in the specific case I've posted. I'd just like some theory behind what's going on or not going on.)
First off, the only way to actually answer performance questions is to actually try it both ways and test the results in realistic conditions.
That said, the other answers which say that "the compiler" does not do this optimization because the property might have side effects are both right and wrong. The problem with the question (aside from the fundamental problem that it simply cannot be answered without actually trying it and measuring the result) is that "the compiler" is actually two compilers: the C# compiler, which compiles to MSIL, and the JIT compiler, which compiles IL to machine code.
The C# compiler never ever does this sort of optimization; as noted, doing so would require that the compiler peer into the code being called and verify that the result it computes does not change over the lifetime of the callee's code. The C# compiler does not do so.
The JIT compiler might. No reason why it couldn't. It has all the code sitting right there. It is completely free to inline the property getter, and if the jitter determines that the inlined property getter returns a value that can be cached in a register and re-used, then it is free to do so. (If you don't want it to do so because the value could be modified on another thread then you already have a race condition bug; fix the bug before you worry about performance.)
Whether the jitter actually does inline the property fetch and then enregister the value, I have no idea. I know practically nothing about the jitter. But it is allowed to do so if it sees fit. If you are curious about whether it does so or not, you can either (1) ask someone who is on the team that wrote the jitter, or (2) examine the jitted code in the debugger.
And finally, let me take this opportunity to note that computing results once, storing the result and re-using it is not always an optimization. This is a surprisingly complicated question. There are all kinds of things to optimize for:
execution time
executable code size -- this has a major effect on executable time because big code takes longer to load, increases the working set size, puts pressure on processor caches, RAM and the page file. Small slow code is often in the long run faster than big fast code in important metrics like startup time and cache locality.
register allocation -- this also has a major effect on execution time, particularly in architectures like x86 which have a small number of available registers. Enregistering a value for fast re-use can mean that there are fewer registers available for other operations that need optimization; perhaps optimizing those operations instead would be a net win.
and so on. It get real complicated real fast.
In short, you cannot possibly know whether writing the code to cache the result rather than recomputing it is actually (1) faster, or (2) better performing. Better performance does not always mean making execution of a particular routine faster. Better performance is about figuring out what resources are important to the user -- execution time, memory, working set, startup time, and so on -- and optimizing for those things. You cannot do that without (1) talking to your customers to find out what they care about, and (2) actually measuring to see if your changes are having a measurable effect in the desired direction.
If MaxResults is a property then no, it will not optimize it, because the getter may have complex logic, say:
private int _maxResults;
public int MaxReuslts {
get { return _maxResults++; }
set { _maxResults = value; }
}
See how the behavior would change if it in-lines your code?
If there's no logic...either method you wrote is fine, it's a very minute difference and all about how readable it is TO YOU (or your team)...you're the one looking at it.
Your two code samples are only guaranteed to have the same result in single-threaded environments, which .Net isn't, and if MaxResults is a field (not a property). The compiler can't assume, unless you use the synchronization features, that criteria.MaxResults won't change during the course of your loop. If it's a property, it can't assume that using the property doesn't have side effects.
Eric Lippert points out quite correctly that it depends a lot on what you mean by "the compiler". The C# -> IL compiler? Or the IL -> machine code (JIT) compiler? And he's right to point out that the JIT may well be able to optimize the property getter, since it has all of the information (whereas the C# -> IL compiler doesn't, necessarily). It won't change the situation with multiple threads, but it's a good point nonetheless.
It will be called and evaluated every time. The compiler has no way of determining if a method (or getter) is deterministic and pure (no side effects).
Note that actual evaluation of the property may be inlined by the JIT compiler, making it effectively as fast as a simple field.
It's good practise to make property evaluation an inexpensive operation. If you do some heavy calculation in the getter, consider caching the result manually, or changing it to a method.
why not test it?
just set up 2 console apps make it look 10 million times and compare the results ... remember to run them as properly released apps that have been installed properly or else you cannot gurantee that you are not just running the msil.
Really you are probably going to get about 5 answers saying 'you shouldn't worry about optimisation'. they clearly do not write routines that need to be as fast as possible before being readable (eg games).
If this piece of code is part of a loop that is executed billions of times then this optimisation could be worthwhile. For instance max results could be an overridden method and so you may need to discuss virtual method calls.
Really the ONLY way to answer any of these questions is to figure out is this is a piece of code that will benefit from optimisation. Then you need to know the kinds of things that are increasing the time to execute. Really us mere mortals cannot do this a priori and so have to simply try 2-3 different versions of the code and then test it.
If criteria is a class type, I doubt it would be optimized, because another thread could always change that value in the meantime. For structs I'm not sure, but my gut feeling is that it won't be optimized, but I think it wouldn't make much difference in performance in that case anyhow.
I have done some performance testing in C# on using the For Loop and the While Loop for an ArrayList to do comparison search.
It seems to be having quadratic time consumption.
However, if I use LastIndexOf or IndexOf to search through the list, it gains "faster than anticipated" speed.
Does anyone know the reason?
I don't know any C# and yet I can put forward the likely answer.
Any programming language method are generally written in ways that take advantage of shortcuts made available by the processor that they run on - which your own written code will not (eg you'll have to declare local variables which will have to be kept on the stack, requiring slower lookup times, instead of just being a temporary register variable). Thus anything the language does natively will generally be quicker than your own code.
Use ILSpy and take a look at the internals of LastIndexOf/IndexOf methods. There lies your answer as to why they are faster.
I have a hunch that the List internally uses a B-tree or some other tree, which has a lookup of log(n). What you are doing with for/foreach is performing a linear lookup with some extra overhead. If you remember your maths class then you'd know that log(n) is flatter than a linear line, thus having faster lookup...
I have written an application in C# that generates all the words that can be existed in the combination of alphabets, numbers and few special characters.
The problem is that it isn't memory efficient as it is adapting Recursion and also some collection like List.
Is there any way I can make it to run in limited memory environment?
Umair
Convert it to an iterative function.
Unfortunately C# compiler does not perform tail call optimization, which is something that you want to happen in this case. CLR supports it, kinda, but you shouldn't rely on it.
Perhaps left of field, but maybe you can write the recursive part of your program in F#? This way you can leverage guaranteed tail call optimization and reuse bits of your C# code. Whilst a steep learning curve, F# is a more suitable language for these combinatorial tasks.
Well...I am not sure whom with I go amongst you but I got the solution. I am using more than one process one that is interacting with user and other for finding the words combination. The other process finds 5000 words, save them and quit. Communication is being achieved through WCF. This looks pretty fine as when process quits = frees memory.
Well, you obviously cannot store the intermediate results in memory (unless you've got some sort of absurd computer at your disposal); you will have to write the results to disk.
The recursion depth isn't a result of the number of considered characters - its determined by what the maximum string length you're willing to consider.
For instance, my install of python 2.6.2 has it's default recursion limit set to 1000. Arguable, I should be able to generate all possible 1-1000 length strings given a character set within this limitation (now, I think the recursion limit applies to total stack depth, so the actual limit may be less than 1000).
Edit (added python sample):
The following python snippet will produce what you're asking for (limiting itself to the given runtime stack limits):
from string import ascii_lowercase
def generate(base="", charset=ascii_lowercase):
for c in charset:
next = base + c
yield next
try:
for s in generate(next, charset):
yield s
except:
continue
for s in generate():
print s
One could produce essentially the same in C# by try/catching on StackOverflowException. As I'm typing this update, the script is running, chewing up one of my cores. However, memory usage is constant at less than 7MB. Now, I'm only print to stdout since I'm not interested in capturing the result, but I think it proves the point above. ;)
Addendum to the example:
Interesting note: Looking closer at running processes, python is actually I/O bound with the above example. It's only using 7% of my CPU, while the rest of the core is bound rending the results in my command window. Minimizing the window allows python to climb to 40% of total CPU usage, this is on a 2 core machine.
One more consideration: When you concatenate or use some other method to generate a string in C#, it occupies its own memory and may stick around for a while. If you are generating millions of strings, you are likely to notice some performance drag.
If you don't need to keep your many strings around, I would see if there's away to avoid generating the strings. For example, maybe you have a character array that you keep updating as you move through the character combinations, and if you're outputting them to a file, you would output them one character at a time so you don't have to build the string.
Given a case where I have an object that may be in one or more true/false states, I've always been a little fuzzy on why programmers frequently use flags+bitmasks instead of just using several boolean values.
It's all over the .NET framework. Not sure if this is the best example, but the .NET framework has the following:
public enum AnchorStyles
{
None = 0,
Top = 1,
Bottom = 2,
Left = 4,
Right = 8
}
So given an anchor style, we can use bitmasks to figure out which of the states are selected. However, it seems like you could accomplish the same thing with an AnchorStyle class/struct with bool properties defined for each possible value, or an array of individual enum values.
Of course the main reason for my question is that I'm wondering if I should follow a similar practice with my own code.
So, why use this approach?
Less memory consumption? (it doesn't seem like it would consume less than an array/struct of bools)
Better stack/heap performance than a struct or array?
Faster compare operations? Faster value addition/removal?
More convenient for the developer who wrote it?
It was traditionally a way of reducing memory usage. So, yes, its quite obsolete in C# :-)
As a programming technique, it may be obsolete in today's systems, and you'd be quite alright to use an array of bools, but...
It is fast to compare values stored as a bitmask. Use the AND and OR logic operators and compare the resulting 2 ints.
It uses considerably less memory. Putting all 4 of your example values in a bitmask would use half a byte. Using an array of bools, most likely would use a few bytes for the array object plus a long word for each bool. If you have to store a million values, you'll see exactly why a bitmask version is superior.
It is easier to manage, you only have to deal with a single integer value, whereas an array of bools would store quite differently in, say a database.
And, because of the memory layout, much faster in every aspect than an array. It's nearly as fast as using a single 32-bit integer. We all know that is as fast as you can get for operations on data.
Easy setting multiple flags in any order.
Easy to save and get a serie of 0101011 to the database.
Among other things, its easier to add new bit meanings to a bitfield than to add new boolean values to a class. Its also easier to copy a bitfield from one instance to another than a series of booleans.
It can also make Methods clearer. Imagine a Method with 10 bools vs. 1 Bitmask.
Actually, it can have a better performance, mainly if your enum derives from an byte.
In that extreme case, each enum value would be represented by a byte, containing all the combinations, up to 256. Having so many possible combinations with booleans would lead to 256 bytes.
But, even then, I don't think that is the real reason. The reason I prefer those is the power C# gives me to handle those enums. I can add several values with a single expression. I can remove them also. I can even compare several values at once with a single expression using the enum. With booleans, code can become, let's say, more verbose.
From a domain Model perspective, it just models reality better in some situations. If you have three booleans like AccountIsInDefault and IsPreferredCustomer and RequiresSalesTaxState, then it doesnn't make sense to add them to a single Flags decorated enumeration, cause they are not three distinct values for the same domain model element.
But if you have a set of booleans like:
[Flags] enum AccountStatus {AccountIsInDefault=1,
AccountOverdue=2 and AccountFrozen=4}
or
[Flags] enum CargoState {ExceedsWeightLimit=1,
ContainsDangerousCargo=2, IsFlammableCargo=4,
ContainsRadioactive=8}
Then it is useful to be able to store the total state of the Account, (or the cargo) in ONE variable... that represents ONE Domain Element whose value can represent any possible combination of states.
Raymond Chen has a blog post on this subject.
Sure, bitfields save data memory, but
you have to balance it against the
cost in code size, debuggability, and
reduced multithreading.
As others have said, its time is largely past. It's tempting to still do it, cause bit fiddling is fun and cool-looking, but it's no longer more efficient, it has serious drawbacks in terms of maintenance, it doesn't play nicely with databases, and unless you're working in an embedded world, you have enough memory.
I would suggest never using enum flags unless you are dealing with some pretty serious memory limitations (not likely). You should always write code optimized for maintenance.
Having several boolean properties makes it easier to read and understand the code, change the values, and provide Intellisense comments not to mention reduce the likelihood of bugs. If necessary, you can always use an enum flag field internally, just make sure you expose the setting/getting of the values with boolean properties.
Space efficiency - 1 bit
Time efficiency - bit comparisons are handled quickly by hardware.
Language independence - where the data may be handled by a number of different programs you don't need to worry about the implementation of booleans across different languages/platforms.
Most of the time, these are not worth the tradeoff in terms of maintance. However, there are times when it is useful:
Network protocols - there will be a big saving in reduced size of messages
Legacy software - once I had to add some information for tracing into some legacy software.
Cost to modify the header: millions of dollars and years of effort.
Cost to shoehorn the information into 2 bytes in the header that weren't being used: 0.
Of course, there was the additional cost in the code that accessed and manipulated this information, but these were done by functions anyways so once you had the accessors defined it was no less maintainable than using Booleans.
I have seen answers like Time efficiency and compatibility. those are The Reasons, but I do not think it is explained why these are sometime necessary in times like ours. from all answers and experience of chatting with other engineers I have seen it pictured as some sort of quirky old time way of doing things that should just die because new way to do things are better.
Yes, in very rare case you may want to do it the "old way" for performance sake like if you have the classic million times loop. but I say that is the wrong perspective of putting things.
While it is true that you should NOT care at all and use whatever C# language throws at you as the new right-way™ to do things (enforced by some fancy AI code analysis slaping you whenever you do not meet their code style), you should understand deeply that low level strategies aren't there randomly and even more, it is in many cases the only way to solve things when you have no help from a fancy framework. your OS, drivers, and even more the .NET itself(especially the garbage collector) are built using bitfields and transactional instructions. your CPU instruction set itself is a very complex bitfield, so JIT compilers will encode their output using complex bit processing and few hardcoded bitfields so that the CPU can execute them correctly.
When we talk about performance things have a much larger impact than people imagine, today more then ever especially when you start considering multicores.
when multicore systems started to become more common all CPU manufacturer started to mitigate the issues of SMP with the addition of dedicated transactional memory access instructions while these were made specifically to mitigate the near impossible task to make multiple CPUs to cooperate at kernel level without a huge drop in perfomrance it actually provides additional benefits like an OS independent way to boost low level part of most programs. basically your program can use CPU assisted instructions to perform memory changes to integers sized memory locations, that is, a read-modify-write where the "modify" part can be anything you want but most common patterns are a combination of set/clear/increment.
usually the CPU simply monitors if there is any other CPU accessing the same address location and if a contention happens it usually stops the operation to be committed to memory and signals the event to the application within the same instruction. this seems trivial task but superscaler CPU (each core has multiple ALUs allowing instruction parallelism), multi-level cache (some private to each core, some shared on a cluster of CPU) and Non-Uniform-Memory-Access systems (check threadripper CPU) makes things difficult to keep coherent, luckily the smartest people in the world work to boost performance and keep all these things happening correctly. todays CPU have a large amount of transistor dedicated to this task so that caches and our read-modify-write transactions work correctly.
C# allows you to use the most common transactional memory access patterns using Interlocked class (it is only a limited set for example a very useful clear mask and increment is missing, but you can always use CompareExchange instead which gets very close to the same performance).
To achieve the same result using a array of booleans you must use some sort of lock and in case of contention the lock is several orders of magnitude less permorming compared to the atomic instructions.
here are some examples of highly appreciated HW assisted transaction access using bitfields which would require a completely different strategy without them of course these are not part of C# scope:
assume a DMA peripheral that has a set of DMA channels, let say 20 (but any number up to the maximum number of bits of the interlock integer will do). When any peripheral's interrupt that might execute at any time, including your beloved OS and from any core of your 32-core latest gen wants a DMA channel you want to allocate a DMA channel (assign it to the peripheral) and use it. a bitfield will cover all those requirements and will use just a dozen of instructions to perform the allocation, which are inlineable within the requesting code. basically you cannot go faster then this and your code is just few functions, basically we delegate the hard part to the HW to solve the problem, constraints: bitfield only
assume a peripheral that to perform its duty requires some working space in normal RAM memory. for example assume a high speed I/O peripheral that uses scatter-gather DMA, in short it uses a fixed-size block of RAM populated with the description (btw the descriptor is itself made of bitfields) of the next transfer and chained one to each other creating a FIFO queue of transfers in RAM. the application prepares the descriptors first and then it chains with the tail of the current transfers without ever pausing the controller (not even disabling the interrupts). the allocation/deallocation of such descriptors can be made using bitfield and transactional instructions so when it is shared between diffent CPUs and between the driver interrupt and the kernel all will still work without conflicts. one usage case would be the kernel allocates atomically descriptors without stopping or disabling interrupts and without additional locks (the bitfield itself is the lock), the interrupt deallocates when the transfer completes.
most old strategies were to preallocate the resources and force the application to free after usage.
If you ever need to use multitask on steriods C# allows you to use either Threads + Interlocked, but lately C# introduced lightweight Tasks, guess how it is made? transactional memory access using Interlocked class. So you likely do not need to reinvent the wheel any of the low level part is already covered and well engineered.
so the idea is, let smart people (not me, I am a common developer like you) solve the hard part for you and just enjoy general purpose computing platform like C#. if you still see some remnants of these parts is because someone may still need to interface with worlds outside .NET and access some driver or system calls for example requiring you to know how to build a descriptor and put each bit in the right place. do not being mad at those people, they made our jobs possible.
In short : Interlocked + bitfields. incredibly powerful, don't use it
It is for speed and efficiency. Essentially all you are working with is a single int.
if ((flags & AnchorStyles.Top) == AnchorStyles.Top)
{
//Do stuff
}
I need to get three objects out of a function, my instinct is to create a new type to return the three refs. Or if the refs were the same type I could use an array. However pass-by-ref is easier:
private void Mutate_AddNode_GetGenes(ref NeuronGene newNeuronGene, ref ConnectionGene newConnectionGene1, ref ConnectionGene newConnectionGene2)
{
}
There's obviously nothing wrong with this but I hesitate to use this approach, mostly I think for reasons of aesthetics and psycholgical bias. Are there actually any good reasons to use one of these approaches over the others? Perhaps a performance issue with creating extra wrapper objects or pushing parameters onto the stack. Note that in my particular case this is CPU intensive code. CPU cycles matter.
Is there a more elegant C#2 of C#3 approach?
Thanks.
For almost all computing problems, you will not notice the CPU difference. Since your sample code has the word "Gene" in it, you may actually fall into the rare category of code that would notice.
Creating and destroying objects just to wrap other objects would cost a bit of performance (they need to be created and garbage collected after all).
Aesthetically I would not create an object just to group unrelated objects, but if they logically belong together it is perfectly fine to define a containing object.
If you're worrying about the performance of a wrapping type (which is a lot cleaner, IMHO), you should use a struct. Current 32-bits implementations of .NET (and the upcomming 64-bits 4.0) support inlining / optimizing away of structs in many cases, so you'd probably see no performance difference whatsoever between a struct and ref arguments.
Worrying about the relative execution speed of those two options is probably a premature optimization. Focus on getting the algorithm correct first, and having clean, maintainable code. When that's done, you can run a profiler on it and optimize the 20% of the code that takes 80% of the CPU time. Even if this method ends up being in that 20%, the difference between the two calling styles is probably to small to register.
So, performance issues aside, I'd probably use a container class. Since this method takes only those three parameters, and (presumably) modifies each one, it sounds like it would make sense to have it as a method of the container class, with three member variables instead of ref parameters.