Should you use Int32 in places where you know the value is not going to be higher than 32,767?
I'd like to keep memory down, hoever, using casts everywhere just to perform simple arithmetic is getting annoying.
short a = 1;
short result = a + 1; // Error
short result = (short)(a + 1); // works but looks ugly when does lots of times
What would be better for overall application performance?
As far as i know it is a good practice to use int whenever possible. Size of int equals to a word size on many architectures, so i think there may be a slight performance degradation when using short in some arithmetical operations.
If you are creating large arrays, then it can save a considerable amount of memory to use narrower types (less bytes), as the size of the array will be "type width" * "number of elements" + "overhead".
However, I'm pretty sure by default that in classes and structs, they will be packed along whole word boundaries e.g. 32bit = 4bytes. A short will still be packed into a 4 byte space.
You can however, manually configure packing in structs\classes by using structure layout:
http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.structlayoutattribute(VS.71).aspx
As with any performance related issue: "don't think - measure".
From an API perspective, it can be majorly annoying to have to keep casting from shorts to ints, etc, as you will find most APIs will use ints for example.
The 3 reasons for me to use a integer datatype smaller than int32:
A system with severe memory constraints.
Huge arrays or similar.
I think it would make the purpose of the code easier to read and understand.
I mostly do normal Windows apps, so the the only reason of those that normally matters to me is the 3rd one.
Unless you're creating 100s of thousands of members then the space savings of a handful of bytes here and there won't really matter on any modern machine. I think the maxim of premature optimization applies here. It might make you feel better, but you're not getting anything particularly measurable out of it. IMO - only optimize memory usage if you're actually using a lot of memory.
Related
as stated in the title i am evaluating the cost of implement a BitArray over bytes[] (i have understood that native BitArray is pretty slow) insthead of using a string representation of bits (eg : "001001001" ) but i am open to any suggestion that are more effective.
The length of array is not known at design time, but i suppose may be between 200 and 500 bit per array.
Memory is not a concern, so use a lot of memory for represent the array is not an issue, what matter is speed when array is created and manupulated (thiy will be manipulated a lot).
Thanks in advance for yours consideration and suggenstion onto the topic.
A few suggestions:
1) Computers don't process bits o even n int or long will work at the same speed
2) To reach speed you can consider writing it with unsafe code
3) New is expensive. If the objects are created a lot you can do the following: Create a bulk of 10K
objects at a time and serve them from a method when required. Once the cache runs out you can recreate them. Have another method that once an object processing completes you clean it up and return it to the cache
4) Make sure your manipulation is optimal
I was recently looking at a config file that was saving some cryptic values. I happened to have the source available, so I took a look at what it was doing, and it was saving a bunch of different values and bit shifting them into each other. It mystified me why someone would do that. My question, then, is: is there an obvious advantage to storing numeric data in this way? I can see how it might make for a slightly smaller value to store, byte-wise, but it seems like a lot of work to save a couple of bytes of storage. It also seems like it would be significantly slower.
The other possibility that occurred to me is that is was used for obfuscation purposes. is this a common usage of bit shifting?
This is one of the common usages of bit shifting. There are several benefit:
1) Bit-shift operations are fast.
2) You can store multiple flags in a single value.
If you have an app that has several features, but you only want certain (configurable) ones enabled, you could do something like:
[Flags]
public enum Features
{
Profile = 1,
Messaging = 1 << 1,
Signing = 1 << 2,
Advanced = 1 << 3
}
And your single value to enable Messaging and Advanced would be:
(1 << 1) + (1 << 3) = 2 + 16 = 18
<add name="EnabledFeatures" value="18" />
And then to figure out if a given feature is enabled, you just perform some simple bitwise math:
var AdvancedEnabled =
EnabledFeatures & Features.Advanced == Features.Advanced;
Bit shifting seems more common in systems-level languages like C, C++ and assembly, but I've seen it here and there in C# too. It's not often used so much to save space, though, as it is for one (or both) of two typical reasons:
You're talking to an existing system (or using an established protocol, or generating a file in a known format) that requires stuff to be very precisely laid out; and/or
The combination of bits comprise a value that is itself useful for testing.
Anyone who uses it in a high-level language solely to save space or obfuscate their code is almost always prematurely optimizing (and/or is an idiot). The space savings rarely justify the added complexity, and bit shifts really aren't complex enough to stop someone determined to understand your code.
I have a project that stores day/hour matrix of available hours during one week. So it has 24x7 values that have to be stored somehow.
I chose to store it as 7 ints, so each day is represented with one integer, and each hour in it as one bit. Just an example where it can be useful.
Sometimes (especially in older Windows programming), there will be information encoded in "high order" and "low order" bits of a value... and it's sometimes necessary to shift to get the information out. I'm not 100% sure on the reasons behind that, aside from it might make it convenient to return a 64-bit value with two 32-bit values encoded in it (so you can handle it with a single return value from a method call).
I agree with your assertions about needless complexity though, I don't see a lot of applications outside of really heavy number-crunching/cryptography stuff where this would be necessary.
I have a console application that allows the users to specify variables to process. These variables come in three flavors: string, double and long (with double and long being by far the most commonly used types). The user can specify whatever variables they like and in whatever order so my system has to be able to handle that. To this end in my application I had been storing these as object and then casting/uncasting them as required. for example:
public class UnitResponse
{
public object Value { get; set; }
}
My understanding was that boxed objects take up a bit more memory (about 12 bytes) than a standard value type.
My question is: would it be more efficient to use the dynamic keyword to store these values? It might get around the boxing/unboxing issue, and if it is more efficient how would this impact performance?
EDIT
To provide some context and prevent the "are you sure you're using enough RAM to worry about this" in my worst case I have 420,000,000 datapoints to worry about (60 variables * 7,000,000 records). This is in addition to a bunch of other data I keep about each variable (including a few booleans, etc.). So reducing memory does have a HUGE impact.
OK, so the real question here is "I've got a freakin' enormous data set that I am storing in memory, how do I optimize its performance in both time and memory space?"
Several thoughts:
You are absolutely right to hate and fear boxing. Boxing has big costs. First, yes, boxed objects take up extra memory. Second, boxed objects get stored on the heap, not on the stack or in registers. Third, they are garbage collected; every single one of those objects has to be interrogated at GC time to see if it contains a reference to another object, which it never will, and that's a lot of time on the GC thread. You almost certainly need to do something to avoid boxing.
Dynamic ain't it; it's boxing plus a whole lot of other overhead. (C#'s dynamic is very fast compared to other dynamic dispatch systems, but it is not fast or small in absolute terms).
It's gross, but you could consider using a struct whose layout shares memory between the various fields - like a union in C. Doing so is really really gross and not at all safe but it can help in situations like these. Do a web search for "StructLayoutAttribute"; you'll find tutorials.
Long, double or string, really? Can't be int, float or string? Is the data really either in excess of several billion in magnitude or accurate to 15 decimal places? Wouldn't int and float do the job for 99% of the cases? They're half the size.
Normally I don't recommend using float over double because its a false economy; people often economise this way when they have ONE number, like the savings of four bytes is going to make the difference. The difference between 42 million floats and 42 million doubles is considerable.
Is there regularity in the data that you can exploit? For example, suppose that of your 42 million records, there are only 100000 actual values for, say, each long, 100000 values for each double, and 100000 values for each string. In that case, you make an indexed storage of some sort for the longs, doubles and strings, and then each record gets an integer where the low bits are the index, and the top two bits indicate which storage to get it out of. Now you have 42 million records each containing an int, and the values are stored away in some nicely compact form somewhere else.
Store the booleans as bits in a byte; write properties to do the bit shifting to get 'em out. Save yourself several bytes that way.
Remember that memory is actually disk space; RAM is just a convenient cache on top of it. If the data set is going to be too large to keep in RAM then something is going to page it back out to disk and read it back in later; that could be you or it could be the operating system. It is possible that you know more about your data locality than the operating system does. You could write your data to disk in some conveniently pageable form (like a b-tree) and be more efficient about keeping stuff on disk and only bringing it in to memory when you need it.
I think you might be looking at the wrong thing here. Remember what dynamic does. It starts the compiler again, in process, at runtime. It loads hundreds of thousands of bytes of code for the compiler, and then at every call site it emits caches that contain the results of the freshly-emitted IL for each dynamic operation. You're spending a few hundred thousand bytes in order to save eight. That seems like a bad idea.
And of course, you don't save anything. "dynamic" is just "object" with a fancy hat on. "Dynamic" objects are still boxed.
No. dynamic has to do with how operations on the object are performed, not how the object itself is stored. In this particular context, value types would still be boxed.
Also, is all of this effort really worth 12 bytes per object? Surely there's a better use for your time than saving a few kilobytes (if that) of RAM? Have you proved that RAM usage by your program is actually an issue?
No. Dynamic will simply store it as an Object.
Chances are this is a micro optimization that will provide little to no benefit. If this really does become an issue then there are other mechanisms you can use (generics) to speed things up.
Yes, I am using a profiler (ANTS). But at the micro-level it cannot tell you how to fix your problem. And I'm at a microoptimization stage right now. For example, I was profiling this:
for (int x = 0; x < Width; x++)
{
for (int y = 0; y < Height; y++)
{
packedCells.Add(Data[x, y].HasCar);
packedCells.Add(Data[x, y].RoadState);
packedCells.Add(Data[x, y].Population);
}
}
ANTS showed that the y-loop-line was taking a lot of time. I thought it was because it has to constantly call the Height getter. So I created a local int height = Height; before the loops, and made the inner loop check for y < height. That actually made the performance worse! ANTS now told me the x-loop-line was a problem. Huh? That's supposed to be insignificant, it's the outer loop!
Eventually I had a revelation - maybe using a property for the outer-loop-bound and a local for the inner-loop-bound made CLR jump often between a "locals" cache and a "this-pointer" cache (I'm used to thinking in terms of CPU cache). So I made a local for Width as well, and that fixed it.
From there, it was clear that I should make a local for Data as well - even though Data was not even a property (it was a field). And indeed that bought me some more performance.
Bafflingly, though, reordering the x and y loops (to improve cache usage) made zero difference, even though the array is huge (3000x3000).
Now, I want to learn why the stuff I did improved the performance. What book do you suggest I read?
CLR via C# by Jeffrey Richter.
It is such a great book that someone stolen it in my library together with C# in depth.
The CLR is not involved at all here, this should all be translated to straight machine code without calls into the CLR. The JIT compiler is responsible for generating that machine code, it has an optimizer that tries to come up with the most efficient code. It has limitations, it cannot spend a large amount of time on it.
One of the important things it does is figuring out what local variables should be stored in the CPU registers. That's something that changed when you put the Height property in a local variable. It possibly decided to store that variable in a register. But now there's one less available to store another variable. Like the x or y variable, one that's critical for speed. Yes, that will slow it down.
You got a bad diagnostic about the outer loop. That could possibly be caused by the JIT optimizer re-arranging the loop code, giving the profiler a harder time mapping the machine code back to the corresponding C# statement.
Similarly, the optimizer might have decided that you were using the array inefficiently and switched the indexing order back. Not so sure it actually does that, but not impossible.
Anyhoo, the only way you can get some insight here is by looking at the generated machine code. There are many decent books about x86 assembly code, although they might be a bit hard to find these days. Your starting point is Debug + Windows + Disassembly.
Keep in mind however that even the machine code is not a very good predictor of how efficient code is going to run. Modern CPU cores are enormously complicated and the machine code is no longer representative for what actually happens inside the core. The only tried and true way is what you've already been doing: trial and error.
Albin - no. Honestly I didn't think that running outside a profiler would change the performance difference, so I didn't bother. You think I should have? Has that been a problem for you before? (I am compiling with optimizations on though)
Running under a debugger changes the performance: when it's being run under a debugger, the just-in-time compiler automatically disables optimizations (to make it easier to debug)!
If you must, use the debugger to attach to an already-running already-JITted process.
One thing you should know about working with Arrays is that the CLR will always make sure that array-indices are not out-of-bounds. It has an optimization for 1-dimensional arrays but not for 2+ dimensions.
Knowing this, you may want to benchmark MyCell Data[][] instead of MyCell Data[,]
Hm, I don't think that the loop enrolling is the real problem.
1. I'd try to avoid accessing the array Data three times per inner loop.
2. I'd also recommend, to re-think the three Add statements: you are apparently accessing a collection three times to add trivial some data. Make it only one access per iteration and add a data type containing three entries:
for (int y = 0; ... {
tTemp = Data[x, y];
packedCells.Add(new {
tTemp.HasCar, tTemp.RoadState, tTemp.Population
});
}
Another look reveals, that you are basically vectorizing a matrix by copying it into an array (or some other sequential collection)... Is that necessary at all? Why don't you just define a specialized indexer which simulates that linear access? Even better, if you only need to enumerate the entries (in that example you do, no random access required), why don't you use an adequate LINQ expression?
Point 1) Educated guesses are not the way to do performance tuning. In this case I can guess about as well as most, but guessing is the wrong way to do it.
Point 2) Profilers need to be well understood before you know what they're actually telling you. Here's a discussion of the issues. For example, what many profilers do is tell you "where the program spends its time", i.e. where the program counter spends its time, so they are almost absolutely blind to time requested by function calls, which is what your inner loop seems to consist of.
I do a lot of performance tuning, and here is what I do. I cycle between two activities:
Overall time measurement. This doesn't require special tools. I'm not trying to measure individual routines.
"Bottleneck" location. This does not require running the code at any kind of speed, because I'm not measuring. What I'm doing is locating lines of code that are responsible for a significant percent of time. I know which lines they are because they are on the stack for that percent, and stack samples easily find them.
Once I find a "bottleneck" and fix it, I go back to the first step, measure what percent of time I saved, and do it all again on the next "bottleneck", typically from 2 to 6 times. I am helped by the "magnification effect", in which a fixed problem magnifies the percentage used by remaining problems. It works for both macro and micro optimization.
(Sorry if I can't write "bottleneck" without quotes, because I don't think I've ever found a performance problem that resembled the neck of a bottle. Rather they were all simply doing things that didn't really need to be done.)
Since the comment might be overseen, I repeat myself: it is quite cumbersome to optimize code which is per se overfluous. You do not really need to explicitely linearize your matrix at all, see the comment above: Define a linearizing adapter which implements IEnumerable<MyCell> and feed it into the formatter.
I am getting a warning when I try to add another answer, so I am going to recycle this one.. :) After reading Steve's comments and thinking about it for a while, I suggest the following:
If serializing a multi-dimensional array is too slow (haven't tryied, I just believe you...) don't use it at all! It appears, that your matrix is not sparse and has fixed dimensions. So define the structure holding your cells as simple linear array with indexer:
[Serializable()]
class CellMatrix {
Cell [] mCells;
public int Rows { get; }
public int Columns { get; }
public Cell this (int i, int j) {
get {
return mCells[i + Rows * j];
}
// setter...
}
// constructor taking rows/cols...
}
A thing like this should serialize as fast as native Array does... I don't recommend hard coding the layout of Cell in order to save few bytes there...
Cheers,
Paul
What is it about the way strings are implemented that makes them so expensive to manipulate?
Is it impossible to make a "cheap" string implementation?
or am I completely wrong in my understanding?
Thanks
Which language?
Strings are typically immutable, meaning that any change to the data results in a new copy of the string being created. This can have a performance impact with large strings.
This is an important feature, however, because it allows for optimizations such as interning. Interning reduces the size of text data by pointing identical strings to the same copy of data.
If you are concerned about performance with strings, use a StringBuilder (available in C# and Java) or another construct that works with mutable text data.
If you are working with a large amount of text data and need a powerful string solution while still saving space, look into using ropes.
The problem with strings is that they are not primitive types. They are arrays.
Therefore, they suffer the same speed and memory problems as arrays(with a few optimizations, perhaps).
Now, "cheap" implementations would require lots of stuff: concatenation, indexOf, etc.
There are many ways to do this. You can improve the implementation, but there are some limits. Because strings are not "natural" for computers, they need more memory and are slower to manipulate... ALWAYS. You'll never get a string concatenation algorithm faster than any decent integer sum algorithm.
Since it creates new copy of the object every time in java its advisable to use StringBuffer
Syntax
StringBuffer strBuff=new StringBuffer();
strBuff.append("StringBuffer");
strBuff.append("is");
strBuff.append("more");
strBuff.append("economical");
strBuff.append("than");
strBuff.append("String");
String string=strBuff.tostring();
Many of the points here are well taken. In isolated cases you may be able to cheat and do thing like using a 64bit int to compare 8 bytes at time in a string, but there are not a lot of generalized cases where you can optimize operations. If you have "pascal style" string with a numeric length field compares can be short circuited logic to only check the rest of the string if the length is not the same. Other operations typically require you to handle the characters a byte at time or completely copy them when you use them.
i.e. concatenation => get length of string 1, get length of string 2, allocated memory, copy string 1, copy string 2. It would be possible to do operations like this using a DMA controller in a string libary, but the overhead of setting it up for small strings would outweigh the benefits.
Pete
It depends entirely on what you're trying to do with it. Mostly it's that it usually requires at least 1 new array allocation unless it's replacing a single character in a direct seek. At the simplest level a string is an array of chars. So just about anything you want to do involves iterating, removing, or inserting new things into an array.
Look into mutable strings, immutable strings, and ropes, and think about how you would implement common operations in a low-level language (say, C). Consider:
Concatenation.
Slicing.
Getting a character at an index.
Changing a character at an index.
Locating the index of a character.
Traversing the string.
Coming up with algorithms for these situations will give you a feel for when each type of storage would be appropriate.
If you want a universal string working in every condition, you have to sacrifice efficiency in some cases. This is a classic tradeoff between getting one thing fast and another. So... either you use a "standard" string working properly (but not in an optimal way), or a string implementation which is very fast in some cases and cumbersome in other.
Sometimes you need immutability, sometimes random access, sometimes quick insertions/deletions...
Changes and copying of strings tends to involve memory management.
Memory management is not good for performance since it tends to require some kind of global mutex that makes your code scale poorly to multiple cores.
You want to read this Joel Spolsky article:
http://www.joelonsoftware.com/articles/fog0000000319.html
Me, I'm disappointed .NET doesn't have a native type called F***edString.