C# byte vs int for short integer values [duplicate] - c#

This question already has answers here:
Should I use byte or int?
(6 answers)
Closed 5 years ago.
This question is related to the physical memory of a C# Program. As we know that, byte variable consumes 1 byte of memory and on the other hand an int (32-bit) variable consumes 4-bytes of memory. So, when we need variables with possibly smaller values (such as a counter variable i to iterate a loop 100 times) which one should we use in the below for loop? byte or int ?
for(byte i=0; i<100; ++i)
Kindly please give your opinion with reason and share you precious knowledge. I shall be glad and thankful to you :-)
Note: I use byte instead of int in such cases. But I have seen that many experienced programmers use int even when the expected values are less than 255. Please let me know if I am wrong. :-)

In most cases, you won't get any benefit from using byte instead of int. The reason is:
If the loop variable is stored in a CPU register: Since modern CPUs have a register width of 32 bits, and since you can't use only one fourth of a register, the resulting code would be pretty much the same either way.
If the loop variable is not stored in a CPU register, then it will most likely be stored on the stack. Compilers try to align memory locations at addresses which are multiples of 4, this has to do with performance, thus the compiler would also assign 4 bytes to your byte variable on the stack.
Depending on details of your code, the compiler would even add extra code to make sure that the memory location (on the stack or in a register) never exceeds 255, which would add extra code and makes it slower.
It's a totally different story with 8 bit microcontrollers like those from Atmel and Microchip, there your approach would make sense.

Related

Advantages/Disadvantages of using int16 over int32 [duplicate]

This question already has answers here:
.NET Integer vs Int16?
(10 answers)
Closed 8 years ago.
So, is there really any advantage to using int16? I can see how you might use it if you know your number is never going to use more than 16 bits. However, the compiler (as I understand it) optimizes for int (aka "int32") anyway. Also, it seems more common practice to use int, in the first place.
So, why would one use int16?
There are mainly two reasons that you might want to use Int16 rather than Int32:
You have a large array of them, and want to save some memory.
You are interfacing with something else, that expects an Int16.
In normal code there is no good reason to prefer Int16. Using an Int32 for example as a loop counter is faster, both in x86 mode and x64 mode. (An Int32 is also faster than an Int64 even in x64 mode.)
You use an Int16 over Int32, when the numbers you are going to represent and make calculations with them etc. can be represented as Int16. In other words all depends on your data.
For instance, I have seen many people represent a 10 digit phone number as an int, which is an Int32. Usually we don't make any calculations with a phone number. Conceptually, it would be better if they had used a string for this purpose, or a char[10] instead of using an int. However, if they had done so, they would had seen any significant change.

Why does everyone use 2^n numbers for allocation? -> new StringBuilder(256)

15 years ago, while programming with Pascal, I understood why to use power of two's for memory allocation. But this still seems to be state-of-the-art.
C# Examples:
new StringBuilder(256);
new byte[1024];
int bufferSize = 1 << 12;
I still see this thousands of times, I use this myself and I'm still questioning:
Do we need this in modern programming languages and modern hardware?
I guess its good practice, but what's the reason?
EDIT
For example a byte[] array, as stated by answers here, a power of 2 will make no sense: the array itself will use 16 bytes (?), so does it make sense to use 240 (=256-16) for the size to fit a total of 256 bytes?
Do we need this in modern programming languages and modern hardware? I guess its good practice, but what's the reason?
It depends. There are two things to consider here:
For sizes less than the memory page size, there's no appreciable difference between a power-of-two and an arbitrary number to allocate space;
You mostly use managed data structures with C#, so you won't even know how many bytes are really allocated underneath.
Assuming you're doing low-level allocation with malloc(), using multiples of the page size would be considered a good idea, i.e. 4096 or 8192; this is because it allows for more efficient memory management.
My advice would be to just allocate what you need and let C# handle the memory management and allocation for you.
Sadly, it's quite stupid if you want to keep a block of memory in a single memory page of 4k... And persons don't even know it :-) (I didn't until 10 minutes ago... I only had an hunch)... An example... It's unsafe code and implementation dependant (using .NET 4.5 at 32/64 bits)
byte[] arr = new byte[4096];
fixed (byte* p = arr)
{
int size = ((int*)p)[IntPtr.Size == 4 ? -1 : -2];
}
So the CLR has allocated at least 4096 + (1 or 2) sizeof(int)... So it has gone over one 4k memory page. This is logical... It has to keep the size of the array somewhere, and keeping it together with the array is the most intelligent thing (for those that know what Pascal Strings and BSTR are, yes, it's the same principle)
I'll add that all the objects in .NET have a syncblck number and a RuntimeType... They are at least int if not IntPtr, so a total of between 8 and 16 bytes/object (This is explained in various places... try looking for .net object header if you are interested)
It still makes sense in certain cases, but I would prefer to analyze case-by-case whether I need that kind of specification or not, rather than blindly use it as good practice.
For example, there might be cases where you want to use exactly 8 bits of information (1 byte) to address a table.
In that case, I would let the table have the size of 2^8.
Object table = new Object[256];
By this, you will be able to address any object of the table using only one byte.
Even if the table is actually smaller and doesn't use all 256 places, you still have the guarantee of bidirectional mapping from table to index and from index to table, which could prevent errors that would appear, for example, if you had:
Object table = new Object[100];
And then someone (probably someone else) accesses it with a byte value out of table's range.
Maybe this kind of bijective behavior could be good, maybe you could have other ways to guarantee your constraints.
Probably, given the increase in smartness of current compilers, it is not the only good practice anymore.
IMHO, anything ending in exact power of two's arithmeric operation is like a fast track. low level arithmeric operation for power of two takes less number of turns and bit manipulations than any other numbers need extra work for cpu.
And found this possible duplicate:Is it better to allocate memory in the power of two?
Yes, it's good practice, and it has at least one reason.
The modern processors have L1 cache-line size 64 bytes, and if you will use buffer size as 2^n (for example 1024, 4096,..), you will take fully cache-line, without wasted space.
In some cases, this will help prevent false sharing problem (http://en.wikipedia.org/wiki/False_sharing).

How variables address RAM?

I'm pretty new to this, so if the question doesn't make sense, I apologize ahead of time.
int in c# is 4 bytes if I am correct. If I have the statement:
int x;
I would assume this is taking up 4 bytes of memory. If each memory address space is 1 byte then this would take up four address slots? If so, how does x map to the four address locations?
If I have the statement int x; I would assume this is taking up 4 bytes of memory. How does x map to the address of the four bytes?
First off, Mike is correct. C# has been designed specifically so that you do not need to worry about this stuff. Let the memory manager take care of it for you; it does a good job.
Assuming you do want to see how the sausage is made for your own edification: your assumption is not warranted. This statement does not need to cause any memory to be consumed. If it does cause memory to be consumed, the int consumes four bytes of memory.
There are two ways in which the local variable (*) can consume no memory. The first is that it is never used:
void M()
{
int x;
}
The compiler can be smart enough to know that x is never written to or read from, and it can be legally elided entirely. Obviously it then takes up no memory.
The second way that it can take up no memory is if the jitter chooses to enregister the local. It may assign a machine register specifically to that local variable. The variable then has no address associated with it because obviously registers do not have an address. (**)
Assuming that the local does take up memory, the jitter is responsible for keeping track of the location of that memory.
If the local is a perfectly normal local then the jitter will bump the stack pointer by four bytes, thereby reserving four bytes on the stack. It will then associate those four bytes with the local.
If the local is a closed-over outer local of an anonymous function, a local of an iterator block, or a local of an async method then the C# compiler will generate the local as a field of a class; the jitter asks the garbage collector to allocate the class instance and the jitter associates the local with a particular offset from the beginning of the memory buffer associated with that instance by the garbage collector.
All of this is implementation detail subject to change at any time; do not rely upon it.
(*) We know it is a local variable because you said it was a statement. A field declaration is not a statement.
(**) If unsafe code takes the address of a local, obviously it cannot be enregistered.
There's a lot (and I mean a LOT) that can be said about this. Various topics you're hitting on are things like the stack, the symbol table, memory management, the memory hierarchy, ... I could go on.
BUT, since you're new, I'll try to give an easier answer:
When you create a variable in a program (such as an int), you are telling the compiler to reserve a space in memory for that data. An int is 4 bytes, so 4 consecutive bytes are reserved. The memory location you were referring to only points to the beginning. It is known afterwards that the length is 4 bytes.
Now that memory location (in the case you provided) is not really saved in the same way that a variable would be. Every time there is a command that needs x, the command is instead replaced with a command that explicitly grabs that memory location. In other words, the address is saved in the "code" section of your program, not the "data" section.
This is just a really, REALLY high overview. Hopefully it helps.
You really should not need to worry about these things, since there is no way in C# that you could write code that would make use of this information.
But if you must know, at the machine-code level when we instruct the CPU to access the contents of x, it will be referred to using the address of the first one of those four bytes. The machine instruction that will do this will also contain information about how many bytes to be accessed, in this case four.
If the int x; is declared within a function, then the variable will be allocated on the stack, rather than the heap or global memory. The address of x in the compiler's symbol table will refer to the first byte of the four-byte integer. However since it is on the stack, the remembered address will be that of the offset on the stack, rather than a physical address. The variable will then be referenced via a instruction using that offset from the current stack pointer.
Assuming a 32-bit run-time, the offset on the stack will be aligned so the address is a multiple of 4 bytes, i.e. the offset will end in either 0, 4, 8 or 0x0c.
Furthermore because the 80x86 family is little-endian, the first byte of the integer will be the least significant, and the fourth byte will be the most significant, e.g. the decimal value 1,000,000 would be stored as the four bytes 0x40 0x42 0x0f 0x00.

Why should I use int instead of a byte or short in C#

I have found a few threads in regards to this issue. Most people appear to favor using int in their c# code accross the board even if a byte or smallint would handle the data unless it is a mobile app. I don't understand why. Doesn't it make more sense to define your C# datatype as the same datatype that would be in your data storage solution?
My Premise:
If I am using a typed dataset, Linq2SQL classes, POCO, one way or another I will run into compiler datatype conversion issues if I don't keep my datatypes in sync across my tiers. I don't really like doing System.Convert all the time just because it was easier to use int accross the board in c# code. I have always used whatever the smallest datatype is needed to handle the data in the database as well as in code, to keep my interface to the database clean. So I would bet 75% of my C# code is using byte or short as opposed to int, because that is what is in the database.
Possibilities:
Does this mean that most people who just use int for everything in code also use the int datatype for their sql storage datatypes and could care less about the overall size of their database, or do they do system.convert in code wherever applicable?
Why I care: I have worked on my own forever and I just want to be familiar with best practices and standard coding conventions.
Performance-wise, an int is faster in almost all cases. The CPU is designed to work efficiently with 32-bit values.
Shorter values are complicated to deal with. To read a single byte, say, the CPU has to read the 32-bit block that contains it, and then mask out the upper 24 bits.
To write a byte, it has to read the destination 32-bit block, overwrite the lower 8 bits with the desired byte value, and write the entire 32-bit block back again.
Space-wise, of course, you save a few bytes by using smaller datatypes. So if you're building a table with a few million rows, then shorter datatypes may be worth considering. (And the same might be good reason why you should use smaller datatypes in your database)
And correctness-wise, an int doesn't overflow easily. What if you think your value is going to fit within a byte, and then at some point in the future some harmless-looking change to the code means larger values get stored into it?
Those are some of the reasons why int should be your default datatype for all integral data. Only use byte if you actually want to store machine bytes. Only use shorts if you're dealing with a file format or protocol or similar that actually specifies 16-bit integer values. If you're just dealing with integers in general, make them ints.
I am only 6 years late but maybe I can help someone else.
Here are some guidelines I would use:
If there is a possibility the data will not fit in the future then use the larger int type.
If the variable is used as a struct/class field then by default it will be padded to take up the whole 32-bits anyway so using byte/int16 will not save memory.
If the variable is short lived (like inside a function) then the smaller data types will not help much.
"byte" or "char" can sometimes describe the data better and can do compile time checking to make sure larger values are not assigned to it on accident. e.g. If storing the day of the month(1-31) using a byte and try to assign 1000 to it then it will cause an error.
If the variable is used in an array of roughly 100 or more I would use the smaller data type as long as it makes sense.
byte and int16 arrays are not as thread safe as an int (a primitive).
One topic that no one brought up is the limited CPU cache. Smaller programs execute faster then larger ones because the CPU can fit more of the program in the faster L1/L2/L3 caches.
Using the int type can result in fewer CPU instructions however it will also force a higher percentage of the data memory to not fit in the CPU cache. Instructions are cheap to execute. Modern CPU cores can execute 3-7 instructions per clock cycle however a single cache miss on the other hand can cost 1000-2000 clock cycles because it has to go all the way to RAM.
When memory is conserved it also results in the rest of the application performing better because it is not squeezed out of the cache.
I did a quick sum test with accessing random data in random order using both a byte array and an int array.
const int SIZE = 10000000, LOOPS = 80000;
byte[] array = Enumerable.Repeat(0, SIZE).Select(i => (byte)r.Next(10)).ToArray();
int[] visitOrder = Enumerable.Repeat(0, LOOPS).Select(i => r.Next(SIZE)).ToArray();
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
int sum = 0;
foreach (int v in visitOrder)
sum += array[v];
sw.Stop();
Here are the results in time(ticks): (x86, release mode, without debugger, .NET 4.5, I7-3930k) (smaller is better)
________________ Array Size __________________
10 100 1K 10K 100K 1M 10M
byte: 549 559 552 552 568 632 3041
int : 549 566 552 562 590 1803 4206
Accessing 1M items randomly using byte on my CPU had a 285% performance increase!
Anything under 10,000 was hardly noticeable.
int was never faster then byte for this basic sum test.
These values will vary with different CPUs with different cache sizes.
One final note, Sometimes I look at the now open-source .NET framework to see what Microsoft's experts do. The .NET framework uses byte/int16 surprisingly little. I could not find any actually.
You would have to be dealing with a few BILLION rows before this makes any significant difference in terms of storage capacity. Lets say you have three columns, and instead of using a byte-equivalent database type, you use an int-equivalent.
That gives us 3 (columns) x 3 (bytes extra) per row, or 9 bytes per row.
This means, for "a few million rows" (lets say three million), you are consuming a whole extra 27 megabytes of disk space! Fortunately as we're no longer living in the 1970s, you shouldn't have to worry about this :)
As said above, stop micro-optimising - the performance hit in converting to/from different integer-like numeric types is going to hit you much, much harder than the bandwidth/diskspace costs, unless you are dealing with very, very, very large datasets.
For the most part, 'No'.
Unless you know upfront that you are going to be dealing with 100's of millions of rows, it's a micro-optimisation.
Do what fits the Domain model best. Later, if you have performance problems, benchmark and profile to pin-point where they are occuring.
Not that I didn't believe Jon Grant and others, but I had to see for myself with our "million row table". The table has 1,018,000. I converted 11 tinyint columns and 6 smallint columns into int, there were already 5 int & 3 smalldatetimes. 4 different indexes used a combo of the various data types, but obviously the new indexes are now all using int columns.
Making the changes only cost me 40 mb calculating base table disk usage with no indexes. When I added the indexes back in the overall change was only 30 mb difference overall. So I was suprised because I thought the index size would be larger.
So is 30 mb worth the hassle of using all the different data types, No Way! I am off to INT land, thanks everyone for setting this anal retentive programmer back on the straight and happy blissful life of no more integer conversions...yippeee!
The .NET runtime is optimised for Int32. See previous discussion at .NET Integer vs Int16?
If int is used everywhere, no casting or conversions are required. That is a bigger bang for the buck than the memory you will save by using multiple integer sizes.
It just makes life simpler.

Why is Array.Length an int, and not an uint [duplicate]

This question already has answers here:
Why does .NET use int instead of uint in certain classes?
(7 answers)
Closed 9 years ago.
Why is Array.Length an int, and not a uint. This bothers me (just a bit) because a length value can never be negative.
This also forced me to use an int for a length-property on my own class, because when you
specify an int-value, this needs to be cast explicitly...
So the ultimate question is: is there any use for an unsigned int (uint)? Even Microsoft seems not to use them.
Unsigned int isn't CLS compliant and would therefore restrict usage of the property to those languages that do implement a UInt.
See here:
Framework 1.1
Introduction to the .NET Framework Class Library
Framework 2.0
.NET Framework Class Library Overview
Many reasons:
uint is not CLS compliant, thus making a built in type (array) dependent on it would have been problematic
The runtime as originally designed prohibits any object on the heap occupying more than 2GB of memory. Since the maximum sized array that would less than or equal to this limit would be new byte[int.MaxValue] it would be puzzling to people to be able to generate positive but illegal array lengths.
Note that this limitation has been somewhat removed in the 4.5 release, though the standard Length as int remains.
Historically C# inherits much of its syntax and convention from C and C++. In those arrays are simply pointer arithmetic so negative array indexing was possible (though normally illegal and dangerous). Since much existing code assumes that the array index is signed this would have been a factor
On a related note the use of signed integers for array indexes in C/C++ means that interop with these languages and unmanaged functions would require the use of ints in those circumstances anyway, which may confuse due to the inconsistency.
The BinarySearch implementation (a very useful component of many algorithms) relies on being able to use the negative range of the int to indicate that the value was not found and the location at which such a value should be inserted to maintain sorting.
When operating on an array it is likely that you would want to take a negative offset of an existing index. If you used an offset which would take you past the start of the array using unit then the wrap around behaviour would make your index possibly legal (in that it is positive). With an int the result would be illegal (but safe since the runtime would guard against reading invalid memory)
I think it also might have to do with simplifying things on a lower level, since Array.Length will of course be added to a negative number at some point, if Array.Length were unsigned, and added to a negative int (two's complement), there could be messy results.
Looks like nobody provided answer to "the ultimate question".
I believe primary use of unsigned ints is to provide easier interfacing with external systems (P/Invoke and the like) and to cover needs of various languages being ported to .NET.
Typically, integer values are signed, unless you explicitly need an unsigned value. It's just the way they are used. I may not agree with that choice, but that's just the way it is.
For the time being, with todays typical memory constraints, if your array or similar data structure needs an UInt32 length, you should consider other data structures.
With an array of bytes, Int32 will give you 2GB of values

Categories

Resources