Thursday, May 31, 2012

byte + byte = int… why?


Looking at this C# code...




byte x = 1;
byte y = 2;
byte z = x + y; // ERROR: Cannot implicitly convert type 'int' to 'byte'



The result of any math performed on byte (or short) types is implicitly cast back to an integer. The solution is to explicitly cast the result back to a byte, so...




byte z = (byte)(x + y); // works



What I am wondering is why? Is it architectural? Philosophical?



We have:



  • int + int = int

  • long + long = long

  • float + float = float

  • double + double = double



So why not:



  • byte + byte = byte

  • short + short = short ?



A bit of background:



I am performing a long list of calculations on "small numbers" (i.e. < 8) and storing the intermediate results in a large array. Using a byte array (instead of an int array) is faster (because of cache hits). But the extensive byte-casts spread through the code make it that much more unreadable.


Source: Tips4all

15 comments:

  1. The third line of your code snippet:

    byte z = x + y;


    actually means

    byte z = (int) x + (int) y;


    So, there is no + operation on bytes, bytes are first cast to integers and the result of addition of two integers is a (32-bit) integer.

    ReplyDelete
  2. I believe it's basically for the sake of performance. (In terms of "why it happens at all" it's because there aren't any operators defined by C# for arithmetic with byte, sbyte, short or ushort, just as others have said. This answer is about why those operators aren't defined.)

    Processors have native operations to do arithmetic with 32 bits very quickly. Doing the conversion back from the result to a byte automatically could be done, but would result in performance penalties in the case where you don't actually want that behaviour.

    I think this is mentioned in one of the annotated C# standards. Looking...

    EDIT: Annoyingly, I've now looked through the annotated ECMA C# 2 spec, the annotated MS C# 3 spec and the annotation CLI spec, and none of them mention this as far as I can see. I'm sure I've seen the reason given above, but I'm blowed if I know where. Apologies, reference fans :(

    ReplyDelete
  3. I thought I had seen this somewhere before. From this article, The Old New Thing:


    Suppose we lived in a fantasy world
    where operations on 'byte' resulted in
    'byte'.


    byte b = 32;
    byte c = 240;
    int i = b + c; // what is i?



    In this fantasy world, the value of i
    would be 16! Why? Because the two
    operands to the + operator are both
    bytes, so the sum "b+c" is computed as
    a byte, which results in 16 due to
    integer overflow. (And, as I noted
    earlier, integer overflow is the new
    security attack vector.)


    EDIT: Raymond is defending, essentially, the approach C and C++ took originally. In the comments, he defends the fact that C# takes the same approach, on the grounds of language backward compatibility.

    ReplyDelete
  4. C#

    ECMA-334 states that addition is only defined as legal on int+int, uint+uint, long+long and ulong+ulong (ECMA-334 14.7.4). As such, these are the candidate operations to be considered with respect to 14.4.2. Because there are implicit casts from byte to int, uint, long and ulong, all the addition function members are applicable function members under 14.4.2.1. We have to find the best implicit cast by the rules in 14.4.2.3:

    Casting(C1) to int(T1) is better than casting(C2) to uint(T2) or ulong(T2) because:


    If T1 is int and T2 is uint, or ulong, C1 is the better conversion.


    Casting(C1) to int(T1) is better than casting(C2) to long(T2) because there is an implicit cast from int to long:


    If an implicit conversion from T1 to T2 exists, and no implicit conversion from T2 to T1 exists, C1 is the better conversion.


    Hence the int+int function is used, which returns an int.

    Which is all a very long way to say that it's buried very deep in the C# specification.

    CLI

    The CLI operates only on 6 types (int32, native int, int64, F, O, and &). (ECMA-335 partition 3 section 1.5)

    Byte (int8) is not one of those types, and is automatically coerced to an int32 before the addition. (ECMA-335 partition 3 section 1.6)

    ReplyDelete
  5. The answers indicating some inefficiency adding bytes and truncating the result back to a byte are incorrect. x86 processors have instructions specifically designed for integer operation on 8-bit quantities.

    In fact, for x86/64 processors, performing 32-bit or 16-bit operations are less efficient than 64-bit or 8-bit operations due to the operand prefix byte that has to be decoded. On 32-bit machines, performing 16-bit operations entail the same penalty, but there are still dedicated opcodes for 8-bit operations.

    Many RISC architectures have similar native word/byte efficient instructions. Those that don't generally have a store-and-convert-to-signed-value-of-some-bit-length.

    In other words, this decision must have been based on perception of what the byte type is for, not due to underlying inefficiencies of hardware.

    ReplyDelete
  6. I remember once reading something from Jon Skeet (can't find it now, I'll keep looking) about how byte doesn't actually overload the + operator. In fact, when adding two bytes like in your sample, each byte is actually being implicitly converted to an int. The result of that is obviously an int. Now as to WHY this was designed this way, I'll wait for Jon Skeet himself to post :)

    EDIT: Found it! Great info about this very topic here.

    ReplyDelete
  7. From the C# language spec 1.6.7.5 7.2.6.2 Binary numeric promotions it converts both operands to int if it can't fit it into several other categories. My guess is they didn't overload the + operator to take byte as a parameter but want it to act somewhat normally so they just use the int data type.

    C# language Spec

    ReplyDelete
  8. This is because of overflow and carries.

    If you add two 8 byte numbers, they might overflow into the 9th bit.

    Example:

    1111 1111
    + 0000 0001
    -----------
    1 0000 0000


    I don't know for sure, but I assume that ints, longs, and doubles are given more space because they are pretty large as it is. Also, they are multiples of 4, which are more efficient for computers to handle, due to the width of the internal data bus being 4 bytes or 32 bits (64 is getting more prevalent now) wide. Byte and short are a little more inefficient, but they can save space.

    ReplyDelete
  9. This was probably a practical decision on the part of the language designers. After all, an int is an Int32, a 32-bit signed integer. Whenever you do an integer operation on a type smaller than int, it's going to be converted to a 32 bit signed int by most any 32 bit CPU anyway. That, combined with the likelihood of overflowing small integers, probably sealed the deal. It saves you from the chore of continuously checking for over/under-flow, and when the final result of an expression on bytes would be in range, despite the fact that at some intermediate stage it would be out of range, you get a correct result.

    Another thought: The over/under-flow on these types would have to be simulated, since it wouldn't occur naturally on the most likely target CPUs. Why bother?

    ReplyDelete
  10. This is for the most part my answer that pertains to this topic, submitted first to a similar question here.

    All operations with integral numbers smaller than Int32 are rounded up to 32 bits before calculation by default. The reason why the result is Int32 is simply to leave it as it is after calculation. If you check the MSIL arithmetic opcodes, the only integral numeric type they operate with are Int32 and Int64. It's "by design".

    If you desire the result back in Int16 format, it is irrelevant if you perform the cast in code, or the compiler (hypotetically) emits the conversion "under the hood".

    For example, to do Int16 arithmetic:

    short a = 2, b = 3;

    short c = (short) (a + b);


    The two numbers would expand to 32 bits, get added, then truncated back to 16 bits, which is how MS intended it to be.

    The advantage of using short (or byte) is primarily storage in cases where you have massive amounts of data (graphical data, streaming, etc.)

    ReplyDelete
  11. My suspicion is that C# is actually calling the operator+ defined on int (which returns an int unless you are in a checked block), and implicitly casting both of your bytes/shorts to ints. That's why the behavior appears inconsistent.

    ReplyDelete
  12. Addition is not defined for bytes. So they are cast to int for the addition. This true for most math operations and bytes. (note this is how it used to be in older languages, I am assuming that it hold true today).

    ReplyDelete
  13. From .NET Framework code:

    // bytes
    private static object AddByte(byte Left, byte Right)
    {
    short num = (short) (Left + Right);
    if (num > 0xff)
    {
    return num;
    }
    return (byte) num;
    }

    // shorts (int16)
    private static object AddInt16(short Left, short Right)
    {
    int num = Left + Right;
    if ((num <= 0x7fff) && (num >= -32768))
    {
    return (short) num;
    }
    return num;
    }


    Simplify with .NET 3.5 and above:

    public static class Extensions
    {
    public static byte Add(this byte a, byte b)
    {
    return (byte)(a + b);
    }
    }


    now you can do:



    byte a = 1, b = 2, c;
    c = a.Add(b);

    ReplyDelete
  14. I think it's a design decission about which operation was more common... If byte+byte = byte maybe much more people will be bothered by having to cast to int when an int is required as result.

    ReplyDelete
  15. In addition to all the other great comments, I thought I would add one little tidbit. A lot of comments have wondered why int, long, and pretty much any other numeric type doesn't also follow this rule...return a "bigger" type in response to arithmatic.

    A lot of answers have had to do with performance (well, 32bits is faster than 8bits). In reality, an 8bit number is still a 32bit number to a 32bit CPU....even if you add two bytes, the chunk of data the cpu operates on is going to be 32bits regardless...so adding ints is not going to be any "faster" than adding two bytes...its all the same to the cpu. NOW, adding two ints WILL be faster than adding two longs on a 32bit processor, because adding two longs requires more microops since your working with numbers wider than the processors word.

    I think the fundamental reason for causing byte arithmatic to result in ints is pretty clear and strait forward: 8bits just doesn't go very far! :D Witn 8 bits, you have an unsigned range of 0-255. Thats not a whole lot of room to work with...the likely hood that you are going to run into a bytes limitations is VERY high when using them in arithmatic. However, the chance that your going to run out of bits when working with ints, or longs, or doubles, etc. is significantly lower...low enough that we very rarely encounter the need for more.

    Automatic conversion from byte to int is logical because the scale of a byte is so small. Automatic conversion from int to long, float to double, etc. is not logical because those numbers have significant scale.

    ReplyDelete