String.intern aka “I never knew that…”

Ok, so I’m trying to fill in any gaps in my java knowledge. First thing is going through my fairly comprehensive SCJP 1.5 study book. Mainly common stuff I’ve used enough of but the odd thing is of interest. Today I found the ‘intern’ method on Strings. I’ll explain.

The typical Java developer is aware of the following:

  • Strings are immutable.
  • Strings typically live in a String pool.
  • Strings should be checked for equality using the equals method, not ‘==’.

You can create a String not in the String pool using the the following:

String foo = new String("bar");

What we typically want, though, is String reuse, via the String pool. An interesting method exists around Strings to ensure getting the String instance from the String pool, this method is ‘intern’. Here is the javadoc lifted from the JSE1.7 site:

Returns a canonical representation for the string object.

A pool of strings, initially empty, is maintained privately by the class String.

When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.

It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.

All literal strings and string-valued constant expressions are interned. String literals are defined in section 3.10.5 of the The Java™ Language Specification.

Here is a class to demonstrate with tests. It uses the System.identityHashCode(Object) method to show which object reference is being looked at.

import junit.framework.Assert;
import org.junit.Test;

public class StringTest {
	final String test = "test";
	@Test
	public void testStrings() {

		final String stackTest = "test";
		final String stackTest2 = new String("test");
		final String stackTest2Interned = stackTest2.intern();

		Assert.assertEquals("test", test);
		Assert.assertTrue("test" == test);
		Assert.assertTrue(stackTest == test);
		Assert.assertFalse(stackTest2 == test);
		Assert.assertTrue(stackTest2Interned == test);

		System.out.println("id of constant = " + System.identityHashCode(test));
		System.out.println("id of stackTest = " + System.identityHashCode(stackTest));
		System.out.println("id of stackTest2 = " + System.identityHashCode(stackTest2));
		System.out.println("id of stackTest2Interned = " + System.identityHashCode(stackTest2Interned));
	}
}

Use the eight primitive Java types

Java has 8 primitve data types.

Here is a list of them with their size and example use:

boolean – a single bit.
A bit is a single value in memory which can have 1 of 2 values – 0 or 1. Typically in programming, 0 means false and 1 means true but this relationship does not exist in java. A boolean has to be true or false, but this rule does not hold for the Boolean object.
Example of its use would be where an object has an ‘is’ or ‘has’ attribute, e.g. person.isDeceased(), person.isEligible();

short – a 16-bit number.
Of the 16 bits used, 1 of the bits is used to determine whether the value is positive or negative. This means the minimum value represented is -32768 and the maximum is 32767.
It is rarely used as an int is typically used for counting things and a Long is typically used for the Id field of an entity.

char – 16-bit character representation.
When used directly it represents a character. It can however be used as a number – it has MIN_VALUE and MAX_VALUE as do other number types, e.g. short, int, long. This does also mean it can be used with the normal mathematical operators such as +, -, % etc.

int – 32-bit integer numbers.
Always occupies 32 bits, regardless of processor, java version etc.
A typical usage would be counting the elements in a list or some returned results. Min value is -2147483648 and Max value is 2147483647. An int is often used even when the value expected would fit into a short. This is just force of habit I think.

float – 32-bit floating point.
This is used for calculating values with numbers after the decimal point, e.g. 10/3. Min value is 1.4E-45
and max value is 3.4028235E38. It shouldn’t be used for currency as it deals with approximations. Running the following:

System.out.println(1.03 - 0.42);

will output

0.6100000000000001

long – 64-bit integer
This is typically used for very large numbers – the min is -9223372036854775808 and the max is 9223372036854775807. A long is typically used for the id field of an entity persisted in the database.

double – 64-bit floating point number.
I never use doubles, I never see doubles used and doubt I will see them used. I don’t do a lot of mathematical computation though – I typically write CRUD apps and have code around the workflow of CRUD, e.g. validation, converting to and from DTOs, persistence.

Hope this helps give an idea of the different types. I think the main concerns are whether you want to use whole numbers or not and if so the concern becomes how much space do you need to represent this number – generally the smaller the better but… shorts are rarely used even when appropriate.

Using private and public as appropriate for encapsulation

Ok, well this part of the SCJP syllabus overlaps with what I’ve written on Encapsulation but I can reuse my class so this should be a cheap post 🙂

In simple terms, if a field or method is public other classes can happily depend on them being there. Private fields and methods are not available to other classes and, as long as the public methods retain their functionality, other classes will be unaffected by changes to them. This is what we want from encapsulation… and also how we achieve it.

Here is our Person class from my post on encapsulation:

public class Person {
  // in this implementation we store both fields as is.
  private final String firstName;
  private final String lastName;
  
  public Person(final String firstName, final String lastName) {
    this.firstName = firstName;
    this.lastName = lastName;
  }
 
  public String getFullName() {
     return firstName + " " + lastName;
  }
}

What do other classes see? Well, they will not see the private fields. To them this class is like a black box (they cannot see /affect its inner workings) that looks like this:

public class Person {
  public Person(final String firstName, final String lastName);
  public String getFullName();
}

If we change how it works without changing the public-facing functionality, we do not need to worry about the other classes.

Unit test
Q: How do we ensure that our class continues to provide the same functionality?
A: Unit tests.

Here is what the test for this class might look like:

import org.junit.Assert;
import org.junit.Test;

public class PersonTest {
	@Test
	public void testHappyPath() {
		final Person p = new Person("Vincent", "Fleetwood");
		Assert.assertEquals("Vincent Fleetwood", p.getFullName());
	}
}

Java SE – Packages

An understanding of packages was part of the requirements of becoming an Oracle Certified Java Programmer. Here I’ll cover what packages mean.

A source folder
When compiling Java the folders and packages must match, e.g. a class called com.vff.Runner would need to be in a folder called vff inside another folder called com. If the class is public it needs to be in a file called Runner.java. On windows it would be com\vff\Runner.java and Unix/Linux would be com/vff/Runner.java.

A logical grouping of classes
An example would be if we were writing a program called mycrm for a company called bling. Imagine if we had the following classes: CustomerService, CustomerValidator, ValidationRule, Customer, CustomerAddress, CustomerContact and a CustomerDatabaseWriter. We could try to group these with the following package and classnames:
package: com.bling.mycrm.service classes: CustomerService
package: com.bling.mycrm.validation classes: CustomerValidator, ValidationRule
package: com.bling.mycrm.model classes: Customer, CustomerAddress, CustomerContact
package: com.bling.mycrm.database classes: CustomerDatabaseWriter.

A way of limiting access to other classes methods
Imagine that the ValidationRule instances should only be visible to other Validation classes, i.e. CustomerValidator. We can enforce this by giving the class default visibility. By not making it protected, private or public only other classes in the same package (com.bling.mycrm.validation) can see it. Alternatively we could make methods of the rule have default visibility with the same technique.

Namespaces
In some other languages packages are referred to as namespaces.

n.b.
When a class refers to other classes in the same package, or in the special java.lang package, the class referred to does not need its package specified (or to be imported).

There is a more extensive explanation of concepts on wikipedia here: http://en.wikipedia.org/wiki/Java_package

OO Concepts – Encapsulation

One of the building blocks of OOP, Encapsulation can have either of 2 meanings (thanks, Wikipedia) :

1. “A language mechanism for restricting access to some of the object’s components.”
2. “A language construct that facilitates the bundling of data with the methods (or other functions) operating on that data.”

Information Hiding
Generally, when we talk about encapsulation in OOP we are talking about Information hiding – closely related to abstraction. As an example, we may want to get a value from an object without worrying about its internal representation.

Here is an example:

public class Person {
  private final String firstName;
  private final String lastName;
 
  public Person(final String firstName, final String lastName) {
    this.firstName = firstName;
    this.lastName = lastName;
  }

  public String getFullName() {
     return firstName + " " + lastName;
  }
}

In this example we can get a full name for the person regardless of how the first and last names are stored internally inside the Person object. This is the OO version of encapsulation.

In a nutshell, we are protected from changes in the internal representation of data, it is hidden from clients of the Person class – hence the private modifier.

For an alternative take on encapsulation – storing data with the methods working on it – you can look here: http://www.javaworld.com/article/2075271/core-java/encapsulation-is-not-information-hiding.html

OO Concepts – Abstraction

Abstraction is one of the key ideas behind Object Oriented Programming, but the word has a few meanings even just within the field of OOP. The general concept is something I refer to as not having to worry about details. Here are some of the meanings of abstraction in OOP:

Abstraction and polymorphism
Some conversations are about levels of abstraction. An example may be that we want our class to log messages but without worrying about how or where to. A Logger class may provide us with a level of abstraction – we can call methods on our Logger without worrying about its implementation. Being able to deal with different Logger implementations, just using its inteface, is a form of polymorphism, and an example of how the key parts of OO interact.

Abstraction and inheritance
When we extend a base class and provide a specialisation, we typically get some functionality for free. We don’t worry about how this functionality works, we just get it. The most common example is java.lang.Object. If we don’t explicitly extend another class we will extend this one by default. It provides our class with toString, hashCode and equals methods, maybe some more.

Levels of Abstraction
In OO we typically layer our applications and deal in abstractions. This enables our code to work at a particular level of abstraction. Imagine code to validate a customer and, if valid, persist it. If it fails it will raise an alert. What ‘level’ of abstraction do we want to work at? I would imagine a method call for each of those tasks. Any more and maybe we don’t have enough abstraction. Any less and maybe there is too much – a problem that I personally don’t think gets enough attention. Anyway, here are some examples of implementing this class with differing levels of abstraction:

/* 
 * BAD - TOO MUCH ABSTRACTION - 
 * here is an example where we get that the service will 
 * try to persist the Customer but how and whether there is
 * validation or not is lost. We would need to dig around - starting with
 * the call to super.perform - this could be doing anything.
 */
class CustomerService { // too much abstraction....
  public void persist(Customer c) {
     super.perform(c, getDataStore()); // who knows what this does?!?
  }
}

Here we have an appropriate level of abstraction – not much digging to do here, we even know how errors are handled.

/* 
 * GOOD - we haven't hidden what the purpose of the method is. We can see what it will do
 * just as good as if we commented inside.
 */
class CustomerService { // right level of abstraction
  public void persist(Customer c) {
     try {
       validator.validate(c);
       customerDao.persist(c);
     } catch (Exception e) {
       log.error("error in persist: " + c, e);
       alertService.alert("error persisting customer: " + c, e);
     }
  }
}

Now we have not enough abstraction. We have details we don’t want to be working with:

/* 
 * BAD - not enough abstraction - 
 * we are working at too low a level, the opposite of the first example. we specify 
 * every little detail, micro-managing each step. 
 */
class CustomerService { // not enough of abstraction
  public void persist(Customer c) {
     DataStore ds = ServiceLocator.getDataStore();
     PersistService ps = new PersistService(ps);
     try {
       Validator validator = new Validator();
       validator.setMode(STRICT);
       validator.setTarget(c);
       validator.validate();
       if (validator.hasErrors()) {
         throw new ValidationException(validator.getErrors());
       }
       ps.beginTransaction();
       ps.persist(c);
       ps.commit();
     } catch (Exception e) {
       Logger log = LoggerFactory.getLogger();
       log.error("error persisting " + c, e);
       if (ps.inTransaction()) {
          ps.performRollback();
       }
       AlertService alertService = new AlertService(getInitialContext());
       alertService.alert("error persisting customer: " + c, e);
     }
  }
}

It’s easy to write code like this, I’ve written it myself. The problem is that when I go back to this code my eyes gloss over and I just don’t want to have to take in all of that complexity. Am I concerned here about which persistence mechanism? No, it is as if this method should just give out simple orders – validate, persist – or log, alert. How these are implemented are the respective responsibilities of the Validator, Persister, Logger and AlertService – implementations I don’t want to worry about here.

Guard conditions in Java

So, my impetus for writing this post is the ‘require’ method in scala. It works a bit like the failed ‘assert’ that came to Java a while ago but never really took off. The ‘require’ in scala is used to write guard conditions, similar to assertions you may make in JUnit tests – here is an example:

object Shuffler {
  
  def shuffle(deck:Deck) = {
    require(deck.isNotEmpty, "deck cannot be empty")
    // code omitted...
  }

}

In Java I’ve seen the following:

public void shuffle(Deck deck) {
  // guard condition
  if (deck.isEmpty()) {  
    throw new IllegalArgumentException("Cannot shuffle an empty deck");
  }
  // code omitted...
}

and also

public void shuffle(Deck deck) {
  // guard condition
  if (deck.isNotEmpty()) {  
    // code omitted...
  } else {
    throw new IllegalArgumentException("Cannot shuffle an empty deck");
  }
}

This style of code can quickly create multiple nested ifs and make the code less readable. An old adage said “no more than one return point in a method” but this rule has not stood the test of time. As Kent Beck writes, this “was to prevent the confusion possible when jumping into and out of many locations in the same routine. It made good sense when applied to FORTRAN or assembly language programs written with lots of global data where even understanding which statements were executed was hard work … with small methods and mostly local data, it is needlessly conservative.”

Option 1: Roll-your-own Guard Condition

We fail fast – error if the condition we cannot handle is true. Note – now our happy path code (omitted) is not inside an if or else. We can write multiple guard conditions but can recongnise the guard conditions for what they are above the actual part of the method that affects change.

public void shuffle(Deck deck) {
  // guard condition
  if (deck.isEmpty()) {  
    throw new IllegalArgumentException("Cannot shuffle an empty deck");
  } 
  // code omitted...
}

Option 2: External Libaries

I think we can improve on this using a static method similar to the Assert class we use in JUnit, and classes to do this already exist. Our options include Validate (commons lang), Assert (Spring framework) and Preconditions (google-guava).

Here is an example using Preconditions:

import com.google.common.base.Preconditions;

public void shuffle(Deck deck) {
  // guard condition
  Preconditions.checkArgument(deck.isNotEmpty(), "Cannot shuffle an empty deck");
  
  // code omitted...
}

This is much simpler and easier to read for me. I believe that, as a rule of thumb, the code doing the primary purpose of a function should generally be equal to the amount of code doing the following tasks: Logging, Auditing, Error Handling, Guard Conditions.

A good comparison of the options is available here: http://www.sw-engineering-candies.com/blog-1/comparison-of-ways-to-check-preconditions-in-java