Hibernate performance

Hibernate is a great Object Relational Mapping (ORM) library for accessing data in a relational database. It is one of the leading implementations of Java Persistence (formerly known as JPA). Using JPQL (based on HQL, which is still a nice syntax guide) one can access and update the data in the database through your ORM objects in a simplified SQL way. However, when implemented incorrectly one can run into serious performance problems. They can all be avoided though. Here are some important things to consider to make your queries performant.

Below you will find two examples of common cases that may run into performance problems, and how to fix them. We use a simple Company-Employee data model as depicted below.

Figure 1. Data model

Eager vs. Lazy

Here is an example of implementing a 1:N relation.

@MappedSuperclass
public abstract BaseEntity {

    @Id
    @GeneratedValue
    @Type(type = "uuid-char")
    @Column(length = 36)
    private UUID id;

    public UUID getId() {
        return id;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null) return false;
        if (obj instanceof BaseEntity) {
            BaseEntity other = (BaseEntity) obj;
            if (other.id == null) return false;
            return other.id.equals(id);
        }
        return false;
    }

    @Override
    public int hashCode() {
        return id != null ? id.hashCode() : 0;
    }
}

@Entity(name = "Company")
public class Company extends BaseEntity{

    @Column(length = 100, nullable = false)
    public String name;

    @OneToMany(mappedBy = "company", targetEntity = Employee.class) // default Lazy
    public List<Employee> employees;
}

@Entity(name = "Employee")
@NamedQueries({
    @NamedQuery(name = "findEmployeesSimple",
                query = "from Employee where yearOfBirth >= :yearOfBirth"),
    @NamedQuery(name = "findEmployeesPerformant",
                query = "select em, co from Employee em join em.company co" +
                        " where em.yearOfBirth >= :yearOfBirth")
})
public class Employee extends BaseEntity {

    @Column(length = 100, nullable = false)
    public String firstName;

    @Column(length = 100, nullable = false)
    public String lastName;

    @Column(length = 100, nullable = false)
    public String email;

    @Column(precision = 4)
    @Index(name = "ix_year_of_birth")
    public int yearOfBirth;

    @ManyToOne(fetch = FetchType.LAZY) // NB Lazy, this is important!
    @JoinColumn(foreignKey = @ForeignKey(name = "fk_employee_company"))
    public Company company;
}

Now notice the @ManyToOne(fetch = FetchType.LAZY). By default this is set to EAGER, which makes that the associated company is loaded when the employee is loaded. However, when doing bulk queries like findEmployeesSimple one runs the risk of bad performance. The ORM library may then execute an individual query to load the company for each employee found. So when N employee records are found, N + 1 queries are executed.

When setting fetch = FetchType.LAZY and using the findEmployeesPerformant query this will not happen. It will then execute just one query. Notice you now end up with a result list of type Object[] objects, containing Employee and Company consecutively. It is your job to make it a list of type List<Employee> with its company fields set, ad vice versa company.employee set as well. However, this should be much faster than running N + 1 queries. Here is an example on how to do that.

List<Object[]> emcos = query.list();
List<Employee> employees = new ArrayList<>();
for (Object[] emco : emcos) {
    var em = (Employee) emco[0];
    var co = (Company) emco[1];
    em.company = co;
    if (co.employees == null) {
        co.employees = new ArrayList<>();
    }
    co.employees.add(em);
}

NB 1. Newer versions of the ORM library may have better optimizers so that findEmployeesSimple does execute just one query. But for more complex bulk queries it probably never will. So to ensure good performance one would better choose the findEmployeesPerformant approach pre-emptively.

NB 2. Do not forget to invoke Employee.company before the transaction has ended for a single employee record loaded from the database. Else you will get a LazyInitializationException when accessing Employee.company later on.

Big IN clause

Consider the scenario in which you have a large list of names in a List<String> names object. And now you want to find all employees with a last name that is in that list. The following query would the be convenient:

    @NamedQuery(name = "findEmployeesByName",
                query = "from Employee where lastName in (:names)")

Invoking this query with names as a parameter will work. But when the list becomes larger than some database specific max (or is it an ORM library max?), usually somewhere around 1000, performance problems may arise. It turns out that when the list is smaller than this max it will be executed as just one query. But when it is larger it may result in one query for each individual name in the list, so we get N queries, with N being the size of the list. To prevent this one can chunk the list into chunks with a size smaller than the max, eg. max - 1. Then we have N / (max - 1) queries. Next concatenating the separate result lists should not be a problem.