Hibernate performance
Hibernate is a great Object Relational Mapping (ORM) library for accessing data in a relational database. It is one of the leading implementations of Java Persistence (formerly known as JPA). Using JPQL (based on HQL, which is still a nice syntax guide) one can access and update the data in the database through your ORM objects in a simplified SQL way. However, when implemented incorrectly one can run into serious performance problems. They can all be avoided though. Here are some important things to consider to make your queries performant.
Below you will find two examples of common cases that may run into performance problems, and how to fix them. We use a simple Company-Employee data model as depicted below.
Eager vs. Lazy
Here is an example of implementing a 1:N relation.
@MappedSuperclass
public abstract BaseEntity {
@Id
@GeneratedValue
@Type(type = "uuid-char")
@Column(length = 36)
private UUID id;
public UUID getId() {
return id;
}
@Override
public boolean equals(Object obj) {
if (this == obj) return true;
if (obj == null) return false;
if (obj instanceof BaseEntity) {
BaseEntity other = (BaseEntity) obj;
if (other.id == null) return false;
return other.id.equals(id);
}
return false;
}
@Override
public int hashCode() {
return id != null ? id.hashCode() : 0;
}
}
@Entity(name = "Company")
public class Company extends BaseEntity{
@Column(length = 100, nullable = false)
public String name;
@OneToMany(mappedBy = "company", targetEntity = Employee.class) // default Lazy
public List<Employee> employees;
}
@Entity(name = "Employee")
@NamedQueries({
@NamedQuery(name = "findEmployeesSimple",
query = "from Employee where yearOfBirth >= :yearOfBirth"),
@NamedQuery(name = "findEmployeesPerformant",
query = "select em, co from Employee em join em.company co" +
" where em.yearOfBirth >= :yearOfBirth")
})
public class Employee extends BaseEntity {
@Column(length = 100, nullable = false)
public String firstName;
@Column(length = 100, nullable = false)
public String lastName;
@Column(length = 100, nullable = false)
public String email;
@Column(precision = 4)
@Index(name = "ix_year_of_birth")
public int yearOfBirth;
@ManyToOne(fetch = FetchType.LAZY) // NB Lazy, this is important!
@JoinColumn(foreignKey = @ForeignKey(name = "fk_employee_company"))
public Company company;
}
Now notice the @ManyToOne(fetch = FetchType.LAZY)
. By default this is set to EAGER
, which makes that the associated company
is loaded when the employee
is loaded.
However, when doing bulk queries like findEmployeesSimple
one runs the risk of bad performance.
The ORM library may then execute an individual query to load the company
for each employee
found.
So when N employee
records are found, N + 1 queries are executed.
When setting fetch = FetchType.LAZY
and using the findEmployeesPerformant
query this will not happen.
It will then execute just one query.
Notice you now end up with a result list of type Object[]
objects, containing Employee
and Company
consecutively.
It is your job to make it a list of type List<Employee>
with its company
fields set, ad vice versa company.employee
set as well.
However, this should be much faster than running N + 1 queries. Here is an example on how to do that.
List<Object[]> emcos = query.list();
List<Employee> employees = new ArrayList<>();
for (Object[] emco : emcos) {
var em = (Employee) emco[0];
var co = (Company) emco[1];
em.company = co;
if (co.employees == null) {
co.employees = new ArrayList<>();
}
co.employees.add(em);
}
NB 1. Newer versions of the ORM library may have better optimizers so that findEmployeesSimple
does execute just one query.
But for more complex bulk queries it probably never will.
So to ensure good performance one would better choose the findEmployeesPerformant
approach pre-emptively.
NB 2. Do not forget to invoke Employee.company
before the transaction has ended for a single employee
record loaded from the database.
Else you will get a LazyInitializationException
when accessing Employee.company
later on.
Big IN clause
Consider the scenario in which you have a large list of names in a List<String> names
object.
And now you want to find all employees with a last name that is in that list.
The following query would the be convenient:
@NamedQuery(name = "findEmployeesByName",
query = "from Employee where lastName in (:names)")
Invoking this query with names
as a parameter will work.
But when the list becomes larger than some database specific max (or is it an ORM library max?), usually somewhere around 1000, performance problems may arise.
It turns out that when the list is smaller than this max it will be executed as just one query.
But when it is larger it may result in one query for each individual name in the list, so we get N queries, with N being the size of the list.
To prevent this one can chunk the list into chunks with a size smaller than the max, eg. max - 1.
Then we have N / (max - 1) queries. Next concatenating the separate result lists should not be a problem.