Friday, April 23, 2010

Note on allocationSize parameter of @SequenceGenerator while using JPA

I ran into what I thought as an issue while I was using the sequence ID generation strategy in JPA. The JPA provider I am using is Hibernate. I think sharing my experience will save someone some time.

To use a sequence (For e.g. Oracle sequence) to generate values and assign them to the ID field of your persistence object, you will following something like this:

@Id
@Column(name = "ITEM_ID")
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator="ItemIdSeqGenerator")
@SequenceGenerator(name="ItemIdSeqGenerator", sequenceName="ITEM_ID_SEQ", allocationSize=1)
private long itemId;
This means the following things:
  1. The @Id annotation says that the field itemId is a primary key.
  2. The @Column annotation says that the corresponding column in the database is ITEM_ID.
  3. The @GeneratedValue says that the value that needs to be populated in the itemId should be generated, while that object is persisted. The strategy to generate the value is to make use of a sequence. The details of the sequence are provided in the following annotation, that has the generator name "ItemIdSeqGenerator".
  4. The @SequenceGenerator annotation describes the sequence generator named "ItemIdSeqGenerator". The sequence in the database that needs to be used is called ITEM_ID_SEQ. 
So far so good. So roughly it implies that every time you try to persist the object, the itemId will be populated using a SQL command (in case of Oracle) "SELECT ITEM_ID_SEQ.NEXTVAL FROM DUAL". So to persist a new object, you will go to the database twice (once to get the NEXTVAL and once to insert the new row).

Imagine a situation where performance is critical and you are going to do a lot of inserts and a lot of IDs need to be generated. To help in such cases, the allocationSize comes to our help and that's exactly what this blog post is about.

allocationSize=N means that "Go and fetch the next value from the database once in every N persist calls. And locally increment the value by 1 in between.".

Let me give an example. Assume that the sequence is set up like this in the database:
CREATE SEQUENCE ITEM_ID_SEQ START WITH 1 INCREMENT BY 10;
And assume that all the values starting with 1 are legal (1, 2, 3, 4, ...) for the ITEM_ID column. In such case you will set the allocationSize=10. So the first persist call will go and fetch the ITEM_ID_SEQ.NEXTVAL from database. The subsequent persist calls will not go to the database, rather they will return last value+1 locally until the value reaches 10. That saves 9 database reads.

What if there are two entity managers that try to do the same thing? When the first entity manager calls the ITEM_ID_SEQ.NEXTVAL it will get 1 and the second one will get 11. Hence the first one will go on like 1,2,3,...10 and the second one will go on like 11,12,13...20, before fetching the next ITEM_ID_SEQ.NEXTVAL.

What if the allowed values are like 1, 11, 21, 31, ... and still you would like to make use of the allocationSize? So far I could not find how to do this! It would have been possible if we can specify an increment, like "allocationSize=100, increment=10" and set up the sequence as follows in the database:
CREATE SEQUENCE ITEM_ID_SEQ START WITH 1 INCREMENT BY 100;
As of now, if you have such requirement, you will have to set the value of allocationSize=1 and suffer the cost of going to the database every time when an object is persisted.

Remember another thing in mind. The J2EE documentation for the allocationSize says  an ambiguous thing which is quite different from the one I described above:
The amount to increment by when allocating sequence numbers from the sequence.
As per my experiments, the increments were always 1 :-(. This parameter just controlled how frequent to go to the database to fetch NEXTVAL.


If you are using an allocationSize greater than 1, it is important to remember that you should not assume that lower ID means earlier to invoke the NEXTVAL from the sequence. Don't try to interpret the ID in anyway other than just an ID to identify the row.