Grails Cookbook - A collection of tutorials and examples

Improving Performance of Large Batch Insert

Sometimes we need to insert large number of rows to the database. (For example: transferring data from external source to our database). It is preferable that performance is fast and the expected time to finish is proportional to the number of rows to be inserted.
A week ago, I was programming a screen that on click of a button, the application should download a file from an external source and insert the data into the database. I noticed that when the number of rows in my data set doubled, it took the application 6 times more to finish the task. It lead me to investigate on how to improve the situation.

Simple Test

To benchmark, I did a small experiment using a simple domain that contains a single number:

package asia.grails
class TestNumber {
	int number
}

And a simple controller action to run a batch insert, to serve as a baseline.

package asia.grails
class TestInsertController {
    def firstTest() { 
        for (int i=0; i<1000; i++) {
            for (int j=0; j<500; j++) {
                new TestNumber(number:j).save()
            }
        }
        render "Done"
    }
}

This test will try to insert 500,000 records. On my machine, it took more than 7 minutes to finish the operation.
After trying many iterations and logging, it seems the operation becomes slower and slower as you insert more records. For example, when you are within the first few thousands, it could insert almost 2,000 records per second. But when there are significantly more records already created (e.g. few hundred thousands) it takes a few seconds to insert 500 records.

In fact, when we try to modify the example to insert 1,000,000 records, the time to finish is more than 1 hour!

Solution

The problem is that all the operations are within the same hibernate session. And the number of objects are piling up inside the session. Hence, the performance degrades more and more over time.
The hibernate session will only flush or clear when the page you requested has rendered (I believe by default Grails uses OpenSessionInViewFilter). The solution to improve performance is to clear the session every now and then to free up the session. For example:
package asia.grails
class TestInsertController {
    def secondTest() { 
        for (int i=0; i<1000; i++) {
            TestNumber.withNewTransaction {
                for (int j=0; j<500; j++) {
                    new TestNumber(number:j).save(flush:true)
                }
            }
            sessionFactory.currentSession.clear()
        }
        render "Done"
    }
}
And with this, performance is linearly proportional. E.g. 500,000 records can be inserted in 5 minutes and a million records in 10 minutes.
Tags: batch insert, insert, session, transaction