This implementation is acceptable only for the small dataset because data is read by one batch query, and stored whole result list in memory. Also, it is not thread-safe.
In the case of loading large volumes:
- on the environment with limited memory can lead to out of memory
- can lead to performance problems. We will wait until thousands of records will be loaded from DB by one call
Solution 1, org.springframework.batch.item.database.JpaCursorItemReader
A similar implementation is defined out of the box in Spring Batch: JpaCursorItemReader
The main difference is that this implementation is working only with specific JPQL query instead of repository and use JPA’s Query.getResultStream() method to get query results.
Implementation of JpaCursorItemReader
:
protected void doOpen() throws Exception {
...
Query query = createQuery();
if (this.parameterValues != null) {
this.parameterValues.forEach(query::setParameter);
}
this.iterator = query.getResultStream().iterator();
}
Hibernate, for example, introduced the Query.getResultStream()
method in version 5.2.
It uses Hibernate’s ScrollableResult
implementation to move through the result set and to fetch the records in batches. That prevents you from loading all records of the result set at once and allows you to process them more efficiently.
Example of creation:
protected ItemReader<Foo> getItemReader() throws Exception {
LocalContainerEntityManagerFactoryBean factoryBean = new LocalContainerEntityManagerFactoryBean();
String jpqlQuery = "from Foo";
JpaCursorItemReader<Foo> itemReader = new JpaCursorItemReader<>();
itemReader.setQueryString(jpqlQuery);
itemReader.setEntityManagerFactory(factoryBean.getObject());
itemReader.afterPropertiesSet();
itemReader.setSaveState(true);
return itemReader;
}
Solution 2, org.springframework.batch.item.database.JpaPagingItemReader
It is more flexible solution for JPQL query than JpaCursorItemReader
. ItemReader loads and stores data by pages and it is thread-safe.
According to documentation:
ItemReader for reading database records built on top of JPA.
It executes the JPQL setQueryString(String) to retrieve requested
data. The query is executed using paged requests of a size specified
in AbstractPagingItemReader.setPageSize(int). Additional pages are
requested when needed as
AbstractItemCountingItemStreamItemReader.read() method is called,
returning an object corresponding to current position.
The performance of the paging depends on the JPA implementation and
its use of database specific features to limit the number of returned
rows.
Setting a fairly large page size and using a commit interval that
matches the page size should provide better performance.
In order to reduce the memory usage for large results the persistence
context is flushed and cleared after each page is read. This causes
any entities read to be detached. If you make changes to the entities
and want the changes persisted then you must explicitly merge the
entities.
The implementation is thread-safe in between calls
Solution 3, org.springframework.batch.item.data.RepositoryItemReader
It is a more efficient solution. It works with the repository, loads and stores data in chunks and it is thread-safe.
According to documentation:
A ItemReader that reads records utilizing a
PagingAndSortingRepository.
Performance of the reader is dependent on the repository
implementation, however setting a reasonably large page size and
matching that to the commit interval should yield better performance.
The reader must be configured with a PagingAndSortingRepository, a
Sort, and a pageSize greater than 0.
This implementation is thread-safe between calls to
AbstractItemCountingItemStreamItemReader.open(ExecutionContext), but
remember to use saveState=false if used in a multi-threaded client (no
restart available).
Example of creation:
PagingAndSortingRepository<Foo, Long> repository = FooRepository<>();
RepositoryItemReader<Foo> reader = new RepositoryItemReader<>();
reader.setRepository(repository ); //The PagingAndSortingRepository implementation used to read input from.
reader.setMethodName("findByName"); //Specifies what method on the repository to call.
reader.setArguments(arguments); // Arguments to be passed to the data providing method.
Creation via builder:
PagingAndSortingRepository<Foo, Long> repository = new FooRepository<>();
new RepositoryItemReaderBuilder<>().repository(repository)
.methodName("findByName")
.arguments(new ArrayList<>())
.build()
More examples of usage: RepositoryItemReaderTests and RepositoryItemReaderIntegrationTests
Summarise:
Your implementation is good only for simple use cases.
I recommend to use out of box solutions.