I would like to know, if this way is recommended to implement the reader spring batch with jpa or is it better to look for another solution and if this way is not recommended where can I look for information on a better option

public class CreditCardItemReader implements ItemReader<CreditCard> {

@Autowired
private CreditCardRepository respository;

private Iterator<CreditCard> usersIterator;

@BeforeStep
public void before(StepExecution stepExecution) {
    usersIterator = respository.someQuery().iterator();
}

@Override
public CreditCard read() {
    if (usersIterator != null && usersIterator.hasNext()) {
        return usersIterator.next();
    } else {
        return null;
    }
}
  }

推荐答案

This implementation is acceptable only for the small dataset because data is read by one batch query, and stored whole result list in memory. Also, it is not thread-safe.
In the case of loading large volumes:

  • on the environment with limited memory can lead to out of memory
  • can lead to performance problems. We will wait until thousands of records will be loaded from DB by one call


Solution 1, org.springframework.batch.item.database.JpaCursorItemReader
A similar implementation is defined out of the box in Spring Batch: JpaCursorItemReader
The main difference is that this implementation is working only with specific JPQL query instead of repository and use JPA’s Query.getResultStream() method to get query results.
Implementation of JpaCursorItemReader:

    protected void doOpen() throws Exception {
        ...
        Query query = createQuery();
        if (this.parameterValues != null) {
            this.parameterValues.forEach(query::setParameter);
        }
        this.iterator = query.getResultStream().iterator();
    }

Hibernate, for example, introduced the Query.getResultStream() method in version 5.2. It uses Hibernate’s ScrollableResult implementation to move through the result set and to fetch the records in batches. That prevents you from loading all records of the result set at once and allows you to process them more efficiently.
Example of creation:

    protected ItemReader<Foo> getItemReader() throws Exception {
        LocalContainerEntityManagerFactoryBean factoryBean = new LocalContainerEntityManagerFactoryBean();
        String jpqlQuery = "from Foo";
        JpaCursorItemReader<Foo> itemReader = new JpaCursorItemReader<>();
        itemReader.setQueryString(jpqlQuery);
        itemReader.setEntityManagerFactory(factoryBean.getObject());
        itemReader.afterPropertiesSet();
        itemReader.setSaveState(true);
        return itemReader;
    }

Solution 2, org.springframework.batch.item.database.JpaPagingItemReader
It is more flexible solution for JPQL query than JpaCursorItemReader. ItemReader loads and stores data by pages and it is thread-safe.
According to documentation:

ItemReader for reading database records built on top of JPA.

It executes the JPQL setQueryString(String) to retrieve requested data. The query is executed using paged requests of a size specified in AbstractPagingItemReader.setPageSize(int). Additional pages are requested when needed as AbstractItemCountingItemStreamItemReader.read() method is called, returning an object corresponding to current position.

The performance of the paging depends on the JPA implementation and its use of database specific features to limit the number of returned rows.

Setting a fairly large page size and using a commit interval that matches the page size should provide better performance.

In order to reduce the memory usage for large results the persistence context is flushed and cleared after each page is read. This causes any entities read to be detached. If you make changes to the entities and want the changes persisted then you must explicitly merge the entities.

The implementation is thread-safe in between calls

Solution 3, org.springframework.batch.item.data.RepositoryItemReader
It is a more efficient solution. It works with the repository, loads and stores data in chunks and it is thread-safe.
According to documentation:

A ItemReader that reads records utilizing a PagingAndSortingRepository.

Performance of the reader is dependent on the repository implementation, however setting a reasonably large page size and matching that to the commit interval should yield better performance.

The reader must be configured with a PagingAndSortingRepository, a Sort, and a pageSize greater than 0.

This implementation is thread-safe between calls to AbstractItemCountingItemStreamItemReader.open(ExecutionContext), but remember to use saveState=false if used in a multi-threaded client (no restart available).

Example of creation:

PagingAndSortingRepository<Foo, Long> repository = FooRepository<>();
RepositoryItemReader<Foo> reader = new RepositoryItemReader<>();
reader.setRepository(repository ); //The PagingAndSortingRepository implementation used to read input from.
reader.setMethodName("findByName"); //Specifies what method on the repository to call.
reader.setArguments(arguments); // Arguments to be passed to the data providing method.

Creation via builder:

PagingAndSortingRepository<Foo, Long> repository = new FooRepository<>();
new RepositoryItemReaderBuilder<>().repository(repository)
                                   .methodName("findByName")
                                   .arguments(new ArrayList<>())
                                   .build()

More examples of usage: RepositoryItemReaderTests and RepositoryItemReaderIntegrationTests

Summarise:
Your implementation is good only for simple use cases.
I recommend to use out of box solutions.

Java相关问答推荐

使用ExecutorService时在ThreadFactory中触发自定义newThread函数

我应该避免在Android中创建类并在运行时编译它们吗?

BiPredicate和如何使用它

Java:根据4象限中添加的行数均匀分布行的公式

如何使用AWS CLI从S3存储桶中的所有对象中删除用户定义的元数据?

对运行在GraalVM-21上的JavaFX应用程序使用分代ZGC会警告不支持JVMCI,为什么?

SpringBootreact 式Web应用程序的Spring Cloud Configer服务器中的资源控制器损坏

名称冲突具有相同的擦除

多重延迟签名

SpringBoot Kafka自动配置-适用于SASL_PLAYTEXT的SSLBundle 包,带SCRAM-SHA-512

搜索列表返回多个频道

使用Class.this.field=Value初始化构造函数中的最后一个字段会产生错误,而使用this.field=Value则不会

是否在settings.xml中使用条件Maven镜像?

根本不显示JavaFX阿拉伯字母

AbstractList保证溢出到其他方法

STREAMS减少部分结果的问题

通过/失败的参数化junit测试方法执行数

Java 21内置http客户端固定运营商线程

";home/runner/work/中没有文件...匹配到[**/pom.xml];Maven项目的构建过程中出现错误

Jackson YAML:支持锚点扩展/覆盖