JavaEE Sample::Batch::Batch Chunk CSV Database

BatchCSVDatabaseTest

The Batch specification provides a Chunk Oriented processing style. This style is defined by enclosing into a transaction a set of reads, process and write operations via ItemReader, ItemProcessor and ItemWriter. Items are read one at a time, processed and aggregated. The transaction is then committed when the defined checkpoint-policy is triggered.

<?xml version="1.0" encoding="UTF-8"?>
<job id="myJob" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
    <step id="myStep" >
        <chunk item-count="3">
            <reader ref="myItemReader"/>
            <processor ref="myItemProcessor"/>
            <writer ref="myItemWriter"/>
        </chunk>
    </step>
</job>

A very simple job is defined in the myJob.xml file. Just a single step with a reader, a processor and a writer.

This job will read a file from the system in CSV format:

@Override
public void open(Serializable checkpoint) throws Exception {
    reader = new BufferedReader(
            new InputStreamReader(
                    Thread.currentThread().getContextClassLoader().getResourceAsStream("/META-INF/mydata.csv")));
}

@Override
public String readItem() {
    try {
        return reader.readLine();
    } catch (IOException ex) {
        Logger.getLogger(MyItemReader.class.getName()).log(Level.SEVERE, null, ex);
    }
    return null;
}

Process the data by transforming it into a Person object:

@Override
public Person processItem(Object t) {
    System.out.println("processItem: " + t);

    StringTokenizer tokens = new StringTokenizer((String)t, ",");

    String name = tokens.nextToken();
    String date;

    try {
        date = tokens.nextToken();
        format.setLenient(false);
        format.parse(date);
    } catch (ParseException e) {
        return null;
    }

    return new Person(id++, name, date);
}

And finally write the data using JPA to a database:

@Override
public void writeItems(List list) {
    System.out.println("writeItems: " + list);
    for (Object person : list) {
        em.persist(person);
    }
}

We’re just going to deploy the application as a web archive. Note the inclusion of the following files:

/META-INF/batch-jobs/myJob.xml
/META-INF/persistence.xml
/META-INF/create.sql
/META-INF/drop.sql
/META-INF/mydata.csv

The myJob.xml file is needed for running the batch definition.
The persistence.xml file is needed for JPA configuration, create schema, load-data and drop schema.
The create.sql file has the necessary database schema for the data.
The drop.sql file has the required commands to drop the database schema created.
The mydata.csv file has the data to load into the database.

@Deployment
public static WebArchive createDeployment() {
    WebArchive war = ShrinkWrap.create(WebArchive.class)
            .addClass(BatchTestHelper.class)
            .addPackage("org.javaee7.batch.chunk.csv.database")
            .addAsWebInfResource(EmptyAsset.INSTANCE, ArchivePaths.create("beans.xml"))
            .addAsResource("META-INF/batch-jobs/myJob.xml")
            .addAsResource("META-INF/persistence.xml")
            .addAsResource("META-INF/create.sql")
            .addAsResource("META-INF/drop.sql")
            .addAsResource("META-INF/mydata.csv");
    System.out.println(war.toString(true));
    return war;
}

In the test, we’re just going to invoke the batch execution and wait for completion. To validate the test expected behaviour we need to query the Metric object available in the step execution.

The batch process itself will read and write 7 elements of type Person. Commits are executed after 3 elements are read.

@SuppressWarnings("unchecked")
@Test
public void testBatchCSVDatabase() throws Exception {
    JobOperator jobOperator = BatchRuntime.getJobOperator();
    Long executionId = jobOperator.start("myJob", new Properties());
    JobExecution jobExecution = jobOperator.getJobExecution(executionId);

    jobExecution = BatchTestHelper.keepTestAlive(jobExecution);

    List<StepExecution> stepExecutions = jobOperator.getStepExecutions(executionId);
    for (StepExecution stepExecution : stepExecutions) {
        if (stepExecution.getStepName().equals("myStep")) {
            Map<Metric.MetricType, Long> metricsMap = BatchTestHelper.getMetricsMap(stepExecution.getMetrics());

            (1)
            assertEquals(7L, metricsMap.get(Metric.MetricType.READ_COUNT).longValue());
            (2)
            assertEquals(7L, metricsMap.get(Metric.MetricType.WRITE_COUNT).longValue());
            (3)
            assertEquals(3L, metricsMap.get(Metric.MetricType.COMMIT_COUNT).longValue());
        }
    }

    Query query = entityManager.createNamedQuery("Person.findAll");
    List<Person> persons = query.getResultList();

    (4)
    assertEquals(7L, persons.size());
    (5)
    assertEquals(jobExecution.getBatchStatus(), BatchStatus.COMPLETED);
}

The read count should be 7 elements. Check MyItemReader.
The write count should be the same 7 read elements.
The commit count should be 4. Checkpoint is on every 3rd read, 4 commits for read elements.
Confirm that the elements were actually persisted into the database.
Job should be completed.

Batch Chunk CSV Database

Chunk Processing - Read, Process, Write to a Database

BatchCSVDatabaseTest

Share the Knowledge

Help Improve

Recent Changelog