Batch Chunk CSV Database

Run
How to run the sample
The source code for this sample can be found in the javaee7-samples GitHub repository. The first thing we need to do is to get the source by downloading the repository and then go into the samples folder:
git clone git://github.com/javaee-samples/javaee7-samples.git
cd javaee7-samples/batch/chunk-csv-database/
Now we are ready to start testing. You can run all the tests in this sample by executing:
mvn test
Or you can run individual tests by executing one of the following:
mvn test -Dtest=BatchCSVDatabaseTest

Chunk Processing - Read, Process, Write to a Database

BatchCSVDatabaseTest

The Batch specification provides a Chunk Oriented processing style. This style is defined by enclosing into a transaction a set of reads, process and write operations via ItemReader, ItemProcessor and ItemWriter. Items are read one at a time, processed and aggregated. The transaction is then committed when the defined checkpoint-policy is triggered.

<?xml version="1.0" encoding="UTF-8"?>
<job id="myJob" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
    <step id="myStep" >
        <chunk item-count="3">
            <reader ref="myItemReader"/>
            <processor ref="myItemProcessor"/>
            <writer ref="myItemWriter"/>
        </chunk>
    </step>
</job>

A very simple job is defined in the myJob.xml file. Just a single step with a reader, a processor and a writer.

This job will read a file from the system in CSV format:

@Override
public void open(Serializable checkpoint) throws Exception {
    reader = new BufferedReader(
            new InputStreamReader(
                    Thread.currentThread().getContextClassLoader().getResourceAsStream("/META-INF/mydata.csv")));
}
@Override
public String readItem() {
    try {
        return reader.readLine();
    } catch (IOException ex) {
        Logger.getLogger(MyItemReader.class.getName()).log(Level.SEVERE, null, ex);
    }
    return null;
}

Process the data by transforming it into a Person object:

@Override
public Person processItem(Object t) {
    System.out.println("processItem: " + t);

    StringTokenizer tokens = new StringTokenizer((String)t, ",");

    String name = tokens.nextToken();
    String date;

    try {
        date = tokens.nextToken();
        format.setLenient(false);
        format.parse(date);
    } catch (ParseException e) {
        return null;
    }

    return new Person(id++, name, date);
}

And finally write the data using JPA to a database:

@Override
public void writeItems(List list) {
    System.out.println("writeItems: " + list);
    for (Object person : list) {
        em.persist(person);
    }
}

We’re just going to deploy the application as a web archive. Note the inclusion of the following files:

/META-INF/batch-jobs/myJob.xml
/META-INF/persistence.xml
/META-INF/create.sql
/META-INF/drop.sql
/META-INF/mydata.csv
  • The myJob.xml file is needed for running the batch definition.

  • The persistence.xml file is needed for JPA configuration, create schema, load-data and drop schema.

  • The create.sql file has the necessary database schema for the data.

  • The drop.sql file has the required commands to drop the database schema created.

  • The mydata.csv file has the data to load into the database.

@Deployment
public static WebArchive createDeployment() {
    WebArchive war = ShrinkWrap.create(WebArchive.class)
            .addClass(BatchTestHelper.class)
            .addPackage("org.javaee7.batch.chunk.csv.database")
            .addAsWebInfResource(EmptyAsset.INSTANCE, ArchivePaths.create("beans.xml"))
            .addAsResource("META-INF/batch-jobs/myJob.xml")
            .addAsResource("META-INF/persistence.xml")
            .addAsResource("META-INF/create.sql")
            .addAsResource("META-INF/drop.sql")
            .addAsResource("META-INF/mydata.csv");
    System.out.println(war.toString(true));
    return war;
}

In the test, we’re just going to invoke the batch execution and wait for completion. To validate the test expected behaviour we need to query the Metric object available in the step execution.

The batch process itself will read and write 7 elements of type Person. Commits are executed after 3 elements are read.

@SuppressWarnings("unchecked")
@Test
public void testBatchCSVDatabase() throws Exception {
    JobOperator jobOperator = BatchRuntime.getJobOperator();
    Long executionId = jobOperator.start("myJob", new Properties());
    JobExecution jobExecution = jobOperator.getJobExecution(executionId);

    jobExecution = BatchTestHelper.keepTestAlive(jobExecution);

    List<StepExecution> stepExecutions = jobOperator.getStepExecutions(executionId);
    for (StepExecution stepExecution : stepExecutions) {
        if (stepExecution.getStepName().equals("myStep")) {
            Map<Metric.MetricType, Long> metricsMap = BatchTestHelper.getMetricsMap(stepExecution.getMetrics());

            (1)
            assertEquals(7L, metricsMap.get(Metric.MetricType.READ_COUNT).longValue());
            (2)
            assertEquals(7L, metricsMap.get(Metric.MetricType.WRITE_COUNT).longValue());
            (3)
            assertEquals(3L, metricsMap.get(Metric.MetricType.COMMIT_COUNT).longValue());
        }
    }

    Query query = entityManager.createNamedQuery("Person.findAll");
    List<Person> persons = query.getResultList();

    (4)
    assertEquals(7L, persons.size());
    (5)
    assertEquals(jobExecution.getBatchStatus(), BatchStatus.COMPLETED);
}
  1. The read count should be 7 elements. Check MyItemReader.

  2. The write count should be the same 7 read elements.

  3. The commit count should be 4. Checkpoint is on every 3rd read, 4 commits for read elements.

  4. Confirm that the elements were actually persisted into the database.

  5. Job should be completed.

Share the Knowledge

Find this sample useful? Share on

There's a lot more about JavaEE to cover. If you're ready to learn more, check out the other available samples.

Help Improve

Find a bug in the sample? Something missing? You can fix it by editing the source, making the correction and sending a pull request. Or report the problem to the issue tracker

Recent Changelog

  • Dec 14, 2014: Switch from polling on jobexecution (for job completion) to polling with joboperator and executionid by Scott Kurz
  • Oct 29, 2014: Fixed #269. npe loading the csv file from the glassfish embedded regular classloader. changed to use the thread context classloader by Roberto Cortez
  • Jul 05, 2014: Removed header license for batch xml files by Roberto Cortez
  • Jun 22, 2014: Removed header license. the licensing is now referenced in the license file in the root of the project by Roberto Cortez
  • Jun 20, 2014: Added fqn to java ee api references to generate direct links to javadocs by radcortez
  • Jun 19, 2014: Documentation clarifications and typos by radcortez
  • Jun 18, 2014: Added documentation to chunk-csv-database project by radcortez
  • Dec 31, 2013: Code style issues by Roberto Cortez
  • Dec 31, 2013: Removed servlets and jsp's by Roberto Cortez
  • Dec 24, 2013: Changed person entity id generation to be manually assigned by Roberto Cortez
How to help improve this sample
The source code for this sample can be found in the javaee7-samples GitHub repository. The first thing you need to do is to get the source by downloading the repository and then go into the samples folder:
git clone git://github.com/javaee-samples/javaee7-samples.git
cd javaee7-samples/batch/chunk-csv-database/

Do the changes as you see fit and send a pull request!

Good Luck!