On Java Development

All things related to Java development, from the perspective of a caveman.

Parsing XML Data Elements Using Apache Digester

without comments

Introduction

This post presents Apache’s Digester project which implements the SAX API. While Digester is not as fast as alternatives, it is faster than DOM based parses and consumes less memory. Digester is very flexible and data elements can be mapped to any data object e.g. ArrayLists, Maps, Entities, etc.

Presented are a couple of examples showing how Digester can be used to isolate data to be persisted as entities in a database.

 

The XML

Presented below is the XML segment to be processed. The information relates to doctors (prescribers of medication) and their related specialties.

The goal of the XML parsing is to isolate the information needed to create both prescriber and specialty records that can be correctly identified using the prescriber-id element. The prescriber record will contain the name and address information while the specialy records contain only the specialty.

Looking at each of the three groups of prescriber-id data, we see there is one specialty for the first and 2 each for the 2nd and 3rd group. At the end of the XML parsing, there should then be 5 records and each should contain their related prescriber-id.

 

Parsing XML for the Prescriber Records

Presented below is the DigesterPrescriberParser class that will parse the data elements related to the name and address information that are to be persisted to the Prescriber file (INGPSP) using Hibernate.

The logic reads like this;

  • (Line 38) – Create an instance of the Digester class so that parsing rules can be established allowing it to store each data element for the prescriber record.
  • (Line 44) – As the first rule, when the first record of the XML file is read, create an instance of the DigesterPrescriberParser. Using Java Reflection, this gives the digester object access to the methods within it, namely, the methods addRecordToList and getListOfRecords. How they will be used is shown later.
  • (Line 47) – When the element for prescriber-text is read, instruct the digester object to create an instance of class INGPSP,
    which is the entity class for the prescriber file of the same name.

  • (Line 50-58) – For each element representing the field data for the presciber file, call the appropriate setter method of INGSPP.
  • (Line 62) – Finally, when the ending prescriber-text element is read, instruct the digester object to call the addRecordToList method of class DigesterHealthDetailParser. This essentially adds a fully populated INGPSP entity object to an array list that will be read when the processing of the XML completes.
  • (Line 73-74) – This is where the rules are actually processed and the INGPSP ojects are created and stored. The ArrayList, listOfRecords, will contain these objects.

 
These are the resulting Prescriber records (truncated for display).
PrescriberRecords01

 

Parsing the XML for the Specialy Records

Presented below is the DigesterSpecialtyParser class that will parse the data elements to be placed into each field of the Specialty file (INGSPP). To be able to create each specialty record, the prescriber-id must be captured once and included into a field of every potential specialty record that are to be persisted to the Specialy file using Hibernate.

The logic reads like this;

  • (Line 28) – Create an instance of the Digester class so that parsing rules can be established allowing it to store each data element for the specialty record.
  • (Line 35) – As the first rule, when the first record of the XML file is read, create an instance of the DigesterSpecialtyParser.Using Java Reflection, this gives the digester object access to the methods within it, namely, the methods addRecordToList and getListOfRecords. How they will be used is shown later.
  • (Line 38) – When the element for prescriber-text is read, instruct the digester object to create an instance of class ArrayList. This arraylist will hold the arraylists containing the value for prescriber id in element zero and the related specialty descriptions. Once the specialty descriptions have been accumulated into the arraylist, it is added to the array list created before.
  • (Line 38) – Create the 2nd arraylist to hold the Prescriber ID and the Specialty Description, previously described.
  • (Line 42-43) – This code calls the add method of the arraylist to add thePrescriber ID to the 2nd arraylist.
  • (Line 50) – This adds the ArrayList object containing the related Prescriber IDs and Specialty Descriptions to the first ArrayList.
  • (Line 53) – Call the method to save this ArrayList so that it will be available to the caller when needed.

Shown below is a debugging session image that shows the relationship between the first ArrayList prescriberIDList, that contains 3 ArrayList objects that represent the 3 groups of Specialty Descriptions. Element zero of these ArrayList contains the Prescriber ID.

prescriberIDList

 
This shows the data elements of each ArrayList.

Now that the data elements for each Specialty record have been isolated and associated with the Prescriber ID, it is easy to transfer the elements into the correct Hibernate entity and then written out to the file. Shown below is the list of records as presented by a SQL Select statement. The field SPPRID contains the Prescriber ID and the field SPSPCL contains the value for the Specialty.

 
These are the resulting Specialty records (truncated for display).
SpecialtyRecords01

 
Summary
This post presented Apache’s Digester, a flexible XML parsing engine and how it can be used to isolate element data to create Entity objects to be persisted to a database. For additional information about XML anatomy, see this post.

Written by admin

August 15th, 2014 at 11:56 am

Posted in XML

Leave a Reply

You must be logged in to post a comment.