With Set functions, you can use the Data Mapper to sort, filter and manipulate your mapped data in various ways. Some of the Set functions vary considerably depending on the Node and Context inputs you provide. In this section we will go through an example in which we use a few of the more complex Set functions, outlining how to get effective use out of the Context and Node inputs.
Look at the following input source data, from which complies with the books_source.xsd schema:
Source XML |
Copy Code
|
---|---|
<library> <section name="Fiction"> <book> <title>War and Peace</title> <author>Leo Tolstoy</author> <publication>1869</publication> <isbn>1234567890</isbn> <note>Abridged</note> <note>Harper Collins</note> <note>Classics</note> <room>342</room> <room>12</room> </book> <book> <title>Pride and Prejudice</title> <author>Jane Austen</author> <publication>1813</publication> <isbn>2345678901</isbn> <note>Penguin</note> <room>45</room> <room>109</room> <room>45</room> </book> <book> <title>Animal Farm</title> <author>George Orwell</author> <publication>1945</publication> <isbn>3456789012</isbn> <note>1st Edition</note> <note>Random House</note> <room>50</room> <room>120</room> </book> </section> <section name="Non-Fiction"> <book> <title>Testament of Youth</title> <author>Vera Brittain</author> <publication>1933</publication> <isbn>0987654321</isbn> <note>Biography</note> <room>12</room> </book> <book> <title>Meditations on First Philosophy</title> <author>Rene Descartes</author> <publication>1641</publication> <isbn>9876543210</isbn> <note>Philosophy</note> <note>Latin</note> <room>43</room> <room>67</room> </book> <book> <title>The Origin of Species</title> <author>Charles Darwin</author> <publication>1872</publication> <isbn>3456789012</isbn> <note>6th Edition</note> <note>Science</note> <room>11</room> </book> </section> </library> |
Notice that the source has multiple nested elements. We are going to use this data to demonstrate the Set functions in the Data Mapper, with particular focus on Filter, Sort and Distinct, although we will cover some of the other options along the way. Our target schema is books_target.xsd:
The data models library books for a school or college. Here is the Mapper with source and target imported:
First let's filter the data while we map it. Let's say that we only want to map items representing books published since 1800. Drag a Filter component onto the Mapper.
The Filter function takes two inputs. The Nodes input represents the items you want to filter, and the Bool input represents a boolean indicating the result of a test. Since we want to only map those books published since 1800, we need to create a conditional. Each time the Mapper iterates through the data, when it encounters a "book" element, we want it to check the "publication" value to determine whether it is at least 1800. To accomplish this, drag a Greater Than or Equal component from the Comparator section onto the Mapper.
Now we need to represent the value of 1800, so drag a Constant Value from the Data Type section onto the Mapper.
Right-click the Constant and choose Show Properties. Choose an integer type from the Data Type drop-down list and enter "1800" as the Value.
Connect the "publication" item in the XML Reader to the Value 1 input in the Greater Than or Equal function, then the Constant output to the Value 2 input.
Now the output of the conditional Greater Than or Equal test will be true if the publication date for a book is at least 1800, false if not. Connect the output of the Greater Than or Equal function to the Bool input of the Filter function. Remember what it is that we want to filter depending on the date: the "book" element. Connect the "book" output in the XML Reader to the Nodes input in the Filter function. Notice once the Filters Nodes input is connected an output node is created which mirrors the input type.
Now the output of the Filter function is the set of "book" elements with "publication" values greater than or equal to 1800 - those elements for which the conditional test returned a true value. Each time the Mapper encounters a "book" element, it carries out the conditional test. If the test returns true for a particular item, the Filter function will pass it forward as output. If a test returns false, that item will not be mapped as output from the Filter function.
We want to carry out further transformations on the element before feeding it into the XML Writer, so don't connect its output just yet.
Let's now look at sorting the data. Assume we want to sort the "book" elements once they have been filtered. We want to sort them on the "title" item value. Drag a Sort function from the Component Palette onto the Mapper.
The Sort function takes two inputs. The Nodes input represents the nodes you want to sort, and the Sort Key represents the item you want to sort these nodes on. Since we want to sort on the title, connect the Sort Key input to the "title" output in the XML Reader. Connect the Nodes input to the Filter function output so that we sort the "book" items already filtered. We want each "title" value to determine the ordering of its parent "book" element within each "section" - but we are only going to be mapping those "book" elements we have already filtered.
Now connect the Sort output to the "volume" input in the XML Writer.
What we have done so far is mapped the "book" element in the source to the "volume" element in the target, filtering and sorting it at the same time. Although we have used the "title" and "publication" items, we have not yet mapped them. They are purely being used as reference points for the parent "book" element at this stage.
Let's now incorporate the Distinct function. If you look back at the source XML, particularly the second "book" element, you will notice that the list of "room" elements contains duplicate values. Let's assume that this is a consequence of the source application, perhaps due to the data being corrupted in some way, but that we don't want to map any duplicates into our target data. Drag a Distinct function onto the Mapper.
The Distinct function takes two inputs. The Value input represents the item you want to map only unique instances of. The Context input represents the context in which the Value must be unique.
We want to make sure we only map values of the "room" element that are unique within each parent "book" element. In other words, the same "room" element can appear more than once within the data, as long as it doesn't appear more than once for each "book". Connect the "book" output in the XML Reader to the Context input in the Distinct function, since this is the context that the item must be distinct within. Connect the "room" item in the XML Reader to the Value item in the Distinct function.
Now connect the Distinct output to the "used_in" input in the XML Writer, so that only unique values will be included in each "volume" item.
Some of the Set functions allow you to limit the range or number of values that are mapped. Let's assume that we want to impose a limit on the number of "note" items we map. If we want to map only one item, we can use either the First or Last function - these map the first and last items encountered respectively. However, let's use the slightly more complex Top function here. With Top, you can map the first n items. Drag a Top function onto the Mapper.
We need to specify how many items we want to map. Drag a Constant onto the Mapper and right-click it, choosing Show Properties. Enter an integer Data Type from the drop-down list and the number 2 as the Value.
Connect the Constant output to the Count input of the Top function, and the Nodes input to the "note" output in the XML Reader.
Connect the Top output to the "info" input in the XML Writer.
Now we are ready to map the remaining connections. Connect "library" to "dept_library", "section" to "department", "name" to "name", "title" to "title", "author" to "writer", "publication" to "year", and "isbn" to "ref".
We can now execute the transform by pressing Shift-F5 or the Execute button (). The transform is applied and the file we selected as output opens in the editor:
Output XML |
Copy Code
|
---|---|
<dept_library> <department> <name>Fiction</name> <volume title="Animal Farm" writer="George Orwell"> <year>1945</year> <ref>3456789012</ref> <info>1st Edition</info> <info>Random House</info> <used_in>50</used_in> <used_in>120</used_in> </volume> <volume title="Pride and Prejudice" writer="Jane Austen"> <year>1813</year> <ref>2345678901</ref> <info>Penguin</info> <used_in>45</used_in> <used_in>109</used_in> </volume> <volume title="War and Peace" writer="Leo Tolstoy"> <year>1869</year> <ref>1234567890</ref> <info>Abridged</info> <info>Harper Collins</info> <used_in>342</used_in> <used_in>12</used_in> </volume> </department> <department> <name>Non-Fiction</name> <volume title="Testament of Youth" writer="Vera Brittain"> <year>1933</year> <ref>987654321</ref> <info>Biography</info> <used_in>12</used_in> </volume> <volume title="The Origin of Species" writer="Charles Darwin"> <year>1872</year> <ref>3456789012</ref> <info>6th Edition</info> <info>Science</info> <used_in>11</used_in> </volume> </department> </dept_library> |
The Set functions have all been applied as part of the mapping process. Only those "volume" elements whose publication date is at least 1800 are included (Filter function). A maximum of 2 "info" elements are included per "volume" (Top function). Within each "department", the "volume" elements are sorted alphabetically by "title" (Sort function). Finally, the "used_in" elements are all unique within their parent elements (Distinct function).