Friday, September 7, 2018

Adventures In Mapping Multivalues - Complex Data Parsing

Splitting Fields

A common situation in Pick is where you have a compound key in one file that links to another file.  As in the case of the control file, you may have a record with a key like C*RED.  Somewhere in your data, you may store that option value in a multivalue.  You may wish to display the fact that you have selected the colour "RED".  In order to do that, you could reference a dictionary that does a group extract on that field. The group extract "G1*1" says find the first "*" and extract 1 "*" delimited value. In our case it would give you the value "RED".

With this structure, our mapping tool would allow you to simply pick a dictionary that gave you the whole value for the foreign key, but also pick a dictionary that used the group extract to show you that extracted, or computed, value.

Sometimes the logic for doing this would get quite convoluted, and while I have seem massive correlatives in dictionaries, these are hard to maintain and very hard to create properly.  In these cases, our Select Method subroutines had a simple way of allowing you to do this type of processing.  To illustrate this, let's look at this example Pick record:

0001^18486^ABC Company^WIDGET]GIDGET^C*GOLD]C*RED\SZ*SMALL^2]1^
11595^22075

Note that everything up to the first attribute mark (^) is attribute 0 (attributes are zero-based), and the "]" signifies a value mark, and the "\" signifies a sub-value mark.

In order to process this option data, we want SQL to have a field that's a Colour field and a field that's a Size field. No such attribute exists in PICK, so what we could do is start by choosing two attribute positions that are not used by real data.  Let's say we look at the file and there are never more than 23 attributes, and no valid dictionaries reference anything higher than 23. We could leave some room for growth and pick attributes 31 and 32 as dummy attributes.

Next we'd create a dummy dictionary for attribute 31, called Colour and one for attribute 32 called Size. We'd use the comment field of the dictionary to note that it's used by the Liberty mapping for a computed value.  We need the dictionaries for the mapping wizard to create the SQL column mapping.

COLOUR
001 A
002 31
003 Colour
...
009 L
010 10
011 Liberty Mapping Virtual Field

SIZE
001 A
002 32
003 Size
...
009 L
010 10
011 Liberty Mapping Virtual Field


Then we'd create this subroutine:

SUBROUTINE SELECT.METHOD.INVOICE(ID, ITEM, PROCEED)
EQU VM TO CHAR(253)
EQU SVM TO CHAR(252)
* Run through multivalues - invoice lines
FOR M=1 TO DCOUNT(ITEM<4>,VM)
   * Run through subvalues in the current multivalue
   * these are the option values for that invoice line
   FOR S=1 TO DCOUNT(ITEM<4,M>,SVM)
      OPTION=ITEM<4,M,S>
      TYPE=FIELD(OPTION,"*",1)
      BEGIN CASE
      CASE TYPE EQ "C"
         * the multivalue position must match
         * but there will only be one Colour option
         ITEM<31,M>=FIELD(OPTION,"*",2)
      CASE TYPE EQ "SZ"
         * the multivalue position must match
         * but there will only be one Size option
         ITEM<32,M>=FIELD(OPTION,"*",2)
      END CASE
   NEXT S
NEXT M
RETURN
END

The SQL might look like this:

SELECT * FROM INVOICE_MASTER
Invoice_Id Invoice_date Customer...........................
      0001  11 Aug 2018 ABC Company

SELECT * FROM INVOICE_LINES
Invoice_Id Line_Id Product Quantity Price.... Colour. Size.
      0001       1 WIDGET         2    115.95 GOLD    NULL
      0001       2 GIDGET         1    220.75 RED     SMALL

The benefit of this is that you don't need a third view for the options, as in our hypothetical case we know (hypothetically) that any invoice line will only have one of any option type assigned to it.

Mixed Types

One of our customers that did deliveries had a very unique delivery field.  You could put a date and time in, or you could put text in.  The date might be 2018-09-12 15:30:00 or it might be "Tomorrow, any time before noon".  There was value in being able to sort any date fields that were entered as dates, but you also had to deal with text. We simply did a format check, and created two virtual fields. One would have the properly formatted dates and times, the other would have the text.

As you can see, Select Methods were an incredibly powerful tool for mapping multivalues.

Sunday, September 2, 2018

Adventures in Mapping Multivalues - Data Cleansing on the Fly

In my last post, I talked about how we used a powerful feature that we alternately called Select Methods and simply Filters, to solve a problem with control files.  In this post, I'll take you through one of the powerful tricks that Select Methods let you play to control your data.

Select Methods, or Filters, are simply Pick/BASIC subroutines that get called for each record read from the file, before any processing is done on that record.  They have the following signature:

SUBROUTINE SUB-NAME(ID, ITEM, PROCEED)

It is generally recommended that you not change ID, but we won't stop you. ITEM is the item body, and as you will see, changing it is quite acceptable. In the last post we showed a way that you would use PROCEED to indicate if a particular record should be processed for the current view, or skipped.

Data Cleansing on the Fly

To understand the importance of this feature, you need to understand what we've mentioned in previous posts. You can define a dictionary definition for a file, if you wish, but this is merely suggestive. Your BASIC programs, and any number of TCL commands (Terminate Control Language was the command-line for Pick systems) could update any format of data into the file.  Examples are a programmer, fixing bad data, accidentally copies a fixed record into the wrong file, with the wrong format for that file.  A BASIC program with a bug, writes a text string into a numeric field. Or some kind of data corruption occurs and the data in a record becomes gibberish.

In the SQL world, you just can't do that, so we would crash a query if we encountered bad data.  We used to have customers insist that their data was good, so we would run our validator on their files and it would start kicking out reports of bad data. In all the years that I worked with Pick customers, I have NEVER run into one that did not have bad data in at least a few of their key files. In most cases, every file with more than 10,000 records had some bad data.

Customers had two options. One was to fix the bad data our validator found. This was great except that we would always ask how the bad data got there. The first time you got bad data again, the query would crash again.  In cases where data was being corrupted by a program or process unknown, there were a couple of simple fixes that could be done using Select Methods:

The first option would be to use a Select Method as a filter. Let's say that you have two fields, attributes 5 and 7, that need to be numeric.  The following program would detect if the data contained was not numeric and tell the program to skip it. As noted in the comments and the omitted code, you can also write a message with the item-id into a log file so you can fix it later.  This is a better option when you think the record is possibly complete garbage and you really don't want to play with it until you've had a look.

Here is what that subroutine would look like:

SUBROUTINE VALIDATE.MYFILEDATA(ID, ITEM, PROCEED)
IF NOT(NUM(ITEM<5>)) OR NOT(NUM(ITEM<7>)) THEN
   PROCEED=0
   * Optionally add code here to write the ID to an error log
   * file so someone can review it
END ELSE
   PROCEED=1
END

The other option is to cleanse the data on the fly. This is what that might look like:

SUBROUTINE FIX.MYFILEDATA(ID, ITEM, PROCEED)
IF NOT(NUM(ITEM<5>)) THEN
   ITEM<5>=0
END
IF NOT(NUM(ITEM<7>)) THEN
      ITEM<7>=0
END

This option makes more sense if you know what attributes are getting the bad data, but have reasonable confidence that the other data in the record is good.  This latter approach was one of the most common uses of our Select Methods.