The ProblemIn a previous post, I talked about what a Pick database was. Pick had a way of representing complex data that was intuitive and matched the structure of common documents that it was modeling very closely. It was a very human-friendly way to represent that data. Note that this is much the same argument for XML data (that's for another blog post...)
In this post, I'm going to talk about how this might look in a relational database, then I'll talk about two different approaches that Pick uses to represent this kind of data:
- Correlated Multivalues
Our Document DataSo, let's consider the example we gave of our Invoice. Here is how it might look in a printed document, with no consideration of how it actually is stored in the database:
Invoice 0001 Date 11 Aug 2018 Customer ABC Company
Product Options Quantity Price............ Extended...
WIDGET GOLD 2 $115.95 $231.90
GIDGET RED 1 $220.75 $220.75
Note that you have an invoice number, which should be a unique key, a date, and customer, that are all fields that have a single value. Then you have a bunch of fields that have more than one value: Product, Options, Quantity, Price. We also have two calculated fields, one being a subtotal, and the other being the extended price (Quantity times Price), which has multiple values, one for each product line. Note also that the Options value can have multiple values for each product value, so this is a sub-multivalued field, or subvalued for short.
Relational (SQL) Model
SELECT * FROM INVOICE_MASTER
Invoice_Id Invoice_date Customer...........................
0001 11 Aug 2018 ABC Company
SELECT * FROM INVOICE_LINES
Invoice_Id Line_Id Product Quantity Price....
0001 1 WIDGET 2 115.95
0001 2 GIDGET 1 220.75
SELECT * FROM INVOICE_OPTIONS
Invoice_Id Line_Id Option_Id Option
0001 1 1 GOLD
0002 1 1 RED
0002 2 1 SMALL
Let's talk about some points here. In the INVOICE_LINES table, we've added a Line_Id field, which combined with the Invoice_Id would be the primary key (unique identifier). If you knew for sure that you would never repeat a product in another line of your invoice, that any product would be unique within any invoice, you could skip that Line_Id field and use the Product value with the Invoice_Id as the primary key. That assumption would probably bite you. (In a multivalued database, you never have to create artificial keys like this.) The same thing happens twice over in the INVOICE_OPTIONS table. This mess of decisions, keys, and additional values creates code entropy, as programmers are required to do multiple reads, then navigate multiple data sets to process the data. One is generally safer to create a new key in these cases, but then you have a task of managing these keys when you write the data back, or insert (or delete) new invoice lines or options.
The next two fields are the invoice date (number of days since Dec 31, 1967) and the customer name. They both have just one value in them.
The next 4 fields contain multivalues. The first value of each of these fields is the data for line 1 of the invoice. The second value of each of these fields is the data for line 2 of the invoice. For the Options field, the first value contains just that value. The second value contains two subvalues. The first one is the first option line for the second invoice line, the second is the second option line for the second invoice line.
For a Pick BASIC programmer who has read this record into a variable called X, he would access the first invoice line's product with this code:
The second line's product value would be this:
SubfilesA less common, but still often used approach to representing this data is called a subfile. In a subfile, all the detail line data is contained in a single field. Here is how it would be represented:
All of your detail lines are contained in attribute 3. The first multivalue is all of invoice line 1's data. The second multivalue is invoice line 2's data. The first subvalue of each multivalue is always the product, the second subvalue is the options, the third is the quantity and the fourth subvalue of each multivalue is the price.
Note that we ran out of delimiters, so we had to use a comma separator two delimit the second line's two option values.
So here is what a Pick BASIC programmer would do to get the first invoice line's product:
You need to specify that you only want that first subvalue, or you'll get the whole invoice line. That may be handy in itself, but then you'd have to do a second line to get that data out.
The second line's product value would be this:
This line gets a bit more involved. The angle brackets only work with system delimiters, so it retrieves both option values, with the comma separator. The FIELD statement says to pull the 2nd comma delimited value out.
Again, your record is really the entire document. Note also, that the BASIC implementation has data access and handling functionality built in. You don't have to call out to an API as it's all there!
Liberty Mapping TerminologyLiberty ODBC's SQL Mapping layer handled both of these structures very simply.
The Correlated Multivalues were termed a "Group Floating" table. If you were dealing with Correlated Subvalues, you had a "Group Floating within Group Floating" table.
The subfiles were called a "Group Positional" table. By using these structures, you would get the equivalent to the above. You could define SQL columns that were the multivalue or subvalue offsets, effectively giving you the Line_Id and Option_Id fields.
When joining two tables where one was actually contained inside the other, our engine would optimize and read the Pick record only once for efficiency sake.