Sunday, July 1, 2018

ODBC for PICK - Why You Shouldn't

Fundamental Mismatches

There were a number of very key, fundamental mismatches between PICK databases, and the ODBC model.  Enough in number and severity that many would say it's impossible, or "don't even try", but the group of pioneers that assailed this opportunity were not deterred by a few minor challenges.  Here are some of the things that we had to address.

Two Main Parts of ODBC

ODBC consisted of two main components. One was a SQL syntax, the other was an implementation of a Call Level Interface (CLI), which is much the same as an Application Programming Interface (API).  As an implementer of a driver, you could declare yourself as supporting one of three levels of each.  In our case we implemented the middle level of both.

Structured Query Language (SQL)

This was the biggest one, and I'll address it better in a separate post, but will touch on it here.  ODBC assumed SQL. While there were ways to bypass that, most tools that worked with ODBC would assume a certain level of SQL, and if you reported that you didn't support it, they'd just give up. So, if you wanted to support ODBC, you had no choice. You had to support SQL.  At the time that ODBC was released, while it was gaining popularity rapidly, both ISAM and Multivalued databases still outnumbered SQL databases.

SQL demanded certain norms, including that it was strongly typed and that it abhorred multivalues.  Multivalue databases like PICK and MUMPS (which was the other popular multivalued system, especially in health care) were, as the name implies, designed to work with multivalues. Again, I'll cover this in more detail elsewhere.  In short, the data models supported by each were very widely different.

Strong vs. Weak Data Typing

In PICK, everything is a string.  If anyone is familiar with XML, there are some definite similarities.  In XML you can have numbers (but they are simply strings containing numeric characters), and you can even have a schema that declares a certain XML element to be numeric, but in the end, if you don't enforce the schema in code, you can put what you want in that XML element. 

PICK had similar structures. In PICK, much like an XML schema, you had dictionaries. It was quite common over time, for PICK dictionaries to collect garbage. You'd have several conflicting dictionaries pointing at the same attribute (the PICK name for a field), and only one of them was really correct, or perhaps none of them were correct!  And it was not uncommon for a file's dictionary level to be empty.  Dictionaries were not enforced, but were useful for doing LIST, SORT, SELECT or SSELECT commands.  LIST and SORT were for creating user reports and SELECT or SSELECT were for activating a list of unique primary keys (called item-ids in PICK terminology.)

The CLI portion of ODBC, meanwhile, was designed for programming languages designed to produce machine code, and perform machine-level calls (hence "Call Level Interface").  This included Assembler, C, C++, and COBOL among other languages.  These languages were all strongly typed.  You would declare a variable as being a string (null terminated - no 16-bit Unicode in the early days - that came later) or an integer. The CLI had its own data types and would even declare the size of the storage for an integer.  The CLI had details of the sizing and precision of each data type, which you had to map to your machine's (or compiler's) data types.  This guaranteed that regardless whether your machine called a 32-bit integer a small int or a long int, you knew how big the data coming back from, or going into, the CLI had to be, and therefore could allocate enough space for it.  This allowed for interoperability between machine types, which was a key requirement for the standard.

Multi-threaded vs. Single User

PICK was a legacy system, originally built on the premise that a user was connecting to an expensive user license through a single, predefined serial port that ran through the building right to their terminal (or PC with a terminal emulator, like PK Harmony or Wintegrate.)  There was no multi-tasking and definitely no multi-threading in that system.

While ODBC didn't require you to support multi-threading, most applications using it expected to open multiple connections to do their work. Now, if you had two serial ports, you could open two connections, but this cost a lot, and required you to run another serial cable to your PC.  Even when we went to networked connections, the license cost for an extra PICK user was prohibitive, so this was another issue we needed to resolve.

Authentication and Authorization

In a pre-public-internet, pre-malicious-hacker age, we were very much aware of the security issues that we were likely to raise, and took these very seriously.  Although it was only a reality for a small number of users, network connectivity was already a reality for some, and we knew it was only a matter of time before it became the norm. We already had people dialing up systems over the telephone network, and I had seen people using dial-up, Prime-Net and then Internet (pre-commercial), and get connected to the wrong Prime Information system. This system was a NORAD system and had no password on the SYSPROG account (the PICK equivalent of root!) We logged off very quickly!

PICK's authentication and authorization model at that time was very weak and not designed for a broad network-connected world.

Performance

We were very much aware that we were likely to run into interesting performance problems for a number of reasons:

Data Architecture Mismatch

We were forcing round pegs into square holes. There were fundamental differences in how SQL and Multivalue databases architected their data and trying fool PICK into thinking it was SQL was likely to present some interesting challenges for performance.

Data transfer speed

When we started this, most PICK systems still used Serial I/O for their connectivity. There were solutions from Netware that allowed you to use the network, but the last bit of the connection was through a serial port concentrator, so you were still limited to the maximum speed of the serial I/O.  This was exacerbated by the fact that most PICK I/O channels designed for terminals would fall over if you burst data into them too quickly.

Challenge Accepted!

I've never found a challenge like this to be discouraging, but rather invigorating, and I've been blessed to work with many people who share that enthusiasm for solving difficult problems in elegant ways, so we got to work addressing the issues.  Challenge Accepted!

No comments:

Post a Comment