Friday, June 22, 2018

ODBC and SQL

Data Driver Difficulties

Round about 1990, my brother's friend John had a company called Paradigm that was writing drivers for Microsoft.  Structured Query Language (SQL) was becoming very popular, with Oracle the clear leader.  There was a problem, however.  It seems that every time you came up with a new application that needed to access data, you had to write a new driver for every database you wanted it to access. If you had, say, 5 applications that needed access to data and 5 databases that you wanted them to access, you had to write 25 drivers.  If the numbers became 10 and 10, you needed 100 drivers. This simply would not scale!

SQL Access Group (SAG) Call Level Interface (CLI) - Alias ODBC

Microsoft was not alone in struggling with this.In 1989 the X-Open Open Access Group had formed a specialized standards body called the SQL Access Group, with the initial members being Oracle Corporation, Informix, Ingres, DEC, Tandem, Sun and HP.  Microsoft joined round about 1990.

This group came up with a call level interface (CLI) that they defined for SQL databases.  It was intended to allow programming languages to access a single driver written a database vendor.  With this approach, if you had the 5 apps and 5 DBs listed above, you would need 5 drivers, and your apps would all need to have an interface provided that would talk to the drivers. 5 drivers instead of 25. 10 drivers instead of 100.

The SQL Access Group finally came up with a name for their standard. They called it Open DataBase Connectivity, or ODBC.  In addition to the CLI, this interface expected you to support some level of SQL. There was Minimum, Core and Extended. We implemented Core.

John's company was contracted to write drivers for all of Microsoft's supported ISAM databases, including comma-separated text files, DBase, Excel, and others.  This meant that they had to write a full SQL engine for these, as comma-separated text files had no SQL processing capability. John was working with a gentleman named Jim, who took on the task and did an excellent job.

In an interestingly twist, when a company called Fincentric wanted to convert their application development environment from using ISAM to SQL Server, Jim also wrote the inverse, taking ISAM commands and converting them into efficient SQL statements.

ODBC for PICK

In 1991, before ODBC even had a name, a number of us met at the India Gate restaurant on Robson street downtown, and discussed the idea of writing a SQL engine and driver for PICK. (We actually recorded notes on a napkin!)  Because of my involvement with Synex and PK Harmony, I was the communication expert.  John was the C/C++ expert with Jim as a very helpful resource on writing SQL engines for non-SQL environments, and my brother and I were the PICK experts.

We decided that my brother Antoon would take on a project working with SQL Server for Vancouver General Hospital (VGH) in order to better understand SQL, so we could determine the feasibility of what we were contemplating.

Laboratory Information Management System (LIMS)

The project was code-named LIMS. The funny thing was that Paradigm was assigned a phone number by VGH that had formerly been for some other department there.  Occasionally, the programmers would get a call from someone who didn't realize the number had been reassigned. They'd answer it with LIMS and there'd be a long pause at the other end, while the caller tried to figure out what department at VGH was called "Limbs"!

As a result of his work on this project, Antoon came up with an initial architecture and design for an ANSI 92 compliant SQL engine prototype, which he began coding in PICK BASIC.

It didn't take too long to get to where "SELECT * FROM SALES" returned a meaningful result set. That was the easy part.

ODBC vs. IDAPI - Oddball vs. Diaper

Borland created their own interface called the Integrated Database Application Programming Interface (IDAPI), which never took off beyond their own databases.  If you ran Microsoft Word's spell checker on IDAPI, you got "diaper". If you ran it against ODBC you got "oddball".

Microsoft ODBC Driver Developer Kit

We were one of the first to get a copy of Microsoft's ODBC Driver Developer Kit. In order to boost productivity, John took their API and re-wrote the header files to support C++, but expose the interfaces as C. I began writing the driver, and developed a transfer protocol that would work with Serial I/O, which was all that most PICK systems would support at that time. I used my experience with Synex, but based on a variation of the Motorola IXO transfer protocol that would work with serial I/O limitation of most PICK systems.

Microsoft had 3 levels of conformance to their API. Core, Level 1 or Level 2. We implemented Level 1 conformance.

Microsoft Releases ODBC 1.0 in 1992

Finally, in 1992, Microsoft released ODBC 1.0. Sort of... They implemented their driver manager so that it would only support their drivers for the first version.  We had been ready to release with them, but found out only weeks before they released that they had decided to implement this limitation.  I scrambled to write an Excel add-in that would allow us to use our driver to load Excel spreadsheets, so our beta customers would be able to use it.

We still managed to get a number of customers up and running with our beta version. And so our ODBC journey was launched.

We Failed the Bozo Test for ODBC

One last funny story for this post: It was an industry inside joke that if you could spell the letter of ODBC in the right order you probably knew what it was.  The unknowing would always refer to it as OBDC.

We put together a marketing brochure, and went through probably about 50 edits.  Somehow, on the very last edit, someone reversed the D and the B and we were so bug-eyed we missed it.  Our brochure went out with OBDC and we couldn't correct anyone who phoned in!

We failed our own Bozo Test!

Sunday, June 17, 2018

The Changing Face of Connectivity

In the Beginning...

One of my earliest customers was First City Trust. While they had started on Microdata, they had begun to move the applications to an IBM mainframe computer.  This was an interesting exercise, as they discovered that it was faster to move data back to the Microdata and get reports off of it from there, than to ask for them on the IBM. They had an army of people (an entire floor) that were focused on "fixing" data that was broken by the application running on the IBM, and getting anything new run up took forever.

On the other hand, they had less than half a dozen programmers, including myself, who were doing these reports and keeping the applications that were still on the Microdata running.  It used to be said that regardless of price "no one ever got fired for hiring IBM".  We'd just shake our heads...

In those days, you had two options for moving data over. One was to print a report and hire someone from the local office temp pool to re-enter it. The other was to write it to a 9-track tape in a compatible format, then read it on the other system.  I did quite a bit of work interpreting COBOL programs that were used to write to the tapes, reading them, converting from EBCDIC to ASCII, parsing them, and then writing them into the appropriate places in the Microdata. Then we'd write a report and usually have it out in under a day.

Let There be Serial Data Transfer Protocols

With the advent of PC Harmony and PK Harmony, we had a new way to move data around.  For the minicomputer world, you could hook up an IBM PC through an RS232 port (also called a Serial Input Output Port (Serial I/O Port, or just Serial Port for short).  These ports initially would allow you to push data through a modem, or directly cabled to a terminal or PC.  Directly connected, you could connect at 9600 baud. This equated in really rough numbers to 8 bits per 10 to 12 baud, so you were getting under 1K per second.  Over a modem, you'd be lucky to get 1200 baud.  A good typist was faster than a modem.

Over time, the quality of the serial ports, terminals and modems all improved, until you could finally get 56K baud modems and 19,200 baud terminals.

The IBM PC came with a serial port interrupt handler in the BIOS that could barely handle 1200 baud on a good day.  In order to get any more speed, terminal emulator writers had to override and replace that interrupt handler with their own that was optimized.

Serial Ports had another problem. They were unreliable. Even when you had hooked the interrupt handler, there were times when enough other interrupts took over the machine and you would drop characters. Or electrical interference, or line quality for a modem would result in corrupted data.  There were parity bits in the low-level protocol that would attempt to help you determine if corruption had happened, but they were typically just even or odd parity. If your corruption flipped two bits, the parity bit would look right, but the character would be bad.

So, in order to ensure that data got through, we developed a proprietary error correcting data transfer protocol.  We checksummed the whole data package, put a length prefix on it, and then checked the checksum at the other end.  We also ensured that every character was echoed before we sent the next one, and would time out and retry.

This was a lot of work, for a relatively slow process, but it beat having someone re-key the data, so it was used by many Business BASIC and Pick BASIC users for many years.

Then There was Netware

For a long time, Novell's Netware was king in the networking world.  I remember when they release a new feature. You could send data over the Netware network, and it would connect to a Serial Port concentrator, that was hooked up to your minicomputer.

That last bit of connectivity typically had the same 9600 baud or maybe 19,200 baud limitation, so it wasn't blindingly fast, and it also had the reliability issues that the minicomputer brought to the table, so you still needed the error correcting protocol, but it had one significant benefit.

Prior to this, if you had a factory, and you needed to put a new terminal in it, you had to run a Serial Cable all the way from your minicomputer to where the terminal needed to be.  It didn't matter how many cables you had already run all that way, you still had to do it.  And Serial I/O was only certified for a certain distance, so unless you paid for expensive repeaters, you were not going to have very good reliability.

Now, you could run a network out to the factory floor, and by dropping a line off the network to a new PC running terminal emulator software, you could add it with very little cabling and very little impact on reliability.  In order to use this network stuff, however, you had to implement a protocol called Netware Asynchronous Communication Services (NACS).

We had a few customers who used this interface, but networking was still fairly new, and required deep pockets to implement. Most Pick users didn't have those deep pockets.

An Explosion of Networks

With Novell initially leading the pack, networking continued to advance. When implementing networks in the early days, you would look at ARCNET, Token Ring, Ethernet, and then you would have to consider topologies. You could choose between Star, Bus, Ring or Tree topology.  While all this was going on, the early internet was being developed.  It used Ethernet, and used a Mesh topology.

Nowadays, the internet uses a combination of Ethernet, Wireless (WiFi), and cellular data technologies. The latter two are gradually outstripping the connected Ethernet for end-user devices.

In the mid-1990s at Liberty, we hired a student called Pardeep, to assist us with support. In speaking with him, he suggested we consider using Winsock libraries to write a TCP/IP socket driver.  Many of the minicomputers we were working with supported TCP/IP based Telnet services, and since he was familiar with the Berkeley Sockets, which Winsock was based on, we asked him to write us a socket driver. Our LWinSock.dll dynamic link library is used to this date at a number of companies including two fortune 500 companies, in our ODBC driver.  We also ported the same driver to JDBC, and used a variation of the code in our OLEDB driver and managed provider.  That little socket driver has seen a lot of use over the years! The beauty of this was that we were able to completely drop all the error correcting protocol and simply acknowledge receipt back to the sender.  We could also send large blocks and it would generally be handled provided we didn't overrun the buffers at the target end.

Was That a Bagle or a Nagle?

One thing that we noticed after a while with our socket driver, was that it would perform very slowly when smaller packets were being sent to the server from the PC. I remember doing careful analysis and concluding that sometime between us requesting a send and the server receiving it, something was introducing a precisely 200 millisecond delay.  It took a lot of digging to unearth the culprit, but we finally did.

When they first introduced Telnet protocol, the implementers discovered that if you had a lot of Telnet users typing characters, by computer terms, there was a large gap between each character, so it would send a character at a time in a packet. The overhead in a packet of 1 byte was huge and wasteful, so a gentlemen with the last name of "Nagle" came up with a strategy. Simply put, as the human typed, the client end of the Telnet program would gather up typed characters and not send them until either a certain number of characters had been gathered, or 200 milliseconds had elapsed since the first character was typed. This was a bit like holding the elevator door and poking your head out to see if anyone was coming before letting the elevator go.

This was called the Nagle algorithm and turned out to be the culprit.

It seems they assumed, if you were opening up a socket to a Telnet server that you would want the Nagle algorithm turned on. We turned it off and the small packets flew fast and furious!  Lesson learned.

Telus PureFibre ™

Today I have Telus PureFibre, and get well over 150 megabits per second. That's roughly 150,000,000 baud!  I can connect wirelessly or with a physical cable, and generally, the other end is the limiting factor for speed.  Reliable data transfer is pretty much a given. If your wireless connection goes down, you will get an error and may have to restart your transfer, but in general, if your network stays up, the data will get there, reliably and fast.

Hotels often provide internet connectivity, but then have hundreds of guests sharing the same point of presence, and it's generally fat enough for a family of 5 with 2 watching Netflix. It doesn't handle hundreds of users at once.  That said, if you can find a decent Starbucks or other point of presence, you can generally get good, fast, connectivity today.

I know that new protocols and new technologies are in the works.  I can't wait to see what the future holds!