Wednesday, December 16, 2015

MongoDB's BI Connector and pushdown of SQL WHERE clauses

[EDIT 05-Apr-2018: MongoDB BI Connector version 2+ uses a much more rich and powerful approach for pushing down SQL clauses to the database - for more info see here]

In previous posts I showed how to use SQL & ODBC to query data from MongoDB, via Windows clients and Linux clients. In this post, I want to explore what happens to the SQL statement when it is sent to the BI Connector and onto the MongoDB database. For example, is the SQL WHERE clause pushed down to the database to resolve?

Again, for these tests, I've used the same MOT UK car test results data set as my last two posts.

As I wanted to get a better insight into what MongoDB is doing under the covers to process SQL queries, I used the Mongo Shell to enable profiling for the MongoDB database holding the MOT data.


Then on my Linux desktop client, using the ODBC settings I'd configured in my last post, I fired up isql ready to start issuing queries against the MOT data set, via the BI Connector.

I submitted a SQL statement to query all cars with a recorded mileage of over 500,000 miles, selecting specific columns only.

> SELECT make, model, test_mileage FROM testresults WHERE test_mileage > 500000;


The results were correctly returned in isql, but I was more interested to see what MongoDB was asked to do on the server-side, to fulfil this request. So using the Mongo Shell I queried the system profile collection to show the last recorded entry, displaying the exact request that MongoDB had received as a result of the translated SQL query.

> db.system.profile.find({ns: "mot:testresults"}).sort({$natural: -1}).limit(1).pretty()


As you can see in the output, the BI Connector has indeed pushed down to the database, the WHERE clause, plus the projection to return only specific fields. The profiler output shows that this has been achieved by the BI Connector, by assembling an Aggregation Pipeline.

This is great to see. Most of the work to process the SQL query is being done at the database level, reducing the amount of data (less rows and less columns) that is returned to the ODBC client for final processing, and also enabling database indexes to be leveraged for maximum performance.

Song for today: End Come Too Soon by Wild Beasts

Tuesday, December 15, 2015

Accessing MongoDB data using SQL / ODBC on Linux

[EDIT 05-Apr-2018: MongoDB BI Connector version 2+ uses a different mechanism for connecting to (not using a PostgreSQL driver) - for more info see here]

In my previous post I showed how generic Windows clients can use ODBC to query data from MongoDB using the new BI Connector. I'm not really a Windows type person though and feel more at home with a Linux desktop. Therefore, in this post, I will show how easy it is to use ODBC from Linux (Ubuntu 14.04 in my case) to query MongoDB data using SQL. I've used the same MOT UK car test results data set as my last post.

First of all I needed to install the Linux ODBC packages plus the PostgreSQL ODBC driver from the package repository.

$ sudo apt-get install unixodbc-bin unixodbc odbc-postgresql

Then I googled how to use ODBC and the PostgreSQL driver to query an ODBC data source. Surprisingly, I didn't find much quality information out there. However, looking at the contents of the "odbc-postgresql" package....

$ apt-file list odbc-postgresql

...it showed some bundled documents including...

/usr/share/doc/odbc-postgresql/README.Debian

Upon opening this text file I found pretty much everything I needed to know, to get going. I really should learn to RTFM more often!

The next step, as documented in that README, was to register the PostgreSQL ANSI & Unicode ODBC drivers in the /etc/odbcinst.ini file, using a pre-supplied template file that contained the common settings.

$ sudo odbcinst -i -d -f /usr/share/psqlodbc/odbcinst.ini.template

Then I needed to create the /etc/odbc.ini file where data sources can be registered. Here I used a skeleton template again.

$ sudo cat /usr/share/doc/odbc-postgresql/examples/odbc.ini.template >> ~/.odbc.ini
$ sudo mv .odbc.ini /etc/odbc.ini

However, this particular template only includes dummy configuration settings, so I needed to edit this file to register the remote MongoDB BI Connector's details properly.

$ sudo vi /etc/odbc.ini

$ cat /etc/odbc.ini 
[mot]
Description         = MOT Test Data
Driver              = PostgreSQL Unicode
Trace               = No
TraceFile           = /tmp/psqlodbc.log
Database            = mot 
Servername          = 192.168.43.173
UserName            = mot
Password            = mot
Port                = 27032
ReadOnly            = Yes
RowVersioning       = No
ShowSystemTables    = No
ShowOidColumn       = No
FakeOidIndex        = No
ConnSettings        =

( forgive my poorly secure username/password of "mot/mot". ;) )

Now I was ready to launch the "isql" (Interactive SQL) tool, that is bundled with the Linux/UNIX ODBC package, to be able to issue SQL queries against my remote MongoDB BI Connector data source.

$ isql mot mot mot

( first param is data source name, second param is username, third param is password )

As you can see in the screenshot below, using "isql" I was then able to easily issue arbitrary SQL commands against MongoDB and see the results.



And that's it. Very simple, and I now have a powerful command line SQL tool at my disposal for further experiments in the future.  :-)


Song for today: Swallowtail by Wolf Alice


Friday, December 11, 2015

Accessing MongoDB data from SQL / ODBC on Windows, using the new BI Connector

[EDIT 05-Apr-2018: MongoDB BI Connector version 2+ uses a different mechanism for connecting to (not using a PostgreSQL driver) - for more info see here]

The latest enterprise version of MongoDB (3.2) includes a new BI Connector to enable business intelligence, analytics and reporting tools, that only "speak" SQL, to access data in a MongoDB database, using ODBC. Most of the examples published so far show how to achieve this using rich graphical tools, like Tableau. Therefore, I thought it would be useful to show here that the data is accessible from any type of tool, that is capable of issuing SQL commands via an ODBC driver. Even from Microsoft's venerable Excel spreadsheet application. Believe it or not, I still come across organisations out there that are using Excel to report on the state of their business!

For my example, I loaded a MongoDB database with the anonymised MOT tests results data, that the UK government makes freely available to use. As an explanation for non-residents of the UK, MOT tests are the annual inspections that all UK road-going cars and other vehicles have to go through, to be legal and safe. There are millions of these car test records recorded every year, and they give a fascinating insight into the types and ages of cars people choose to drive in the UK.

First I loaded the CSV file based MOT data sets into a MongoDB 3.2 database, using a small Python script I wrote, with each document representing a test result for a specific owner's car for a specific year. Below is an example of what one test result document looks like in MongoDB:


I then followed the online MongoDB BI Connector documentation to configure a BI Connector server to listen for ODBC requests for the "mot" database and translate these to calls to the underlying MongoDB "testresults" collection. I just used the default DRDL ("Document Relational Definition Language") schema file that was automatically generated by the "mongodrdl" command line utility (a utility bundled with the BI Connector).

Then, on a separate desktop virtual machine running Windows 10, I downloaded the latest PostgreSQL ODBC driver installer for Windows and installed it.



With the ODBC driver installed, I then proceeded to define a Windows ODBC Data Source to reference the MOT database that I was exposing via by the BI Connector (running on a remote machine).





By default the BI Connector (running on a machine with IP address 192.168.1.174, in my case), listens on port 27032. Before hitting the Save button, I hit the Test button to ensure that the Windows client could make a successful ODBC connection to the BI Connector.



With the new ODBC Data Source now configured (shown in screenshot above), I then launched Microsoft Excel so that I could use this new data source to explore the MOT test data.



Excel's standard query wizard was able to use the ODBC data source to discover the MongoDB collection's "schema". I chose to include all the "fields" in the query.



I thought it would be useful to ask for the MOT test results to be ordered by Test Year, followed by Car Make, followed by Car Model.




Finally, upon pressing OK, Excel presented me with the results of the SQL/ODBC query, run directly against the MOT test data, sourced from the MongoDB collection.


Excel, then gave me many options, including settings to say whether to periodically refresh the data from source, and how often. I was also able to open Microsoft's built-in Query Builder tool to modify the query and execute again


That's pretty much it. It's straight forward to configure a Windows client to access data from MongoDB, via MongoDB's new BI Connector, using ODBC.


Song for today: Dust and Disquiet by Caspian