Quick Way to Join Across Two Databases

A recent hack / sketchy 1st product draft required datasets that live in two different database to be joined together to do some analysis on. Initially thoughts of denormalization or key/value lookup crossed peoples minds but wanting a quick win to try and get the idea validated as quickly as possible, I kept on digging.

As always stackoverflow provided the start of the journey of discovery with something called dblink.

dblink to the rescue!

dblink is a hidden gem of a extension that made doing a join across two databases (weather there local or remote) a total breeze. Firstly the extension needs to be installed on the database with the following command.

1
  CREATE EXTENSION dblink

Once added to your database using it requires three pieces of information

1
2
3
  SELECT * FROM dblink('CONNECTION STRING',
    'QUERY STRING')
  AS t( CASTING RESTULS )

Connection String

I personally use this format of string as it allows easy access to both local and remote DBs.

postgres://PG_USER:PG_PASSWORD@HOST:PORT/DATABSE_NAME'

eg

postgres://pguser:abcd@127.0.0.1:5432/test_db'

Query String

This is query you wish to run on the remote machine. It can be any valid SQL.

1
'SELECT title, price, category FROM products WHERE shop_id = 99'

Casting the results

When the results from the query are returned to local machine postgres doesn’t know the column types of the received result set and thus needs to be told about them

1
AS t(title varchar, price int, category varchar)

Full call example

So putting those three things together the final call would be

1
2
3
  SELECT * FROM dblink('postgres://pguser:abcd@127.0.0.1:5432/test_db',
    'SELECT title, price, category FROM products WHERE shop_id = 99')
   AS t(title varchar, price int, category varchar)

Allowing you to manipulate this results just as it they were from a local database table.

Comments