Connecting to Databases

Getting external data from a database from the model.predict function is straight forward. To ensure that you are able to connect to a database and successfully query it, we recommend you break down your connection and requests into a series of functions:

  • A function to connect to the DB
  • A function to automatically reconnect to the DB in case the connection is dropped
  • A function to ensure that the query is successful and is not empty

The code below shows an example of how to do this for a PostgreSQL DB. Connections to other types of databases should be similar to below.

We'll use RPostgreSQL for R and psycopg2 for python to connect to an external database:

library(RPostgreSQL)

#function to connect to a database
connectToDB <- function() {
  write("reconnecting to database...", stderr())
  dbConnect(PostgreSQL(), user='username', password='password',host='hostname.com',dbname='my_db',port=5432)
}

# this function will automatically reconnect to the database if the R
# connection drops for some reason
pg.con <- NULL
safeQuery <- function(q) {
  tryCatch({
    dbGetQuery(pg.con, q)
  }, error = function(e) {
    write(paste("error querying postgresql:", e), stderr())
    pg.con <<- connectToDB()
    dbGetQuery(pg.con, q)
  })
}

# this will run a query and then inspect the results (or lack of results)
# and return an appropriate response
queryDB <- function(q) {
  result <- safeQuery(q)

  # if result is null, then there was a very bad error
  # else if result was empty, return no data, else return the result
  if (is.null(result)) {
    "error running query"
  } else if (nrow(result) == 0) {
    "no data returned from query"
  } else {
    result
  }
}

# these are 3 samples that should illustrate how the functions above work
queryDB("select * from not-a-real-table;")
queryDB("select * from users where 1 = 2;")
queryDB("select * from users limit 1;")

Python

import psycopg2

#create a DB function
def get_db_data():
    #try connecting to the DB; this statements will timeout after 3 seconds
    try:
        conn = psycopg2.connect("
                                dbname='db'
                                user='username'
                                host='db.company.com'
                                password='password'
                                options='-c statement_timeout=3000'")
    except:
        print "Unable to connect to the database"

    q = """select * from metrics limit 1;"""
    cur = conn.cursor()
    try:
        #as an example of the timeout, the "pg_sleep" statement will fail
        #cur.execute("select pg_sleep(2000)")
        cur.execute(q)
        data = cur.fetchall()
    except:
        print "failed to execute query"
    # close our connection
    conn.close()
    return data

##continue doing model stuff

results matching ""

    No results matching ""