Saturday, August 8, 2009

Ruby and Postgres

I've been trying to learn Ruby recently (rails comes later), and as an exercise I decided to write an ETL connector to load transaction data into my financial database. getting it to work with Postgres was a few steps so I thought I'd post it here as it's a pretty self-contained example.

First do these steps
  • install postgres-server-dev
  • sudo gem install postgres
  • sudo gem install pg
  • gem contents pg will allow you to view examples



  • Here's my code that converts the csv file and loads it up. I could not figure out how to turn of autocommit (we all know how I feel about autocommit) so I just start a transaction asap. I also couldn't figure out how to see how many rows are inserted with each insert statement like in perl, so I just used the PK. Enjoy!



    require 'rubygems'
    require 'pg'
    raise 'usage: ruby usbank.rb csvfile' if not ARGV[0]

    @conn = PGconn.new({:host=>'localhost',:user=>'DBUSER',:dbname=>'DBNAME'})
    @conn.setnonblocking(true)
    @conn.exec('BEGIN')

    def getid
    @conn.exec('select max(txn_id::numeric)+1 as txn_id from txn')[0]['txn_id']
    end

    file = File.new(ARGV[0], "r")
    file.gets

    begin_id = getid()

    while (line = file.gets)
    @array = line.split(",")
    date = @array[0]
    item = "#{@array[1]} #{@array[2]} #{@array[3]}"
    amount = @array[4]
    amount.chomp!
    puts "#{amount} #{date} #{item}"

    query = "insert into txn (date, amount, cat_id, item, source_id, txn_id)
    select
    \'#{date}\',
    #{amount},
    0,
    \'#{item}\',
    5,
    ( select nextval('txn_id_seq'))
    where not exists (
    select 1 from txn where date= \'#{date}\' and amount=#{amount}
    );\n
    "
    @conn.exec(query)
    end

    end_id = getid()
    puts "#{Integer(end_id) - Integer(begin_id)} updated."

    @conn.exec('COMMIT')
    @conn.finish
    file.close

    turn off autocommit for psql

    Every time I fire up a new instance of postgres, I forget how to turn off autocommit for psql. Searching on it, mostly I found a series of people saying "just begin a transaction." This makes NO sense! All it takes is one time of forgetting to begin a transaction, then one forgotten or ill-formed where clause on your delete statement and you're sitting on the floor crying. I am not sure why this is the default. So here is a post that will hopefully bubble above the People Not Answering The Question.

    Anyways, it's pretty easy. Just

    cd ~
    vim .psqlrc
    and make it look like this
    SET AUTOCOMMIT OFF
    viola! Now you may enjoy the wonders of Postgres (including transactional ddl, my favorite), without having to keep looking over your shoulder.