Updating GCD Data

So you have loaded the Grand Comicbook Database into a local postgresql instance and wrote some code that makes use of the data... They just did a new data dump... Now how do you update your copy of the data?

Prep the data

Do the steps in "create mysql clean up script" and "dump data to tab separated value files" steps.

Now copy this python script:

#!/usr/bin/env python

"""
update gcd data that's prep'ed in /tmp/gcd_dump
"""

import os, glob
from pprint import pprint
import psycopg2, psycopg2.extras

table_names = [os.path.splitext(os.path.basename(fp))[0] for fp in glob.glob('/tmp/gcd_dump/*.txt')]

conn = psycopg2.connect("dbname='gcd' user='postgres'")
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)

def sql_logger(sql):
    print sql
    cur.execute(sql)

constraints = []
for ii in table_names:
    sql_logger("""
select t.constraint_name, t.table_name, t.constraint_type,
  c.table_name as c_table_name, c.column_name as c_column_name, k.column_name as k_column_name
from information_schema.table_constraints t,
  information_schema.constraint_column_usage c,
  information_schema.key_column_usage k

  where t.constraint_name = c.constraint_name
    and t.constraint_name = k.constraint_name
    and t.constraint_type = 'FOREIGN KEY'
    and c.table_name = '%s'
  """ % ii)
    for row in cur:
        constraints.append(dict(row))

sql_logger('begin')
for ii in constraints:
    sql_logger('alter table %s drop constraint %s;' % (ii['table_name'], ii['constraint_name'],))

for table_name in table_names:
    sql_logger("DELETE FROM %(table_name)s" % locals())
    sql_logger("COPY %(table_name)s FROM '/tmp/gcd_dump/%(table_name)s.txt'" % locals())

for ii in constraints:
    sql_logger("""
ALTER TABLE ONLY %(table_name)s 
  ADD CONSTRAINT %(constraint_name)s 
    FOREIGN KEY (%(k_column_name)s) REFERENCES %(c_table_name)s(%(c_column_name)s) DEFERRABLE INITIALLY DEFERRED;
""" % ii)

sql_logger('commit')

You'll have to run this as the postgres user just as before. It records the FOREIGN KEY CONSTRAINT, drops them, deletes the old data, copies in the new, and recreates the constraints, all in one transaction! Eat that MySQL.

Feb. 2, 2010 (1 month ago) | Tags: django,python | loading comment count...

django_loader.py

I got tired of putting

import os, sys
sys.path.append(<django project parent dir>)
sys.path.append(<django project dir>)
os.environ['DJANGO_SETTINGS_MODULE']='<django project name>.settings'

at the top of all my scripts that do command line things with my django models. So I share with you 'django_loader.py'. Note the use of traceback to figure out what file is importing 'django_loader.py'.

""" 
Put this in your python path.  At the top of your script put 'import
django_loader'.  This will start with the directory your file is in and
search through it and it's parent directories until it finds a file named
'settings.py'.  It will then add that directory and it's parent to your
sys.path, and set DJANGO_SETTINGS env var.  
"""

import os, sys, traceback

class CouldNotFindSettings(StandardError):
    pass
def find_settings(current_dir):
    if current_dir == '/':
        raise CouldNotFindSettings
    if 'settings.py' in os.listdir(current_dir):
        return current_dir
    return find_settings(os.path.dirname(current_dir))
def load(filepath):
    django_project_dir = find_settings(os.path.dirname(filepath))
    django_project_name = os.path.basename(django_project_dir)

    sys.path.append(os.path.dirname(django_project_dir))
    sys.path.append(django_project_dir)
    os.environ['DJANGO_SETTINGS_MODULE']='%s.settings' % (django_project_name,)

current_filepath = os.path.normpath(os.path.join(os.getcwd(), traceback.extract_stack(limit=2)[0][0]))
load(current_filepath)

Jan. 21, 2010 (1 month, 2 weeks ago) | Tags: django,python | loading comment count...

Now playing in 2010

XBox 360

  • Dead Rising - I've been playing this off and on since 2008. This last weekend I made a huge amount of progress, but I might have to start over due to saving when I was almost out of time for a mission. Punishing difficulty, but it's more fun that way.
  • Dragon Age - There's no way I'll finish before Mass Effect 2 comes out this month, and I like the Mass Effect story more. I suspect this one will be around for a while.
  • Left 4 Dead 2 - It's intense, but difficult to get the right people for a good game at expert. Clearly I need a second TV and xbox to put in the family room.

D&D

Just finished a 3 year once a week campaign. My character destroyed the universe. Sorry about that guys.

Started a new campaign as a Binder/Bard based way too closely on Eddie Riggs from BrĂ¼tal Legend. I know it's cheep to copy, but it's fun. Since the world he lives in looks more like the a heavy metal cover from this world his album covers will have folks in offices, staring at computers.

Jan. 11, 2010 (1 month, 3 weeks ago) | Tags: video-games | loading comment count...

Why Django? Why Postgres?

I use Django and Postgres at home, because I use Rails and MySQL all day at work. Working in totally different solutions to the same problem keeps you fresh.

Jan. 11, 2010 (1 month, 3 weeks ago) | loading comment count...

How to convert data in a MySQL database to Postgresql

To do this you need both mysql and postgresql running on a local computer. You probably want this to be a local workstation that you have superuser access to. We are going to use features in mysql and postgres that makes the database daemon read and write to local files.

We'll use django's schema format deal with the difference between postgresql and mysql. We'll use tab separated value (TSV) data files as the interchange format between databases. Mysql has a different idea of how to escape newlines and carriage returns than Postgresql so we'll use a quick and dirty python script to clean that up.

While this should work in many different OS's, I did this on a Ubuntu, so the details might be a bit different.

Let's start with the most recent data dump from the Grand Comicbook Database

load data into mysql

mysqladmin -uroot create gcd
mysql -uroot gcd < pub_dec21_schema_innodb.sql
unzip pub_dec21_data.zip
mysql -uroot gcd < pub_dec21_data.sql

create django project

django-admin.py startproject grandcomicdb
cd grandcomicdb
chmod +x manage.py
./manage.py startapp gcd
# edit settings.py to add gcd to INSTALLED_APPS
# edit settings.py to set up connection to mysql
./manage.py inspectdb > gcd/models.py
# edit gcd/models.py to make the fk quoted, and add relative_name's

create mysql clean up script

cat >> fix_mysql_tsv.py << EOF
#!/usr/bin/env python

# this will not work for very big files.

import sys
ff = open(sys.argv[1], 'r').read()
ff = ff.replace('\r', '\\r')
ff = ff.replace('\\\n', '\\n')

open(sys.argv[1], 'w').write(ff)
EOF
chmod +x fix_mysql_tsv.py

dump data to tab separated value files

mkdir /tmp/gcd_dump
chmod 777 /tmp/gcd_dump
mysqldump -uroot -t --tab /tmp/gcd_dump gcd
find /tmp/gcd_dump -type f -exec ~/web/grandcomicsdb/fix_mysql_tsv.py \{\} \;

create postgres database with schema derived from the mysql database

sudo -s -u postgres
createuser gcd --pwprompt --no-createrole --no-createdb
createdb gcd -O gcd
exit
# edit settings.py to set up connection to postgresql
./manage syncdb

create postgres database and load data

sudo -s -u postgres
psql
BEGIN;
COPY gcd_language FROM '/tmp/gcd_dump/gcd_language.txt';
COPY gcd_country FROM '/tmp/gcd_dump/gcd_country.txt';
COPY gcd_brand FROM '/tmp/gcd_dump/gcd_brand.txt';
COPY gcd_publisher FROM '/tmp/gcd_dump/gcd_publisher.txt';
COPY gcd_indicia_publisher FROM '/tmp/gcd_dump/gcd_indicia_publisher.txt';
COPY gcd_story_type FROM '/tmp/gcd_dump/gcd_story_type.txt';
COPY gcd_series FROM '/tmp/gcd_dump/gcd_series.txt';
COPY gcd_issue FROM '/tmp/gcd_dump/gcd_issue.txt';
COPY gcd_story FROM '/tmp/gcd_dump/gcd_story.txt';
COMMIT;

Jan. 11, 2010 (1 month, 3 weeks ago) | Tags: django,python | loading comment count...

Using Ruby's SVN bindings

I couldn't find the simplest example of using Ruby's SVN bindings. Here's something simple, get the info on some file in a local working directory.

require 'svn/client'

ctx = Svn::Client::Context.new
ctx.add_simple_provider
ctx.info('some file in your svn working dir') do |path,info|
  p path
  p info.last_changed_rev
end

This page is also useful.

Sept. 25, 2008 (1 year, 5 months ago) | Tags: ruby | loading comment count...

Rosetta Stone Project Caribbean

This should give you an idea of what it’s like to work at Rosetta Stone… the CEO just emailed this to everyone:

Sept. 20, 2008 (1 year, 5 months ago) | loading comment count...

Currently Playing

Like you care…

XBox 360

  • Dead Rising (love it)
  • GTA IV
  • Mass Effect
  • Beautiful Katamari
  • Half Life 2 (replaying for the achievements)
  • Castle Crashers (need to try online coop)
  • Braid (hurting my brain)

PS2

  • Champions of Norrath 2 (waiting to visit Jason and hack our way thru)
  • God of War (suck in that room of blades on a grid that needs timing that I can’t do)

Wii

  • Mario Galaxy

Sept. 12, 2008 (1 year, 5 months ago) | Tags: video-games | loading comment count...

Validing with an XSD in ruby

    xml = generate_xml
    require 'xml'
    Tempfile.open(self.class.to_s) do |tmp|
      tmp.write(xml)
      tmp.close
      document = XML::Document.file(tmp.path)
      schema_doc = XML::Document.file("some.xsd")
      schema = XML::Schema.document(schema_doc)
      assert document.validate(schema), "the xml isn't valid.  look above for error."
    end

Sept. 6, 2008 (1 year, 6 months ago) | loading comment count...

A review of ruby s3 librarys

gem install s3sync

Has the right idea about how to store files, but the lib it uses does not abstract things enough. Does not use http keep alive to avoid the slow startup of tcp. Does not have an easy way to iterate over all keys in a bucket, but this helps to do this:

  def each_object(bucket_name)
    next_marker = nil
    while true do
      response = CONN.list_bucket(bucket_name, {'marker' => next_marker, 'max-keys' => 10})
      response.entries.each do |s3obj|  
        yield s3obj
      end
      break unless response.properties.is_truncated
      next_marker = response.entries.last.key
    end
  end

gem install aws-s3

Again no built in way to iterate over all keys. Has problems with '/' at the start of file names.

gem install right_aws

Uses the same http connection, can iterate over all keys with a single method (though, it makes a full array of all the keys rather than allowing you to supply a block). Here's my thumbnailer:

  require 'rubygems'
  require 'right_aws'    
  require 'RMagick'
  require 'pp'

  s3 = RightAws::S3.new(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)

  picture_bucket = s3.bucket('OurPictures')
  thumbnail_bucket = s3.bucket('OurPicturesThumbnails')

  picture_bucket.keys.each do |key|
    thumbnail_key = RightAws::S3::Key.create(thumbnail_bucket, key.name)
    next if key.name !~ /.jpg$/i
    next if thumbnail_key.exists?
    image = Magick::Image.from_blob(key.data).first
    image.change_geometry!('256x256>') do |cols, rows, img|
      img.thumbnail!(cols, rows)
    end
    thumbnail_key.put(image.to_blob, 'private')
    p thumbnail_key.full_name
    image = nil
    GC.start
  end

May 31, 2008 (1 year, 9 months ago) | Tags: ruby | loading comment count...

RailsConf 2008: Saturday Eaily Morning

(mostly randy here again. I was putting up my first ec2 instance. :)

Saturday Keynotes

  • funny video from the RailsEnvy guys about testing
DHH introducing Jeremy Kemper
  • Jeremy has done a buttload of work lately in the Rails core
  • dhh on a coding vacation?
  • some people think about one part of rails Jeremy thinks about the whole thing
Jeremy’s talk about Rails 2… where it’s going, etc
  • “it’s all about resources”
  • “we shed a lot of fat” – split things off into plugins
  • “we gained speed” – “i’m not concerned about rails being super super quick” — huh???
  • 1600 patches… it’s hard to read all that code! so we embraced git and lighthouse instead of svn and trac
  • Rails 2.1
    • refactoring
    • documentation
    • thinner + faster
    • (he looks exhausted… too much free five runs beer and cheese pies)
    • he wants rails to look pretty.
    • use rubyprof
    • merging migrations
    • making timezones fitter, happier, more productive
    • migrations now have change_table block for modifying tables (just like create_table)
    • gem dependencies
    • improved memcache… making it a first-class citizen in rails. memcache-client now bundled with rails
    • much of what he’s covering Myers read in this tutorial on rails 2.1
    • dirty – AR now knows when an attribute has changed
      • message.body_changed?
      • message.body_was
      • message.changed?
      • enables partial updates
    • smarter :include — it’s not as eager to join on the first query. this seems like it will be slower, but benchmarking supposedly proves that querying the join table on an as-needed basis is faster
    • named_scope
      • so you can say user.messages.recent instead of user.recent_messages by saying named_scope :recent, :order -> 'created_at desc’ in the Message model
    • Message.scoped(:limit => 10)
    • jruby -S jetty_rails (run rails on jruby)
    • rbx script/server (run rails on rubinius)
    • rails now runs on ruby 1.9
  • rails 2.1 out today (a gem update at 10:14am didn’t do anything tho… it will be later… so not a Steve Jobs “and you can buy it right now”)

May 31, 2008 (1 year, 9 months ago) | Tags: railsconf2008 | loading comment count...

RailsConf 2008: Friday Night Keynote

DHH keynote

  • surplus of productivity
  • company’s needs that special in the problems we encounter
  • We ceded flexibility
  • “people like choices a lot more than they like to choose” – dhh, just now
  • why is this not in the framework, someone make this choice for me
  • we decided tech matters
  • “great people rarely fail because of poor technology” – but come on, do you want to just “not fail”?
  • we cared about us
  • “ruby is designed to make programmers happy” – matz
  • the surplus will not last forever
    • why? the mainstream will copy rails (dhh doesn’t think so)
    • dramatic alternative arrives (dhh doesn’t think so)
    • rails becomes the mainstream
  • business as usual – which is blowing your surplus, running at 110% all the time
  • another day, another fire
  • Lost in the mechanics – when running at 110% means you can’t see anything else
  • fatigued, disinterested, passionless
  • it’s just a job – most depressing statement
  • one place to invest that will pay back: you
  • 1:10 programmer productivity
  • no one is born a rock star programmer, you become it
  • recharge tangentially – or some other word, do something else besides sit in front of a computer all day: making spoons, play the banjo, fly a plane
  • can’t just focus on one muscle
  • speaking about taking people to the next level: sleep more (applause)
  • stop to read paper
  • suggested reading
    • my job went to india (and all i got was this lousy book)
    • implementation patterns
    • innovator’s dilemma
    • tufte’s envisioning information
  • all this helps you judge what is valuable or not
  • is this lack of a seam on the iphone what matters.
  • program less
  • when you have only 10 hours to program a week, you know what matters
  • sometimes good to start a project from scratch
  • share: you benefit from the sharing
  • “the purpose of playing this game well is to be able to get the best position of the next game” – alistair cockburn, 1999
  • 4 day work week means more focus
  • summary: this surplus is not going to last forever don’t blow it all on hookers and fur coats

May 31, 2008 (1 year, 9 months ago) | Tags: railsconf2008 | loading comment count...

RailsConf 2008: Friday Early Afternoon

Dialogue Concerning the Two Chief Modeling Systems

  • it’s a Play

(again you should read the slides)

  • choosing the right name will make the dev think about this model and give it the right property
  • Jim keeps saying we’re skipping layers
  • using story cards
  • jammed up over reoccurring events
  • objects = data + behavior, so you can’t just talk about the data (rows/tables), you must look at behavior
  • pull out the Class, Responsibilities, and Collaborator cards
  • card only useful for organizing your thoughts not need to fill them all out
  • “temporal expressions”
  • CJ Date: apparently wrote a lot about data modeling
  • Code Smell in Refactoring by Chad Fowler

Flexible Scaling: How to Handle 1 Billion Pageviews – TJ

  • WoW: you see all kinds of human behaviors: Mafia, Philantropis
  • Building games on Facebook
  • He’s the author of Warbook: Rise of the Infernals
  • w/i a week he had to fix stuff
  • w/i 3 weeks had to rewrite
  • started with firebug’s net tab
  • look at your logs: pl_analyze
  • iostat
  • you need tools, but you also need strategies
  • don’t need it? ditch it.
  • slowing it down? simplify it.
  • logging it? stop
  • selecting it? cache it.
  • memcache
  • put sessions in cache
  • no-select design
    • use memcache
    • cache_fu already does this
  • using ec2
  • 1 db box
  • 1 memcache box
  • 1 static file
  • X mongrel boxes
  • The Hard Part
    • Scale Everything Else
    • Scale your deploymnet
      • use capistrano
    • Scaling your support
    • community management
      • give them updates every day
  • server cost $2000 a month
  • you need money
  • warbook makes $100,000/month
  • 1.5 million users
  • 16 million page views
Q&A
  • remove transactional saves
  • save per fields
  • how to solve the persistence problem on ec2? db not on amazon
  • which facebook lib? started with rfacebook, facebooker, bebo

Slides

May 30, 2008 (1 year, 9 months ago) | Tags: railsconf2008 | loading comment count...

RailsConf 2008: Friday Late Morning

Entrepreneurs On Rails – Dan Benjamin

  • Successful biz fills a need
  • What do you need
    • paper work important, expressing the idea idea is better, being flexable on that idea
    • energy
  • so much you have to do before hand, why are you doing this?
  • set goals – we need to do x by y, if we don’t get there we’ll change things
  • have a path to cash
  • have an exit strategy
  • why create a company
    • Liability – legal protection from your bugs :)
    • Taxes – more buying power, talk to an accountant
    • working with other companies
    • easy of working with other companies
    • Ownership – knowing who gets what
    • perception of credibility – but don’t try to be what you are not (cd of office noises for phone calls)
  • types of business
  • Fictitious Name
  • LLC is a good idea
  • You don’t have to do this in Delaware
  • hire lawyer/accountant
  • your website should be very clear about what you wanted
  • we build things we need, but if you look at say moms with kids at home you have a much bigger audience
  • marketing. peepcode logo (myers: I’ve always thought it looked like a lingerie ad)
    • you will spend 40% or more of you time marketing
  • making it work is the hard part not the (unknown) paper work
  • hard part: adjusting the the lack of stability
  • common to be in a feast or famine situation
  • co-working is a way to get an office with others
  • creativity zone
  • biz deals goes like this
    • NDA – worth it for a 10k deal to have your lawyer look at it
    • proposal – 40 pages shortest he ever did
    • contract – you should write the contract, here’s what I’ll do, he’s what I’m liabile and not liabile for
    • Functionality Outline – evolution of a statement of work
    • getting the money – net 30, net 60, net 15, net 0 – put it in your contract, also “there is a 2% fee for late payments”
  • products: it’s about liability
  • TOS/privacy policy – be clear, be up front
  • ways to get money (see slides)
  • on hivelogic he’ll post a sample SOW

Surviving the Big Rewrite: Moving YELLOWPAGES.COM to Rails

  • biggest website at&t runs
  • all rails
  • 1 year ago 1/2 as big
  • (long slides)
  • why a big rewrite? – it’s a great bundler
  • no automated tests, new features really hard
  • lots of code being replace with site redesign
  • hard to leverage
  • java: get around this web thing with design patterns so you get to the real business of talking to middleware
  • devs except that not everything they want will get done
  • core team never more that 4 – trying to keep it small
  • they looked at django and ejb3/jboss
  • no to django
    • better automated testing integration (hear hear)
    • more platform maturity
    • clearer path to C if necessary for performance (I don’t agree with that. Python + C is easy, plus you have psyco and ctype that let you get C performance or use c libs with pure python)
    • developer comfort and experience
  • not convinced anyone needs MOM
  • only one developer that knew rails at the start
  • project got stuck
  • project lead appointed to make decision-making and communication with executive team, or at least the appearance
    • sometimes she decided in private with her bosses
  • freeze current site
  • if it’s not simple to decide how to change a current site behavior, don’t change it. save it for a later phase.

(it’s worth it to read the slides, even for a non rewrite project. The slides are probably have too much on them, but that’s good for you)

  • they spent an amazing amount of time communicating what was changing.
  • F5 Load Balancers
  • switched to erubis in web tier

May 30, 2008 (1 year, 9 months ago) | Tags: railsconf2008 | loading comment count...

RailsConf 2008: Friday Early Morning

(notes from mostly Randy and a little from me)

Chad Fowler

  • Reflecting on the history of Ruby/Rails conferences of old
  • Lost Last Ep. apparently glitched last night, thus everyone downloading Lost

DHH

  • actually saying something nice about Joel
  • used to take Joel’s columns to meetings to say “this is what we should be doing”

Joel Spolsky

  • slide of Angelina Jolie… the only slide… trying to get a higher evaluation score
  • now showing a Brad Pitt picture… so he’s not a male chauvinist pig
  • slide of ian somerhalder… another good looking guy, but brad beats him in a google fight
  • he lies about the number slides
  • brad is ipod, ian is a zune
  • blue chip vs off brand
  • blue chip chair is herman miller… lots of knockoffs… like the ones we have at work…
  • Angelina is blue chip… Uma Thurman is off brand
  • Brad is blue chip… Iam is off brand
  • Great software
    • makes people happy
    • obsesses over aesthetics
    • observes the culture code
  • Joel does lolcats
  • goes through a scenario about how it sucks to install updates on windows…
  • the ui sucks
  • the progress bars are misleading
  • the system now doesn’t recognize one of your devices.
    • You can’t unplug the device. Tell us First. Fuckhead.
    • this experience sucks… makes people unhappy
    • Learned helplessness is a book about how lack-of-control makes people unhappy
  • just do something that makes you feel in control
    • Abercrombie & Fitch checkout example
  • doesn’t give you a choice over how you checkout… it is a 4 stage process, and you have to do it in their order
    • Amazon example
  • you can change your information, you can do whatever you want in any order… you are in control
    • Tips for making people happy
  • Put the user in control
  • Positive feedback
    • Tips for obsessing over aesthetics
  • iphone is way more popular than samsung blackjack
    • but iphone is slower, has a smudge screen rather than a keyboard, doesn’t have exchange compatibility,
    • but the iphone is beautiful… “if you accidentally swallowed it, it would go right down”
  • in paris, they don’t have fire escapes, because they are ugly… it’s more important to look good than to survive a fire
  • it works… people prefer something that looks good over something that is better
    • Tips for observing the culture code
  • the ford explorer has 88 deaths per year
  • the toyota camry has 41 deaths per year… it’s not safer to be in an SUV, actually… even though people THINK they feel safer
  • SUVs have soft corners, airbags everywhere, cupholders, and are up high. they trick us into feeling better and safer.
    • web 2.0 don’t have visions, they go to parties
    • Misattribution: when you have coffee, you enjoy the movie more… it’s not the movie, it’s the coffee!
    • ends the talk playing “Sweet Home Alabama”... that was a good talk!

(it was interesting, because he didn’t have a final point. At the end he compared some blog posts that DHH and _why wrote about ruby and it had all these good words (passion, love, ...). He swapped them to their antonym and labeled it Java. I expected him to follow with telling us that there was a common misattribution with rails and these feelings and that we could find the same else where)

May 30, 2008 (1 year, 9 months ago) | Tags: railsconf2008 | loading comment count...