CouchDB + Sinatra

Irritatingly, I can’t get on to the machine that has my nice, populated CouchDB install, and the script for importing from CSV. So I’m a bit stymied for today.

But I did find useful things:

  • A Ruby library for CouchDB which looks like exactly the right thing (Also, its API docs)
  • A bunch of other libraries on the same site which could come in handy
  • And, by just trying it out, I learned that it’s really easy to use regexps with Sinatra:
    get /data\/(.*)/ do |record|
      "I'm fetching #{record}"

Making an API for a dataset using CouchDB

I’ve been working on a project to republish some interesting data (more later…) and had planned to use MySQL for storage. However, after figuring out the (somewhat horrid) data structure, I realised this would require >10 tables, and be generally horrible.

So, I wondered if there might be a better option. I started by asking the lazyweb if there was an easy, no-effort option.¬†Unsurprisingly¬†there wasn’t, so I didn’t make much progress — but then, we started using CouchDB for a work project, and it occurred to me that that might be a good fit. Queryable and efficient for a reasonably large dataset (>140k records) but with a flat, schema-free structure, and supporting revisions natively. Winning.

I installed CouchDB and managed to add some data to it, and view it. I also had a play with design documents, or views, or whatever the right term is (the nomenclature is a bit confusing). I understand MapReduce but I’m not sure how it’ll apply to this, if at all. In any case, this is my todo list:

  1. Display all the records in a gigantic paginated list
  2. Allow all the data to be downloaded in a giant, horrible CSV file
  3. Launch version 1!
  4. Allow records to be searched simply (keyword search over all the fields)
  5. Allow records to be searched complicatedly (with multiple conditions operating on fields separately)
  6. Allow records which are matched to be exported to a slightly less giant but still quite horrible CSV file
  7. [Maybe] Add other data formats
  8. [Maybe] Add email alerts and other useful bits to notify people of changes to things they care about

All of this should be easy, but since I’m finding CouchDB a bit confusing at the moment, I’m not sure it will be — at least, not until I’ve gotten over the initial hump of learning something new. This is as far as I got towards making some actual progress — the lightbulb moment being that I can pass ?key=, ?startkey= and ?endkey= straight into this query without having to do anything special:

  "_id" : "_design/example",
  "views" : {
    "by_gender" : {
      "map" : "function(doc) { emit(doc._id, doc.gender)}"

I did find some useful resources for CouchDB: