I’ve been working on a project to republish some interesting data (more later…) and had planned to use MySQL for storage. However, after figuring out the (somewhat horrid) data structure, I realised this would require >10 tables, and be generally horrible.
So, I wondered if there might be a better option. I started by asking the lazyweb if there was an easy, no-effort option. Unsurprisingly there wasn’t, so I didn’t make much progress — but then, we started using CouchDB for a work project, and it occurred to me that that might be a good fit. Queryable and efficient for a reasonably large dataset (>140k records) but with a flat, schema-free structure, and supporting revisions natively. Winning.
I installed CouchDB and managed to add some data to it, and view it. I also had a play with design documents, or views, or whatever the right term is (the nomenclature is a bit confusing). I understand MapReduce but I’m not sure how it’ll apply to this, if at all. In any case, this is my todo list:
- Display all the records in a gigantic paginated list
- Allow all the data to be downloaded in a giant, horrible CSV file
- Launch version 1!
- Allow records to be searched simply (keyword search over all the fields)
- Allow records to be searched complicatedly (with multiple conditions operating on fields separately)
- Allow records which are matched to be exported to a slightly less giant but still quite horrible CSV file
- [Maybe] Add other data formats
- [Maybe] Add email alerts and other useful bits to notify people of changes to things they care about
All of this should be easy, but since I’m finding CouchDB a bit confusing at the moment, I’m not sure it will be — at least, not until I’ve gotten over the initial hump of learning something new. This is as far as I got towards making some actual progress — the lightbulb moment being that I can pass ?key=, ?startkey= and ?endkey= straight into this query without having to do anything special:
{
"_id" : "_design/example",
"views" : {
"by_gender" : {
"map" : "function(doc) { emit(doc._id, doc.gender)}"
}
}
} |
I did find some useful resources for CouchDB: