Thursday, July 14, 2011

pymongo and index creation

For my current project, I'm working with a pretty big document db - 100M documents so far, and I am considering scaling it to 25Billion over the next few weeks.

The indexes on the document collections take several minutes to create or ensure. By default, mongo locks the collection while the index is "ensured". For smaller collections, it is no big deal. But for a rebuild that takes up to an hour, the blocking operation is a big problem. Turns out mongodb supports background rebuilds, but it wasn't clear that the PyMongo driver exposed it. PyMongo's documentation on collection.ensure_index didn't mention backgrounding. However, the driver has a passthrough for keyword args, so based on mongodb's documentation for backgrounding, I just tried it. Seems to work!

