When creating models for your data (in for example Django), make sure you have a good idea how the data is going to be used. For example, I recently rewrote a model that was being used to store multiple key/value pairs for a given user.
The model looked something like:
Pair(models.Model) user=models.ForeignKey(User) key=models.CharField() value=models.CharField()
There are multiple “Pairs” per person (i.e. say about 8). In this case, the client loaded all Pairs for the beginning of a session for a given user:
The client never asked for a specific k/v pair. In this case, my opinion is that the model was over-designed in the sense that it could support individual key/value querying if necessary in the future. But this has some negative consequences.
When the client stores a user’s prefs, it could send in i.e. 8 k/v pairs all at once, and we would have to store them using a for loop.
for pair in pairs: p = Pair(key=pair['key'], value = pair['value']) p.save()
This is not good because there are potentially lots of writes happening. Combine this with a large # of concurrent users = inefficient and unnecessary # of writes to the DB and using up DB connections if you’re trying to be conservative.
This is where mongo comes in. Mongo is probably a better choice here since the client makes up its own K/V pairs and the backend doesn’t really care what the keys and values are. It just needs to store them for the user. And mongo lets us do that.
Your model could be like:
Pairs user = int
Don’t forget to index by user and make sure schemaless is enabled.
Now, if you have a lot of data, you may not want to migrate everything all at once. You could do a lazy migration, where basically instead of using just your standard RDB calls, you combine them with mongo calls to check if the user’s data has already been migrated and migrate the data if it needs to be, otherwise use mongo’s data. This will work for the users that use your application after you put in this code, but doesn’t guarantee inactive users to get their data migrated to mongo. What you could do is write a data migration file to get run via South, but you might want to put this off til later if you have lots of unmigrated data and need low downtime during production deploys. Another idea I had was maybe a periodic celery task that slowly migrates unmigrated data. What do you think?