So GridFS, MongoEngine & Django walk into a bar …
October 18th, 2010 — 5 Comments — Permalink
Unfortunately this post is not an absurd nerdy joke about three buzzwords in a public house. Although I could tell one about sin(x), cos(x), and ex ...
MongoEngine v0.4
So, MongoEngine 0.4 has just been released and I'd like to write a bit about one new feature in particular. The latest and greatest release of MongoEngine includes support for MongoDB's GridFS storage engine. GridFS is an exciting technology which allows the storage of files directly within a MongoDB database. This means your files get the benefits of replication and sharding, just like the rest of your data. Once the files are in the database, serving them via Nginx is simple with Mike Dirolf's nginx-gridfs module.
This functionality is implemented as a field called FileField. This new field behaves as a file-like object, allowing for natural use with other Python code. Reading and writing data to this new field is as easy as reading and writing from a regular file:
class Painting(Document): artist = StringField() date = DateTimeField(default=datetime.now) photo = FileField() thumbnail = FileField() my_painting = Painting(artist='Steve') my_painting.photo = open('my_painting.jpg', 'r') my_painting.thumb = open('my_painting_thumbnail.jpg', 'r').read() my_painting.save()
The great thing about these FileFields is that since they are file-like objects, we can pass them to (almost) anything that accepts files. In the above example I used a separate image as the thumbnail. I could actually just generate the thumbnail from the original photo using PIL and save directly to another FileField:
# Open and save original image filename = 'my_painting.jpg' photo = open(filename, 'r') my_painting.photo = photo # Use original image to create thumbnail pil_image = Image.open(filename) pil_image.thumbnail((80, 80), Image.ANTIALIAS) # Stream new thumbnail into the thumb FileField my_painting.thumb.new_file() pil_image.save(my_painting.thumb, 'jpeg', quality=85) my_painting.thumb.close() my_painting.save() my_painting.photo.delete()
Deletion of files is just as you would expect. The delete() method can be called on a FileField to remove any stored object. It is important to note that the FileField actually only stores the ID of a file in a separate GridFS collection. This means that deleting a document with a defined FileField does not actually delete the file. You must be careful to delete any files in a document as above before deleting the document itself.
The FileField also allows for storage of arbitrary metadata such as content_type or filename. The put() method allows for metadata to be stored using the same call as the file:
# Storage my_painting.photo.put(photo, filename=filename, content_type='image/jpeg') # Retrieval type = my_painting.photo.content_type name = my_painting.photo.filename
Files can be replaced with the replace() method. This works just like the put() method so even metadata can (and should) be replaced:
another_painting = open('another_painting.png', 'r') my_painting.photo.replace(another_painting, content_type='image/png')
Integration with Django
Since many people will be using this functionality with Django, it was a natural extension to complement the FileField with a custom storage backend. It's called GridFSStorage and works like this:
# Create a GridFS based filesystem fs = mongoengine.django.GridFSStorage() # Attempt to save a new file called hello.txt filename = fs.save('hello.txt', 'Hello, World!')
Just like the default Django storage backends, the save() method will try to save your file with the specified filename, and if it can't then a new it will be saved under something else and returned. For this reason, it is important to save the returned filename and use it to refer to saved files later on.
GridFSStorage implements all of the current relevant calls in the Django File Storage API.
>>> fs.exists('hello.txt') True >>> fs.open('hello.txt').read() 'Hello, World!' >>> fs.size('hello.txt') 13 >>> fs.url('hello.txt') 'http://your_media_url/hello.txt' >>> fs.open('hello.txt').name 'hello.txt' >>> fs.listdir() ([], [u'hello.txt'])
Serving GridFS files
So once you've got your files into MongoDB you'll likely want to get them back out again as quickly as possible. There are a number of ways to do this but the simplest is to use Nginx with the nginx-gridfs module. Like all Nginx modules, this must be compiled in when Nginx is build. A simple configuration to serve files from the paintings_db collection would go something like this:
location /gridfs/ { gridfs paintings_db field=filename type=string; }
There are several benchmarks floating around that compare the different methods for serving GridFS files, but it really comes down to balancing simplicity against speed, and for most purposes I think the above will do nicely. If you have anything that is getting particularly high traffic then you'll want to look into offloading some of the work to a dedicated CDN anyway.
Documentation
That's pretty much all there is to the GridFS functionality in MongoEngine so far. You'll find the documentation over at mongoengine.org and as always, the code is on Github for you to hack around with. I look forward to hearing what you do with this and to the improvements that will inevitably be submitted.
MongoEngine 0.4 also includes a bunch of other good stuff including a completely rewritten q-objects implementation, Geospacial support, and new queryset operators.
It's on PyPi so you can upgrade with pip install -U mongoengine and try out some of these new features right now. Get it while it's hot!
Discussion
Comments on this post have now been closed.
Hi,
Is it possible to serve gridfs files using the django default webserver?
In normal django setup, a file associated to a FileField can be reached as such:
class MyPhoto(models.Model): pic = FileField(upload_to='user/photo')
http://localhost:8000/media/user/photo/myphoto.jpg
How do I serve mongoengine FileField in similar fashion?
Thanks!
Well it would be hideously inefficient and it is recommended that you only have Django serve media during development. But if you insist, you could serve a GridFS file in the same fashion as django.contrib.staticfiles does by constructing your own response object. Since FileField is a file-like object you can substitute the relevant line with something like this:
If you construct your own
serveview then you should be able to use it just like the default with the following in yoururls.py:Hi Steve,
How to use FileField with GridFSStorage together in Django? I tried to do something like this (http://docs.djangoproject.com/en/dev/topics/files/#the-built-in-filesystem-storage-class) by passing fs to FileField but end up with error.
fs = mongoengine.django.GridFSStorage()
class Car(models.Model):
Could you provide some example on how to use GridFSStorage with FileField inside a Django model?
Thanks, DV
Hi Steve, great work here.
I'm trying to use the GridFSStorage with django and seem to be having an issue. The files will upload and save just fine, but when I delete them using the django admin, it seems to jack up mongo until I manually remove the FileDocument.objects.
I posted this issue on stackoverflow, but haven't gotten any responses: http://stackoverflow.com/questions/5041996/django-with-pluggable-mongodb-storage-troubles
If you have the time, would you see if I'm just doing something stupid? Thanks!
Thank you, I come back