If your goal is to make a reliable and secure web app, you're going to spend
some time thinking about data integrity. It's a truism that if your users can
put bad data in your database, they will. Django comes with a fair bit of
support out of the box for data validation. A ForeignKey
will create a
database level constraint and the unique_together
attribute gives you the
ability to add some more complex database level constraints, but often you'll
have real world constraints on your data that aren't neatly captured by
Django's built in constraints, but could be enforced by your relational
database.
While you can (and should!) write model and form validators to catch and handle
errors before they become 500 errors, it's a mistake to rely on them to
guarantee data integrity. Django may not be the only application modifying your
database, and even if it is (and you've got no admins with a tendency towards
'fixing' problems using the database console), it's very hard to really
guarantee data integrity with just Django. Model validation provides only very
limited guarantees: it works fairly well for data entry via a ModelForm
, but
validators aren't run on save
, or on queryset update
, or through any of the
other mechanisms Django provides for putting data in the database. Models are
updated and modified through dozens of different code paths, and it's just
plain bad design to try to catch every instance and make sure validation
happens. The database itself is where you should validate data.
Here's the good news: if you're not afraid of writing a line or two of SQL, Django makes it very easy to impose any database constraint you want with just a few minutes work.
So lets suppose we're writing an online liquor store and want to store some user profile information. For legal reasons we'll ask users their date of birth, and only want to allow users who are at least 21 years old to register, and we want to impose a database constraint to guarantee our condition.
We write our usual profile model as usual. Something like:
1 2 3 4 5 6 7 8 9 | # accounts/models.py
from django.conf import settings
from django.db import models
class UserProfile(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL)
date_of_birth = models.DateField(validators=[validate_over_twenty_one])
...
|
Note that we're still doing model validation so our ModelForms will work
smoothly. Next, we generate our migrations; the operations
section should
looks something like:
1 2 3 4 5 6 7 8 9 10 11 12 | # accounts/migrations/0001_initial.py
...
operations = [
migrations.CreateModel(
name='UserProfile',
fields=[
('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
('date_of_birth', models.DateField(validators=[accounts.models.validate_over_twenty_one])),
('user', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to=settings.AUTH_USER_MODEL)),
],
),
]
|
We add our custom SQL after the CreateModel
operation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # accounts/migrations/0001_initial.py
...
operations = [
...
migrations.RunSQL(
"""
ALTER TABLE accounts_userprofile ADD CONSTRAINT is_over_21 CHECK (
date_of_birth <= now() - interval '21 years'
);
""",
"""
ALTER TABLE accounts_userprofile DROP CONSTRAINT is_over_21;
"""
),
]
|
Now we run our migrations and we're done! All it took was 10 lines of extra code, and now we can be completely confident that no user profiles with an age under 21 years can be created. Note that we could write this as a separate migration, but by doing it in this migration we ensure that the database is never in a state where bad data can get in. For this particular example I've used some postgres specific syntax just because it's what I'm familiar with, but something similar should work for most databases. This particular example is a little contrived to keep it simple: having an account with an age under 21 years isn't really an issue with data integrity, but situations where real data integrity concerns exist come up constantly, and this basic technique works most of the time.
The beautiful part about this approach is that Django doesn't need to know
anything about the constraint. If you try to add bad data, you'll get an
IntegrityError
, but that's better than running into a confusing bug down the
line when the bad data bubbles back to the surface. If you can anticipate the
error, you can catch and handle it cleanly, and if you miss the error, at least
you just get a 500 error for the relevant request. Once you get used to doing
this, adding a constraint is 15 minutes of work, and makes debugging and
reasoning about your code much easier.
There's a natural tendency when working in a framework, especially a 'batteries included' framework like Django, to avoid doing anything outside the framework itself. Often that's a good idea. But I would argue that part of the power of Django is that it provides easy low level access to your database when you need it. Django may eventually support more sophisticated constraints out of the box. As I write this there's an 6 year old bug in the Django bug tracker to "Add the ability to use database-level CHECK CONSTRAINTS" that switched from "Someday/Maybe" to "Accepted" six months ago! But we don't have that support yet, and I suspect that a large part of the reason the Django devs have been so slow to add this feature is that it's already there.
If you want to use more sophisticated constraints in your Django database, you can, and if you're not already, you really should be!