If you’ve ever written a server-side service before, you’ve probably realized you have to repeat certain patterns a lot. On every route, maybe you need to pull a user ID or validate that records exist, and return a specific response if any of that fails.

Rather than do that over and over, we can instead use models.This gives us built-in methods to validate data and make our application more secure, our errors more meaningful, and our code more scalable.

In this post, we’ll go over why we use models in our codebases and give example usages for Python using Pydantic. By model, we simply mean a data structure with embedded constraints. A Customer model, for example, defines which fields map to a customer, and their properties, such as whether or not they’re optional. They also allow us to write more complicated validation logic, such as checking for the validity of a phone number, all while centralizing that logic inside the model instead of having it spread around in functions.

We will start by discussing a set of issues faced in Python backend projects, like repeating the same logic over and over in your code, returning unhelpful API error messages, and having undocumented dictionary arguments. Then we’ll walk through a solution: using models to simplify everything.

Improving Validation Issues

Let’s start by considering how many of our routes work. If you’ve worked with Python backends before, you’ve probably manually parsed request arguments at some point. This is what typical web server frameworks recommend out of the box. You might think of of writing a route handler like:

@app.get("/contact/all")
def route_handler(request):
    filter = request.args.get("filter")
    count = request.args.get("count", 10)
    offset = request.args.get("offset", 0)
    
    return get_contacts(filter)

In a basic tutorial, this is simple enough to demonstrate the framework’s design patterns without overcomplicating the task. However, this code does not scale well.

Make Valdiation Scale

In particular, it means manually validating the data in every route. Sometimes, the same validation is even required in multiple places. An example of this might be a get_users route that also takes in a count and offset (which are pretty commonly used for pagination) as well as a filter (which could be different here).

@app.get("/user/all")
def route_handler(request):
    filter = request.args.get("filter")
    count = request.args.get("count", 10)
    offset = request.args.get("offset", 0)
    
    return get_users(filter)

This results in a lot of duplicated code and sometimes trivial validations. For example, consider a more robust version of the above route:

@app.get("/contact/all")
def route_handler(request):
    filter = request.args.get("filter")  # dict
    count = request.args.get("count", 10)
    offset = request.args.get("offset", 0)

    if count <= 0 or offset <= 0 or int(count) != count or int(offset) != offset:
        return ValidationError("count and offset must be positive integers")

    if validate_filter(filter):
        return get_contacts(filter, count, offset)
    else:
        return ValidationError("Malformed filter")

All we’ve done is some simple validation: making sure types are logical and parameters are sensical. Even if this were extracted into separate functions, it would still be unwieldy to write validators manually for each simple data type we expect. Notice how in the following example, we still have repeated code, even though it is less complex.

@app.get("/contact/all")
def route_handler(request):
    filter = validate_filter(request.args.get("filter"))  # dict
    count = validate_int(request.args.get("count", 10), gt=0)
    offset = validate_int(request.args.get("offset", 0), gt=0)


    return get_contacts(filter, count, offset)
    
@app.get("/user/all")
def route_handler(request):
    filter = validate_filter(request.args.get("filter"))  # dict
    count = validate_int(request.args.get("count", 10), gt=0)
    offset = validate_int(request.args.get("offset", 0), gt=0)

    return get_users(filter, count, offset)

Now imagine we have more arguments, or even a json body for post requests. Strings, ints, booleans, dates, lists, dicts. Route handlers will grow out of control, and it doesn’t really make sense to extract the route validation into a function of its own. That logic is specific to this route and won’t be reused anywhere else. For this reason, having a model for each route, with centralized validation based on the type of each arg really cleans things up.

Clearer Validation Errors

With manual data validation, it can be complicated to throw meaningful errors consistently. That is particularly true for complex objects. For example, APIs often expect json bodies like this:

{
   "name": "John Doe",
   "address": {
       "street_name": "152 Test Ave.",
       "zip_code": "12345",
       "state": "Example state",
       "country": "Example country"
   },
   "phone_number": "123-555-1234"
}

Throwing detailed validation errors on what went wrong gets harder with nested data. A quick glance is all you need: if all you wanted to do was check the fields were non-empty, that’s seven fields to check, plus the complication four are contained within another of them. Checking optional vs. required fields is even harder, and having more complex validation on top of that is harder still. Using a validation framework with models makes this much more manageable.

The Perils of Dict Function Arguments

People often try to overcome these validation issues by passing dictionaries as function arguments. It makes sense on one level: the dynamic nature of Python makes that easy. While it may be convenient, it’s bad practice for several reasons:

  • The dictionary is not structured by definition, allowing for less/more arguments to be passed
  • Documenting a changing number of arguments is hard to do when these arguments are part of a dict that anyone can augment from anywhere
  • Documenting dictionary arguments is not centralized and results in duplicated, hard-to-maintain docstrings
  • Validation issues when we need to require that dict arguments satisfy some constraints, which result in duplicate validation logic
  • Validation logic that’s spread around the project means that a small change to the argument list will most probably result in bugs due to old validation logic in some places.

I’m sure you can think of more evil ramifications of using dicts as function arguments. But how then do we overcome the problem and validate without incredibly messy or unsafe code?

Enter Pydantic

If you’re familiar with FastAPI, then you’ve used Pydantic to build models for organizing your routes. We use Pydantic with Sanic to validate our backend routes using models and to pass around structured objects in our code.

This has several benefits over using plain-old classes. For example, Pydantic models provide built-in validation on construction, and return meaningful errors when something is invalid. If we declare a field as numeric, Pydantic takes care of details, like ensuring we can safely cast to a numeric type from a string input in the request or that a value is non-negative, and then giving the requestor feedback if necessary.

With Pydantic we can pass around structured objects with built-in validation and centralized docs instead of dicts that make it easy to miss a key detail. And our route validation logic is extracted into a model that can be made of other reusable models.

Example Usage

The customer object that’s being decoded from a request body now looks like:

from typing import Optional
from pydantic import BaseModel

class Address(BaseModel):
    street_address: str
    city: Optional[str]
    state: Optional[str]
    zip_code: Optional[str]
    
class Customer(BaseModel):
    name: str
    address: Address
    phone_number: str
    
    @validator("phone_number")
    def validate_phonenumber(cls, v):
        # Validation logic goes here
        pass

Fields can be specified to be optional or required, as well as have a specific type like str or int. Models can have fields that are Pydantic models themselves, such as the address field above. This simplifies otherwise complicated validation logic by dividing and conquering.

You can now see how easy it is to reuse parts of this and get meaningful error messages like:

    2 validation errors for Customer
    name
      field required (type=value_error.missing)
    phone_number
      value is not a valid string (type=value_error.string)

These errors are useful for developers as well as customers interfacing with your API. Instead of receiving a “Validation Error” and then having to manually check the docs and try to figure out what went wrong, users can spend their time actually fixing the issue. This has been very useful to our frontend team, as it cuts down on a lot of otherwise necessary back and forth to figure out what’s going on.

On the development side, instead of getting KeyError exceptions when you misuse dictionaries, you now minimize the chances of that happening due to code changes, and also get nice error messages.

To see how easy this becomes to use, look at the following example from FastAPI:

@app.put(/customer/create)
async def create_customer(customer: Customer):
    return create_customer(customer)

Using type hints, the validation happens behind the scenes. By defining the model, we know exactly what’s inside the customer object we’ll be receiving. And on top of all this, if any user of our API sends bad data, they’ll get a detailed error message describing what went wrong!

Conclusion

When you’ve got structured data that has to satisfy some constraints, you’re much better off using a model-based validation framework. Skip the plain old dictionaries or classes. While Python has added Dataclasses as a step in this direction, Pydantic remains king (they even have their own dataclasses!).

Using models allows us to scalably validate routes and throw meaningful errors. They also allow us to pass around structured data instead of unstructured dictionaries, which makes our code more secure, robust, and easier to fix if it breaks.

If you’re interested in learning more about how to integrate pydantic models into your Python web framework, take a look at FastAPI which is based on Pydantic. From there, writing a set of decorators that provide similar functionality should be straightforward.