Compositional Elixir—Ecto Query Patterns

When I was a consultant, one of the opportunities I was most grateful for was the chance to get myself knee's deep in a dizzying variety of Elixir codebases. A direct result of this led me to hone in on a set of Elixir patterns and conventions that I end up pulling into just about every single new project I get my hands on.

I've written high-level posts about patterns I particularly like, such as railway–oriented programming and ways to make your Elixir code more composable, but today I felt like going on a little bit of a deep-dive into getting the most out of one of the most common Elixir libraries in common usage: Ecto!

ELI5—Ecto

Obviously, for a much more fully-featured introduction to Ecto, I recommend jumping right in and reading the official docs/guide as well as Programming Ecto, but a brief and abridged overview I can do.

Ecto is effectively the equivalent of an ORM that you'll find used in, or bundled with, popular web frameworks. While Ecto isn't strictly and ORM, you can think about it doing the same basic job: you define schemas and queries which allow you to read/write records stored in your database of choice (most commonly Postgres) using standard Elixir structs.

On a high level, Ecto is a collection of three different bits and pieces: Ecto.Schema, Ecto.Query, and Ecto.Changeset; providing you with the means of mapping between Elixir structs and your database schema, an extremely robust query builder, and data validation library respectively. Hopefully, that clears up the title of this post—today we'll be focusing on ways to make the most out of Ecto.Query specifically 💪

A Common Problem

One of the things I optimize for as a developer is separating my concerns whenever possible and creating boundaries/domains to isolate different "ideas" and "concepts" in whatever codebase I'm working on.

Unfortunately, on most projects I've worked on, Ecto code tends to be strewn all across the codebase with little regard for how/why it's being strewn about. When projects use libraries such as Absinthe (especially with Dataloader), problems usually end up compounding even more.

For (a surprisingly realistic) example's sake, imagine we're working on a standard Elixir project which uses Phoenix and Absinthe; it'll usually end up having a directory structure not very different from this:

├── my_project/
│   ├── my_context/my_entity.ex
│   └── my_context.ex
└── my_project_web/
    ├── controllers/my_entity_controller.ex
    ├── resolvers/my_entity_resolver.ex
    ├── types/my_entity_types.ex
    └── my_loader.ex

And while organization wise this is what we've come to expect when following standard Elixir conventions, the shocking thing is that every file I've listed here usually contains Ecto code. To make things even worse, these files don't just contain any old Ecto code, but they go as far as to define new Ecto.Querys! In my mind, this is a huge no–no.

Before I get ahead of myself, I'll briefly explain how I interpret the different roles of the files shown in the directory structure example:

Everything within my_project/ is what I'd define as "application logic". This is the real meat and potatoes and our app; this is where we would read/write stuff to/from the database, where we'd write code to actually do stuff (as nebulous as that sounds).
- my_context.ex is a standard Phoenix style context, which in other words, is just a simple Elixir module that we treat as the public API for doing "things" concerning context. A better example to name my_context.ex would be something like accounts.ex which exposes functions such as Accounts.register_new_user/N, Accounts.create_login_token_for_user/N, Accounts.send_forgot_password_email/N.
- my_entity.ex is a standard Ecto.Schema, something that maps records in your database into standard Elixir structs. A more realistic schema name would be user.ex, or admin.ex.
Everything within my_project_web/ is what I'd define as a "consumer application". Typically, I don't expect very much real business logic to live in this directory. Modules defined in these files will simply wrap one or more functions defined by contexts in my_project/, maybe doing some mapping/rendering/converting of different data formats.
- controllers/my_entity_controller.ex is a standard Phoenix controller which lets us trigger code in response to certain REST-like actions.
- resolvers/my_entity_resolver.ex and types/my_entity_types.ex are relatively normal Absinthe files which are effectively the equivalent of our Phoenix controller, but for GraphQL.
- my_loader.ex is a Dataloader specific concept; think of it as a way of batching GraphQL requests to avoid the N+1 problem.

Hopefully, then, you can see why having quite literally every single file here create Ecto.Querys is a code smell? To re-iterate, in case it isn't clear, Ecto.Querys are basically a way for you to write SQL queries to read/write from your database. Now, in reality, you'll likely have a bunch of different resolvers, controllers, types, schemas, contexts; so your problems can only get worse from here... 😅

The issues described here are all effectively different flavors of DRY: I've seen many maintenance nightmares where there is no source of truth for how something is meant to work; instead, you get hundreds of duplicated, one-off Ecto.Querys built all over the place, doing similar but slightly different things. Not only is this the antithesis of comprehensibility, but it's also absolutely unmaintainable: how can anyone sleep at night knowing the next day they'll need to find and update every single query dealing with SomeEntity?!

Much like how I describe error handling and propagation, Ecto.Querys are something I want to push deep into one layer of my codebase. In my mind, Ecto.Querys are an extremely low-level implementation detail that is tightly coupled to your Ecto.Schema and database, and we should treat them as such.

Thankfully, with some elbow grease and hard work (and hopefully, a comprehensive test suite 🤞), we can gradually refactor codebases from the state I've just described into something a little more "onion-like" in shape: having a dedicated database-layer (containing all your Ecto code in one centralized place), a business logic layer (Pretty much just contexts and loader.ex, consumers of your database layer), and one or more consumer layers (Phoenix controllers, Oban workers, Absinthe resolvers, etc.).

One Small Step...

Continuing along with the example directory structure above, let's imagine that our context currently looks like this:

defmodule MyProject.MyContext do
  alias MyProject.MyContext.MyEntity

  alias MyProject.Repo

  import Ecto.Query

  def does_entity_exist?(entity_id) do
    from(
      e in MyEntity,
      where: e.id == ^entity_id,
      where: coalesce(e.deleted, false) != true,
      limit: 1
    )
    |> Repo.exists?()
  end

  def count_entities do
    from(e in MyEntity, where: coalesce(e.deleted, false) != true)
    |> Repo.aggregate(:count)
  end

  def count_entities(org_id) do
    from(
      e in MyEntity,
      where: e.org_id == ^org_id,
      where: coalesce(e.deleted, false) != true,
    )
    |> Repo.aggregate(:count)
  end
end

This looks okay so far, right? I'd go as far as to call this pretty comfortable since this is the state a lot of Elixir projects end up looking like. It works well and is easy to grok in isolation, but we can already foresee some issues that might come up (even with this one module):

Note the repeated query expressions (where: coalesce(e.deleted, false)), which are present in every query. This makes sense since we don't want deleted entities to "exist" or be "counted", but it definitely makes future refactorings in this file a little daunting.

What if we were asked to amend this where: coalesce(e.deleted, false) clause to also consider something like e.updated_at so that we can effectively ignore the existence of entities that haven't been updated in over some time period? We'd need to go and carefully update every function and manually implement this check ourselves. Not exactly safe or not error-prone.
Since we're building queries on the fly, how is anyone supposed to know what fields are available in our schema? Someone who needs to maintain this file will need to know, say, that MyEntity has an org_id field or a deleted flag. IDE autocompletion gets us part of the way there, but it's flakey at best (at least in Vim).

This might be fine right now, as we're working with a single schema, but imagine we need to start joining on different entities and remembering what each entity's respective fields are. This requires a great amount of additional mental overhead and exponentially increases the surface area for things to go wrong.

For me, both of these issues highlight the fact that we aren't working at the right level of abstraction for our contexts. The first issue shows us that we need to centralize our queries into a single source of truth. The second issue shows that we're leaking database-level implementation details all over the place (extrapolate what this means when you have queries being built this way across your entire codebase...)

The first thing I usually do when confronted with a codebase that looks like this is to look at all the individual query expressions being used and refactor those into functions. I like putting these functions directly in whatever schema the query's primary entity is (in our case, it'd be my_entity.ex). That way, we can compose our Ecto.Query the same way as before, but leveraging functions instead:

defmodule MyProject.MyContext do
  alias MyProject.MyContext.MyEntity

  alias MyProject.Repo

  def does_entity_exist?(entity_id) do
    MyEntity
    |> MyEntity.where_not_deleted()
    |> MyEntity.where_id(entity_id)
    |> Ecto.Query.limit(1) 
    |> Repo.exists?()
  end

  def count_entities do
    MyEntity
    |> MyEntity.where_not_deleted()
    |> Repo.aggregate(:count)
  end

  def count_entities(org_id) do
    MyEntity
    |> MyEntity.where_not_deleted()
    |> MyEntity.where_org_id(org_id)
    |> Repo.aggregate(:count)
  end
end

defmodule MyProject.MyContext.MyEntity do
  ...

  def where_not_deleted(query) do
    from e in query,
      where: coalesce(e.deleted, false) != true
  end

  def where_id(query, id) do
    from e in query,
      where: e.id == ^id
  end

  def where_org_id(query, org_id) do
    from e in query,
      where: e.org_id == ^id
  end

  ...
end

Hopefully, it's clear that we're already starting to address the issues I described. Thankfully, Ecto makes it dead easy to make queries composable out-of-the-box, so this refactor ends up not being very much work.

The first issue ends up being resolved quite directly, whereas the second one is resolved by... you got me: IDE autocompletion! Why is this an acceptable fix now when it wasn't before? Because we're now working at a higher level of abstraction—it doesn't matter, say, if where_org_id needs to join some other entities to do its job, those are just database level implementation details that are hidden from you. You're composing natural-language statements in far more of a declarative way than if you were to just write the Ecto.Querys by hand.

Going back to the bits of added complexity I mentioned when describing the problems above, imagine we really did want to implement additional filters such as this where_not_outdated check (perhaps where_not_archived would be a better name?): all we would need to do is implement that as a function in my_entity.ex and call it as part of whichever pipeline needs the addition!

Not only do we minimize code duplication in your contexts, because these new composable query functions can be used elsewhere, you can also deduplicate the queries strewn all over your codebase (loader.ex, I'm looking at you 👀). The ultimate goal is to have the source of truth for anything database-related (insofar as MyEntity is concerned) be contained within my_entity.ex itself.

Down the Rabbit Hole

We're not done yet: iterative software development throws another spanner in your works! While the code we've been left with looks to be in a much better place, imagine we need to revisit how we've implemented the MySchema.where_org_id/2 check above: instead of us being able to store an org_id field directly on entity's schema, we need to join the owner of our entity, and join the org of our owner.

Jumping right in, if we try to solve this problem naturally and iteratively using the pattern we've just described, we should end some with something quite similar to the following:

defmodule MyProject.MyContext.MyEntity do
  ...

  def where_org_id(query, org_id) do
    from e in query,
      join: c in assoc(e, :owner),
      join: o in assoc(c, :org),
      where: o.id == ^org_id
  end

  ...
end

Surprisingly, this teeny three-line change does the trick! Before we celebrate, however, I'll point out the Pandora's box we just unleashed into our codebase 😉

The new issue we've introduced is that MyEntity.where_org_id/2 implements a check on data it doesn't own! This might sound fine on such a small scale. Still, I've seen this kind of thing escalate to ridiculous levels—its easy to see how this regresses into the same problem we were trying to solve at the beginning of all of this: if multiple entities all need to check their org_id, at this rate, we'd be reimplementing this same check in literally every schema in our project!

Ideally, we'd just be able to join on our owner struct and call Owner.where_org_id(owner, ^org_id) on it; however, this isn't something that can be expressed in Ecto.Query's DSL of course—the owner struct isn't realized here yet and just represents an expression in a yet-to-be-executed SQL query.

We can express something roughly analagous however! If we keep going down the rabbit hole of trying to break the query example we arrived at into smaller and smaller pieces, let's see if anything clicks and becomes self-evident:

def where_org_id(query, org_id) do
  query
  |> join_owner()
  |> where_owner_org_id(org_id)
end

defp join_owner(query) do
  from e in query,
    join: c in assoc(e, :owner)
end

defp where_owner_org_id(query, org_id) do
  from [..., c] in query,
    where: c.org_id == ^org_id
end

This is a little closer to where we want to be: it encapsulates all of the concerns we have into their own functions, which is good, but we still have yet to solve our issue. Let's take a look at what exactly is insufficient with these three functions.

We still can't move this where_owner_org_id/2 check into the Owner schema. We've decomposed our query function into atomic pieces that we literally can't split anymore, but we still have a big problem.

Queries thus far have been composable because we're always querying against the main entity (from here on out, we'll call it the first binding) of a query. After we do a join_owner/1, we need to keep in mind that we have an owner binding in our query as well. In addition to this, if we want to move our where_owner_org_id/2 to Owner.where_org_id/2, we need to be somehow able to handle the fact that owner isn't the first binding of this query.
Even if we could somehow determine that owner is the second binding of a query in some cases, how do we get this to work in general? What if we expose multiple join_blah/1 functions? We would need to micromanage the order entities are joined all over the place to try and be consistent. This isn't maintainable or anywhere near the idea of a fun time...
If multiple queries end up calling join_owner/1 internally, Ecto.Query will end up generating SQL with duplicated joins. This might cause issues when running your final SQL query—it isn't optimal, that's for sure! This is likely(?) optimized away by some databases, but maybe not. Either way, it's something to avoid if we can help it.

Thankfully, Ecto provides us with the tool we need to solve all of these issues in one fell swoop by unlocking a new composition superpower: named bindings!

Named bindings work as they sound: they let you name bindings such that you can refer to them later in the query by a given name rather than position. This is in contrast to Ecto's default behavior, which is positional binding: the order in which you bind entities into your query determines how you use them later.

You can create a named binding as simply as using the as keyword following a from or join keyword like so:

defp join_owner(query) do
  from e in query,
    join: c in assoc(e, :owner),
    as: :owner
end

Since naming is hard, I follow the simple convention whereby if I'm joining on a module named SomeSchema, I tend to want to be consistent and name my binding something like :some_schema. This convention is nice because I can reuse any queries that work for :some_schema, no matter the source of the query.

Once a named binding has been created, you can query against said name binding with the syntax below:

defp where_owner_org_id(query, org_id) do
  from [owner: owner] in query,
    where: owner.org_id == ^org_id
end

This, if you noticed, is pretty much exactly what we want. This query is not coupled to anything but the fact that owner is a named binding in our query, which naturally leads us to my next big point (and quite likely my most contrarian one): I opt for never using positional bindings; instead, I tend to use named bindings for quite literally everything.

If we use named bindings (especially if they're consistently named), that means we can freely compose literally any queries between any schemas we've defined in our application! Talk about reaching the end of the rabbit hold 💪

(Consistently named) Named bindings also let us avoid Ecto generated duplicate joins where none are needed, Ecto.Query provides the predicate function Ecto.Query.has_named_binding?/2 which will return true if the given Ecto.Query has a named binding with the given name. This means we can update our join_owner/1 function to be a no-op if we already have the required join in place:

defmodule MyProject.MyContext.MySchema do
  alias MyProject.MyContext.Owner
  ...

  def where_org_id(query, org_id) do
    query
    |> join_owner()
    |> Owner.where_org_id(org_id)
  end

  defp join_owner(query) do
    if has_named_binding?(query, :owner) do
      query
    else
      from [my_context: my_context] in query,
        join: _owner in assoc(my_context, :owner),
        as: :owner
    end
  end

  ...
end

defmodule MyProject.MyContext.Owner do
  ...

  defp where_org_id(query, org_id) do
    from [owner: owner] in query,
      where: owner.org_id == ^org_id
  end

  ...
end

As a side note, I raised an issue to the Ecto mailing list about implementing a guard for this check (since all the data is available by inspecting the Ecto.Query struct).

It won't be available any time soon since it requires the latest version of Elixir. Still, hopefully, one day, we can re-implement the if check by purely utilizing pattern matching a function head when when has_named_binding(query, binding) instead! How neat is that?!

Going the extra mile (and to make automating our named binding generation easier since we're going to want to do it every time we query something), instead of having query pipelines in our context start with a schema module (i.e. MyEntity), I typically define base_query/0 functions in all of my schemas.

These base_query/0 functions serve two purposes: the first being a place to hide the implementation detail of making a named binding, and the second being a convenient place to group queries we always want to do in one place (i.e., the where_not_deleted/1 and where_not_archived/1 checks we talked about above). This equates to the following changes:

# Instead of this...
defmodule MyProject.MyContext do
  ...

  def count_entities do
    MyEntity
    |> MyEntity.where_not_deleted()
    |> Repo.aggregate(:count)
  end

  ...
end

# We do this...
defmodule MyProject.MyContext do
  ...

  def count_entities do
    MyEntity.base_query()
    |> Repo.aggregate(:count)
  end

  ...
end

defmodule MyProject.MyContext.MyEntity do
  ...

  def base_query() do
    query = from e in query, as: :my_entity

    query
    |> MyEntity.where_not_deleted()
  end

  ...
end

With this pattern in hand, you should find yourself empowered to weave, stitch, and compose even the most complex of Ecto queries together! Even so, there is much further we can go—Ecto.Query provides other ways to extending/composing queries beyond the simple expression chaining we've discussed.

If you're interested in this, I definitely suggest reading into fragments, dynamic queries, and subqueries. You can truly build some amazing things with these tools, but dedicated deep-dives for all of these will be saved for another time as they'll prove to be quite the detour.

Compose all the things!

There's still a little bit more we can do to eke out the most composability in our application layer. Now that our Ecto.Querys are written with composability in mind, we can go ahead and refactor our application layer to be dynamic rather than static: what I mean by this is until now, we've written our queries as standard and static Elixir pipelines.

There isn't a reason to limit ourselves to this since Ecto.Query composition is ultimately just accumulating expressions in a struct; it's something we can accumulate over! Since our example talked about being a Phoenix/Absinthe app in the beginning (though we don't really care too much about those details), naturally, our app exposes some public API that provides our users with the ability to provide us arbitrary query parameters that we need to handle.

Tangentially: while I recognize the necessity of a project like Dataloader, it's annoying to me that Dataloader effectively ends up forcing us to duplicate functions we've already defined in our contexts.

For the sake of simplicity, I'm going to ignore Dataloader's existence for the following example.

If your project does use Dataloader, at least with compositional queries, you can try to re-use them between your loader.ex and contexts.

If we assume our app exposes a GraphQL endpoint that ends up looking like this:

object :entity_queries do
  field :entities, non_null(list_of(non_null(:entity))) do
    description("Retrieve a list of all patients at your org")

    arg(:owner_id, description: "Include only entities belonging to the given owner")

    arg(:include_deleted, description: "Include deleted entities in results")
    arg(:include_archived, description: "Include archived entities in results")

    arg(:limit, :integer, description: "Limit the number of results")
    arg(:offset, :integer, description: "For pagination, offset the return values by a number")

    resolve(fn args, _context = %{current_user: %{org: %Org{} = org}} ->
      # somehow implements this endpoint
    end)
  end
end

We now have the problem of needing to query data from our database based on whatever combination of args gets passed in by our API user.

One way of dealing with this need is to build the query in our resolver function itself dynamically (this is typically what gets done by Dataloader or Relay), but I don't want to conflate application layer concerns with consumer layer concerns; I don't want to go this route.

We already have a perfectly operational function in our contexts, so more often than not, my resolver functions end up being very lightweight wrappers around context functions (again, everything in my_project_web/ lives in our consumer layer, so this is to be expected). Usually I end up end up writing something analagous to:

# My resolver function in my Absinthe schema
resolve(fn args, _context = %{current_user: %{org: %Org{} = org}} ->
  MyContext.list_entities(org, args)
end)

# Corresponding in context
defmodule MyProject.MyContext do
  ...
  
  def list_entities(%Org{} = org, args) when is_map(args) do
    args
    |> MyEntity.base_query()
    |> MyEntity.where_org_id(org.id)
    |> MyEntity.where_owner_id(args.owner)
    |> Ecto.Query.limit(args.limit)
    |> Ecto.Query.offset(args.offset)
    |> Repo.all()
  end

  ...
end

# Corresponding in schema
defmodule MyProject.MyContext.MyEntity do
  ...
  
  def where_owner_id(query, nil), do: query

  def where_owner_id(query, owner_id) do
    from [entity: entity] in query,
      where: entity.owner_id == ^owner_id
  end

  ...
end

As you can see, simply leveraging pattern matching in our composable query expression functions let us define quite clearly what possible filters a given function can do, while not explicitly requiring any particular pieces of data.

For particular gnarly or complex outside-world dependant queries (i.e., if a single arg ends up chaining multiple query expressions), I tend to prefer to write in the following pattern instead:

defmodule MyProject.MyContext do
  ...
  
  def list_entities(%Org{} = org, args) when is_map(args) do
    base_query =
      MyEntity.base_query()
      |> MyEntity.where_org_id(org.id)

    query =
      Enum.reduce(args, base_query, fn
        {:limit, limit}, query ->
          Ecto.Query.limit(query, limit)

        {:offset, offset}, query ->
          Ecto.Query.offset(query, offset)

        {:owner_id, owner_id}, query ->
          MyEntity.where_owner_id(query, owner_id)

        {:something_complex, complex}, query ->
          query
          |> MyEntity.do_something()
          |> MyEntity.something_else()
          |> MyEntity.where_complex_thing(complex)

        _unsupported_option, query ->
          query
      end)

    Repo.all(query)
  end

  ...
end

Alternatively, you can define MyEntity.where_complex_thing/2 to be a function that internally combines multiple other, small queries.

It can always be a delicate balance between creating hundreds of these one-off queries which map one-to-one with API options versus having super low-level query functions which don't do anything on their own. Still, much like the difficulty of naming things, this is where we just need to make judgment calls and refactor as we start seeing problems.

Thankfully, once we've reached this stage of API design and query composition, refactoring and reorganizing things tends to be much less daunting a task, so it's not too big of an ordeal ✊

Conclusion

Hopefully, I've said and showed enough to convince you that many projects use Ecto sub-optimally (or at very least, there are ways to make Ecto work harder for you!).

Ecto really does prove to be one of the nicest ORM-like pieces of technology I've ever used, and I'd love to see more people embrace the power that Ecto has to offer.

One of the huge benefits of composing queries like this outside of centralizing the source of truth for querying the database is the increase in confidence it gives us. I'm not personally a TDD Zealot (or hardcore believer in any development dogma), but I do aim to test the majority of the code I write, at the very least, with unit tests.

The nice thing about unit testing context functions made up of these composable query functions is that every single test compounds the confidence of my underlying query expressions more and more. Since I'm extensively testing the bare building blocks of all the business logic in my app, I'm really compounding what I get out of my test suite—definitely by a few orders of magnitude compared to projects which don't centralize their queries in one place.

There is still much, much more that could be said about Ecto, we've only really scratched the surface—I want to eventually talk about how you can make good use of Ecto.Query.fragment/N (has anyone else noticed it's a variadic macro?! How?! 🤯), subqueries, dynamic queries, etc. Until then, if you enjoyed this (admittedly opinionated) post, I've written another Ecto related post on the shortcomings of, and solutions to Ecto's virtual fields.

I hope this has helped! Until next time ☕