Easy Ergonomic Telemetry in Elixir with Sibyl
Feb 02, 2023
10 min. read
Elixir
Project
Refining Ecto Query Composition
Nov 08, 2022
13 min. read
Elixir
Absinthe
Ecto
Compositional Elixir—Ecto Query Patterns
Jun 10, 2021
18 min. read
Elixir
Ecto
Stuff I use
May 28, 2021
13 min. read
General
A minimal Nix development environment on WSL
Apr 04, 2021
13 min. read
WSL
Nix/NixOS
Elixir pet peeves—Ecto virtual fields
Dec 15, 2020
5 min. read
Elixir
Ecto
My usage of Nix, and Lorri + Direnv
Nov 26, 2020
5 min. read
Nix/NixOS
Build you a `:telemetry` for such learn!
Aug 26, 2020
19 min. read
Elixir
Erlang
Project
Drinking the NixOS kool aid
Jun 30, 2020
9 min. read
Nix/NixOS
Testing with tracing on the BEAM
May 20, 2020
8 min. read
Elixir
Compositional Elixir—Contexts, Error Handling, and, Middleware
Mar 06, 2020
13 min. read
Elixir
GraphQL
Phoenix
Absinthe
Taming my Vim configuration
Jan 14, 2020
7 min. read
(Neo)Vim
Integrating GitHub Issues as a Blogging Platform
Nov 06, 2019
6 min. read
Elixir
GraphQL
About Me
Nov 05, 2019
1 min. read
General

Querying Recursive Data Structures with SQL and Ecto

In one of the projects I've worked on in the past, we were tasked to implement an attribute based access control microservice for managing permissions for different types of entities across different resources shared between different organizations.

Problem Overview & Background

For the purposes of this post, we'll simplify greatly and use the following description of the system:

  • One organization can have many subordinate organizations.
  • One organization can own many resources.
  • One organization can have many admins.
  • Admins in an organization have access to resources belonging to that organization, or any subordinate organizations.
  • Organizations can be arbitrarily deeply nested

We modelled this with the following database schema:

from_id relation to_id
org_1 superior_of org_2
org_1 superior_of org_3
admin_1 manages org_1
admin_2 manages org_3

Using this structure to store relations, we're able to store arbitrary relations between arbitrary entities, allowing us to do complex queries like getting all entities in all organizations that can be managed by an admin at a given tier of this hierarchy—in the example above, admin_1 would be able to manage all assets of all organizations and sub-organizations of org_1.

During the MVP phase of the project, this microservice was thrown together pretty quickly and generally it worked well. It was built for simplicity with not much special focus on optimality however so after using this service in production for 14 months we started to notice that for particularly large organization hierarchies queries would end up taking quite awhile.

A common query we performed was to get a list of all organizations, and recursively get all of the subordinate organizations for those organizations. The original implementation looked a little like this:

def traverse_hierarchy(organization_id) do
  {:ok, children} = get_children(organization_id)
  {organization_id, Enum.map(children, &traverse_hierarchy/1)}
end

defp get_children(organization_id) do
  children =
    App.Repo.all(
      from r in Relation,
      where: r.from_id == ^organization_id,
      where: r.relation == @superior_org_relation,
      select: r.to_id
    )

  {:ok, children}
end

This implementation of traverse_hierarchy/1 essentially finds children via a depth first search, which is exhaustive and correct, simple and small; however it also performs N+1 queries because we'd be making a call to get the children for the current node we were working on, and N calls to get the children of each child node. As our hierarchies grew large, the number of queries this would end up doing would increase exponentially.

Building a solution

After doing some research, we discovered that SQL has a mechanism which allows us to recursively query data called Common Table Expressions (CTEs). These expressions act somewhat like closures in that they allow us to run 'functions' on temporary results inside an existing SQL query, allowing you to further query based on the result of the previous. This essentially reduce our multiple SQL queries into a single one following the mantra of letting your database do the heavy lifting 😉

Writing the query in SQL was pretty simple since there are plenty of tutorials on recursive CTEs. I ended up with something like:

CTEs are detailed and documented quite thoroughly, so it was really easy to come up with an initial CTE that would do what we wanted:

WITH RECURSIVE tree(depth, from_id, relation, to_id) AS (
  SELECT 0, from_id, relation, to_id FROM relations WHERE from_id = 'root_org_id' AND relation = 'is_superior_org'
  UNION SELECT
    depth + 1,
    relations.from_id,
    relations.relation,
    relations.to_id
  FROM
    relations JOIN tree
    ON relations.from_id = tree.to_id
)
SELECT depth, from_id as parent, to_id as child
  FROM tree ORDER BY depth

When the above SQL query is executed, the output returned is pretty much exactly what we want:

depth parent child
0 root_org_id sub_org_a
0 root_org_id sub_org_b
1 sub_org_a sub_sub_org_a
1 sub_org_a sub_sub_org_b
1 sub_org_b sub_sub_org_c
2 sub_sub_org_c sub_sub_sub_org_a

We could do some light postprocessing on this returned dataset and we would quite easily build a response which matched the return of the function snippet above.

Unfortunately Ecto doesn't provide any API over querying using CTEs. There are a few ways of calling out raw SQL queries and getting Ecto's database adapter to execute them but this practice is generally frowned upon for good reason. Thankfully we could hackily use a fragment to do what we wanted, like so:

defp traverse_hierarchy(organization_id) do
  hierarchy =
    App.Repo.all(
      from(r in Relation,
        left_join:
          cte in fragment(
            """
            (
              WITH RECURSIVE tree(depth, from_id, relation, to_id) AS (
                SELECT 0, from_id, relation, to_id FROM relations WHERE from_id = ? AND rel = 'is_superior_org'
                UNION SELECT
                  depth + 1,
                  relations.from_id,
                  relations.relation,
                  relations.to_id
                FROM
                  relations JOIN tree
                  ON relations.from_id = tree.to_id
              )
              SELECT * FROM tree ORDER BY depth
            )
            """,
            ^organization_id
          ),
        where: r.from_id == cte.from_id and r.to_id == cte.to_id and r.rel == cte.rel
      )
    )

  # For brevity I won't actually include code to postprocess these results
  # but it would be quite simply to do so here.
  process_data(hierarchy)
end

This implementation performed much faster than the original although admittedly it is quite ugly! The original implementation was significantly more readable, but at the end of the day this approach worked out quite well and solved the problem we were trying to solve.

I'm not sure if this is the best way of dealing with such recursive data structures, or if there is a more idiomatic way of doing it, but it was an interesting dive about leveraging SQL instead of programming logic whenever we can.

I'm not sure why Ecto doesn't provide any capability to execute CTEs but maybe one could hack together a PR or a library to do just that 🙂

Edit:
Looks like between when this was implemented on the project I was working on and the time I wrote this article, Ecto now has includes an API for using CTEs! You can check out the documentation here.