Chris Bailey

Querying Recursive Data Structures with SQL and Ecto

In one of the projects I've worked on in the past, we were tasked to implement an attribute based access control microservice for managing permissions for different types of entities across different resources shared between different organizations.

For the purposes of this post, we'll simplify greatly and use the following description of the system:

We modelled this with the following database schema:

from_id relation to_id
org_1 superior_of org_2
org_1 superior_of org_3
admin_1 manages org_1
admin_2 manages org_3

Using this structure to store relations, we're able to store arbitrary relations between arbitrary entities, allowing us to do complex queries like getting all entities in all organizations that can be managed by an admin at a given tier of this hierarchy—in the example above, admin_1 would be able to manage all assets of all organizations and sub-organizations of org_1.

During the MVP phase of the project, this microservice was thrown together pretty quickly and generally it worked well. It was built for simplicity with not much special focus on optimality however so after using this service in production for 14 months we started to notice that for particularly large organization hierarchies queries would end up taking quite awhile.

A common query we performed was to get a list of all organizations, and recursively get all of the subordinate organizations for those organizations. The original implementation looked a little like this:

def traverse_hierarchy(organization_id) do
  {:ok, children} = get_children(organization_id)
  {organization_id, Enum.map(children, &traverse_hierarchy/1)}
end

defp get_children(organization_id) do
  children =
    App.Repo.all(
      from r in Relation,
      where: r.from_id == ^organization_id,
      where: r.relation == @superior_org_relation,
      select: r.to_id
    )

  {:ok, children}
end

This implementation of traverse_hierarchy/1 essentially finds children via a depth first search, which is exhaustive and correct, simple and small; however it also performs N+1 queries because we'd be making a call to get the children for the current node we were working on, and N calls to get the children of each child node. As our hierarchies grew large, the number of queries this would end up doing would increase exponentially.

After doing some research, we discovered that SQL has a mechanism which allows us to recursively query data called Common Table Expressions (CTEs). These expressions act somewhat like closures in that they allow us to run 'functions' on temporary results inside an existing SQL query, allowing you to further query based on the result of the previous. This essentially reduce our multiple SQL queries into a single one following the mantra of letting your database do the heavy lifting 😉

Writing the query in SQL was pretty simple since there are plenty of tutorials on recursive CTEs. I ended up with something like:

CTEs are detailed and documented quite thoroughly, so it was really easy to come up with an initial CTE that would do what we wanted:

WITH RECURSIVE tree(depth, from_id, relation, to_id) AS (
  SELECT 0, from_id, relation, to_id FROM relations WHERE from_id = 'root_org_id' AND relation = 'is_superior_org'
  UNION SELECT
    depth + 1,
    relations.from_id,
    relations.relation,
    relations.to_id
  FROM
    relations JOIN tree
    ON relations.from_id = tree.to_id
)
SELECT depth, from_id as parent, to_id as child
  FROM tree ORDER BY depth

When the above SQL query is executed, the output returned is pretty much exactly what we want:

depth parent child
0 root_org_id sub_org_a
0 root_org_id sub_org_b
1 sub_org_a sub_sub_org_a
1 sub_org_a sub_sub_org_b
1 sub_org_b sub_sub_org_c
2 sub_sub_org_c sub_sub_sub_org_a

We could do some light postprocessing on this returned dataset and we would quite easily build a response which matched the return of the function snippet above.

Unfortunately Ecto doesn't provide any API over querying using CTEs. There are a few ways of calling out raw SQL queries and getting Ecto's database adapter to execute them but this practice is generally frowned upon for good reason. Thankfully we could hackily use a fragment to do what we wanted, like so:

defp traverse_hierarchy(organization_id) do
  hierarchy =
    App.Repo.all(
      from(r in Relation,
        left_join:
          cte in fragment(
            """
            (
              WITH RECURSIVE tree(depth, from_id, relation, to_id) AS (
                SELECT 0, from_id, relation, to_id FROM relations WHERE from_id = ? AND rel = 'is_superior_org'
                UNION SELECT
                  depth + 1,
                  relations.from_id,
                  relations.relation,
                  relations.to_id
                FROM
                  relations JOIN tree
                  ON relations.from_id = tree.to_id
              )
              SELECT * FROM tree ORDER BY depth
            )
            """,
            ^organization_id
          ),
        where: r.from_id == cte.from_id and r.to_id == cte.to_id and r.rel == cte.rel
      )
    )

  # For brevity I won't actually include code to postprocess these results
  # but it would be quite simply to do so here.
  process_data(hierarchy)
end

This implementation performed much faster than the original although admittedly it is quite ugly! The original implementation was significantly more readable, but at the end of the day this approach worked out quite well and solved the problem we were trying to solve.

I'm not sure if this is the best way of dealing with such recursive data structures, or if there is a more idiomatic way of doing it, but it was an interesting dive about leveraging SQL instead of programming logic whenever we can.

I'm not sure why Ecto doesn't provide any capability to execute CTEs but maybe one could hack together a PR or a library to do just that 🙂

Edit:
Looks like between when this was implemented on the project I was working on and the time I wrote this article, Ecto now has includes an API for using CTEs! You can check out the documentation here.


Return to Posts →