Rails: insert_all and upsert_all
4 min read

Rails: insert_all and upsert_all

TIL something new about Rails. You can now use insert_all and upsert_all for efficient writes. But that's not all. Read on for what "just worked" and what didn't.

I can't tell you how many times I've either dropped to GitHub::SQL or added some external gem to do bulk writes (activerecord-import or upsert). Often we worry about N+1's and inefficient reads, but completely forget about writes.

Why do we forget about writes? I think the main reason is because writes are so rare in comparison to reads that our apps can usually get away with being inefficient in that area.

Background

I've been working on adding webhooks to Flipper Cloud – effectively switching synchronization of Cloud and your app from pull (poll for local sync) to push (webhook for local sync).

Since Cloud is making a request to your app anyway, the webhook response seemed like a good time to sync up some other information – like groups.

Flipper Groups

Flipper has the concept of a group. A group is a named dynamic block of ruby code that returns true or false.

# register a group
Flipper.register(:admins) do |actor|
  actor.respond_to?(:admin?) && actor.admin?
end

# enable admins group for stats feature
Flipper.enable_group :stats, :admins

# check if enabled stats is enabled for user
# since group is enabled for stats, this will 
# return true if person.admin? is true, else false
Flipper.enabled?(:stats, user) 

Cloud allows you to add groups manually:

But all of the groups are registered in your application, so why make people do by hand what computers can do?

The Upsert Need

Webhooks make requests to your application. Your application has the groups registered. Why not use that response body to return your group information so Cloud can slurp them in? Sweet, right?

Let's say our webhook response body looks a bit like this (array of hashes so we can sync more attributes down the road if necessary):

{
  "groups": [
    {
      "name": "admins"
    },
    {
      "name": "basic"
    },
    {
      "name": "plus"
    },
    {
      "name": "premium"
    }
  ]
}

The next step is to upsert those groups. Any groups that have already been created can be ignored, but the rest should magically appear in the Cloud UI.

Lately, I've been reaching for GitHub::Result quite often, so this might look a bit odd to you, but I think it reads well considering the level of defensiveness required here (imagine the alternative begin/rescues that would be necessary):

def create_groups(response_body)
  GitHub::Result.new {
    JSON.parse(response_body)
  }.map { |data|
    if groups = data["groups"]
      if groups.is_a?(Array)
        registered_groups = groups.map { |row| row.is_a?(Hash) ? row["name"] : nil }.compact.uniq
        CreateGroups.call(webhook.environment, registered_groups)
      end
    end
  }.rescue { |exception|
    Rails.logger.info "SyncWebhook exception: #{exception.inspect}"
    Raven.capture_exception(exception)
    GitHub::Result.error(exception)
  }
end

Once the webhook request is made, the response body is passed to this method.

The goal here is for nothing to cause a failure that would prevent the creation of a WebhookResponse or incorrectly mark the webhook response as an error (for something that errored on our side of things and not in your app).

  1. The body is parsed since it is JSON.
  2. If that succeeds, we verify the types. People can put in whatever webhook URL they want and that URL doesn't have to use our fancy, automatic middleware.
  3. Because the response could be literally anything, we need to be careful not to assume JSON or even the structure of the JSON.
  4. Once we have the names, we try to create any that are missing.
  5. If anything goes wrong, we track the exception. But we don't let it cause problems with our original intent of making a webhook request and recording the result.

TIL

Finally, we are to the TIL moment. You'll notice a call to CreateGroups in there. I really didn't want to do multiple creates or a find and (possibly) fewer creates.

It has been a while since I've researched upserts with Rails, so I decided to spend some time on google. Google led me to a post by Justin Searls, who I know and trust (I'm sure he'll give me a hard time about admitting this).

I whipped a test case together and quickly had CreateGroups working as I wanted.

class CreateGroups
  def self.call(environment, names)
    raise ArgumentError, "environment is blank" if environment.blank?

    GitHub::Result.new {
      names = Array(names).reject(&:blank?).map(&:to_s).uniq
      attributes = names.map { |name|
        {
          environment_id: environment.id,
          name: name,
          created_at: Time.zone.now,
          updated_at: Time.zone.now,
        }
      }

      if attributes.present?
        Group.upsert_all(attributes, unique_by: [:environment_id, :name])
      end
    }
  end
end

The key bit here is line 17 – Group.upsert_all. As of Rails 6, insert_all and upsert_all were added.

A couple things I learned about upsert_all:

  • You need to have a unique index to work with. I already had one on [:environment_id, :name].
  • I had to add the timestamps to the attributes hash.
  • I tried using environment.groups.upsert_all, but that did not add environment_id as I hoped it would.
  • When I didn't use unique_by, an ActiveRecord::RecordNotUnique or similar error was raised which I did not expect. I'm fine with some groups not being created and just wanted this to create any new ones. unique_by solved my use case.

I'm a log tailer (and proud) because I always want to see what SQL is actually happening when I use fancy methods like this. For the curious, the result was:

  Group Bulk Upsert (0.4ms)  
  INSERT INTO "groups" ("environment_id","name","created_at","updated_at") 
  VALUES (506429097, 'admins', '2020-12-16 16:24:48.089080', '2020-12-16 16:24:48.089119'), (506429097, 'foo', '2020-12-16 16:24:48.089123', '2020-12-16 16:24:48.089125'), (506429097, 'bar', '2020-12-16 16:24:48.089127', '2020-12-16 16:24:48.089128'), (506429097, 'baz', '2020-12-16 16:24:48.089134', '2020-12-16 16:24:48.089135') 
  ON CONFLICT ("environment_id","name") 
  DO UPDATE SET "created_at"=excluded."created_at","updated_at"=excluded."updated_at" 
  RETURNING "id"

Now, when features get enabled and disabled, Cloud makes a webhook request to your app. Your app syncs feature data from Cloud and provides the groups registered. Then, Cloud ensures that those groups exist on our end for you to use in our UI.

I'm glad to see Rails has baked upsert_all in. Go forth and write more efficiently!

If you enjoyed this post,
you should subscribe for more.