Better bulk filtering for Gmail

Fri 07 July 2017 by Lars Kellogg-Stedman Tags gmail

I use Gmail extensively for my personal email, and recently my workplace has been migrated over to Gmail as well. I find that for my work email I rely much more extensively on filters and labels to organize things (like zillions of internal and upstream mailing lists), and that has posed some challenges. While Gmail is in general fairly snappy, attempting to apply an action to thousands of messages (for example, trying to mark 16000 messages as "read", or applying a new filter to all your existing messages) results in a very poor experience: it is not possible to interact with Gmail (in the same tab) while the action is running, and frequently actions will timeout.

Fortunately, we can take advantage of Gmail's IMAP interface to overcome most of these obstacles. The naive approach won't work: If you attempt to perform an IMAP action against thousands of messages you will encounter the same timeouts you see with the browser. The big advantage to IMAP is that it makes it easy, with a little bit of code, to split a large operation up into smaller chunks. Gmail provides a few [IMAP extensions][] that provide us with a mechanism for accessing Gmail-specific features, such as the rich search syntax and support for arbitrary labels.

I've written a small tool to take care of this; you can find it in my [gmailfilters][] repository on GitHub. The project provides two commands, the gmf bulk-filter command, which I will discuss here, and the gmf manage-filters command, which can translate between a simpler YAML syntax and the XML syntax used by Gmail's filter import/export. I may write about that in a future post.

Installing gmf

The gmailfilters project is a standard Python package. You can install it directly from GitHub like this:

pip install git+https://github.com/larsks/gmailfilters

This will install the gmf command, which provides the following subcommands:

  • bulk-filter -- a command line tool for applying bulk actions to Gmail messages.
  • manage-filters -- translate between a YAML filter syntax and the XML syntax used by Gmail for filter import/export.
  • apply-filters -- read the YAML filters file and apply (a supported subset of) the actions to your mail.

This article is going to focus on the gmf bulk-filter command.

Configuring gmf

Once installed, you will need to create a configuration file. By default, gmf will look for a file named gmailfilters.yml located in your current directory or in your $XDG_CONFIG_HOME directory, typically $HOME/.config. The configuration file looks something like this:

accounts:
  default:
    host: imap.gmail.com
    ssl: true
    username: username@example.com
    password: secret-password

You can have multiple accounts in the file; gmf will use the one named default by default.

Using the bulk-filter command

I have a Gmail filter that applies the label topic/containers to any mail matching the search {docker container kubernetes lxc runc}. I want to apply this filter to all my existing messages, and I want to mark all matching messages as read so that I can identity new matches. I can use the bulk-filter tool to accomplish this task using the --label and --flag options:

gmf bulk-filter --query '{docker container kubernetes lxc runc}' \
  --label topic/containers --flag seen

It turns out I had close to 15000 freecycle messages gathering dust in my account. The messages were already labeled with the freecycle label. We can weed those out like this:

gmf bulk-filter --query 'label:freecycle' --trash

The --trash option acts like Gmail's "move to trash" option. You can also use --delete, which will use an IMAP delete operation, but the behavior of an IMAP delete is controlled by your Gmail configuration (it may simply archive a message, or it may delete it completely).

The bulk-filter tool can also be used to remove labels by preceding them with a -. For example, if I want to find all messages labeled fedora-devel-list and modify them so that they are labeled list, list/fedora, and list/fedora/devel I can run:

gmf bulk-filter --query 'label:fedora-devel-list' \
  --label list --label list/fedora --label list/fedora-devel
  --label='-fedora-devel-list'

This exposes a quirk of Python's argparse argument parser: if the argument to an option starts with -, argparse assumes that you've made a mistake unless you explicitly attach it to the option with =.

By default, the bulk-filter tool operates on the [Gmail]/All mail folder, which contains all of your messages. You can limit it to specific folders instead by providing an optional list of folders (that may contain wildcards). For example, if I want to perform the above labelling operation only on internal company mailing lists, I could limit it like this:

gmf bulk-filter --query '{docker container kubernetes lxc runc}' \
  --label topic/containers list/internal/*

This assumes, of course, that you have filters in place that apply labels nested under list/internal to internal company mailing lists.

The default behavior of bulk-filter is to operate on 200 messages at a time. You can change this using the --size parameter, for example:

gmf bulk-filter --size 50 ...

Comments