N

An Introduction To The dbt Plugin API

28 Aug 2023

In dbt-core 1.6.0, Michelle Ark made an exciting addition: a plugin system! This system lets third-party code seamlessly integrate models and artifacts into dbt Core’s compilation process, offering new possibilities for dbt users. Initially proposed by Kshitij Aranke in March 2023 as part of a discussion around creating a python sdk for dbt, the plugin system first came to fruition to support dbt Labs’ proprietary multi-project collaboration product.

GAI generated image of a pluggable interface

Despite its quiet release, the dbt Core plugin system brings great benefits to the OSS dbt community, allowing native cross-project references without importing projects as packages, crafting synthetic model nodes, and creating custom artifacts. Though undocumented at present, this post aims to unravel the plugin interface, guiding you on creating your own plugins and tapping into the dbt plugin system’s potential.

What is the dbt plugin system?

The dbt Plugin system is a common interface that defines hooks that are executed during different parts of the dbt execution lifecycle. During execution, a global PluginManager is instantiated to find and initialize python modules containing dbt plugins. Then, during different parts of the execution lifecycle, like parsing models and writing artifacts, the PluginManager will execute dedicated plugin hooks for each plugin to trigger the plugin’s functionality.

Let’s look at an example! Here’s a sequence diagram for how the ManifestLoader uses a PluginManager to inject external nodes from a plugin into a Manifest.

sequenceDiagram
    ManifestLoader->>plugins.__init__:get_plugin_manager(project_name)
    plugins.__init__ ->> plugins.__init__:setup_plugin_manager(project_name)
    plugins.__init__ ->> PluginManager: PluginManager.from_modules(project_name)
    PluginManager ->> PluginManager: Find python modules with dbt plugins
    loop [Each plugin]
        PluginManager ->> Plugin: initialize()
        Plugin ->> PluginManager: self
    end
    PluginManager ->> plugins.__init__: self
    plugins.__init__ ->> ManifestLoader: PluginManager

    Note right of ManifestLoader: inject_external_nodes()
    ManifestLoader ->> PluginManager: plugin_manager.get_nodes()
    loop [Each plugin]
        PluginManager ->> Plugin: Get nodes
        Plugin ->> PluginManager: ModelNodeArgs
    end
    PluginManager ->> ManifestLoader: PluginNodes

Of particular interest is the initialization step for each plugin. When this happens, the plugin’s initialize() method is called. This is the first opportunity when arbitrary code can be executed by the Plugin.

How do we create a new plugin?

There are a few requirements when making a new dbt Core plugin:

  1. The plugin must be in a python module that follows the dbt_* naming scheme.
  2. The plugin module must have a plugins variable defined that contains a list of plugin classes to be initialized.
  3. The plugin itself must be a subclass of dbtPlugin.
  4. Each hook method for our method must have the is_dbt_hook attribute set to true. This can be accomplished by using the provided @dbt_hook decorator.

It is my understanding that these requirements exist to ensure that the PluginManager does not accidentally import non-plugin code, and furthermore prevents unintended code execution if there is a hook naming collision in the plugin classes.

Here is a bare-bones __init__.py file for a hypothetical dbt_example_plugin module.

from dbt.plugins.manager import dbt_hook, dbtPlugin
from dbt.plugins.manifest import PluginNodes
from dbt.plugins.contracts import PluginArtifacts

class ExamplePlugin(DbtPlugin):
    """A demonstration plugin for dbt-core 1.6.x."""

    def initialize(self) -> None:
        """
        Initialize the plugin. This is where you'd setup connections,
        load files, or perform other I/O.
        """
        print('Initializing ExamplePlugin')
        pass

    @dbt_hook
    def get_manifest_artifacts(self, manifest) -> PluginArtifacts:
        """
        Return PluginArtifacts to dbt for writing to the file system.
        """
        return PluginArtifacts()

    @dbt_hook
    def get_nodes(self) -> PluginNodes:
        """
        Return PluginNodes to dbt for injection into dbt's DAG.
        """
        return PluginNodes()


plugins = [ExamplePlugin]

Now, if this example module exists within the safe python environment as dbt-core, during model loading the PluginManager will detect the dbt_example_plugin module, register and initialize the plugin, and it will call the get_nodes method.

Thoughts for the future

I am delighted that this plugin interface exists! For the first time, we can start extending dbt Core’s functionality without wrapping the CLI. I’ve already used this functionality to create dbt-loom, an open-source dbt Core plugin that enables multi-project deployments. With the functionality that already exists, I can see a many areas for development:

  • Creating a sqlite version of the project manifest. Combined with HTTP range requests, this should allow for an improved dbt-docs experience by bypassing loading large manifest.json files.
  • During the initialize() method call, we should be able to access imported dbt constructs and add our telemetry instrumentation.
  • Depending on how difficult it is to monkeypatch, it may be possible to inject custom methods into a Jinja context.

More importantly, I’m excited for other hooks that could be made in the future! Given that dbt Core is still open source, I hope to see many new PRs open up that extend this new interface further.