An Introduction To The dbt Plugin API
In dbt-core 1.6.0, Michelle Ark made an exciting addition: a plugin system! This system lets third-party code seamlessly integrate models and artifacts into dbt Core’s compilation process, offering new possibilities for dbt users. Initially proposed by Kshitij Aranke in March 2023 as part of a discussion around creating a python sdk for dbt, the plugin system first came to fruition to support dbt Labs’ proprietary multi-project collaboration product.
Despite its quiet release, the dbt Core plugin system brings great benefits to the OSS dbt community, allowing native cross-project references without importing projects as packages, crafting synthetic model nodes, and creating custom artifacts. Though undocumented at present, this post aims to unravel the plugin interface, guiding you on creating your own plugins and tapping into the dbt plugin system’s potential.
What is the dbt plugin system?
The dbt Plugin system is a common interface that defines hooks that are executed during different parts of the dbt execution lifecycle. During execution, a global PluginManager
is instantiated to find and initialize python modules containing dbt plugins. Then, during different parts of the execution lifecycle, like parsing models and writing artifacts, the PluginManager will execute dedicated plugin hooks for each plugin to trigger the plugin’s functionality.
Let’s look at an example! Here’s a sequence diagram for how the ManifestLoader
uses a PluginManager
to inject external nodes from a plugin into a Manifest
.
sequenceDiagram
ManifestLoader->>plugins.__init__:get_plugin_manager(project_name)
plugins.__init__ ->> plugins.__init__:setup_plugin_manager(project_name)
plugins.__init__ ->> PluginManager: PluginManager.from_modules(project_name)
PluginManager ->> PluginManager: Find python modules with dbt plugins
loop [Each plugin]
PluginManager ->> Plugin: initialize()
Plugin ->> PluginManager: self
end
PluginManager ->> plugins.__init__: self
plugins.__init__ ->> ManifestLoader: PluginManager
Note right of ManifestLoader: inject_external_nodes()
ManifestLoader ->> PluginManager: plugin_manager.get_nodes()
loop [Each plugin]
PluginManager ->> Plugin: Get nodes
Plugin ->> PluginManager: ModelNodeArgs
end
PluginManager ->> ManifestLoader: PluginNodes
Of particular interest is the initialization step for each plugin. When this happens, the plugin’s initialize()
method is called. This is the first opportunity when arbitrary code can be executed by the Plugin.
How do we create a new plugin?
There are a few requirements when making a new dbt Core plugin:
- The plugin must be in a python module that follows the
dbt_*
naming scheme. - The plugin module must have a
plugins
variable defined that contains a list of plugin classes to be initialized. - The plugin itself must be a subclass of
dbtPlugin
. - Each hook method for our method must have the
is_dbt_hook
attribute set to true. This can be accomplished by using the provided@dbt_hook
decorator.
It is my understanding that these requirements exist to ensure that the PluginManager
does not accidentally import non-plugin code, and furthermore prevents unintended code execution if there is a hook naming collision in the plugin classes.
Here is a bare-bones __init__.py
file for a hypothetical dbt_example_plugin
module.
from dbt.plugins.manager import dbt_hook, dbtPlugin
from dbt.plugins.manifest import PluginNodes
from dbt.plugins.contracts import PluginArtifacts
class ExamplePlugin(DbtPlugin):
"""A demonstration plugin for dbt-core 1.6.x."""
def initialize(self) -> None:
"""
Initialize the plugin. This is where you'd setup connections,
load files, or perform other I/O.
"""
print('Initializing ExamplePlugin')
pass
@dbt_hook
def get_manifest_artifacts(self, manifest) -> PluginArtifacts:
"""
Return PluginArtifacts to dbt for writing to the file system.
"""
return PluginArtifacts()
@dbt_hook
def get_nodes(self) -> PluginNodes:
"""
Return PluginNodes to dbt for injection into dbt's DAG.
"""
return PluginNodes()
plugins = [ExamplePlugin]
Now, if this example module exists within the safe python environment as dbt-core, during model loading the PluginManager
will detect the dbt_example_plugin
module, register and initialize the plugin, and it will call the get_nodes
method.
Thoughts for the future
I am delighted that this plugin interface exists! For the first time, we can start extending dbt Core’s functionality without wrapping the CLI. I’ve already used this functionality to create dbt-loom, an open-source dbt Core plugin that enables multi-project deployments. With the functionality that already exists, I can see a many areas for development:
- Creating a sqlite version of the project manifest. Combined with HTTP range requests, this should allow for an improved dbt-docs experience by bypassing loading large manifest.json files.
- During the
initialize()
method call, we should be able to access imported dbt constructs and add our telemetry instrumentation. - Depending on how difficult it is to monkeypatch, it may be possible to inject custom methods into a Jinja context.
More importantly, I’m excited for other hooks that could be made in the future! Given that dbt Core is still open source, I hope to see many new PRs open up that extend this new interface further.