An Introduction To The dbt Plugin API
In dbt-core 1.6.0, Michelle Ark made an exciting addition: a plugin system! This system lets third-party code seamlessly integrate models and artifacts into dbt Core’s compilation process, offering new possibilities for dbt users. Initially proposed by Kshitij Aranke in March 2023 as part of a discussion around creating a python sdk for dbt, the plugin system first came to fruition to support dbt Labs’ proprietary multi-project collaboration product.
Despite its quiet release, the dbt Core plugin system brings great benefits to the OSS dbt community, allowing native cross-project references without importing projects as packages, crafting synthetic model nodes, and creating custom artifacts. Though undocumented at present, this post aims to unravel the plugin interface, guiding you on creating your own plugins and tapping into the dbt plugin system’s potential.
What is the dbt plugin system?
The dbt Plugin system is a common interface that defines hooks that are executed during different parts of the dbt execution lifecycle. During execution, a global
PluginManager is instantiated to find and initialize python modules containing dbt plugins. Then, during different parts of the execution lifecycle, like parsing models and writing artifacts, the PluginManager will execute dedicated plugin hooks for each plugin to trigger the plugin’s functionality.
Let’s look at an example! Here’s a sequence diagram for how the
ManifestLoader uses a
PluginManager to inject external nodes from a plugin into a
sequenceDiagram ManifestLoader->>plugins.__init__:get_plugin_manager(project_name) plugins.__init__ ->> plugins.__init__:setup_plugin_manager(project_name) plugins.__init__ ->> PluginManager: PluginManager.from_modules(project_name) PluginManager ->> PluginManager: Find python modules with dbt plugins loop [Each plugin] PluginManager ->> Plugin: initialize() Plugin ->> PluginManager: self end PluginManager ->> plugins.__init__: self plugins.__init__ ->> ManifestLoader: PluginManager Note right of ManifestLoader: inject_external_nodes() ManifestLoader ->> PluginManager: plugin_manager.get_nodes() loop [Each plugin] PluginManager ->> Plugin: Get nodes Plugin ->> PluginManager: ModelNodeArgs end PluginManager ->> ManifestLoader: PluginNodes
Of particular interest is the initialization step for each plugin. When this happens, the plugin’s
initialize() method is called. This is the first opportunity when arbitrary code can be executed by the Plugin.
How do we create a new plugin?
There are a few requirements when making a new dbt Core plugin:
- The plugin must be in a python module that follows the
- The plugin module must have a
pluginsvariable defined that contains a list of plugin classes to be initialized.
- The plugin itself must be a subclass of
- Each hook method for our method must have the
is_dbt_hookattribute set to true. This can be accomplished by using the provided
It is my understanding that these requirements exist to ensure that the
PluginManager does not accidentally import non-plugin code, and furthermore prevents unintended code execution if there is a hook naming collision in the plugin classes.
Here is a bare-bones
__init__.py file for a hypothetical
from dbt.plugins.manager import dbt_hook, dbtPlugin from dbt.plugins.manifest import PluginNodes from dbt.plugins.contracts import PluginArtifacts class ExamplePlugin(DbtPlugin): """A demonstration plugin for dbt-core 1.6.x.""" def initialize(self) -> None: """ Initialize the plugin. This is where you'd setup connections, load files, or perform other I/O. """ print('Initializing ExamplePlugin') pass @dbt_hook def get_manifest_artifacts(self, manifest) -> PluginArtifacts: """ Return PluginArtifacts to dbt for writing to the file system. """ return PluginArtifacts() @dbt_hook def get_nodes(self) -> PluginNodes: """ Return PluginNodes to dbt for injection into dbt's DAG. """ return PluginNodes() plugins = [ExamplePlugin]
Now, if this example module exists within the safe python environment as dbt-core, during model loading the
PluginManager will detect the
dbt_example_plugin module, register and initialize the plugin, and it will call the
Thoughts for the future
I am delighted that this plugin interface exists! For the first time, we can start extending dbt Core’s functionality without wrapping the CLI. I’ve already used this functionality to create dbt-loom, an open-source dbt Core plugin that enables multi-project deployments. With the functionality that already exists, I can see a many areas for development:
- Creating a sqlite version of the project manifest. Combined with HTTP range requests, this should allow for an improved dbt-docs experience by bypassing loading large manifest.json files.
- During the
initialize()method call, we should be able to access imported dbt constructs and add our telemetry instrumentation.
- Depending on how difficult it is to monkeypatch, it may be possible to inject custom methods into a Jinja context.
More importantly, I’m excited for other hooks that could be made in the future! Given that dbt Core is still open source, I hope to see many new PRs open up that extend this new interface further.