Take chances, make mistakes, and get mesh-y
18 Oct 2023
Earlier this year, dbt Core 1.5 introduced new model governance features including model versions, model contracts, groups, and access, to make it easier to manage the the shape of models, the ownership of models, and access controls determining which upstream models can be leveraged for downstream models. When used well, these features allow analytics engineers to more closely manage their expectations for the data they consume, and manage who their downstream consumers are.
Like many dbt features, however, these are all managed in complex YAML configurations that can be challenging to launch. To that end, Grace Goheen, Dave Connors and I created dbt-meshify, a CLI tool that programmatically manages model versions, model contracts, groups and access controls. Additionally, dbt-meshify can manage splitting projects and reflow-ing references for teams leveraging multi-project deployments via dbt Cloud or via dbt Core and dbt-loom.
This hard work culminated in a whimsical skit presentation at Coalesce 2023, during which we presenting how dbt-meshify can dramatically improve the quality of life for analytics engineers in truly massive organizations.
I’m particularly bullish about the future for dbt Core’s model governance and multi-project deployment features. As someone who works in a very large dbt environment, I’m excited to be working with teams to roll out groups, access controls, contracts, and versions. As we continue to mature as a group, I suspect we will investigate whether or not multiple projects make sense for our use cases, and when the time comes we’ll have a tool in place that will make the split as easy as possible.
Slides
Narrator: Welcome to story hour - today, we’re going to delve into the classic tale - “The Enchanted Learning Vehicle: meshifies a dbt project”
Narrator: This is a three part story about the data team at Mega Corp Big Co Inc, a massive conglomerate with lots of data, and even more stakeholders.
Narrator: Our cast of characters includes Barnold, an analytics engineer who works on the core platform team, and Ralphie, who specializes in modeling data for the sales team.
Narrator: Ever present is Janet, a data manager who needs her data YESTERDAY
Narrator: Our analysts find themselves faced with a set of problems that may sound familiar for those of you who develop in large dbt projects with lots of contributors across a diverse organization.
Ralphie and Barnold, whether they know it or not, are dealing with- Undefined ownership of models
- Unclear interfaces between teams
- Simply Too Many Models slowing down your development
Narrator: Our story will follow these heroes as they use dbt-meshify, a brand-new command line tool, to add model governance and cross-project lineage features to their dbt project, solving these challenges along the way.
Narrator: Our story unfolds in three chapters:
- CHAPTER I - An Unexpected Journey (into data contracts). Our heroes, faced with a breakage, learn how to protect their dbt projects from unexpected changes
- CHAPTER II - A Line in the Sand. Our heroes define their interfaces, and commit to maintaining their boundaries
- CHAPTER III - A Project All My Own. Our heroes migrate to a multi project architecture
Narrator: Chapter 1
Narrator: As our action begins Barnold and Ralphie have settled in at their desks ready for another day of generating actionable insights when suddenly...
Ralphie: Hey uh Barnold I hate to bother you, but did you happen to make any changes to any of our Revenue related models? I'm seeing some jobs failing and I know I didn't make any changes.
Barnold: Revenue related? I thought you handled that I just handled the core our core entities. I can check my G history here I did do a couple cleanup changes recently and ... oops.
Ralphie: What do you mean by "oops"?
Barnold: Yeah Ralphie, I'm sorry. I maybe sort of kind of accidentally removed the is preferred customer column from my customers model. I'm looking at the dag and I can see your transaction model is right downstream of my changes. That wasn't like an important column, was it?
Ralphie:Yeah, I mean that kind of powers all of our Revenue metrics and models. It's in like every dashboard.
Barnold: well how the heck was I supposed to know that? The marketing team told me to clean up unnecessary fields, and I thought that would be an easy change.
Slack notification sound
Ralphie: Oh no! Janet must have seen the family jobs, and now she's not happy. We better give this a read.
Barnold (Reading Janet): Hey team, seriously frustrated over here! 😡 The latest of the project is a total disaster. Changes to Upstream models are breaking our downstream test -- it's like nobody cares about stability. And those core entities? Who's even in charge? It's like a game of hot potato and it is not fun. The lack of organization is driving me up a wall! 💢
Narrator: While Janet's message may seem harsh. I'm sure many of us can empathize. It can be challenging for one data team to trust another when:
- Changes to upstream models are breaking downstream jobs
- No one knows who owns the core entities in their project, and
- The relationships between teams models are too complex to safely make changes
Barnold: But, like that's not my fault, right?
Ralphie (Reading Janet): This dbt project has become an absolute disaster zone, and we can't rely on it anymore. It has become a breeding ground for chaos, and guess who's at the center of all? You got it, @barnold. The fact that we're drowning in this tangled web of confusion is a testament to your complete lack of responsibility. We're wasting valuable time and resources trying to salvage your mess.
Ralphie: You probably should've worked from home today.
Barnold: I'm going to get fired. What am I supposed to do? This dbt project is a giant web of confusion, and our teams never communicate because no one knows who owns what. It's impossible to keep developing on this project without accidentally breaking something and unleashing Janet's wrath.
Ralphie: I know what you mean. When we were a smaller team this was a smaller project, it was much easier to work together. But, now we we're all walking on eggshells, and nobody can get anything done.
Barnold: I just wish there was a way to untangle this without starting totally from scratch.
Ms. Fizzle: Hello Barnold! Hello Ralphie!
Barnold:Who are you?
Ralphie: And how do you know our names?
Ms. Fizzle: My name is Ms. Fizzle, but you can call me "The Fizz".I think it might be time to take a little field trip a field trip.
Barnold: A field trip to where?
Ms. Fizzle: Well, inside your DBT project of course!
Ralphie: We're not in school anymore. But that's beside the point. Our dbt project is a complete mess. We'll get lost in an instant.
Ms. Fizzle: Not with the expert navigation capabilities of the Enchanted Learning Vehicle. Come on, once we're inside I think we’ll be able to find some pretty neat ways to improve the organization governance, and reliability of your dbt models. Let’s take chances! Make mistakes! Get mesh-y!
Ralphie: Well given the state things currently, I guess we have nothing left to lose.
Barnold: Janet's not going to be able to fire me if she can't find me, so I guess might as well go!
Everyone shrinks, and the Enchanted Learning Vehicle enters the dbt project.
Ms. Fizzle: Welcome to your dbt project. Right now, we’re traveling through the project lineage - where we can see the dependencies from one model to another. Let’s take a look at that broken relationship that caused your production jobs to fail… Vehicle, do your stuff!
Barnold: That thing is looking rough!
Ralphie: Yikes, that's pretty bad.
Barnold: All that damage just cuz I removed one measly column?
Ralphie: I mean it does power Janet's favorite 3D donut chart
Barnold: Janet does love to play favorites, and I'm starting to get the feeling I'm not one of them.
Ms. Fizzle: It seems like this relationship could definitely use some TLC. It sounds like there's a lot writing on this connection.
Barnold: I had no idea it was so critical to you Ralphie. It was just a lot easier to know who was using what when our team was small and our project was tidy. I never want to let you down again.
Ralphie: I know, Barnold. I just really wish we had a way we could help build up some trust again now that we have this complex project.
Ms. Fizzle: Not to intrude here, but it sounds like you need a way to reliably expect what the customers model is going to look like.
Ralphie: That's exactly right!
Ms. Fizzle: and Barnold, it sounds like you need a way to safely make changes to your models while still meeting Ralphie's expectations, no matter how high they might be.
Barnold: I couldn't agree more.
Ms. Fizzle: I've got an idea! It sounds like we should add a model contract to this relationship.
Ralphie: What's a model contract?
Narrator: A model contract is a set of upfront guidelines that define the shape of your model. You can enforce a model contract by providing the column names, column types, and any additional warehouse constraint that you want to add. A model contract affects the DDL that dbt generates when you do a dbt run. So. with an enforced contract, dbt will include your defined column names and data types in the create table statement. So, if you later make a breaking change to a contracted model, dbt will throw an error. This will prevent you from accidentally introducing breaking changes to important models in your dbt project.
Barnold: That sounds amazing! Then I'll know exactly what I'm building, and exactly how my changes are going to impact everything downstream.
Ralphie: Knowing that Barnold will be able to make changes safely means that I'll be able to work on my own part of the project without worrying about things breaking, too! 🤔 You know, I think I've heard about model contract somewhere. In a recent product session, I believe Jeremy Cohen was saying something esoteric about them in yesterday's product session. If I recall correctly, they seemed like a lot of work to put together.
Barnold: Ralphie makes a really good point. If I'm going to have to define every single column in all my models and their data types, I'm going to be drowning in work! As if like I'm not already...
Ms. Fizzle: Well, Barnold, you're not alone! Pals, meet dbt-meshify. dbt-meshify is a brand new command line tool to help you add model governance and cross project lineage features to your dbt project by automating the creation of your configurations.
Ralphie: And it's also a wrench?
Ms. Fizzle: And! It's also a wrench. The dbt-meshify package will let us add these features to our project with a few simple commands using the node selection syntax that you know and love. We want to add a new contract to our model, so let's use the dbt-meshify operation add-contract
command. Mind if I pop on into your dbt project?
Ms. Fizzle: I'm going to go ahead and run the dbt-meshify operation add-contract
command, and I'm going to go ahead and select your customers
model, since that's the model that we want to add a contract to. Meshify, do your stuff!
Command executes, and logs are output.
All right! So, we'll be able to see the logs that tell us what took place. I can also take a look at the changes to my YAML file. We can see that I now have a brand new enforced contract on my customer's model, with all of the column names and data types that we would expect. So, we've got a brand new model contract with no bespoke yaml writing needed!
Barnold: Holy cow! Look at how clean it is Ralphie. It's great that dbt is going enforce the contracts during building, and Ralphie be confident about what to expect ... but this is one of my most popular models. I do a lot of development work here. What happens when I actually need to make some changes that are going to break our agreement? How do I manage that change with Ralphie?
Ralphie: Ooh ... This sounds familiar. Jeremy Cohen was waxing poetic about something ... model versions! That's the feature, right?
Ms. Fizzle: That's exactly right, Ralphie!
Narrator: Model versions allow you to define brand new logic for your dbt model, and produce two outputs that consumers can choose between. It's a pathway to upgrade your logic without breaking anyone's work, and gives consumers time to migrate to the new logic as they're ready.
Ms. Fizzle: All right! I'm going to hop on back into this project if that's all right with y'all.
Barnold: That sounds awesome! Let's do it.
Ms. Fizzle: Excellent! I'm going to try out the dbt-meshify version
command, and I'm going to --select
the customer model again because I want to add a new version to the customers model. Meshify, do your stuff!
Command runs and logs are generated.
Ms. Fizzle All right! Again, we'll get the logs that tell us what's happening under the hood. If I take a look at my yaml, I can see that my latest version for customers is One, but I've now got version One and version Two of my customers model. So, we've got version one of customers which can maintain our old logic and then, Barnold, if you want to make a change to this model you could do that in customers version two. Make any change you want and then you can provide Ralphie some time to migrate to the newest version.
Barnold Seems easy to me! What do you think, Ralphie?
Ralphie This is absolutely fantastic! I think we should definitely be adding this to our project. I'm feeling good about these changes.
Narrator: Chapter Two: A Line in the Sand
Ralphie: This helps a lot, but I wouldn't have known who owned all of this business logic if the dbt Job hadn't failed and if we hadn't dig into our project.
Ralphie: If we zoom out and look at the entire DAG, I'm not really sure who owns what just from looking at it. I mean, I know my part of the project: It is the transaction model, and everything upstream from that with the exception of the stores and customers models that Barnold maintains. If only there was some way we could keep track of who owns which part of our project.
Ms. Fizzle: It sounds like it's time that we divided and conquered this DAG. Let's sort your nodes into groups to better understand what these models do and who should be on the receiving end of Janet's slack messages when things get messy.
Barnold Probably still me... 😔
Narrator: Groups are collections of model nodes that are defined in the yaml in your dbt project. Groups allow you to define the owner of that set of nodes, including their contact information, and more finely control how consumers reference models in that group by setting access levels according to the interfaces of that group.
Barnold:Uh... Hang on Fizz. What the heck are access levels?
Ms. Fizzle: Barnold, I'm so glad you asked!
Narrator: Model access in dbt allows you to enforce which models are allowed to reference a given model. There are three types of access in dbt: We have "private", where your model can only be referenced by other models in the same group, "protected" where your model can be referenced by other models in the same project -- this is the default -- and then "public", where your model can be referenced by any models regardless of the group or project that they're in. we'll talk more about when that's helpful a bit later.
Ms. Fizzle: We want to add a new group to this project so we can use the dbt-meshify group
command to accomplish this. Now, Ralphie, do you want to give it a spin?
Ralphie: Would I ever!
Ralphie: All rightly. Let's hop into the terminal again. The Fizz said I'm going to use the dbt-meshify group
command, and since this group pertains to me and the folks that I work with, I'm going to call this group sales_analytics
. It's important to know who owns what, so I'm going to set the --owner-name
to Yours Truly, "Ralphie", and because we're using our node selection syntax I'm going to --select
all the models I care about, which includes all of our sales unioned together and their upstream models, all of our returns unioned together and their upstream models, and of course my pride and joy transactions.
Barnold: You love that model.
Ralphie: I do! ❤️ Meshify, do your stuff
Command runs and logs are generated.
Ralphie: Wow, look at those logs go. That was a lot of action. Let's go see what changed in our files. Let's look at new _group.yml
file. I see that we have a new group defined called sales_analytics
and the owner is yours truly -- Ralphie. Now let's look at my pride and joy transactions. I see that transactions is part of the sales analytics group and has a meta field that enabled contract enforcement. I also see that Meshify automatically created all of columns and data types for the columns in this model. If look at my intermediate models, I see they're part of the group two and ... Oh that's interesting! I see the intermediate models and staging models have add their access levels set to "private". Hey Fizz, what's up with all the access level changes on these models
Ms. Fizzle: dbt-meshify understands how your models are being used, and then sets the access levels accordingly. So, for models in your project that are only going to be referenced by other models in the same group, their access is going to be private. But for models that are referenced by models outside of your group and for Leaf models in your project, those are going to be set to protected.
Ralphie: Wow this is truly wild. You know, what if Barnold got to have his own group, too? What do you think, Barnold?
Barnold:I'm ready to have my own group, too. Let's do it!
Barnold: I'm going to follow some really similar steps, since you made it look real easy. I'm going to run dbt-meshify group
and I'll call my group core_data_models
. Easy enough! The owner is me this time not you Ralphie, and then rather than selecting the nodes, in this case I know that what's not yours is mine. Generally we share everything, but in this case we actually want to draw a line here. I'll use the exclude flag from dbt-meshify to exclude your group and select everything else into my core_data_models
.
Command runs and logs are generated.
Barnold: Let's check our files. The _group.yaml
file has a new entry for core_data_models
, with myself set as the owner. My customer model is also protected now, and doesn't have a new contract since we did that in the first step. But, since your also using my store's model, and the call center model is the end of my dag where people definitely query that for their BI use-cases, so I'm glad to see a contract on both of those. Also, both are set to protected, and my staging models are in the group and set to private. Looks awesome!
Ralphie: I'm really curious about what this has done to our DAG.
Barnold: Let's take a look!
Ralphie: How beautiful! It's a little garden now.
Barnold: I could get used to this
Narrator: Chapter Three: A project al my own
Ralphie: I'm really happy with what we've done so far. By having contracts and groups in place, everything's a lot cleaner. But, I can't help but notice that I still have to squint my eyes in order to really see what's going on here. I wish that things were a little bit cleaner.I wish that my project would parse faster, run faster ... You know, what if we took all the revenue models out and did something else with them 👉👈
Barnold: Hang on. Are you breaking up with my models?
Ralphie: No! ... I mean is that even possible? It doesn't matter. The bottom line is is we definitely still have a relationship here. My models are intrinsically related to your models, Barnold, but I'd like things to be a little bit cleaner. To have a little more space.
Ms. Fizzle: There comes a time in every analytics engineer's life where they need a dbt project of their own. When you two are ready to take that leap, you can use the dbt-meshify split
command to split your one project into two.
Narrator: Multi-project deployments in dbt allow you to manage interfaces between contributors seamlessly. So no more stepping on each other's toes in a monolithic project. You can reduce manifest size and parse times by reducing the size of each component project, and enable cross project lineage without having to duplicate nodes via sources.
Ralphie: This sounds exactly like what we need to do with our project in order to make it easier for us to work together! What do you think Barnold?
Barnold: I'm just going to miss you and your models so much.
Ralphie: Hey! We're not breaking up forever. Tt's important for us to maintain our trust, and for us to come together as a team to make this decision. You know what? As a sign of goodwill, how about you run this command, Barnold. I'll even let you name our new project!
Barnold You mean it? You want me to name your project?
Ralphie: You bet, Barnold. It's important that we have a good mesh here.
Barnold Well okay. If you're feeling ready, let's go ahead and do it!
Barnold: Just like The Fizz said, we can use the dbt-meshify split
command. How's this sound: big_corp_go_to_market
?
Ralphie: I was thinking dolla_dolla_bills
, but big_corp_go_to_market
will do!
Barnold We're in agreement there! I'll go ahead and --select
your sales_analytics
group, and everything upstream, so you have all your sources and everything you need. I'll also make sure to --exclude
my core_data_models
group and all of its sources so that we have that same line that we saw in our beautiful garden, but split into two projects instead. Meshify, do your stuff!
Command executes, and logs are output.
Barnold: That's an uncanny amount of changes. I'm going to look at my file tree because it might be a little bit easier to absorb what has changed. There's a new folder in my repository there called big_corp_go_to_market
, and I see a dbt_project.yml
file inside. That must be your new project! And, it has a dependency directly against the old project, mega_corp_big_co_inc
.
Ralphie: That's incredible!
Barnold: I see all your models, and -- there it is! Transactions. Your pride and joy. Oh, check this out! Your ref()
statement to my model has been transformed into a cross-project reference with the project name first and the model name second.
Ralphie: See we still have our relationships in place!
Barnold: I see your sources, your stages models... everything you need, iun a project all your own!
Ralphie: It's all here! What happened to your project Barnold?
Barnold: Well, it looks like dbt-meshify moved some stuff around, but the biggest change that I see is that the customers model just got flipped to "public" access. Fizz, that's what you were saying enables access across projects, right?
Ms. Fizzle: That's exactly right!
Barnold: I'd note that my models still have the contracts from before, so our interface is rather clean. You can feel confident selecting any one of these public models!
Ralphie: I wonder what's happened to our DAG as a result of splitting our single project into two component projects.
Barnold: My word! That's so simple. Even I can handle a two-node DAG.
Ralphie: And, I don't have to get new glasses to see what's going on. This is amazing!
Ms. Fizzle: Consider your dbt-project "meshified" 😎. All right. I think it's time for us to head back home.
The Enchanted Learning Vehicle leaves the computer, flies through the air, and lands on the ground.
Barnold: I going to be honest. I was starting to get a little queasy on that bus.
Ralphie: I'm pretty happy to be home.
Slack notification sound
Barnold: Oh no... 😥
Ralphie: Barnold, I think you're going to like this one!
Ralphie (Reading Janet):Hey @channel, checkout @barnold and @ralph's latest updates to the company dbt project! The project is much easier to follow now that the GTM models have been partitioned into their own project. Major kudos for adding contracts and versions, too! Let's find more opportunities to adopt this technique across the department.
Barnold: I've never read a kinder message from Janet before. Thank goodness dbt-meshify helped us clean this mess of a project! In record time, we added contracts to our most important models so we can confidently build on them. We learned how to add versions to manage our changes when our expectations need to change. We grouped our models together, and automatically set their access levels to the appropriate levels for downstream use. And most importantly, we broke our monolithic project into two clean component project. Ms. Fizzle, this meshify tool is out of this world! Have you thought about sharing it with the world?
Ms. Fizzle: Well, it's up to us to spread the word. Let's write a story about our journey, and share it at the Coalesce Conference!
Barnold: In San Diego?!
Ms. Fizzle: Yep!
Ralphie: In 2023?!?
Ms. Fizzle: In 2023. And we did exactly that!
Ms. Fizzle: dbt-meshify is ready to be pip-installed anywhere that you run python. For more documentation check out the dbt-meshify GitHub page. If you want to start your team's adoption of model governance or cross-project lineage features, try out this package. If you run into any bugs, or have ideas for new features, feel free to open up an issue on GitHub and one of us will take a look.
Ms. Fizzle: Remember 🫵
All: Take chances, make mistakes, and get mesh-y!