Data (Documentation) Modelling with AI

Data (Documentation) Modelling with AI

Data modelling is often treated as a specialist discipline. It requires SQL proficiency, familiarity with a transformation framework, and an understanding of conventions that are rarely fully documented. As engineering organisations grow, this lack of shared context can limit who is able to contribute to the data layer.

Connecting an AI assistant, such as Claude Code, directly to a data warehouse can expand access, but the real enabler is documentation. Clear, practical documentation is what makes broader contribution possible.


The Problem: Data Modelling as a Bottleneck


In many organisations, the data team is small while the engineering organisation continues to grow. Engineers are expected to deliver end-to-end solutions, yet data modelling still becomes a handoff to a specialist team. Instead of producing the final data models themselves, engineers often stop at the application layer and pass the rest on.

This is rarely a capability issue. Most engineers can write SQL and reason about data. What's missing is project-specific context. dbt architectures come with rules for naming, testing, and composition that are not obvious unless you already know them. Business logic is embedded in SQL and Jinja macros, often in subtle ways.

Without that context, engineers hesitate. Even when they could implement the change, uncertainty about expectations turns progress into a ticket and a wait.


Documentation That Explains Intent


AI assistants can query warehouses, read dbt models, and generate SQL. Without guidance, they tend to produce code that runs but does not fit the project: wrong layers, inconsistent naming, or incorrect assumptions about how fields relate.

Improving this does not require a better model. It requires documentation that explains what things are supposed to mean.

Schema files should be treated as the source of truth for meaning, not just structure. Column descriptions should explain the business logic behind a field rather than restating its name. When intent is written down clearly, both humans and AI can understand a model without digging through the SQL.

Good descriptions let the AI explain and extend models accurately. Poor descriptions force it to guess.


Using CLAUDE.md for Project Context


Column-level documentation helps with individual models, but working effectively in a dbt project requires understanding how the whole thing is organised. Project instruction files such as CLAUDE.md can provide that context.

A root CLAUDE.md can describe global conventions: how to run dbt, what each layer is for, naming rules, and testing expectations. This gives engineers a clear picture of how the project is structured before they open a single SQL file.

Additional CLAUDE.md files placed in subdirectories can explain local patterns. For example:
- A file in a directory where all of the models are from the same source, can explain intricacies of that source to the AI.
- Another directory might need to always have column names in a specific format e.g. exports to third parties. We can provide this context only when it is required, when building in this directory.

This structure matters because context is picked up as someone moves through the project. High-level guidance explains the overall shape, while local files explain the rules that apply in a specific area.


Documentation as a Practical Feedback Cycle


When AI is used directly against the data warehouse, gaps in documentation show up immediately. Missing or unclear explanations lead to vague or incorrect answers. Each mistake points to an assumption that was never written down.

Fixing those gaps improves the next interaction. Over time, more edge cases get documented, conventions become explicit, and less knowledge lives only in someone's head.

Documentation stops being something you write once and forget. It becomes something the team relies on and keeps up to date because the cost of getting it wrong is obvious.

Conclusion


The problem with contributing to a data warehouse isn't skill. It's not knowing the rules of the project.

Engineers can write SQL and understand data models, but without clear conventions and written intent, it's hard to work confidently. Good documentation, combined with well-placed project context files, makes those rules visible.

AI can then guide engineers toward the correct patterns instead of guessing. The rules are written down rather than passed around informally, and contributing to the data layer is no longer limited to a small group of specialists.

Read more