Your YAML factory is a DSL. Most teams find out the hard way.
Most pipeline YAML schemas grow into accidental domain-specific languages, with no spec, no design review, and a 2 a.m. on-call engineer debugging a templated `unless` block. The teams that escape design the language on purpose and treat the schema like a public API.
The first version of our pipeline schema had nine fields. Eight months later it had thirty-one, four custom Jinja macros, and an if block nobody on the team could fully reason about anymore. What the team had built was a domain-specific language (DSL), with no design review and no spec, and they had done it by accident.
That is the moment most data platforms find out their YAML factory is a DSL. Usually around 2 a.m., when the on-call engineer is trying to figure out why a templated unless block is producing malformed configs for three tenants in a row.
The contrarian read on DAG sprawl is that DAG count is downstream of pattern count. Hire two more data engineers into a platform with twelve patterns and you do not get more capacity, you get patterns thirteen and fourteen. A recent engagement collapsed 430 DAGs into 12 and pulled about $40K a month of compute back. The numbers are not the point of this post. They are evidence for a different point. The collapse stuck because the schema was designed as a language. Most data platforms never do that step, and that is why most consolidations sprawl back inside two years.
Pattern count is the metric DAG count is hiding
DAG count is a number leadership can read on a dashboard, which is most of why it gets used. It happens to move in the same direction as pattern count and headcount for a while, so for a year or two you can pretend it is the right metric.
The right metric is pattern count. The unit of work on a data platform is the pattern, the reusable shape that thirty pipelines share. A platform with 12 patterns and 430 instances has the same operational surface as a platform with 12 patterns and 12 instances. The 431st DAG that follows an existing pattern is nearly free. The 13th pattern is what costs you a code path on every primitive, every alert route, and every audit query.
This is why headcount never fixes sprawl. A new engineer landing in a platform with 12 patterns does not produce a 13th instance of pattern 3. They produce pattern 13. Their copy of partition pruning is six percent different from the canonical one because they were not in the room when the canonical version got argued. Six months in, you have 14. Brooks's Law applies one layer up.
What executives miss is that DAG count is the visible part. Pattern count is the part that drives MTTR, audit prep, and on-call attrition. It is also the part nobody is measuring, because no orchestrator ships a pattern-count metric and most teams have not named their patterns out loud.
A YAML factory is a DSL, whether you call it one or not
Once you commit to the pattern layer, the next decision happens whether you make it on purpose or not. Pipelines start landing in YAML. A parser somewhere reads those YAML files and turns them into orchestrator objects. The schema of that YAML, the fields it accepts, the way they compose, the implicit defaults, is now a DSL. You designed it. You may not have noticed.
The Kubernetes ecosystem is the cautionary precedent. YAML alone could not express what teams needed, so templating engines arrived. Helm, then kustomize, then jsonnet, then ytt, then cue. Each one exists because the previous one ran into the same wall: the moment your config language needs branching, looping, or shared structure, plain YAML stops being enough, and the tool you reach for becomes the actual language your team writes in. By the time most platforms realize this, they own a Helm chart with three layers of templating that nobody can debug at 2 a.m.
Terraform took the other path on purpose. HashiCorp built HCL because they knew YAML would not hold up under config logic, and a full programming language was too much. HCL is a designed DSL with a type system, an interpolation syntax, a versioning policy, and a deprecation process. The reason Terraform stays readable at scale is that the language was a deliberate artifact, not an accumulation.
The accidental version is what most data platforms ship. Pipeline YAML grows fields when somebody needs them. Templating gets bolted on because two pipelines were almost identical except for a tenant filter. By week 18 a condition_when field has appeared. Six months later the schema has thirty-one fields, half the team disagrees on what three of them do, and the on-call engineer is grepping parser source to figure out the order of evaluation. The factory still ships pipelines. The language inside it was never designed, and it shows.
Where the DSL ends and Python begins
The honest version of this is a single line. YAML the spec has no logic. No conditionals, no loops, no expressions. The moment a pipeline needs to branch, somebody is choosing where the branch lives. Three real options.
A templating engine is the first. Jinja, Go templates, something similar. The branch lives in template syntax that renders down to YAML. Cheap to add, easy to abuse, and the failure mode is the one Helm taught everyone. Once your templates can call macros that call other macros, the rendered output stops being something you can mentally execute.
Option two: a custom parser. Your code reads the YAML into a dict and branches based on the values. This is what most YAML factories actually do, and it is the option I have shipped. It works, with a cost most teams underestimate. Every branch in the parser is a feature in your DSL. Every feature has a contract, a versioning story, and a deprecation cost.
The schema is a public API to the engineers writing pipelines. You treat it that way or you do not.
Or you stay declarative in YAML and route the hard pipelines through an escape hatch. A pipeline that genuinely needs branching gets written in Python against the same primitive APIs the YAML compiles to. The factory does not try to absorb every shape. It owns the 80 percent that is configuration and concedes the 20 percent that is real logic.
Where this fails is when teams pick the templating option, do not realize they have, and never write down the schema as a contract. The DSL grows by accident, the deprecation story does not exist, and the version 2 schema arrives as a forced migration nobody asked for.
Where the consolidation actually breaks
Pattern consolidation is sold as inevitable once you start. It is not. Two failure modes show up consistently.
One is a domain with more irreducible variance than the platform team accounted for. You promised 12 patterns. The honest answer for the business you are in is 18, because three lines of business have genuinely different validation contracts and a fourth has a regulatory requirement nobody can compress. Forcing those into the existing 12 produces a primitive with so many optional fields that it becomes a different language for each caller, which is the failure you were trying to avoid. The fix is to take the four extras seriously and design them as primitives, not pretend the 12 number was sacred.
The other failure is over-consolidation. You hit 12 and stopped, when two of those 12 are doing the work of what should have been four. A transform primitive that quietly handles eight wildly different shapes of join logic stops being a primitive. It becomes a god object that grew a YAML interface. The MTTR on that primitive will tell you, eventually, by being three times the MTTR of every other one.
Both failures share the same root cause. Nobody runs the schema review. The DSL evolves under load, with the on-call rotation as the implicit feedback mechanism. That is a slow signal. By the time the rotation starts complaining, the consolidation has already drifted, and the cost of pulling it back in is higher than the cost of designing it deliberately would have been.
Where this lands
The pattern most teams have today is a config file that grew teeth. The teams that escape it are the ones that wrote down the language, gave it a version number, and treated the schema like a public API the platform owes its users.
If your platform is past 200 DAGs and you can feel the YAML schema starting to grow conditionals, I do a half-day audit of the factory, the schema, and the escape hatch. No pitch attached.
Interested in working together? Let's talk.