Kendrick Horeftis

Most data engineering orgs are stopping at the wrong AI investment

Rolling Copilot out to your engineers is the lowest-leverage AI move on the board. Three other rungs compound much harder, and most DE orgs are leaving them on the table.

Most data engineering orgs adopting AI in 2026 are stopping at the same move. Roll out Copilot, Cursor, or Claude Code to the engineers, declare AI adoption done, move on. That is the lowest-leverage play on the board.

The IDE rung is real. AI in the editor measurably speeds up dbt models, SQL refactors, and test scaffolding. The trap is treating it as the destination when it is actually the floor. Per-engineer leverage compounds with headcount. Platform leverage compounds against everything the platform touches.

The three rungs above the IDE share that property. Each one changes how the platform itself behaves, which is why each one keeps paying out long after the engineer has logged off. They are also the rungs most vendors will not sell you, because they require investment in your own data, your own catalog, and your own incident history rather than a license per seat.

This post lays out the three platform-level AI investments that compound, where each one fits, and how to triage which one your org should attack first.

Rung 1: AI-assisted legacy migration

The single highest-ROI AI investment most DE orgs are not making is also the one finance has been waiting for. Migrating off legacy ETL platforms (SSIS, Teradata, Oracle stored procs, hand-rolled Python DAGs from a decade ago) has historically taken multi-year programs and tens of millions of dollars, almost all of it spent on engineers manually rewriting proprietary business logic that nobody documented at the time it was written.

Frontier models can now translate that logic. They will not produce a clean migration unsupervised, and they will not handle every edge case, but they will produce something close enough that the human engineering effort moves from typing translated code to reviewing, testing, and reconciling it. The pattern that ships reliably is straightforward. AI drafts the modernized version. A senior data engineer owns the input contract, the output contract, and the parity test. Reconciliation between legacy and new runs in dual-write for a defined window. Cutover happens when the parity test sustains.

On a recent healthcare engagement, the heaviest line item on an Oracle-to-BigQuery migration plan was the stored-proc and view translation work. AI took the first pass on the bulk of the PL/SQL. Senior engineers spent their time on parity tests and the regulated edge cases instead of retyping logic that nobody had touched in years. Months of typing collapsed into weeks of review. Numbers obviously vary. The shape of the change is consistent.

This rung does not just save engineering hours. It unlocks budgets that have been frozen for a decade because the original migration estimate was unfundable. A vendor dependency that has been quietly capping the platform's options goes with it. The legacy system finance, security, and audit are all separately uncomfortable with comes off the books.

Where this rung breaks. Heavily regulated business logic with side effects you cannot reproduce in a parity test. Legacy systems with undocumented dependencies into operational tools nobody has touched in five years. The honest answer in those cases is the same as it always was, which is patient archaeology, but the AI can take the first pass at the archaeology too.

Rung 2: AI-enriched active metadata

Catalogs and lineage as documentation is dead. Static tables of column descriptions that nobody updates. Lineage diagrams that the platform team renders quarterly and then forgets. The thing that has changed in the last year is that catalog and lineage have become the substrate AI agents read to do their work, and the quality of that substrate now caps the quality of every other AI play in the org.

The move is to put AI on the read path as well as the write path. Read path: pipe query history, lineage events, schema drift, and access patterns into a system that summarizes them. Write path: have AI propose column descriptions from observed query patterns, infer entity relationships, classify PII, and write a plain-language summary of the lineage impact for any proposed schema change. Humans approve. The catalog starts updating itself in the rough direction of correctness, which is what static catalogs never managed.

Skip this rung and the rest of the AI investment carries a tax it never recovers from.

The compounding part is what makes this rung the most under-invested. Every other AI initiative in the org gets better the moment the metadata is grounded. The IDE assistant stops hallucinating column names because it has the actual schema and the actual usage. Rung 1's migration agent gets source-of-truth contracts to test against. By the time Rung 3's incident agent walks lineage in the middle of a page, the lineage is current.

Finance hears this as: we stop paying for the same investigation twice. Security cares because PII classification stays current as the schema evolves. The CTO has been promised AI-readiness for two years, and this is the rung that makes it a measurable property of the platform instead of a slide.

Rung 3: Agentic on-call and incident triage

On-call rotations are the thinnest, most expensive part of a data platform. Every senior engineer who burns out on a midnight pager rotation is a leverage point the org just lost. AI on this rung will not eliminate the rotation. It will change what the human on it actually does.

The pattern that ships today is triage and root-cause drafting, not autonomous remediation. An incident agent reads the failed run logs, walks the lineage to identify probable upstream causes, surfaces the last three deploys that touched any related primitive, drafts the remediation steps, and pre-fills the postmortem template. The on-call engineer goes from a 45 minute archaeology dive to a 5 minute confirm-or-redirect. The hard judgment, the production change, and the customer communication still belong to the human. The 40 minutes of finding the right tab to open is what the agent buys back.

This is where the rung gets honest. Full self-healing is not there for most stacks, and pretending otherwise is how teams ship agents that escalate noise faster than they resolve it. Drift is also real. An incident agent trained on last quarter's incident shapes will quietly degrade as the platform changes. The same active metadata from Rung 2 is what keeps it from rotting.

This rung directly attacks MTTR, on-call attrition, and the audit-prep tax that eats two weeks of senior engineer time every quarter. The pattern discipline conversation is upstream of it. Once the platform has fewer pattern primitives to maintain, AI triage works dramatically better, because the agent has a small canonical set of shapes to reason over instead of inheriting whatever sprawl the platform already had. The two investments compound on each other.

Choosing the right rung for your org

Sequencing matters more than ambition here. The temptation is to pick the rung that is most fashionable inside the org or in the market that quarter, and the failure mode is consistent. A migration program kicks off before the metadata is grounded enough to write parity tests. An incident agent gets deployed against an alerting surface that fires 200 times a week across 30 Slack channels, and the agent inherits the noise instead of triaging it. Either way the investment underperforms and the next round of AI funding gets harder to defend.

The triage I run with leadership teams looks roughly like this. Heavy legacy weight, a migration plan parked for years, finance asking why the modernization line item never moves: Rung 1 first. Thin catalog, partial lineage, AI initiatives elsewhere stalling because the agents have nothing solid to read: Rung 2 is the prerequisite. When senior on-call is the bottleneck and MTTR is climbing, Rung 3 buys back the most senior bench you have.

Acknowledge the failure modes out loud. Hallucinations on regulated logic show up at Rung 1. Rung 2 has metadata drift and over-permissioned write paths to manage. Rung 3's failure mode is agents that escalate noise faster than they resolve it. Anyone who pitches you frictionless wins on any of these is selling the demo. Production looks different.

The IDE rung still belongs in the mix. It is the floor every engineer ends up using anyway, and it needs the least coordination, which is why it tends to get done first. Coordination is the unlock for the others.

Where this lands

The pattern under all of this is the same pattern under most platform decisions. AI inside a DE org compounds when it changes how the platform behaves. The keystroke-accelerator rung is real, and it is the floor. The compounding lives a few rungs above it.

If you're staring at the AI investment lineup for next year and not sure which rung fits where your platform actually is, I'd be glad to compare notes for an hour. Not a pitch, no deck, no follow-up sales motion.

Interested in working together? Let's talk.