Still Swinging Shovels#
What Frederick Taylor can teach us about the future of data engineering
The Speed of Thought, The Weight of History#
We have engines that move data at the speed of thought.
DuckDB, Arrow, Polars, Ray, dbt - technical miracles hiding in plain sight.
And yet, we’re still duct-taping pipelines, debugging tribal wisdom, hoping the batch run finishes before the weekend.
We’re still in the 1800s of data work.
When Frederick Winslow Taylor started timing shovel loads at Bethlehem Steel in the late 1800s, he wasn’t trying to start a movement. He just wanted to know why ten men with the same tools produced wildly different results.
The answer wasn’t the shovel. It was the process.
Most workers were operating on instinct - a rough approximation of what worked, passed down from whoever trained them last. Taylor called it “rule of thumb.” Then he replaced it with science. Measure, test, standardize, repeat. And suddenly, the same shovel moved twice as much steel.
Fast forward a hundred and twenty years.
We have tools that would’ve looked like magic to Taylor.
But watch closely, and the habits haven’t changed.
We’re still moving data by instinct.
Still relying on folklore.
Still assembling brittle flows from mismatched parts.
We’re in the 1800s again.
The Machines Have Arrived#
DuckDB runs warehouse-grade queries straight from your laptop.
Arrow moves data across systems without flinching - zero copy, no translation.
Polars chews through millions of rows with single-core grace.
Ray scales Python like it’s local, even when it’s not.
dbt brings version control and lineage to the heart of SQL.
These aren’t just tools. They’re turbines.
And yet, we still run them like it’s a backyard forge.
Data Work by Rule of Thumb#
We tune queries to the millisecond.
Ask how long a pipeline should take, and you’ll get a shrug.
There’s no baseline for data work. No common unit to measure:
- Shipping a pipeline
- Training a model
- Validating a table
- Debugging a failure
- Onboarding an engineer
Every team reinvents the wheel.
Every architecture diagram is a one-off.
Every definition of “done” is personal.
We obsess over tools, not workflows.
We optimize queries, not process.
We’ve industrialized the machines -
but not the motion.
It’s like handing CNC machines to blacksmiths.
You’ll get sparks. But not precision. Not scale. Not yet.
Writing as a Factory Floor#
Writing is my factory floor.
Every article begins with a question that won’t leave me alone - why does this feel brittle, and that feel right?
Some days, I build. Some days, I observe.
But always, I’m studying the flow.
It’s where I inspect the system.
It’s where I debug the factory.
Taylor Had Stopwatches. We Have Git#
Taylor had stopwatches.
We have version control.
Commits trace our thoughts.
Issues reveal where logic breaks down.
Lineage shows where ideas spread or stall.
The evidence is there. But only if we’re curious enough to read it.
From Workshop to Factory#
The next leap in data engineering won’t come from faster engines or smarter compilers. It will come from better process:
- Shared metrics for pipeline complexity
- Baselines for change impact across systems
- Libraries that capture not just what to do, but how to think
- Mental models for common operations
- Metrics that go beyond execution time
But this isn’t just about standardization. The industrial revolution didn’t stop at parts; it changed how we think about work.
Data engineering needs the same:
- Educational models that marry theory and practice
- Certifications that actually mean something
- Standards for quality, not just compliance
- Better ways to share what we’ve learned
- Tools that enforce wisdom without killing creativity
The same way Taylor turned bricklaying into a science, we need to make data work observable, measurable, and teachable.
Until then, we’re still guessing.