Your Scheduler Doesn't Know What It Doesn't Know

Every field-service company has the same problem. A homeowner writes "AC making a loud grinding noise." The dispatcher books an hour and moves on. The technician shows up and it's a seized compressor — three and a half hours. The next two appointments cascade late. Customer satisfaction drops. Overtime kicks in.

The fix isn't to book more time for every job — that kills utilization. The fix is to know which jobs are unpredictable and hedge accordingly.

Why Point Estimates Are Wrong

"This job will take 90 minutes" answers the wrong question. The right question: what's the full probability distribution? The mean is one number. The variance tells you whether to hedge.

"Thermostat not responding" — almost always a settings issue or dead battery, 30–60 minutes, low variance. "AC grinding noise" — could be a bearing, compressor, or fan motor, 45 minutes to 4+ hours. A regular scheduler treats them identically.

The Model

The system takes unstructured text plus structured metadata and outputs a LogNormal distribution over job duration:

Both the mean and the spread are functions of the input — the model learns that some job types are inherently more unpredictable, not just longer. Text is featurized via TF-IDF or sentence-transformer embeddings, supplemented with domain keyword flags (noise symptoms, leak symptoms, system type, urgency modifiers).

For production, NGBoost (Natural Gradient Boosting) fits a LogNormal at each leaf node — same distributional output as the Bayesian model, sub-millisecond inference, scikit-learn API:

from ngboost import NGBRegressor
from ngboost.distns import LogNormal

ngb = NGBRegressor(Dist=LogNormal, n_estimators=500)
ngb.fit(X_train, y_train)

dist = ngb.pred_dist(X_new)
mean = dist.mean()        # expected duration
std  = dist.std()         # spread / unpredictability
q90  = dist.ppf(0.90)     # 90th percentile

Variance-Aware Scheduling

Once you have a distribution per job, the calendar slot is:

The buffer coefficient isn't a constant — it's dynamically adjusted. Premium SLA customers get more buffer. When the calendar is nearly full, buffers shrink to squeeze in one more job. The last job of the day gets extra padding because overrun means overtime.

What this looks like in practice

Job	E[T]	σ[T]	λ	Slot
Thermostat not responding	45 min	10 min	1.0	55 min
AC grinding noise	90 min	35 min	1.0	125 min
AC grinding (premium SLA)	90 min	35 min	1.2	132 min
AC grinding (calendar 95% full)	90 min	35 min	0.7	114 min

The thermostat gets 10 minutes of buffer. The grinding-noise job gets 35. Impossible with a point estimate — a fixed rule books them the same.

Reclaiming Dead Time

The key insight: buffers are options, not sunk costs. A 125-minute AC job finishes in 80 minutes — you have 45 minutes back. Three options: tighten remaining slots, pull forward the next appointment, or insert a short waitlisted job from the same-day queue. The system picks whichever maximizes revenue per hour.

Without distribution-based scheduling, you'd never know you had the slot.

Calibration Over Accuracy

A model that predicts the right mean but wrong variance is worse than useless for scheduling. If you predict an 85th-percentile estimate, roughly 85% of actual durations should fall below it. Also track sharpness — how wide the intervals are. A model that says "0 to 8 hours" for every job is calibrated but useless.

KPI	Target	How to measure
On-time completion	≥ 85%	actual ≤ scheduled_slot
Buffer utilisation	≥ 60%	actual / scheduled_slot
Overtime incidents/week	≤ 2	Last job overruns shift end
Same-day fill rate	≥ 30%	Waitlist jobs inserted into reclaimed slots
Calibration error	≤ 3%	Max deviation across quantile levels

The system simultaneously increases throughput — by tightening slots on predictable jobs — and reduces overruns — by expanding slots on unpredictable ones. It's not scheduling better; it's scheduling with the right information.