Every field-service company has the same problem. A homeowner writes "AC making a loud grinding noise." The dispatcher books an hour and moves on. The technician shows up and it's a seized compressor — three and a half hours. The next two appointments cascade late. Customer satisfaction drops. Overtime kicks in.
The fix isn't to book more time for every job — that kills utilization. The fix is to know which jobs are unpredictable and hedge accordingly.
Why Point Estimates Are Wrong
"This job will take 90 minutes" answers the wrong question. The right question: what's the full probability distribution? The mean is one number. The variance tells you whether to hedge.
"Thermostat not responding" — almost always a settings issue or dead battery, 30–60 minutes, low variance. "AC grinding noise" — could be a bearing, compressor, or fan motor, 45 minutes to 4+ hours. A regular scheduler treats them identically.
The Model
The system takes unstructured text plus structured metadata and outputs a LogNormal distribution over job duration:
Both the mean and the spread are functions of the input — the model learns that some job types are inherently more unpredictable, not just longer. Text is featurized via TF-IDF or sentence-transformer embeddings, supplemented with domain keyword flags (noise symptoms, leak symptoms, system type, urgency modifiers).
For production, NGBoost (Natural Gradient Boosting) fits a LogNormal at each leaf node — same distributional output as the Bayesian model, sub-millisecond inference, scikit-learn API:
from ngboost import NGBRegressor from ngboost.distns import LogNormal ngb = NGBRegressor(Dist=LogNormal, n_estimators=500) ngb.fit(X_train, y_train) dist = ngb.pred_dist(X_new) mean = dist.mean() # expected duration std = dist.std() # spread / unpredictability q90 = dist.ppf(0.90) # 90th percentile
Variance-Aware Scheduling
Once you have a distribution per job, the calendar slot is:
The buffer coefficient isn't a constant — it's dynamically adjusted. Premium SLA customers get more buffer. When the calendar is nearly full, buffers shrink to squeeze in one more job. The last job of the day gets extra padding because overrun means overtime.
What this looks like in practice
| Job | E[T] | σ[T] | λ | Slot |
|---|---|---|---|---|
| Thermostat not responding | 45 min | 10 min | 1.0 | 55 min |
| AC grinding noise | 90 min | 35 min | 1.0 | 125 min |
| AC grinding (premium SLA) | 90 min | 35 min | 1.2 | 132 min |
| AC grinding (calendar 95% full) | 90 min | 35 min | 0.7 | 114 min |
The thermostat gets 10 minutes of buffer. The grinding-noise job gets 35. Impossible with a point estimate — a fixed rule books them the same.
Reclaiming Dead Time
The key insight: buffers are options, not sunk costs. A 125-minute AC job finishes in 80 minutes — you have 45 minutes back. Three options: tighten remaining slots, pull forward the next appointment, or insert a short waitlisted job from the same-day queue. The system picks whichever maximizes revenue per hour.
Without distribution-based scheduling, you'd never know you had the slot.
Calibration Over Accuracy
A model that predicts the right mean but wrong variance is worse than useless for scheduling. If you predict an 85th-percentile estimate, roughly 85% of actual durations should fall below it. Also track sharpness — how wide the intervals are. A model that says "0 to 8 hours" for every job is calibrated but useless.
| KPI | Target | How to measure |
|---|---|---|
| On-time completion | ≥ 85% | actual ≤ scheduled_slot |
| Buffer utilisation | ≥ 60% | actual / scheduled_slot |
| Overtime incidents/week | ≤ 2 | Last job overruns shift end |
| Same-day fill rate | ≥ 30% | Waitlist jobs inserted into reclaimed slots |
| Calibration error | ≤ 3% | Max deviation across quantile levels |
The system simultaneously increases throughput — by tightening slots on predictable jobs — and reduces overruns — by expanding slots on unpredictable ones. It's not scheduling better; it's scheduling with the right information.