DevZen

How It Works — Burnout Risk Detection

A detailed look at the science, algorithm, and design decisions behind DevZen's burnout risk score.


1. Why Burnout Matters for Developers

Burnout is not just "feeling tired." The World Health Organization classified occupational burnout as a syndrome in ICD-11 (2019), characterised by three dimensions: emotional exhaustion, depersonalisation (cynicism toward work), and reduced professional efficacy [1]. Software developers face a disproportionately high risk:

The challenge is that burnout is gradual. People rarely notice the slide from "a tough week" to "I can't look at another pull request." DevZen tackles this by computing a daily, objective risk signal from data the user already tracks — mood, stress, sleep, hydration, and work hours — turning subjective feelings into an actionable early warning.


2. Research Foundations

Our model draws on three bodies of established research:

2.1 Maslach Burnout Inventory (MBI)

The gold-standard burnout assessment since 1981, the MBI measures emotional exhaustion, depersonalisation, and personal accomplishment via a 22-item questionnaire [5]. While clinically validated, a 22-item survey is too heavy for daily tracking. DevZen operationalises the MBI's core dimensions through proxy signals:

MBI Dimension DevZen Proxy Signal
Emotional Exhaustion Low mood trend + high stress
Depersonalisation Declining engagement (mood + habit completion)
Reduced Efficacy Elevated work hours without corresponding recovery

2.2 Job Demands-Resources (JD-R) Model

Bakker & Demerouti's JD-R model (2007) frames burnout as the result of an imbalance between demands (workload, time pressure) and resources (recovery, social support, autonomy) [6]. DevZen maps this directly:

When demands consistently exceed resources, burnout risk rises.

2.3 Allostatic Load Theory

McEwen's concept of allostatic load (1998) describes the cumulative physiological cost of chronic stress [7]. Sleep deprivation, dehydration, and sustained high cortisol (indicated by user-reported stress) all increase allostatic load. Our algorithm captures this through multiple biological and behavioural channels rather than relying on any single metric.

2.4 Developer-Specific Research

These informed our decision to weight recovery signals (sleep, hydration) alongside psychological signals (mood, stress).


3. The Algorithm

3.1 Overview

DevZen computes a Burnout Risk Score (0–100) once per day using a weighted linear model. The score is deterministic — same inputs always produce the same output — and runs entirely on-device (offline-capable, no data leaves the phone).

Burnout Score = Σ (weight_i × normalised_component_i) × 100

Higher score = higher burnout risk.

3.2 Input Signals

The algorithm consumes a 7-day rolling window of user-logged data:

Signal Source Raw Range Aggregation
Mood Daily mood logs 1–5 (awful → great) Mean over last 7 days
Stress Daily stress logs 1–10 (calm → extreme) Mean over last 7 days
Sleep Apple Health / manual Hours per night Mean over last 7 days
Work Hours User profile (onboarding) 4–16 hours/day Static value
Hydration Water log entries Glasses per day Mean over days with logs

Why 7 days? A 7-day window captures short-term trends (a bad week) without being diluted by older data. It also aligns with standard epidemiological practice for self-reported wellbeing measures [11]. Recomputation happens on each dashboard load, so the score is always current.

3.3 Normalisation

Each signal is normalised to a 0–1 scale where 1 = worst case (maximum burnout contribution) and 0 = best case (no burnout contribution):

Component Formula Worst Case (→ 1) Best Case (→ 0)
Mood (5 - moodAvg) / 4 Mood = 1 (awful) Mood = 5 (great)
Stress (stressAvg - 1) / 9 Stress = 10 (extreme) Stress = 1 (calm)
Work (workHours - 4) / 12 16 hours/day 4 hours/day
Sleep 1 - (sleepAvg - 5) / 4 ≤ 5 hours/night ≥ 9 hours/night
Hydration 1 - (waterAvg / waterGoal) 0 glasses Meeting/exceeding goal

All values are clamped to [0, 1] after computation.

3.4 Weights

┌───────────────────────────────────────────────────────┐
│                 Weight Distribution                    │
│                                                       │
│   ███████████████  Mood          30 %                 │
│   ███████████████  Stress        30 %                 │
│   ██████████       Work Hours    20 %                 │
│   █████            Sleep         10 %                 │
│   █████            Hydration     10 %                 │
│                                                       │
│   Psychological ─────── 60 %                          │
│   Behavioural / Physical 40 %                         │
└───────────────────────────────────────────────────────┘

Rationale for the 30-30-20-10-10 distribution:

3.5 Final Score Computation

score = (mood_weight × mood_norm
       + stress_weight × stress_norm
       + work_weight × work_norm
       + sleep_weight × sleep_norm
       + hydration_weight × hydration_norm) × 100

Result is rounded to the nearest integer and clamped to [0, 100].

3.6 Worked Examples

Example A — Healthy Developer

Signal Value Normalised
Mood avg 4.2 / 5 (5 - 4.2) / 4 = 0.20
Stress avg 3.0 / 10 (3.0 - 1) / 9 = 0.22
Work hours 8 h (8 - 4) / 12 = 0.33
Sleep avg 7.5 h 1 - (7.5 - 5) / 4 = 0.375
Hydration avg 7 / 8 goal 1 - 7/8 = 0.125
Score = (0.30 × 0.20 + 0.30 × 0.22 + 0.20 × 0.33 + 0.10 × 0.375 + 0.10 × 0.125) × 100
      = (0.060 + 0.066 + 0.066 + 0.0375 + 0.0125) × 100
      = 0.242 × 100
      = 24   → Low Risk ✅

Example B — Developer Approaching Burnout

Signal Value Normalised
Mood avg 2.1 / 5 (5 - 2.1) / 4 = 0.725
Stress avg 7.5 / 10 (7.5 - 1) / 9 = 0.722
Work hours 12 h (12 - 4) / 12 = 0.667
Sleep avg 5.5 h 1 - (5.5 - 5) / 4 = 0.875
Hydration avg 3 / 8 goal 1 - 3/8 = 0.625
Score = (0.30 × 0.725 + 0.30 × 0.722 + 0.20 × 0.667 + 0.10 × 0.875 + 0.10 × 0.625) × 100
      = (0.2175 + 0.2166 + 0.1334 + 0.0875 + 0.0625) × 100
      = 0.7175 × 100
      = 72   → High Risk 🔴

Primary driver: Mood (highest weighted contribution at 0.2175).


4. Risk Levels

The continuous 0–100 score is mapped to three risk levels for user-facing communication:

Score Range Risk Level Colour User Message
0 – 39 Low Teal (#18B89A) "Great work — keep up your healthy routines."
40 – 69 Moderate Amber (#E8940A) "Watch your stress and sleep this week."
70 – 100 High Red (#E84848) "Take a break. Your burnout risk is elevated."

Why three buckets, not five or a continuous gradient?

Burnout research shows that people respond better to categorical risk labels than to precise numbers for behaviour change (Weinstein, 1999) [15]. Three levels are actionable: "you're fine", "pay attention", or "intervene now." More granularity creates decision paralysis; fewer levels loses the "watch out" middle ground.


5. Primary Driver Identification

Beyond the aggregate score, the algorithm identifies the primary driver — the single component contributing the most weighted risk. This is the component with the highest value of (weight_i × normalised_component_i).

This powers the actionable insight: "Your burnout risk is Moderate. Main driver: stress. Consider taking a 10-minute walk between meetings."

When all inputs are null (new user, no data yet), primaryDriver is null and the UI shows a prompt to start logging.


6. Handling Missing Data

Real users don't log perfectly. The algorithm gracefully handles null (missing) inputs:

Missing Signal Default Normalised Value Rationale
Mood 0.4 Slightly pessimistic — nudges user to log
Stress 0.3 Conservative — assumes some baseline stress
Sleep 0.3 Conservative — assumes some sleep deficit
Hydration 0.3 Conservative — assumes mild underhydration

Design principle: "absent data is not good data." We deliberately avoid defaulting to zero (which would imply "everything is fine") because a user who stops logging may be disengaging — itself a burnout signal. The conservative defaults produce a mildly elevated score that encourages the user to resume logging without triggering false alarms.


7. Data Pipeline

┌─────────────┐    ┌──────────────┐    ┌───────────────┐    ┌───────────────┐
│  User Logs   │    │  7-Day Agg   │    │  Algorithm    │    │  Dashboard    │
│  (mood,      │───▶│  (averages   │───▶│  (normalise,  │───▶│  (score,      │
│  stress,     │    │  per signal) │    │  weight, sum) │    │  risk label,  │
│  water)      │    │              │    │              │    │  driver)      │
└─────────────┘    └──────────────┘    └───────────────┘    └───────────────┘
       │                                       │                     │
       ▼                                       ▼                     ▼
  WatermelonDB                          Pure function           UI Card +
  (mood_logs,                           (src/lib/               Wellbeing
   stress_logs,                          burnout.ts)             Ring
   water_logs)                                │
                                              ▼
                                     burnout_scores table
                                     (daily snapshot for
                                      trend analysis)
  1. Collection: User logs mood (1–5), stress (1–10), and water (glasses) throughout the day.
  2. Aggregation: The useTodayData() hook queries WatermelonDB for the last 7 days and computes means.
  3. Computation: calculateBurnoutScore() — a pure function — normalises, weights, and sums.
  4. Persistence: Result is stored in burnout_scores table with all five normalised components, enabling historical trend charts.
  5. Display: The Today dashboard renders a colour-coded card with score, risk label, contextual message, and a wellbeing progress ring (inverted score).

8. Limitations & Future Improvements

Current Limitations

Limitation Impact Planned Mitigation
Sleep data is null 10 % of the model runs on a default value Apple Health integration (Sprint 4)
Work hours are static Doesn't capture overtime spikes Calendar/time-tracking API integration
Linear model Cannot capture nonlinear interactions (e.g., low sleep + high stress compounds worse than the sum) TFLite on-device ML model (v2)
Self-reported data Subject to mood-congruent bias (stressed people rate stress higher) Passive sensors (HRV, screen time)
No temporal weighting Day 1 and day 7 contribute equally Exponential decay weighting

Roadmap (v2 — ML-Based)

The current rule-based model is a deliberate v1 choice: interpretable, debuggable, and privacy-safe. The v2 upgrade path involves:

  1. On-device TFLite model trained on anonymised, opt-in data.
  2. HRV (Heart Rate Variability) from Apple Watch / Google Health as a physiological stress indicator.
  3. Screen time as a passive work-hours proxy.
  4. Temporal weighting: Recent days count more (exponential decay with λ = 0.85).
  5. Interaction terms: Sleep × Stress interaction to capture compounding effects.
  6. Personalised baselines: Individual thresholds learned from each user's historical data, rather than fixed cutoffs.

9. Privacy & Ethics


10. Validation & Testing

The algorithm is covered by a comprehensive test suite (src/lib/__tests__/burnout.test.ts):


References

[1] World Health Organization. (2019). International Classification of Diseases 11th Revision (ICD-11). Burn-out (QD85). https://icd.who.int/browse/2024-01/mms/en#129180281

[2] GitLab. (2024). 2024 Global DevSecOps Report. https://about.gitlab.com/developer-survey/

[3] Haystack Analytics. (2023). Developer burnout and its impact on software security.

[4] Maslach, C., & Leiter, M. P. (2016). Understanding the burnout experience: Recent research and its implications for psychiatry. World Psychiatry, 15(2), 103–111.

[5] Maslach, C., Jackson, S. E., & Leiter, M. P. (1996). Maslach Burnout Inventory Manual (3rd ed.). Consulting Psychologists Press.

[6] Bakker, A. B., & Demerouti, E. (2007). The Job Demands-Resources model: State of the art. Journal of Managerial Psychology, 22(3), 309–328.

[7] McEwen, B. S. (1998). Stress, adaptation, and disease: Allostasis and allostatic load. Annals of the New York Academy of Sciences, 840(1), 33–44.

[8] Thorp, A. A., et al. (2011). Sedentary behaviors and subsequent health outcomes in adults. American Journal of Preventive Medicine, 41(2), 207–215.

[9] Mark, G., Gudith, D., & Klocke, U. (2008). The cost of interrupted work: More speed and stress. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 107–110.

[10] Perlow, L. A., & Porter, J. L. (2009). Making time off predictable — and required. Harvard Business Review, 87(10), 102–109.

[11] Stone, A. A., et al. (1998). A comparison of coping assessed by ecological momentary assessment and retrospective recall. Journal of Personality and Social Psychology, 74(6), 1670–1680.

[12] Kodz, J., et al. (2003). Working Long Hours: A Review of the Evidence. UK Department of Trade and Industry.

[13] Walker, M. (2017). Why We Sleep: Unlocking the Power of Sleep and Dreams. Scribner.

[14] Ganio, M. S., et al. (2011). Mild dehydration impairs cognitive performance and mood of men. British Journal of Nutrition, 106(10), 1535–1543.

[15] Weinstein, N. D. (1999). What does it mean to understand a risk? Evaluating risk comprehension. Journal of the National Cancer Institute Monographs, 25, 15–20.