How does DevZen measure burnout risk?

DevZen computes a daily 0–100 burnout risk score from five signals: mood (30%), stress (30%), work hours (20%), sleep (10%), and hydration (10%). The score updates each day from a 7-day rolling window of your logs.

Is the burnout score computed on-device?

Yes. The entire calculation runs locally on your phone using a deterministic formula. No data is sent to a server, and no machine-learning inference happens off-device.

What signals contribute to the burnout score?

Mood and stress logs, work hours from your profile, sleep duration, and hydration. Psychological signals (mood + stress) account for 60% of the score; recovery signals (work, sleep, hydration) the remaining 40%.

Is my health data shared with my employer?

No. DevZen is a personal tool — there is no team dashboard, manager view, or data sharing. All logs stay on your device unless you opt into encrypted iCloud or Google Drive backup.

How It Works — Burnout Risk Detection

A detailed look at the science, algorithm, and design decisions behind DevZen's burnout risk score.

1. Why Burnout Matters for Developers

Burnout is not just "feeling tired." The World Health Organization classified occupational burnout as a syndrome in ICD-11 (2019), characterised by three dimensions: emotional exhaustion, depersonalisation (cynicism toward work), and reduced professional efficacy [1]. Software developers face a disproportionately high risk:

40 % of developers reported experiencing burnout in 2024 (GitLab DevSecOps Survey) [2].
83 % of developers cited burnout as a contributing factor to security incidents (Haystack Analytics) [3].
Developer burnout costs companies an estimated $6,200 – $12,000 per employee per year in lost productivity, turnover, and errors [4].

The challenge is that burnout is gradual. People rarely notice the slide from "a tough week" to "I can't look at another pull request." DevZen tackles this by computing a daily, objective risk signal from data the user already tracks — mood, stress, sleep, hydration, and work hours — turning subjective feelings into an actionable early warning.

2. Research Foundations

Our model draws on three bodies of established research:

2.1 Maslach Burnout Inventory (MBI)

The gold-standard burnout assessment since 1981, the MBI measures emotional exhaustion, depersonalisation, and personal accomplishment via a 22-item questionnaire [5]. While clinically validated, a 22-item survey is too heavy for daily tracking. DevZen operationalises the MBI's core dimensions through proxy signals:

MBI Dimension	DevZen Proxy Signal
Emotional Exhaustion	Low mood trend + high stress
Depersonalisation	Declining engagement (mood + habit completion)
Reduced Efficacy	Elevated work hours without corresponding recovery

2.2 Job Demands-Resources (JD-R) Model

Bakker & Demerouti's JD-R model (2007) frames burnout as the result of an imbalance between demands (workload, time pressure) and resources (recovery, social support, autonomy) [6]. DevZen maps this directly:

Demands: Stress level, work hours
Resources: Sleep quality, hydration, mood (as a proxy for psychological resource availability)

When demands consistently exceed resources, burnout risk rises.

2.3 Allostatic Load Theory

McEwen's concept of allostatic load (1998) describes the cumulative physiological cost of chronic stress [7]. Sleep deprivation, dehydration, and sustained high cortisol (indicated by user-reported stress) all increase allostatic load. Our algorithm captures this through multiple biological and behavioural channels rather than relying on any single metric.

2.4 Developer-Specific Research

Prolonged sitting + screen time compounds cognitive fatigue (Thorp et al., 2011) [8].
Flow state disruption from notifications and context switching creates hidden stress spikes (Mark et al., 2008) [9].
"Always-on" culture in software teams blurs recovery boundaries (Perlow & Porter, 2009) [10].

These informed our decision to weight recovery signals (sleep, hydration) alongside psychological signals (mood, stress).

3. The Algorithm

3.1 Overview

DevZen computes a Burnout Risk Score (0–100) once per day using a weighted linear model. The score is deterministic — same inputs always produce the same output — and runs entirely on-device (offline-capable, no data leaves the phone).

Burnout Score = Σ (weight_i × normalised_component_i) × 100

Higher score = higher burnout risk.

3.2 Input Signals

The algorithm consumes a 7-day rolling window of user-logged data:

Signal	Source	Raw Range	Aggregation
Mood	Daily mood logs	1–5 (awful → great)	Mean over last 7 days
Stress	Daily stress logs	1–10 (calm → extreme)	Mean over last 7 days
Sleep	Apple Health / manual	Hours per night	Mean over last 7 days
Work Hours	User profile (onboarding)	4–16 hours/day	Static value
Hydration	Water log entries	Glasses per day	Mean over days with logs

Why 7 days? A 7-day window captures short-term trends (a bad week) without being diluted by older data. It also aligns with standard epidemiological practice for self-reported wellbeing measures [11]. Recomputation happens on each dashboard load, so the score is always current.

3.3 Normalisation

Each signal is normalised to a 0–1 scale where 1 = worst case (maximum burnout contribution) and 0 = best case (no burnout contribution):

Component	Formula	Worst Case (→ 1)	Best Case (→ 0)
Mood	`(5 - moodAvg) / 4`	Mood = 1 (awful)	Mood = 5 (great)
Stress	`(stressAvg - 1) / 9`	Stress = 10 (extreme)	Stress = 1 (calm)
Work	`(workHours - 4) / 12`	16 hours/day	4 hours/day
Sleep	`1 - (sleepAvg - 5) / 4`	≤ 5 hours/night	≥ 9 hours/night
Hydration	`1 - (waterAvg / waterGoal)`	0 glasses	Meeting/exceeding goal

All values are clamped to [0, 1] after computation.

3.4 Weights

┌───────────────────────────────────────────────────────┐
│                 Weight Distribution                    │
│                                                       │
│   ███████████████  Mood          30 %                 │
│   ███████████████  Stress        30 %                 │
│   ██████████       Work Hours    20 %                 │
│   █████            Sleep         10 %                 │
│   █████            Hydration     10 %                 │
│                                                       │
│   Psychological ─────── 60 %                          │
│   Behavioural / Physical 40 %                         │
└───────────────────────────────────────────────────────┘

Rationale for the 30-30-20-10-10 distribution:

Mood (30 %) and Stress (30 %): Psychological signals are the strongest predictors of burnout per MBI and JD-R literature. Mood captures the exhaustion axis; stress captures the demands axis. Together they account for 60 % of the score, reflecting that burnout is fundamentally a psychological phenomenon.
Work Hours (20 %): The single strongest objective demand signal. Research consistently links >50 hours/week with elevated burnout risk (Kodz et al., 2003) [12]. Weighted lower than mood/stress because perception of workload matters more than raw hours — some developers thrive on long hours if the work is engaging.
Sleep (10 %): Sleep deprivation impairs emotional regulation and amplifies perceived stress (Walker, 2017) [13]. Weighted at 10 % because the current input is coarse (hours only, no quality metrics) and Apple Health integration is pending.
Hydration (10 %): Even mild dehydration (1–2 % body mass loss) impairs cognitive performance and mood (Ganio et al., 2011) [14]. Weighted lowest because the relationship to burnout is indirect — it's a recovery/self-care proxy more than a direct burnout driver.

3.5 Final Score Computation

score = (mood_weight × mood_norm
       + stress_weight × stress_norm
       + work_weight × work_norm
       + sleep_weight × sleep_norm
       + hydration_weight × hydration_norm) × 100

Result is rounded to the nearest integer and clamped to [0, 100].

3.6 Worked Examples

Example A — Healthy Developer

Signal	Value	Normalised
Mood avg	4.2 / 5	(5 - 4.2) / 4 = 0.20
Stress avg	3.0 / 10	(3.0 - 1) / 9 = 0.22
Work hours	8 h	(8 - 4) / 12 = 0.33
Sleep avg	7.5 h	1 - (7.5 - 5) / 4 = 0.375
Hydration avg	7 / 8 goal	1 - 7/8 = 0.125

Score = (0.30 × 0.20 + 0.30 × 0.22 + 0.20 × 0.33 + 0.10 × 0.375 + 0.10 × 0.125) × 100
      = (0.060 + 0.066 + 0.066 + 0.0375 + 0.0125) × 100
      = 0.242 × 100
      = 24   → Low Risk ✅

Example B — Developer Approaching Burnout

Signal	Value	Normalised
Mood avg	2.1 / 5	(5 - 2.1) / 4 = 0.725
Stress avg	7.5 / 10	(7.5 - 1) / 9 = 0.722
Work hours	12 h	(12 - 4) / 12 = 0.667
Sleep avg	5.5 h	1 - (5.5 - 5) / 4 = 0.875
Hydration avg	3 / 8 goal	1 - 3/8 = 0.625

Score = (0.30 × 0.725 + 0.30 × 0.722 + 0.20 × 0.667 + 0.10 × 0.875 + 0.10 × 0.625) × 100
      = (0.2175 + 0.2166 + 0.1334 + 0.0875 + 0.0625) × 100
      = 0.7175 × 100
      = 72   → High Risk 🔴

Primary driver: Mood (highest weighted contribution at 0.2175).

4. Risk Levels

The continuous 0–100 score is mapped to three risk levels for user-facing communication:

Score Range	Risk Level	Colour	User Message
0 – 39	Low	Teal (#18B89A)	"Great work — keep up your healthy routines."
40 – 69	Moderate	Amber (#E8940A)	"Watch your stress and sleep this week."
70 – 100	High	Red (#E84848)	"Take a break. Your burnout risk is elevated."

Why three buckets, not five or a continuous gradient?

Burnout research shows that people respond better to categorical risk labels than to precise numbers for behaviour change (Weinstein, 1999) [15]. Three levels are actionable: "you're fine", "pay attention", or "intervene now." More granularity creates decision paralysis; fewer levels loses the "watch out" middle ground.

5. Primary Driver Identification

Beyond the aggregate score, the algorithm identifies the primary driver — the single component contributing the most weighted risk. This is the component with the highest value of (weight_i × normalised_component_i).

This powers the actionable insight: "Your burnout risk is Moderate. Main driver: stress. Consider taking a 10-minute walk between meetings."

When all inputs are null (new user, no data yet), primaryDriver is null and the UI shows a prompt to start logging.

6. Handling Missing Data

Real users don't log perfectly. The algorithm gracefully handles null (missing) inputs:

Missing Signal	Default Normalised Value	Rationale
Mood	0.4	Slightly pessimistic — nudges user to log
Stress	0.3	Conservative — assumes some baseline stress
Sleep	0.3	Conservative — assumes some sleep deficit
Hydration	0.3	Conservative — assumes mild underhydration

Design principle: "absent data is not good data." We deliberately avoid defaulting to zero (which would imply "everything is fine") because a user who stops logging may be disengaging — itself a burnout signal. The conservative defaults produce a mildly elevated score that encourages the user to resume logging without triggering false alarms.

7. Data Pipeline

┌─────────────┐    ┌──────────────┐    ┌───────────────┐    ┌───────────────┐
│  User Logs   │    │  7-Day Agg   │    │  Algorithm    │    │  Dashboard    │
│  (mood,      │───▶│  (averages   │───▶│  (normalise,  │───▶│  (score,      │
│  stress,     │    │  per signal) │    │  weight, sum) │    │  risk label,  │
│  water)      │    │              │    │              │    │  driver)      │
└─────────────┘    └──────────────┘    └───────────────┘    └───────────────┘
       │                                       │                     │
       ▼                                       ▼                     ▼
  WatermelonDB                          Pure function           UI Card +
  (mood_logs,                           (src/lib/               Wellbeing
   stress_logs,                          burnout.ts)             Ring
   water_logs)                                │
                                              ▼
                                     burnout_scores table
                                     (daily snapshot for
                                      trend analysis)

Collection: User logs mood (1–5), stress (1–10), and water (glasses) throughout the day.
Aggregation: The useTodayData() hook queries WatermelonDB for the last 7 days and computes means.
Computation: calculateBurnoutScore() — a pure function — normalises, weights, and sums.
Persistence: Result is stored in burnout_scores table with all five normalised components, enabling historical trend charts.
Display: The Today dashboard renders a colour-coded card with score, risk label, contextual message, and a wellbeing progress ring (inverted score).

8. Limitations & Future Improvements

Current Limitations

Limitation	Impact	Planned Mitigation
Sleep data is null	10 % of the model runs on a default value	Apple Health integration (Sprint 4)
Work hours are static	Doesn't capture overtime spikes	Calendar/time-tracking API integration
Linear model	Cannot capture nonlinear interactions (e.g., low sleep + high stress compounds worse than the sum)	TFLite on-device ML model (v2)
Self-reported data	Subject to mood-congruent bias (stressed people rate stress higher)	Passive sensors (HRV, screen time)
No temporal weighting	Day 1 and day 7 contribute equally	Exponential decay weighting

Roadmap (v2 — ML-Based)

The current rule-based model is a deliberate v1 choice: interpretable, debuggable, and privacy-safe. The v2 upgrade path involves:

On-device TFLite model trained on anonymised, opt-in data.
HRV (Heart Rate Variability) from Apple Watch / Google Health as a physiological stress indicator.
Screen time as a passive work-hours proxy.
Temporal weighting: Recent days count more (exponential decay with λ = 0.85).
Interaction terms: Sleep × Stress interaction to capture compounding effects.
Personalised baselines: Individual thresholds learned from each user's historical data, rather than fixed cutoffs.

9. Privacy & Ethics

All computation is on-device. No health data leaves the phone. The burnout score is computed locally using WatermelonDB (SQLite on iOS/Android, LokiJS on web).
No external ML inference. The rule-based model requires zero network access.
User controls all data. Logs can be deleted at any time; the score recomputes from available data.
No employer visibility. DevZen is a personal tool — there is no team dashboard, manager view, or data sharing.
Conservative defaults. Missing data nudges the score up slightly (encouraging logging) rather than down (falsely reassuring).

10. Validation & Testing

The algorithm is covered by a comprehensive test suite (src/lib/__tests__/burnout.test.ts):

Boundary tests: Worst-case inputs → score ≥ 70; best-case inputs → score < 10.
Range validation: Score always in [0, 100]; components always in [0, 1].
Risk level mapping: Verified at boundary values (39 → low, 40 → moderate, 69 → moderate, 70 → high).
Sensitivity tests: Each component independently raises the score when worsened.
Determinism: Same inputs → identical output (pure function, no side effects).
Null handling: All-null inputs → valid result with null primaryDriver; partial nulls → valid score with identified driver.

References

[1] World Health Organization. (2019). International Classification of Diseases 11th Revision (ICD-11). Burn-out (QD85). https://icd.who.int/browse/2024-01/mms/en#129180281

[2] GitLab. (2024). 2024 Global DevSecOps Report. https://about.gitlab.com/developer-survey/

[3] Haystack Analytics. (2023). Developer burnout and its impact on software security.

[4] Maslach, C., & Leiter, M. P. (2016). Understanding the burnout experience: Recent research and its implications for psychiatry. World Psychiatry, 15(2), 103–111.

[5] Maslach, C., Jackson, S. E., & Leiter, M. P. (1996). Maslach Burnout Inventory Manual (3rd ed.). Consulting Psychologists Press.

[6] Bakker, A. B., & Demerouti, E. (2007). The Job Demands-Resources model: State of the art. Journal of Managerial Psychology, 22(3), 309–328.

[7] McEwen, B. S. (1998). Stress, adaptation, and disease: Allostasis and allostatic load. Annals of the New York Academy of Sciences, 840(1), 33–44.

[8] Thorp, A. A., et al. (2011). Sedentary behaviors and subsequent health outcomes in adults. American Journal of Preventive Medicine, 41(2), 207–215.

[9] Mark, G., Gudith, D., & Klocke, U. (2008). The cost of interrupted work: More speed and stress. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 107–110.

[10] Perlow, L. A., & Porter, J. L. (2009). Making time off predictable — and required. Harvard Business Review, 87(10), 102–109.

[11] Stone, A. A., et al. (1998). A comparison of coping assessed by ecological momentary assessment and retrospective recall. Journal of Personality and Social Psychology, 74(6), 1670–1680.

[12] Kodz, J., et al. (2003). Working Long Hours: A Review of the Evidence. UK Department of Trade and Industry.

[13] Walker, M. (2017). Why We Sleep: Unlocking the Power of Sleep and Dreams. Scribner.

[14] Ganio, M. S., et al. (2011). Mild dehydration impairs cognitive performance and mood of men. British Journal of Nutrition, 106(10), 1535–1543.

[15] Weinstein, N. D. (1999). What does it mean to understand a risk? Evaluating risk comprehension. Journal of the National Cancer Institute Monographs, 25, 15–20.