← Back to research notes

Sleep disruption visible in messaging timestamps: what we can and cannot infer

Smartphone behavioral signals are, increasingly, a serviceable proxy for actigraphy in healthy young adults — the population they have been validated in. Their generalization to clinical populations and older adults is incomplete, and the gap between "rough sleep window estimate" and "clinically usable circadian phenotyping" is wider than many digital phenotyping enthusiasts acknowledge.

Sleep disruption is a clinical workhorse. It is a precipitant of mood episodes, a symptom of nearly every Axis I disorder, a side effect of most psychotropics, and one of the most actionable targets in psychiatric care. Yet the standard clinical assessment of sleep is a question on a checklist: "How are you sleeping?" Answered, typically, with the same casual estimate the patient gives at the dentist's office.

Actigraphy — wrist-worn accelerometry that infers sleep windows from movement — has been the research-grade alternative for thirty years. It works, but it requires the patient to wear and charge a device, return it after a study period, and tolerate the obvious novelty effects. In routine clinical practice it is rare.

Smartphone-derived behavioral signals offer a third path. The phone is already on the patient, already passively logging timestamps of the patient's interactions with it. If those timestamps can be turned into a defensible estimate of sleep windows, the clinical utility is real: not as a sleep study substitute, but as a continuous, longitudinal, cheap, unobtrusive trace of the patient's circadian rhythm across years.

What the validation literature shows

The most-cited validation studies of smartphone-inferred sleep estimates were done in healthy college student populations — the convenient sample of digital-phenotyping research. In those samples, several papers have reported that simple algorithms applied to phone-screen-on times, or to last-message and first-message times of day, recover sleep onset and offset within ~30 minutes of actigraphy estimates on most nights.

Group-level circadian metrics — mean sleep midpoint, day-to-day regularity (the Sleep Regularity Index, SRI), social jetlag — track actigraphic equivalents reasonably well, with correlations typically in the 0.6–0.8 range.

These are real findings. They support the use of smartphone-derived signals as a rough but useful population-level proxy. They do not support tight individual sleep-stage estimation, microarousal counting, or anything resembling polysomnographic precision.

What the data supports

For healthy young adults: rough estimates of sleep onset, sleep offset, sleep midpoint, day-to-day regularity, and weekday/weekend social jetlag. Useful for longitudinal trend visualization, not for diagnostic claims.

What messaging timestamps add (and where they fall short of phone-screen data)

Pratibmb operates on messaging archives, not raw phone-screen logs. The difference matters. A messaging archive captures every outgoing message, with precise timestamps, across years — potentially across many years preceding the first clinical contact. This is a strength: actigraphy gives you 1–2 weeks of data; a 10-year messaging archive gives you 3,650 days.

The weaknesses are also real. Messaging timestamps capture only the moments when the patient was actively typing and sending. Long stretches of phone use (reading, scrolling, watching video, browsing) leave no message-trace and so appear, to the algorithm, as candidate sleep windows. The signal is sparser and noisier than phone-screen data.

Practical implication: messaging-derived sleep estimates likely underestimate wakefulness (anyone awake but not texting looks asleep) and so likely overestimate total sleep time. Calibration is feasible at the individual level if anchor information is available (e.g. the patient confirms a few sleep windows manually) but should be assumed otherwise unreliable.

What is reasonably extractable

From timestamp data alone, the following are defensible signals that can be surfaced for clinician review — with appropriate uncertainty disclosed:

  1. Last-message time of day, by week. The latest a patient sends a message each night is a serviceable proxy for sleep onset, with the caveat above. Tracking this over years reveals long-term phase shifts.
  2. First-message time of day, by week. Sleep offset proxy. More reliable than onset because morning routines tend to involve early phone checks.
  3. Late-night message frequency. Number of messages sent in the 02:00–05:00 window per week. This is a particularly clinically relevant signal, as nocturnal awakenings and reduced sleep need are diagnostic features of mood disorder episodes.
  4. Day-to-day regularity. Variance in last-message and first-message times across days. A useful proxy for the SRI literature on regularity as a mood-disorder risk factor.
  5. Weekday vs. weekend phase shift. "Social jetlag" computed from messaging windows. Has independent associations with mood, metabolic health, and chronotype.

What the data cannot support

It is worth being explicit about claims that messaging-timestamp data do not support, no matter how appealing the inference:

  • Sleep stage estimation. No combination of timestamp features can reliably distinguish REM from N3. Any tool that suggests otherwise is overclaiming.
  • Total sleep time at clinical precision. Estimates within ~30 minutes per night are realistic in young adults; wider in older, clinical, or shift-working populations.
  • Detection of segmented sleep. The algorithm sees a single gap; it cannot tell whether the gap reflects continuous sleep or a sleep- wake-sleep pattern with no messaging during the wakeful period.
  • Insomnia subtype classification. Sleep-onset, sleep- maintenance, and early-morning awakening insomnia all produce overlapping timestamp patterns. Disentangling them requires patient-reported anchors.
  • Anything in shift workers, time-zone travelers, or night-shift populations without explicit modeling of context. The default algorithm assumes a circadian-synchronized 24-hour pattern.

The clinical case for tracking it anyway

Despite these limits, longitudinal sleep-window estimates from messaging archives have a clinical value that other measures cannot match: they are retrospective. A patient who arrives at first appointment with three years of messaging data can hand the clinician a multi-year sleep-pattern chart. No other tool generates that.

For mood disorder differential diagnosis, retrospective circadian patterns are especially valuable. Distinguishing major depressive disorder from bipolar II often turns on whether the patient has had hypomanic episodes in the past — episodes characterized by, among other features, decreased need for sleep. A patient who reliably maintained 4–5 hour sleep windows during a period they describe as "very productive" is a different diagnostic picture than one who maintained 8–9 hours throughout. Messaging timestamps may document the difference where memory cannot.

Similarly, identifying anniversary reactions, seasonal patterns, post- partum sleep changes, and medication-related sleep changes — all hinge on comparing the present to a baseline the patient cannot reliably recall. Smartphone behavioral data give us the baseline.

Conservative usage recommendations

For a clinical tool surfacing this data, our current thinking on responsible defaults:

  • Display sleep-window estimates as shaded ranges, not point estimates, with the uncertainty width visible.
  • Annotate any week with fewer than ~30 outgoing messages as "low confidence" and visually distinct.
  • Surface long-term trends (months, years) more prominently than short-term fluctuations (single weeks). The signal-to-noise ratio is much better at longer time scales.
  • Allow the patient to add anchor data ("I went to bed at midnight last Tuesday") to calibrate the algorithm to their personal patterns.
  • Never display a numerical "sleep quality score." There is no defensible basis for one from this data.

Where this leaves us

Messaging-timestamp analysis offers something clinical sleep assessment has rarely had: a continuous, retrospective, multi-year window into the patient's circadian behavior, derived from data the patient already owns. Used cautiously — with explicit uncertainty, with anchor calibration, with appropriate humility about clinical populations — it can complement (never replace) sleep history-taking, and may surface patterns the patient cannot recall and actigraphy cannot reach back far enough to capture.

Used incautiously, it is just another digital-phenotyping tool overpromising on a thin evidence base. We intend the former.

Selected references

  1. Wirz-Justice, A. (2006). Biological rhythm disturbances in mood disorders. International Clinical Psychopharmacology, 21, S11–S15.
  2. Wehr, T. A. (1991). Sleep-loss as a possible mediator of diverse causes of mania. British Journal of Psychiatry, 159(4), 576–578.
  3. Phillips, A. J. K., Clerx, W. M., O'Brien, C. S., et al. (2017). Irregular sleep/wake patterns are associated with poorer academic performance and delayed circadian and sleep/wake timing. Scientific Reports, 7(1), 3216.
  4. Wittmann, M., Dinich, J., Merrow, M., & Roenneberg, T. (2006). Social jetlag: misalignment of biological and social time. Chronobiology International, 23(1–2), 497–509.
  5. Saeb, S., Lattie, E. G., Schueller, S. M., Kording, K. P., & Mohr, D. C. (2016). The relationship between mobile phone location sensor data and depressive symptom severity. PeerJ, 4, e2537.
  6. Aledavood, T., Lehmann, S., & Saramäki, J. (2015). Digital daily cycles of individuals. Frontiers in Physics, 3, 73.
  7. Mohr, D. C., Zhang, M., & Schueller, S. M. (2017). Personal sensing: understanding mental health using ubiquitous sensors and machine learning. Annual Review of Clinical Psychology, 13, 23–47.