In our research in digital phenotyping, I’ve long advocated for the importance of collecting raw data from smartphones and wearables instead of using pre-packaged proprietary data summaries available through Google’s and Apple’s software development kits (SDKs). There are many reasons for this. First, it is important to use the same algorithms for users of both Android and iOS phones. Second, these algorithms have often not been validated, which leads to challenges in reproducibility. Third, these algorithms change over time, which can invalidate even within-person comparisons (let alone comparisons across persons) over time. The topic of this post deals with this last issue. One would assume that downloading data for the same period twice should return the same data. After all, nothing about the behavior has changed for the historical data, it just happened to be downloaded at two different points in time. In the experiment below, we discover that when downloading data for a specific time period at two different points in time, we get strikingly different data sets.
The specific data of interest here is wearable data on heart rate variability (HRV). Cardiac autonomic dysregulation is associated, among others, with psychiatric disorders, and it can be assessed through heart rate variability (HRV). So-called RR intervals refer to inter-beat intervals between successive heartbeats, and in the time domain HRV is commonly characterized using either standard deviation of RR intervals (SDRR) or root mean square of successive RR interval differences (RMSSD). Stated differently, consider identifying the peaks in the time series, and then calculate peak-to-peak distance, leading to a time series of consecutive time intervals. You can then characterize the scale of the distribution of this time series by its standard deviation (as in SDRR), or you can take the first difference of the time series and characterize its RMS (as in RMSSD).
Our test subject started collecting HRV data using Apple Watch on December 6, 2018. Using Health Auto Export (https://www.healthexportapp.com/), the HealthKit HRV data was imported into CVS files and were not processed in any way. The data were imported twice: first export was on September 5, 2020, and the second export on April 15, 2021. Each time data was exported from the start date until the export date, meaning that we now have exported data twice for the time period from December 6, 2018, to September 5, 2020. There are 640 days in common across the exports and data is missing for 18 days only, so data are 97% complete. The two exported files should be identical, but we decided to look and confirm.
What is shown in the first plot is the daily mean HRV (measured in ms) over time, where time on the horizontal axis is measured in days since the start date. The two time series correspond to data obtained in the first and second exports. To be clear, these data cover the same date range, so they should be identical. In fact, their means are very similar, 52 vs. 55 for the first and second export, respectively, but their variances are very different: 1240 vs. 572. To get some further insight into this, I made a scatter plot of the values of one time series against the other. The dashed identity line is where we’d like to see the points fall if they were identical, as we’d hope. Instead, there’s a lot of scatter in the data, and their Pearson linear correlation coefficient is just 0.67. That’s not a very high correlation.
Many commercial wearables don’t allow researchers to access raw data, and most smartphone applications don’t collect raw data either. This is a problem for reproducible science. Take home message: if we’d like research in this space to be reproducible, we need to work with raw data.
These data were collected, extracted, and generously shared by Mr. Hassan Y. Dawood; I’m grateful for his support.