Reliability and criterion validity of two applications of the iPhone™ to measure cervical range of motion in healthy participants

Summary of background data Recent smartphones, such as the iPhone, are often equipped with an accelerometer and magnetometer, which, through software applications, can perform various inclinometric functions. Although these applications are intended for recreational use, they have the potential to measure and quantify range of motion. The purpose of this study was to estimate the intra and inter-rater reliability as well as the criterion validity of the clinometer and compass applications of the iPhone in the assessment cervical range of motion in healthy participants. Methods The sample consisted of 28 healthy participants. Two examiners measured cervical range of motion of each participant twice using the iPhone (for the estimation of intra and inter-reliability) and once with the CROM (for the estimation of criterion validity). Estimates of reliability and validity were then established using the intraclass correlation coefficient (ICC). Results We observed a moderate intra-rater reliability for each movement (ICC = 0.65-0.85) but a poor inter-rater reliability (ICC < 0.60). For the criterion validity, the ICCs are moderate (>0.50) to good (>0.65) for movements of flexion, extension, lateral flexions and right rotation, but poor (<0.50) for the movement left rotation. Conclusion We found good intra-rater reliability and lower inter-rater reliability. When compared to the gold standard, these applications showed moderate to good validity. However, before using the iPhone as an outcome measure in clinical settings, studies should be done on patients presenting with cervical problems.


Background
Cervical disorders are major health problems in our society and an important source of disability [1]. The mean prevalence of neck pain in the general population is 23.1% with a higher incidence noted in office and computer workers [2]. It is also one of the most common reasons to visit a health care professional [2]. Consequences of cervical disorders are multiple and include deficits such as pain and decreased range of motion (ROM) [3], which may reduce social participation and even lead to a sick leave [4].
Assessment of ROM is a significant part of the physical therapist's role when evaluating a patient presenting with cervical disorders. Indeed, it helps to establish the clinical diagnosis and the prognosis, and also helps to elaborate an individualized treatment plan [5]. ROM is also an objective measure, which is essential to monitor the patient's evolution throughout therapy. For these reasons, valid and reliable assessment tools are necessary.
Recent smartphones are often equipped with an accelerometer (gravity sensor) and magnetometer (digital compass), which, through software applications, can perform various inclinometric functions. These applications are intended for recreational use, but have the potential to measure and quantify range of motion in many articulations, such as the cervical spine. For instance, previous studies have demonstrated the potential use of some applications in rehabilitation [23,24] and in ROM measurement [25]. The iPhone is easy to use and requires minimum training. Moreover, this instrument could allow the examiner (therapist) to obtain valid cervical ROM measurements, which can detect deficits in cervical ROM. Considering potential use of smartphones in rehabilitation and the favourable results obtained with digital inclinometers [13], the current study proposes to examine the psychometric properties of two applications (clinometer and compass) of the iPhone. The specific objectives are to determine the intra and inter-rater reliability of these two applications in the assessment of cervical ROM, as well as the criterion validity using CROM as the gold standard.

Design of the study
In this study, we used a descriptive correlational design to determine the reliability of the iPhone using intra and inter-rater reliability. For exploring the validity of these applications, we used criterion validity using the CROM as the gold standard. Because of the absence of any study on the reliability or validity of the iPhone for the measurement of cervical ROM, the population used in this study is composed of healthy participants (without neck pain and/or ROM deficits).

Participants
Our sample consisted of 28 healthy volunteers (9 men and 19 women) aged from 19 to 43 years old (mean ± SD: 23 ± 6). Participants were included if they were 18 years of age or older and had neither cervical spine problem or neck pain. We excluded persons with cervical pathology (ex. painful diagnosis of arthritis or whiplash during the past year), psychiatric condition (ex. dementia, amnesia, delirium) or neurological disease (ex. Multiple sclerosis, Lou Gehrig's Disease). The population included in this protocol was a convenient sample, recruited by purposive and snowball sampling. All volunteers consented for their participation in the study and did not receive monetary rewards or compensation for their time and participation to this study. The study was conducted in accordance with the Helsinki Declaration after approval from the ethics review board of the Centre hospitalier universitaire de Sherbrooke (project #10-199). All participants read the protocol, and a written consent was obtained in agreement with local ethics guidelines'. The study took place at the School of Rehabilitation of the Université de Sherbrooke. Considering the novel aspect related to the use of smartphones to measure ROM and the fact that we wanted to explore the psychometric properties of these applications, we opted for the recruitment of healthy subjects. All were assessed by the same instruments and the same observers.

Instruments iPhone's applications
The iPhone is a smartphone with many possible applications. The application used to measure the cervical ROM in frontal and sagittal planes is Clinometer (Peter Breitling, Version 3.3, http://www.plaincode.com/products), an application designed using the three inbuilt accelerometers (LIS302DL accelerometer). This application uses the internal three axes linear accelerometer to measure the direction of gravity's pull. For this, the gyroscope stays in one position, no matter the orientation. When placed against a solid surface, the inclinometer compares the angle of the object to the gyroscope, and displays the results using the software interface.
Flexion/extension measures were taken with the iPhone placed on the left side of the head, aligned with the ear (see Figure 1). Left and right Side flexion were measured with the iPhone on contralateral head side with level aligned with the eyes (see Figure 2).
The application used to measure the cervical ROM in horizontal plane is Compass, software already integrated in the iPhone. In order to point out the orientation of the iPhone, the application uses the built-in magnetometer, which the senses orientation relative to the Earth's magnetic field using the Hall effect (http://www.memsjournal. com/2011/02/motion-sensing-in-the-iphone-4-electronic-compass.html). The chip (AKM AK8975) senses the field in three directions, and from that can figure where the magnetic field pointing north is. Moreover, it also uses the accelerometer that tracks the movement of the device to measure changes in orientation. We choose the magnetic north to obtain our results. Rotation measures were taken with the iPhone placed on participant's head with the arrow aligned with the nose (see Figure 3).

Cervical Range of Motion Device (CROM)
The CROM was used for the measurement of cervical flexion, extension, lateral flexions and rotations. This eyeglasses-like instrument has three inclinometers placed at three different positions: one near the left ear for flexion/extension (sagittal plan) and another for the lateral flexions on forehead (frontal plane) and both are gravity dependent. Finally, the one on the top of the head (horizontal plane) is used for the measurement of rotations; it is magnetic dependant, therefore, a magnetic brace must be placed around the neck. This instrument was used as our gold standard considering that its reliability and validity have been studied extensively [10,18,26,27].

Clinical procedures
For the purpose of this study, participants were simply asked to perform maximal (end-range) neck flexion, extension, left side flexion, right side flexion, left head rotation and right head rotation. Each participant was asked to perform neck movement at his/her own pace without going to fast.

Selection of examiners for the reliability study
Four students in physical therapy received three hours of training to adequately manipulate the CROM device. In their training, they also taught other classmates how to use the device during a two-hour session to enhance their competence in using the CROM. Following their training session, they determined which anatomical point of reference should be used with the iPhone and  they trained for an hour to make sure their method was standardized. They then measured their own cervical ROM with the CROM and the iPhone (each student was measured twice). The intra-rater reliability was calculated for each student with the intraclass correlation coefficient (ICC). The two students with higher ICC's results (ICC = 0,79 and 0,81) were assigned as examiner for the reliability part of the study. These two practiced their techniques in another two-hour session with four volunteers to standardize the procedure. Overall, the examiners had eight hours of training with the two instruments. This was done in order to minimize the error originating from examiners.

Selection of the examiner for the validity study
Between the two examiners, the one with the highest intra-rater reliability (highest ICCs) was chosen to undertake the validity study. This was done in order to minimize the error originating from examiners.

Data collection
During all data collection sessions, the participants were instructed on the procedures. They were then asked to warm-up with five repetitions of all cervical movements. Afterwards, stabilizing straps were installed to prevent any trunk and shoulder movements during the movement's execution (the same procedure were used during the selection of the examiners).
All measures were taken in the same order: flexion, extension, right and left lateral flexions and right then left rotations. This was done in order to minimize the possible bias induced by thixotropy [28].
For the sagittal and frontal plans, measures always corresponded to the total range (in degrees): the difference between final and initial measure. For example, a starting position of 5°slightly in extension and an endrange of 65°in flexion give us a total flexion of 70°(65°-(−5°) = 70°). Although the procedures for the CROM indicate to only take the final measure (angle at end-range), we could not use this method for the iPhone since our landmark was not necessary at 0°(i.e.: iPhone aligned with the ear), whereas it is always at 0°for the CROM. The total range (in degrees) was also used for the rotations movements.

Procedures for the reliability study
Two examiners entered two different rooms with a paired observer. They took all cervical ROM measures (flexion, extension, right then left lateral flexions and right then left rotations) with the iPhone while their paired observer wrote down the measures.
The examiners then changed room in a clockwise motion until they had taken two measures of each movement for each participant with the iPhone. This allowed us to measure both intra and inter-rater reliability (see Figure 4).  CROM; this allowed us to establish the criterion validity of the iPhone in comparison to this gold standard.

Data analysis Sample size calculation
In order to have minimal significant ICC value of 0.60 (1-β = 0.80; α = 0.05), a minimum of 20 subjects was required.
For the reliability part of the study, intra and interrater reliability were estimated with the intraclass correlation coefficient (ICC). ICC is a statistic designed to measure the size and direction of the association between two variables [29]. The values vary between −1 (perfect negative association) and +1 (perfect positive association). Different guidelines exist for the interpretation of ICC, but one reasonable scale is that an ICC value of less than 0,4 indicates poor reproducibility; ICC values in the range of 0,4 to 0,75 indicate fair to good reproducibility, and an ICC value of greater than 0,75 shows excellent reproducibility [30].
To estimate the criterion validity, we used the ICC and Pearson's correlation coefficient. Some consider the ICC to be more accurate than Pearson's correlation coefficient. For example, if one examiner always measures 5°m ore than another examiner, Pearson's correlation coefficient would still be high. The ICC has the advantage to control for this bias, and for the later example, the ICC would be lower since it verifies if the values are the same and not only associated. The interpretation of the ICCs for the validity part of the study, an ICC value of less than 0,5 indicates poor validity; ICC values in the range of 0,5 to 0,65 indicate moderate to good validity, and an ICC value of greater than 0,65 shows good validity. Reference values [30] are reported in Table 1. Thereafter, 95% confidence intervals (95% CI) were constructed around the point estimated to account for sampling variation. Finally, descriptive statistics for measures of ROM (degrees) for each movement are reported for the iPhone and the CROM using mean and standard deviation.

Intra-rater reliability
The highest ICCs were observed for examiner 1; they varied between 0.66-0.84; lower ICCs were found for examiner 2 where they varied between 0.17-0.68. Except for rotations, all movements had a good to excellent reliability, where side flexions demonstrated the best ICCs, and rotation the lowest ICCs. Table 2 shows the ROM obtained and the reliability coefficients (ICCs) for each movement.

Inter-rater reliability
To calculate the inter-rater reliability, we compared the average ROM value for each movement between both examiners. We found a moderate inter-rater reliability for movements in the sagittal (ICCs = 0.48-0.49) and frontal axis (ICCs = 0.40-0.54), but a poor inter-rater reliability in the transverse axis (ICCs = 0.07-0.09). The complete results are presented in Table 3.
We observed moderate validity for the movement of extension (ICC = 0.58; r = 0.56, p = 0.002) and right   Table 4 shows the complete results of the criterion validity (ICCs as well as the Pearson's correlation coefficient).

Discussion
This study is the first to examine the predictive value of two applications of the iPhone, which have the capability to measure cervical ROM using the CROM as the accepted gold standard. Although a few studies were already done on the validity of a digital device for the measurement of cervical ROM, no previous study was done on the digital inclinometer and/or the compass of the iPhone for the measurement of cervical ROM.
Finally, in the transverse axis, we found moderate to good intra-rater reliability (ICCs = 0.66-0.74; 95% CI: 0,39-0,87), while Prushansky et al. [22] observed higher ICCs (ICCs = 0.84-0.92; 95% CI: 0,68-0,96). This might be explained by the fact that they took their measurements with the inclinometer while the subjects were in supine position, whereas we used the compass rather than the inclinometer of the iPhone. Since the compass is not influenced by gravity, but rather by orientation of the iPhones, it has more potential source of error than the inclinometer, which could have easily influenced the intra and inter-rater ICCs. Furthermore, the magnetometer which serves as the hardware for the compass application is more sensible of the presence of electromagnetic fields which is another factor that could have contribute to the lower ICCs for the measurements of neck rotation.

Inter-rater reliability
When the ROM measured by two independent examiners were compared, our ICCs were moderate for  movements in the sagittal plane (ICCs = 0.48-0,49; 95% CI: 0,14-0,72) and in the frontal plan (ICCs = 0.40-0,54; 95% CI: 0,04-0,75). When we look closely at our results, we found that examiner 2, who used an iPhone generation 3GS, always had higher ROM measures than examiner 1, who used an iPhone generation 4. Considering that Apple uses an LIS302DL accelerometer for both iPhones 4 and 3GS and the two different generations of iPhone had the same operating system (iOS 4), factors related to the positioning of the iPhone might explain this observation. We also found poor correlation in transverse plan (ICC = 0.07-0,09; 95% CI: -0,30-0,44), which again might be explained by the presence of electro-magnetic fields that could influence the measure. On the other hand, it could also be attributed to the examiner since examiner 2 showed lower intra-rater reliability.

Validity
Cervical ROM measured with the iPhone presented comparable results (moderate to good validity) when compared to the ROM measured with the CROM for all cervical movements, except for the movement of left rotation (ICC = 0.43). On the basis of this relation, the validity of the iPhone can be considered good for these movements for a same examiner, except for rotation. The poor results observed for the movements of rotation (ICC < 0.60) may partly be explained by the fact that it was measured by an application very sensible to electromagnetic fields. This can lessen the accuracy of the, measurement. It could also be explained by the movement and/or positioning of the iPhone during the measurement of cervical rotation.
To our knowledge, no study examining the validity of the iPhone for assessing cervical ROM has been published. However, a recent article on the reliability and validity of a relatively inexpensive digital inclinometer reported results that were similar to our findings in sagittal and frontal planes: a good reliability (ICCs = 0,82-0,94) but lower validity (r = 0,62-0,83). Results were different for the rotation movements: a good reliability (ICCs = 0,84-0,92) and poor validity (results not reported). Their better results obtained for the reliability of rotations might be explain by the fact that rotations measured in supine position [22].
Our results show that measures of extension and right rotation had poor inter-rater reliability and thus mined the validity of this measure. This discrepancy may be attributed to the data collection procedures or the placement of the iPhone on the top of participant's head. Special efforts were made in this study to minimize this type of error, but we suggest that future measurements of rotation movements might be done with the iPhone on the top of the forehead while the person is lying supine as done by Prushansky [22].

Strengths and limitations
First, the two examiner's initial preparation (training) with the CROM represents strength. The assessment of the examiner's skills showed that they were competent (ICC > 0,65) in the use of the method and the device (E1: ICC = 0,81; 95% 0,56-0,92. E2: ICC = 0,79; 95% 0,52-91) (see Table 1 for ICC reference values). For the validity study, we purposely chose examiner 1 in order to minimize the source of error coming from the examiner.
Second, standardization of the procedures also helped minimize random errors. To achieve this, all participants were stabilized in order to avoid compensation. Also, the research assistant always gave the same instructions before each measurement for all participants and the environment was identical during all the data collection process: same rooms, same orientation or the participants (facing east), same chairs, etc.
Thirdly, measures were taken with the iPhone and the CROM were always taken in the same order. Thus, if the cervical ROM increased with repetitions, the pattern would be the same for all participants and would not influence our results.
Finally, the iPhone measures were always taken before the CROM measure to prevent an information bias. Due to the numerous measurements took with the iPhone, we considered that it would have been impossible for the examiner to remember all the results and influence its readings using the CROM. Therefore, we think that this help minimized an information bias. This study also had limitations. First, data was collected on a sample of healthy participants, which limits the external validity. Although we tried to minimize bias affecting the internal validity, but the fact that examiner 1 had higher intra-rater ICCs than examiner 2 might partly explain the modest results for the inter-rater reliability.

Conclusion
Implications of this study relate to the use of the iPhone to measure the cervical ROM in patients without neck dysfunction. The iPhone is a popular device and has good potential for clinical use. This instrument is easy to use and requires minimum training. Moreover, this instrument could allow the examiner (therapist) to obtain valid cervical ROM measurements, which can detect deficits in cervical ROM.
In the current non-probabilistic sample of healthy participants, we found that the iPhone had good intra-rater reliability but lower inter-rater reliability. When compared to a gold standard (CROM), the iPhone showed moderate to good validity for movements in the sagittal and frontal plans, but poor validity for rotation movements. At this stage, we cannot recommend the use of the iPhone to measure cervical range of motion in all directions. Moreover, before using the iPhone as an outcome measure in clinical settings, we should focus on finding better positioning method for the measurement of cervical rotation and more importantly, studies should be done on patients presenting with cervical problems.