CATCH-IT Blog: Health Informatics Journal Club

“CATCH-IT Reports” are Critically Appraised Topics in Communication, Health Informatics, and Technology, discussing recently published ehealth research. We hope these reports will draw attention to important work published in journals, provide a platform for discussion around results and methodological issues in eHealth research, and help to develop a framework for evidence-based eHealth. CATCH-IT Reports arise from “journal club” - like sessions founded in February 2003 by Gunther Eysenbach.

Monday, December 14, 2009

Final CATCH-IT Report: Syndromic Surveillance Using Ambulatory Electronic Health Records

Hripcsak G, Soulakis ND, Li L, Morrison FP, Lai AM, Friedman C, Calman NS, Mostashari F. (2009). Journal of the American Medical Informatics Association, 16(3):354-61. Epub 2009 Mar 4.

Abstract & Blog Comments
Draft
Slideshow - not able to upload

Introduction
Syndromic surveillance is a type of surveillance that uses health-related data to predict or detect disease outbreaks or bioterrorism events. Much of the work in this area of research has been conducted on structured health data (1)(2). However, these systems typically need to be tailored to for a particular IT system and institution due to a lack of available data standards. Engineering syndromic surveillance systems in this way is both time consuming and localized only. On the other hand, utilizing narrative records for syndromic surveillance brings unique challenges as this data requires natural language processing. Several studies have successfully experimented with this approach (3)(4). With the increasing adoption of electronic health records (EHRs) there is an abundance of clinically-relevant data that could potentially be used for syndromic surveillance. As a result, creating a generic syndromic surveillance system that could be broadly applied and disseminated across institutions is attractive. This CATCH-IT report is a critique of a research paper detailing an approach to creating such a system (5).

Objectives
The aim of this study was to develop and assess the performance of a syndromic surveillance system using both structured and narrative ambulatory EHR data. The evaluation methodology suggests that the authors may be trying to assess the system’s performance based on its concurrent validity with other existing surveillance systems.

Hypothesis
Not explicitly stated. It appears implicitly that the authors expect the signals from the test systems and the ED data to occur at the same time (no lag).

Methodology

Setting
The Institute for Family Health (IFH) served as the data source for testing the surveillance systems. IFH is comprised of 13 community health centres in New York, all of which use the Epic EHR system.

Query Development
The authors took two different approaches to developing their syndromic surveillance system, a tailored approach on structured data and a generic approach on narrative data. The two syndromes of interest, influenza-like-illnesses (ILI) and gastrointestinal-infectious diseases (GIID), were defined by two physicians. Both sets of queries were developed based on these definitions. The tailored queries were created specifically for IFH by mapping key terms to available system data and using past influenza season data. The performance of the tailored queries for ILI and GIID were not thoroughly evaluated. The MedLEE natural language processing system (NLP) (6) was utilized to create the generic queries for narrative data. Generic queries were tested on internal medicine ambulatory notes from the Columbia University Medical Centre (CUMC). These queries were evaluated using a gold standard that was produced by a manual review of a subset of clinical notes. Queries were then selected based on their ROC performance.

Evaluation
The resulting queries were tested on 2004-2005 data from the Institutes for Family Health (IFH). All structured notes with recorded temperature (124,568) and de-identified narrative notes (277,963) were analyzed. The results of the two test systems were compared with two existing sources, the New York City Emergency Department chief complaint syndromic surveillance system (NYC ED) (7) and the New York World Health Organization (WHO) influenza isolates, using the lagged cross-correlation technique. The NYC ED served as the only comparison source for GIID.

Results
The ILI lagged cross-correlation for IFH structured and narrative isolates showed both a strong correlation with NYC ED isolates (0.93 and 0.88, respectively) and with one another (0.88). The correlation with WHO isolates was high (0.89 structured, 0.84 narrative), although less precise and produced an asymmetric lagged cross-correlation shape, which hindered interpretation of the true lag.

GIID results were more ambiguous. While IFH structured data correlated relatively well with the NYC ED data (0.81), the IFH narrative data correlated poorly with both IFH structured and NYC ED isolates (0.51 and 0.47, respectively). This result indicated that there was a particular problem with the generic narrative approach on GIID data. However, across all GIID comparisons there was a lack of precision in the correlations (wide confidence intervals) and clarity in interpreting the true lag.

Author’s Conclusions
The authors concluded that the tailored structured EHR system correlated well with validated measures with respect to both syndromes. While the narrative EHR data performed well only on ILI data, the authors believe this approach is feasible and has the potential for broad dissemination.

Methodological Issues & Questions

Query Development
Both sets of queries were based on syndrome definitions created by two domain experts. While the definitions of ILI and GIID are fairly well established, it is not clear why the authors decided to employ expert opinion for their definitions rather than using a standard definition from the CDC or the WHO. Although the definitions used in this study are valid and likely do not represent a major source of error, any contentions on this issue could have easily been avoided.

The bigger methodological issue in respect to the structured query development is the lack of query evaluation. A crude measure of sensitivity was calculated for the ILI query, but no manual review was undertaken to produce measures of specificity or predictive value. Of even more concern, the performance of the GIID query was not assessed at all. There does not appear to be any logical reason for these methodological omissions and this flaw calls into question the validity of the analyses of the structured query.

While the narrative query development was described in more detail than the structured query development, it is not without its flaws. CUMC notes were used for testing in this phase. However, the lack of context surrounding the CUMC makes it difficult to determine the robustness and generalizability of the query. For instance, it is not mentioned what EHR system is used, what kind of patient population is seen at this institution, or why ambulatory internal medicine notes were used for query testing. It seems counter-intuitive to use notes from a medical specialty to create a query for primary care ambulatory notes. Additionally, only notes generated by physicians who used the EHR were used and we are not told how many physicians encompassed this population.

To their benefit, the authors do conduct a manual review of a subset of the notes to produce a gold standard, but this process is not detailed clearly. It is unknown how many notes were used for this process and why only one reviewer undertook this process. Ideally, at least two reviewers would perform the review and a measure of inter-rater reliability (kappa) would be reported.

Comparison Data
A large part of this study’s value rests on the comparisons made between the test IFH systems and the established systems (NYC ED and WHO). However, the fundamental question here is whether or not these comparisons are valid and appropriate as the patient population and data range may differ greatly.

The NYC ED system utilizes chief complaints which are timely and produce good agreement for respiratory and GI illness (7). However, patients with GIID or mild respiratory symptoms may not go to the ED. This limitation has come to light as the system has failed in the past to detect GI outbreaks (7) and indicates the NYC ED data may not represent a gold standard. Additionally, the authors of the current study propose that the poor performance of the GIID narrative query may be due to the fact that the NYC ED covers a much broader geographical area. The implication here is that GIID are often localized and therefore may not be captured by local IFH community clinics. These distinctions between the NYC ED and IFH may account for some of the ambiguous results obtained and raises doubts about the decision to use the NYC ED as a comparison. However, it is likely that no alternative comparison source exists and therefore, the NYC ED represented the best available data source.

While the hypothesis is not explicitly stated, it appears that the authors would expect that their surveillance system would produce signals concurrent with the emergency signals. However, this assumption may not be valid as there is nothing to suggest that primary care signals would behave in this manner.

WHO Isolates are used as the second ILI comparison in this study, but are not described in any noteworthy detail. This hinders the reader in understanding the appropriateness of this source and thereby the meaningfulness of the cross-correlation results. A brief search on the internet revealed that the WHO has a National Influenza Centre (NIC) in NY which is part of a larger WHO Global Influenza Surveillance Network. The NIC samples patients with ILI and submits their biomedical isolates to the WHO for analysis. The WHO in turn will use the information for pandemic planning. The amount of testing that these centres will conduct depends on what phase the flu season is in, with more testing occurring during the start of the season in order to confirm influenza and less testing occurring at the peak of the season due to practicalities. Because the nature of the WHO NIC is very different in both motivation, operation, and scale compared to the NYC ED and IFH syndromic surveillance systems it raises questions about the appropriateness of using it as a comparison. In their study of the NYC ED, Heffernan(2004) includes the WHO isolates as a visual reference to provide context for their own signals, but they do not attempt to use it as a correlation metric. Given the aforementioned reasons this example may be the most appropriate way to use the WHO isolates.

Syndrome Keywords
The last area of discussion is regarding the keywords used to define the syndromes. A breakdown of the keywords used in each study found that only 3 terms (fever, viral syndrome, and bronchitis) were similar between the NYC ED and IFH queries. Interestingly, 4 terms used in the IFH queries (cold, congestion of the nose, sneezing, and sniffles) were exclusion criteria for ILI in the NYC ED. If the queries used across the studies are dissimilar, it may indicate that they are not identifying the same patients and this would raise more issues concerning the meaningfulness of the cross-correlation results.

Surprisingly, the GIID keywords for both NYC ED and IFH were similar. This complicates the interpretation of the GIID results, but may suggest that the issue here is with the geographical range and patient population.

Discussion
Due to the methodological issues discussed above, it is difficult to determine the validity and salience of the conclusions reached by the authors. As this project was exploratory in nature it would be beneficial if the authors took a step back to carefully review their queries and revise them appropriately after thorough evaluation of each query’s sensitivity, specificity, and predictive value. The most salient problem should be to develop internal consistency and reliability between the two IFH systems before attempting to compare their performance with external measures that may or may not be appropriately matched.

Overall, while the study is under practical limitations in its choice of comparative data sources, the authors presented an interesting idea that will likely be of use in the future and should be developed and evaluated more carefully.

Q’s for authors

1) How were abbreviations, misspellings, acronyms, and localized language dealt with in the narrative query development?
2) Why were internal medicine notes chosen for the IFH system if one is trying to create a general approach?
3) Can you please provide more detail about the WHO isolates and why they were chosen as comparison data?

References

1) Cochrane DG, Allegra JR, Chen JH, Chang HG. Investigating syndromic peaks using remotely available electronic medical records. Advances Dis Surveil 2007; 4-48.
2) Thompson MW, Correlation between alerts generated from electronic medical record (EMR) data sources and traditional sources. Advances Dis Surveill 2007; 4:268.
3) South BR, Gundlapalli AV, Phansalkar, S et al. Automated detection of GI syndrome using structured and non-structured data from the VA EMR. Advances Dis Surveill 2007;4:62.
4) Chapman MW, Dowling JN, Wagner MM. Fever detection from free-text clinical records for biosurveillance. J Biomed Inform 2004 Apr;37(2):120-7.
5) Hripcsak G, Soulakis ND, Li L, Morrison FP, Lai AM, Friedman C, Calman NS, Mostashari F. Journal of the American Medical Informatics Association 2009; 16(3):354-61. Epub 2009 Mar 4.
6) Friedman C, Shagina L, Lussier Y, Hripcsak G. Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc 2004; 11(5):392-402.
7) Heffernan R, Mostashari F, Das D, Karpati A, Kulldorff M, & Weiss D. Syndromic surveillance in public health practice, New York City. Emerging Infectious Diseases 2004; 10 (5): 858-864.

Monday, December 7, 2009

CATCH-IT Final Report: Web-based weight loss in primary care: A RCT

Paper: Bennett GB, Herring SJ, Puleo E, Stein EK, Emmons KM and Gillman MW. Web-based Weight Loss in Primary Care: A Randomized Controlled Trial. Obesity (2009) Advance online publication, 20 August 2009. DOI:10.1038/oby.2009.242

Abstract: click here
Slide presentation: click here
Draft report: click here

Introduction
The purpose of this paper is to review the study by Bennett et al.(1) on their web-based behavior modification intervention for weight loss. With rising obesity rates around the world,(2) there is a need for weight loss interventions that are accessible to a larger number of individuals. Behavior therapy can significantly enhance comprehensive weight loss strategies,(3) but access to lifestyle interventions are limited by costs and availability of counseling services. The authors present a web-based tool with the potential for wide scale implementation at low costs.

Objective
The objective of the study was to evaluate the short-term (12-week) efficacy of a web-based intervention in primary care patients with obesity (BMI 30 to 40 kg/m2) and hypertension.

Methods
A total of 101 obese, hypertensive patients were randomized to receive either the web-based intervention (n=51) or usual care (n=50). Intervention participants had access to the comprehensive weight loss website for 3 months, and four counseling sessions (two in-person sessions and two telephone sessions). Counseling was provided by a health coach (registered dietician) trained to use principles of motivational interviewing. The health coach provided counseling on “obesogenic” behavior goals (determined at the start of the intervention). Participants could select new goals at week 6. The primary purpose of the website was to facilitate daily self-monitoring of adherence to behavior change goals.

Participants in the usual care group received the standard care offered by the outpatient clinic. They were also given a copy of the “Aim for a Healthy Weight” document published by the National Heart Lung and Blood Institute.(4)

At baseline, and at 3 month follow-up, participants completed a web-based survey followed by anthropometric measures and blood pressure assessments. Participants were offered $25 for attending each assessment.

Results
Primary outcome: Greater weight loss was reported in the intervention group (-2.281 kg +/- 3.21) than the usual care group (0.28 +/- 1.87 kg); mean difference -2.56 kg (95% CI -3.60, -1.53). Intervention participants lost a greater percentage of baseline body weight (-2.6% +/- 3.3%) than usual care participants (0.39% +/- 2.16%); mean difference -3.04% (95% CI -4.26, -1.83). About a quarter of intervention participants (25.6%) lost >5% of their initial body weight at 12-week. None of the usual care participants lost > 5% body weight in the study period.

Secondary outcomes: A reduction in BMI was observed among the intervention group (-0.94 +/- 1.16 kg/m2) compared to an increase in BMI in the usual care group (0.13 +/- 0.75 kg/m2); mean difference -0.07 kg/m2 (95% CI -1.49, -0.64). No statistically significant differences were found for waist circumference, systolic blood pressure, and diastolic blood pressure.

Participants meeting the login goal (3 times per week) for at least 6 weeks had greater weight loss (-3.30 +/- 3.78 kg) than those who met the login goal for less than 6 weeks (-0.42 +/- 1.78 kg); mean difference: -2.88 kg (95% CI -1.56, -4.60). Those who met the login goal for 10 weeks (83% of study weeks) demonstrated much greater weight loss (-4.50 +/- 3.29 kg) than those who did not (-0.60 +/- 1.87); mean difference: -3.90 kg; (95% CI -2.43, -5.36).

No association was found between participation in four coaching sessions and weight loss.

Discussion
Bennett et al. reported short term weight loss of -2.25kg +/- 3.21 using a web based behavioral intervention. While this result is statistically significant, the small amount of weight loss is of limited clinical significance. Obesity guidelines suggest losing 10% of initial body weight (e.g., 10% of 100kg person = 10kg) for clinical benefits.(5,6)

The weight loss reported in the study was somewhat lower than weight loss reported in other weight loss internet interventions.(7) The authors suggested that dietary restrictions would be necessary to achieve results of larger magnitude. However, the rationale for treating obese patients with behavior therapy alone is not clear. Canadian obesity guidelines recommend diet and physical activity as the first-line treat of obesity. Behavior interventions are considered as an adjunct to other interventions.(3)

The relatively short study period was also somewhat unusual since long-term weight loss is the main challenge in obesity. The trial period of 12 weeks is much shorter than study periods reported by other studies.(6)

According to the authors, the intervention was developed to overcome the challenge of long term adherence. Adherence to behavior change strategies often wanes over time. Unfortunately, adherence to the web based intervention also decreased over the study period with 78% who met login goal at week 1, versus 43.1% at 12 weeks.

The authors suggested that the web based intervention can be implemented without a health coach. But, success without coach support may be less since a research assistant (health coach) contacting participants can act as a “push” factor that lowers attrition rates.(8)

Some of the references noted in the paper do not support the text. For example, on page 2, the paper states that “research staff subsequently collected anthropometric measures and blood pressure using established procedures (reference 20).” However, reference 20 refers to a 24 page survey NHANES food questionnaire.(9)

Some of the numbers reported in the paper need revisiting. For example, Table 1 (page 2) indicates that intervention participants (n = 51) had a higher body weight (101.0 kg +/- 15.4) than usual care participants (n = 50) with a body weight of (97.3 kg +/- 10.9). But, the body weight of all participants (n=101) was also reported as (97.3 kg +/- 10.9).

Despite all the weaknesses noted above, the study was well designed. The paper included most items on the CONSORT(10,11) and STARE-HI(12) checklists. Although the authors did not explicitly state that this was a pilot study, it appears that they have already begun other trials using their iOTA approach (see website).

The web intervention included many interesting features such as the display of the “average performance for other program participants,” regular updates to behavioral skills needed to adhere to obesogenic behavior change goals, social networking forum, recipes.

The obesogenic behavior change goals are very short and simple ("Walk 10,000 steps every day," "Watch 2 h or less of TV every day," "Avoid sugar-sweetened beverages," "Avoid fast food," "Eat breakfast every day," and "No late night meals and snacks"). Christensen(13) suggests that “shorter interventions” could be the primary role of the internet in disease prevention instead of delivery of lengthy therapy that requires hours of online work.

In terms of future research, it would be interesting to compare the web based intervention with other weight loss interventions. Although comparison with “usual care” may be common practice in randomized controlled trials, a “head-to-head” trial with an alternative intervention may contribute more knowledge. For instance, a comparison of the web based intervention with a paper based monitoring tool could be very informative, since it would allow analysis of web enabled features.

Questions
1. In the usual care group, what was standard care? How many visits did the usual care group make to the primary care provider for weight reduction?
2. Of the 124 ineligible participants, what were the reasons for ineligibility?
3. What was the web based survey completed by participants at baseline and at 3 months follow-up? Was the NHANES food questionnaire used as the web based survey? What were the results of the survey?
4. Table 3 (page 4) excluded the one participant who did not login once. Given that the range of logins from week 1 to week 12 includes “0”, why was this data omitted?
5. Intervention participants received “two 20-min motivational coaching sessions in person (baseline and week 6), and two, 20-min biweekly sessions via telephone (week 3 and 9)”. Were there biweekly telephone coaching sessions in addition to the two telephone sessions at week 3 and week 9? What was the impact of the “message feature that allowed for direct communication with the coach”? Was there extended access to the health coach beyond the four sessions throughout the 12 week study period?
6. With regards to the web-based intervention, what was being tracked by the web-based intervention (number of times eat out, number of stairs walked)? How many minutes on average did each session take? What behaviour skills were presented on the website and updated biweekly? What was the impact of the social networking forum?

References
1. Bennett G Bennett GG, Herring SJ, Puleo E, Stein EK, Emmons KM and Gillman MW. Web-based Weight Loss in Primary Care: A Randomized Controlled Trial. Obesity (2009) Advance online publication, 20 August 2009. DOI:10.1038/oby.2009.242
2. OECD Health Data 2009: How Does Canada Compare. http://www.oecd.org/dataoecd/46/33/38979719.pdf
3. Lau DCW, Douketis JD, Morrison KM, Hramiak IM, Sharma AM, Canadian clinical practice guidelines on the management and prevention of obesity in adults and children. CMAJ 2007;176(8 suppl):Online-1–117 www.cmaj.ca/cgi/content/full/176/8/S1/DC1
4. National Heart, Lung, and Blood Institute, National Institutes of Health. Aim for a Healthy Weight. Washington DC: US Department of Health and Human Services, 2005. http://www.nhlbi.nih.gov/health/public/heart/obesity/aim_hwt.pdf
5. Wadden TA, Butryn ML, Wilson C. Lifestyle modification for the management of obesity. Gastroenterology. 2007 May;132(6):2226-38.
6. Sarwer DB, von Sydow Green A, Vetter ML, Wadden TA. Behavior therapy for obesity: where are we now? Curr Opin Endocrinol Diabetes Obes. 2009 Oct;16(5):347-52. DOI: 10.1097/MED.0b013e32832f5a79
7. M. Neve, P. J. Morgan, P. R. Jones and C. E. Collins. Effectiveness of web-based interventions in achieving weight loss and weight loss maintenance in overweight and obese adults: a systematic review with meta-analysis. Obesity Reviews. 2009 Sep 14 DOI: 10.1111/j.1467-789X.2009.00646.x
8. Eysenbach Gunther. The Law of Attrition. J Med Internet Res. 2005;7(1):e11. doi: 10.2196/jmir.7.1.e11. http://www.jmir.org/2005/1/e11/v7e11
9. National Heart, Lung, and Blood Institute, National Institutes of Health. Aim for a Healthy Weight. Washington DC: US Department of Health and Human Services, 2005. http://www.nhlbi.nih.gov/health/public/heart/obesity/aim_hwt.pdf
10. Moher D, Schulz KF, Altman DG for the CONSORT Group. The CONSORT Statement: Revised Recommendations for Improving the Quality of Reports of Parallel-Group Randomized Trials. Ann Intern Med. 2001;134:657-662. (http://www.consort-statement.org)
11. Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, Elbourne D, Gøtzsche PC, Lang T for the CONSORT Group. The Revised CONSORT Statement for Reporting Randomized Trials: Explanation and Elaboration. Ann Intern Med. 2001;134:663-694. (http://www.consort-statement.org)
12. Talmon J, Ammenwerth E, Brender J, de Keizer N, Nykänen P, Rigby M. STARE-HI--Statement on reporting of evaluation studies in Health Informatics. Int J Med Inform. 2009 Jan;78(1):1-9.
13. Christensen H, Ma A. The Law of Attrition Revisited. (J Med Internet Res 2006;8(3):e20) doi:10.2196/jmir.8.3.e20. http://www.jmir.org/2006/3/e20/

Friday, December 4, 2009

(Final) CATCH-IT Report: Effectiveness of Active-Online, An Individually Tailored Physical Activity Intervention, in a Real-life Setting: RCT

Wanner M., Martin-Diener E., Braun-Fahrländer C., Bauer G., Martin B.W. (2009). Effectiveness of Active-Online, an Individually Tailored Physical Activity Intervention, in a Real-Life Setting: Randomized Controlled Trial. J Med Internet Res, 11 (3): e23.


Original Post - Abstract Only - Full Text - Slideshow - Draft Report

Introduction

This report is a summary and analysis of the study conducted by Wanner et al. (2009)entitled Effectiveness of Active-Online, An Individually Tailored Physical Activity Intervention, in a Real-life Setting: Randomized Controlled Trial. The focus of the study was a web-based physical activity intervention called Active-Online which provides users with customized advice on increasing their physical activity levels. The study compared the effectiveness of Active-Online to a non-tailored website in changing physical activity behaviour when delivered in a real-life setting.

The authors found that a tailored web-based intervention is not more effective than a non-tailored website when deployed in an uncontrolled setting.


Objective

The study aimed to answer the following three questions:

  1. What is the effectiveness of Active-Online, compared to a non-tailored website, in increasing self-reported and objectively measured physical activity levels in the general population when delivered in a real-life setting?
  2. Do participants of the randomized study differ from spontaneous users of Active-Online, and how does effectiveness differ among these groups?
  3. What is the impact of frequency and duration of use of Active-Online on changes in physical activity behaviour?


Methods

A randomized controlled trial (RCT) was used to answer the questions posed by the authors. Three groups of participants were observed during the trial—the control group (CG), the intervention group (IG), and the spontaneous users (SU) group. CG and IG participants were recruited through media advertisements and randomized into their respective groups using a computer-based random number generator. Participants in the SU group were recruited directly from the Active-Online website by redirecting them to the study website if they chose to participate in the study. A sub-group of participants volunteered to wear an accelerometer so that their physical activity levels could be objectively measured during the study.

Participants in IG and SU visited the Active-Online website and answered diagnostic questions about their physical activity behaviour to receive customized feedback on how to improve them. Those in CG visited a static website to receive generic tips on physical activity and health.

All groups were followed up via email at 6 weeks (FU1), 6 months (FU2) and 13 months (FU3) after the baseline assessment. There was no face-to-face component in the study.


Measures

Three types of data were collected in the study:

  1. Self-reported subjective measurements of physical activity levels obtained through follow-up questionnaires presented to all groups
  2. Objective measurements obtained from accelerometers worn by the subgroup
  3. Frequency and duration of visits to Active-Online obtained from the Active-Online user database which recorded each log-in to the website


Results

There was a significant increase in subjectively measured levels of physical activity among all groups from baseline to FU3, but no significant differences between randomized groups. However, the differences were more pronounced in the SU group. As for the objective measurements of physical activity obtained from accelerometer readings, there was no increase from baseline values in any of the groups. Measurements of frequency and duration of use of Active-Online showed an increase in self-reported total minutes of physical activity with increasing duration of use. However, this result was no longer significant when adjusted for stages of behaviour change (a concept based on the seven-stage behaviour change model as described by Martin-Deiner et al. (2004)).


Limitations

The inclusion of SU as an additional study arm may be seen by some readers as an interesting and exploratory endeavour. However, others may find that it takes away from the clarity of the study. The fact that the SU group is not randomized, not homogeneous with the two other groups, and only represents 7.4% of all visitors of Active-Online may cause some readers to wonder why it was included in the study at all. Moreover, the measurements obtained from this group were explicitly discounted in the “Discussion” section of the paper. It is suggested that the authors alert readers of the exploratory nature of the SU group early in the “Methods” section of the report when this group is first introduced. Readers will thus be made aware that the SU will not be counted towards the results of the study and has only been included to add another dimension of interest.

It is not clear in the report whether or not the authors had a set of eligibility criteria for participants, although this being a web-based study with no face-to-face component, it would have been difficult to enforce any eligibility criteria on participants at all. Furthermore, it is not known from the report how the authors ensured the uniqueness of participants. Participants were identified using unique email addresses, but it is quite likely that a single participant may have registered for the study multiple times using several different email addresses. This could seriously impact the study results if the same user was assigned to more than one user group as a result of using multiple email addresses.

In addition to the eligibility criteria, it is recommended that the authors provide a sample of the advertisement used to recruit participants and the questionnaire used at each follow-up, to comply with the CONSORT standards for reporting RCTs (CONSORT, 2009).

One other limitation of the study is that most of the participants already had high levels of physical activity at baseline, leading to a ceiling effect. As expected from such a large sample size, the results showed a regression towards the mean physical activity level in the population.


Discussion

Overall, the study was very well presented, with sufficient background information, clear writing, and appropriate use of tables and figures. The authors took a bold step in conducting a web-based intervention in an uncontrolled setting over a very long period of time. It is commendable that the authors frankly reported the limited effectiveness of their intervention when some researchers may have hesitated to do so. They did not go beyond their evidence to draw conclusions.

The study clearly answered the three questions set forth in the objective:

  1. The study found significant increase in physical activity levels between baseline and last follow-up (FU3) in all groups; however, there was no difference in the results between the randomized groups.
  2. Spontaneous users differed from randomized users in baseline characteristics, and also showed a significant increase in physical activity levels after using the intervention, compared to the randomized groups.
  3. The impact of frequency and duration of use of Active-Online on changes in physical activity levels of participants is not clear after the study.

The results from this study resonate with those of similar studies investigating web-based physical activity interventions (Spittaels et al., 2007). It adds to the existing evidence that effectiveness of a web-based physical activity intervention may be difficult to demonstrate when delivered in an uncontrolled setting. The study highlights some of the key issues pertaining to web-based studies in real-life settings, including attrition and contamination of the control group. High attrition rates have been recognized as a common problem in Internet-based studies (Eysenbach, 2005) and this was evident in the present study as well. It was also acknowledged by the authors that some members of the control group were familiar with and had used Active-Online at least once during the course of the study. This may have caused a bias towards the null.

Results from this study will be particularly useful for researchers in the field of healthcare and sports medicine. Further research could include the delivery of web-based physical activity interventions within wider health promotion contexts such as primary care or workplace settings.


Questions to the Authors

  1. How do you account for the contamination of CG in Internet-based studies such as this one?
  2. Could members of CG have accessed Active-Online as SU (using a different email address)?
  3. What were the technical difficulties causing 38 participants to be omitted from the study?
  4. How did you validate the uniqueness of the participants?
  5. What was the reason for not measuring the usage of the non-tailored website?


References

CONSORT. (2009). The CONSORT Group. Retrieved November 14, 2009, from The CONSORT Group: http://www.consort-statement.org/

Eysenbach, G. (2005). The Law of Attrition. Journal of Medical Internet Research , 7 (1), e11.

Martin-Diener, E., Thuring, N., Melges, T., & Martin, B. (2004). The Stages of Change in three stage concepts and two modes of physical activity: a comparison of stage distributions and practical implications. Health Education Research , 19 (4), 406-417.

Spittaels, H., De Bourdeaudhuij, I., & Vandelanotte, C. (2007). Evaluation of a website-delivered computer-tailored intervention for increasing physical activity in the general population. Prev Med, 44 (3), 209-217.

Spittaels, H., De Bourdeaudhuij, I., Brug, J., & Vandelanotte, C. (2007). Effectiveness of an online computer-tailored physical activity intervention in a real-life setting. Health Education Research, 22 (3), 385-396.

Wanner, M., Martin-Diener, E., Braun-Fahrlander, C., Bauer, G., & Martin, B. (2009). Effectiveness of Active-Online: An Individually Tailored Physical Activity Intervention, in a Real-Life Setting: Randomized Controlled Trial. Journal of Medical Internet Research , 11 (3), e23.

Monday, November 30, 2009

DRAFT CATCH-IT Report: Syndromic Surveillance Using Ambulatory Electronic Health Records

Hripcsak G, Soulakis ND, Li L, Morrison FP, Lai AM, Friedman C, Calman NS, Mostashari F. (2009). Journal of the American Medical Informatics Association, 16(3):354-61. Epub 2009 Mar 4.

Abstract & Blog Comments

Introduction
Syndromic surveillance is a type of surveillance that uses health-related data to predict or detect disease outbreaks or bioterrorism events. Much of the work in this area of research has been conducted on structured health data (1)(2). However, these systems typically need to be tailored to each particular system and institution due to a lack of available data standards. Engineering syndromic surveillance systems in this way is time consuming and localized. On the other hand, utilizing narrative records for syndromic surveillance brings unique challenges as this data requires natural language processing. Several studies have successfully experimented with this approach (3)(4). With the increasing adoption of electronic health records (EHRs) there is an abundance of clinically-relevant data that could potentially be used for syndromic surveillance. As a result, creating a generic syndromic surveillance system that could be broadly applied and disseminated across institutions is attractive. This CATCH-IT report is a critique of a research paper detailing the approach to creating such a system (6).

Objectives
The aim of this study was to develop and assess the performance of a syndromic surveillance system using both structured and narrative ambulatory EHR data. The evaluation methodology suggests that the authors may be trying to assess the system’s performance based on its concurrent validity with other existing surveillance systems.

Hypothesis
Not explicitly stated. It appears implicit that the authors expect the signals from the test systems and the ED data to occur at the same time (no lag).

Methods

Setting
The Institute for Family Health (IFH) served as the data source for testing the surveillance systems. IFH is comprised of 13 community health centres in New York, all of which use the Epic EHR system.

Query Development

The authors took two different approaches to developing their syndromic surveillance system, a tailored approach on structured data and a generic approach on narrative data. The two syndromes of interest, influenza-like-illnesses (ILI) and gastrointestinal-infectious diseases (GIID), were defined by two physicians. Both sets of queries were developed based on these definitions. The tailored queries were created specifically for IFH by mapping key terms to available system data and using past influenza season data. The performance of the tailored queries for ILI and GIID were not thoroughly evaluated. The MedLEE natural language processing system (NLP) (REF) was utilized to create the generic queries for narrative data. Generic queries were tested on outpatient internal medicine notes from the Columbia University Medical Centre (CUMC). These queries were evaluated using a gold standard that was produced by a manual review of a subset of clinical notes. Queries were then selected based on their ROC performance.

Evaluation
The resulting queries were tested on 2004-2005 data from the Institutes for Family Health (IFH). All structured notes with recorded temperature (124,568) and de-identified narrative notes (277,963) were analyzed. The results of the two test systems were compared with two existing sources, the New York City Emergency Department chief complaint syndromic surveillance system (NYC ED) (5) and the New York World Health Organization (WHO) influenza isolates, using the lagged cross-correlation technique. The NYC ED served as the only comparison source for GIID.

Results
The ILI lagged cross-correlation for IFH structured and narrative isolates showed both a strong correlation with NYC ED isolates (0.93 and 0.88, respectively) and with one another (0.88). The correlation with WHO isolates was high (0.89 structured, 0.84 narrative), although less precise and produced an asymmetric lagged cross-correlation shape, which hindered interpretation of the true lag.

GIID results were more ambiguous. While IFH structured data correlated relatively well with the NYC ED data (0.81), the IFH narrative data correlated poorly with both IFH structured and NYC ED isolates (0.51 and 0.47, respectively). This result indicated that there was a particular problem with the generic narrative approach on GIID data. However, across all GIID comparisons there was a lack of precision in the correlations (wide confidence intervals) and clarity in interpreting the true lag.

Author’s Conclusions

The authors concluded that the tailored structured EHR system correlated well with validated measures with respect to both syndromes. While the narrative EHR data performed well only on ILI data, the authors believe this approach is feasible and has the potential for broad dissemination.

Methodological Issues & Questions

Query Development
Both sets of queries were based on syndrome definitions created by two domain experts. While the definitions of ILI and GIID are fairly well established, it is not clear why the authors decided to employ expert opinion for their definitions rather than using a standard definition from the CDC or the WHO. Although the definitions used in this study are valid and likely do not represent a major source of error, any contentions on this issue could have easily been avoided.

The bigger methodological issue in respect to the structured query development is the lack of query evaluation. A crude measure of sensitivity was calculated for the ILI query, but no manual review was undertaken to produce measures of specificity or predictive value. Of even more concern, the performance of the GIID query was not assessed at all. There does not appear to be any logical reason for these methodological omissions and this flaw calls into question the validity of the analyses of the structured query.

While the narrative query development was described in more detail than the structured query development, it is not without its flaws. CUMC notes were used for testing in this phase. However, the lack of context surrounding the CUMC makes it difficult to determine the robustness and generalizability of the query. For instance, it is not mentioned what EHR system is used, what kind of patient population is seen at this institution, or why outpatient internal medicine notes were used for query testing. It seems counter-intuitive to use notes from a medical specialty to create a generic query for ambulatory notes. Additionally, only notes generated by physicians who used the EHR were used and we are not told how many physicians encompass this population.

To their benefit, the authors do conduct a manual review of a subset of the notes to produce a gold standard, but this process is not detailed clearly. It is unknown how many notes were used for this process and why only one reviewer undertook this process. Ideally, at least two reviewers would perform the review and a measure of inter-rater reliability (kappa) would be reported.

Comparison Data

A large part of this study’s value rests on the comparisons made between the test IFH systems and the established systems (NYC ED and WHO). However, the fundamental question here is whether or not these comparisons are valid and appropriate as the patient population and data range may differ greatly.

The NYC ED system utilizes chief complaints which are timely and produce good agreement for respiratory and GI illness (REF). However, patients with GIID or mild respiratory symptoms may not go to the ED. This limitation has come to light as the system has failed in the past to detect GI outbreaks (5) and indicates the NYC ED data may not represent a gold standard. Additionally, the authors of the current study propose that the poor performance of the GIID narrative query may be due to the fact that the NYC ED covers a much broader geographical area. The implication of this is that GIID are often localized and therefore may not be captured by local IFH community clinics. These distinctions between the NYC ED and IFH may account for some of the ambiguous results obtained and calls into question the decision to use the NYC ED as a comparison. However, it is likely that no other comparison source exists and the NYC ED represented the best available data.

While the hypothesis is not explicitly stated, it appears that the authors would expect that their surveillance system would produce signals concurrent with the emergency signals. However, this assumption may not be valid as there is nothing to suggest that primary care signals would behave like this.

WHO Isolates are used as the second ILI comparison in this study, but are not described in any noteworthy detail. This hinders the reader in understanding the appropriateness of this source and thereby the meaningfulness of the cross-correlation results. A brief search on the internet revealed that the WHO has a National Influenza Centre (NIC) which is part of a larger WHO Global Influenza Surveillance Network. The NIC samples patients with ILI and submits their biomedical isolates to the WHO for analysis. The WHO in turn will use the information for pandemic planning. The amount of testing that these centres will conduct depends on what phase the flu season is in, with more testing occurring during the start of the season in order to confirm influenza and less testing occurring at the peak of the season due to practicalities. Because the nature of the WHO NIC is very different in both motivation, operation, and scale compared to the NYC ED and IFH syndromic surveillance systems it raises questions about the appropriateness of using it as a comparison. In their study of the NYC ED, Heffernan et al. (2004) include the WHO isolates as a visual reference to provide context for their own signals, but they do not attempt to use it as a correlation metric. Given the aforementioned reasons this example may be the most appropriate way to use the WHO isolates.

Syndrome Keywords

The last area of discussion is regarding the keywords used to define the syndromes. A breakdown of the keywords used in each study found that only 3 terms (fever, viral syndrome, and bronchitis) were similar between the NYC ED and IFH queries. Interestingly, 4 terms used in the IFH queries (cold, congestion of the nose, sneezing, and sniffles) were exclusion criteria for ILI in the NYC ED. If the queries used across the studies are dissimilar, it may indicate that they are not identifying the same patients and this would raise more issues concerning the meaningfulness of the cross-correlation results.

Surprisingly, the GIID keywords for both NYC ED and IFH were similar. This complicates the interpretation of the GIID results, but may suggest that the issue here is with the geographical range and patient population.

Discussion
Due to the methodological issues discussed above, it is difficult to determine the validity and salience of the conclusions reached by the authors. As this project was exploratory in nature it would be beneficial if the authors took a step back to carefully review their queries and revise them appropriately after thorough evaluation of each query’s sensitivity, specificity, and predictive value. The most salient problem should be to develop internal consistency and reliability between the two IFH systems before attempting to compare their performance with external measures that may or may not be appropriately matched.

Overall, while the study is under practical limitations in its choice of comparative data sources, the authors presented an interesting idea that will likely be of use in the future and should be developed and evaluated more carefully.

Q’s for authors

1) How were abbreviations, misspellings, acronyms, and localized language dealt with in the narrative query development?
2) Why were internal medicine notes chosen for the IFH system if one is trying to create a general approach?
3) Can you please provide more detail about the WHO isolates and why they were chosen as comparison data?

References

1) Cochrane DG, Allegra JR, Chen JH, Chang HG. Investigating syndromic peaks using remotely available electronic medical records. Advances Dis Surveil 2007; 4-48.
2) Thompson MW, Correlation between alerts generated from electronic medical record (EMR) data sources and traditional sources. Advances Dis Surveill 2007; 4:268.
3) South BR, Gundlapalli AV, Phansalkar, S et al. Automated detection of GI syndrome using structured and non-structured data from the VA EMR. Advances Dis Surveill 2007;4:62.
4) Chapman MW, Dowling JN, Wagner MM. Fever detection from free-text clinical records for biosurveillance. J Biomed Inform 2004 Apr;37(2):120-7.
5) Heffernan R, Mostashari F, Das D, Karpati A, Kulldorff M, & Weiss D. Syndromic surveillance in public health practice, New York City. Emerging Infectious Diseases 2004; 10 (5): 858-864.
6) Hripcsak G, Soulakis ND, Li L, Morrison FP, Lai AM, Friedman C, Calman NS, Mostashari F. (2009). Journal of the American Medical Informatics Association, 16(3):354-61. Epub 2009 Mar 4.

Saturday, November 28, 2009

CATCH-IT Final Report: 'Acceptability of a Personally Controlled Health Record in a Community-Based Setting: Implications for Policy and Design'

Weitzman ER, Kaci L, Mandl KD. Acceptability of a Personally Controlled Health Record in a Community-Based Setting: Implications for Policy and Design. J.Med.Internet Res. 2009 Apr 29;11(2):e14.

Doi:10.2196/jmir.1187

CATCH-IT Draft ReportAbstractFull TextSlideshow

Keywords
Medical records; medical records systems, computerized; personally controlled health records (PCHR); personal health records; electronic health record; human factors; research design; user-centric design; public health informatics

Introduction
The field of consumer health informatics (CHI) stems from the ideology of empowering consumers whereby CHI innovations have the potential to support knowing participation of patients/consumer in healthcare practices (1), (2). Personally Controlled Health Records (PCHR) being a web-based collection of a patient’s medical history can be considered as one such innovation in the field of CHI and is anticipated to result in better self-care, reduced errors and improved health (3). To this end, the paper by Weitzman et al, Acceptability of a Personally Controlled Health Record: Implications for Policy and Design (4)(2) is by far the first published report studying an electronic platform that puts the users in control of their personal health information from an electronic medical record to which they are subscribed. The paper provides an account of the pre-deployment, usability, pilot testing and a full demonstration of a typical PCHR, using Dossia Consortium’s Indivo PCHR as a medium of analysis.

The authors of this paper are pioneers in the field of public and consumer health informatics (5), affiliated with Harvard School of Public Health, and have published numerous papers in peer-reviewed journals. Although this is the first of their papers addressing the PCHR in a community-based setting, it is expected to help determine the changes that need to be made and the policies put in place before a widespread adoption of PCHR in a wider setting (6,6).

Objectives
The purpose of this study was to understand the acceptability, early impacts, policy, and design requirements of PCHRs in a community-based setting.

Weitzman and colleagues (7) describe the intervention as a citizen- or patient-centered health record system that interoperates with but not tethered to a provider system which represents a fundamental change from the current approaches to health information management.

Indivo PCHR being an open-course PCHR platform that has served as a model for the proliferation in the PCHR movement as cited in earlier papers by Kenneth Mandl (3), (8), this study aimed at exploring the beliefs and reactions to the Indivo PCHR. The paper therefore had two sets of objectives: primary aim of learning about the acceptability of PCHRS, and barriers and facilitators to its adoption; and the secondary aim of identifying policy and design issues before refining the system for deployment.

Methods
The research was a formative evaluation of beliefs, attitudes and preferences involving observational assessments over a two-year period from May/June 2006 to April 2008. Study participants were affiliated with the university health maintenance organization in a community-based setting. More than 300 participants took part in the study involving three phases: (i) pre-deployment (n=20 clinician, administrators, institutional stakeholders; age 35-60 years), (ii) usability testing (n=12 community members) and pilot demonstration (n=40 community members; age 25-65 years), and (iii) full demonstration (n= 250 participants; age 18-83 years; 81 email communications). No description was provided on how the participants were recruited.

The abstract mentions a shared rubric of a priori defined major constructs which may be interpreted as a framework analysis approach to the qualitative study. Whether or not it actually followed the framework analysis methodology is not clearly stated. Also, no reference has been cited with regard to the formal usability protocol.

In terms of the strength of the analyses, the analytic approach section describes the process of independent review of email communication by individual reviewers which was a positive aspect. Moreover, the triangulation of data involving time, space and persons (9) at different stages of the formative evaluation adds to the credibility, reliability and validity of the results in this qualitative study.

However, there lies a concern regarding the predefined themes- awareness of PHRs/PCHRs, privacy of personal health information, and autonomy. One may interpret that the themes were formed as part of a related study done earlier and the same set was applied in the context of this qualitative analysis. How the major constructs emerged and the process of being operationally defined is unclear and needs concrete explanation to make the methodology section transparent.

Results
Of the three preset themes, the level of awareness of PCHRs was found to be low, paired with high levels of expectations and autonomy. Moderate level of privacy concerns were identified with younger users possessing limited understanding of harmful consequences of sharing information on the web. Additionally, barriers and facilitators for adoption of a PCHR seemed to exist at institutional, interpersonal and individual levels.

Based on the STARE-HI (10) recommendations, the results section lacked specific information on outcome data in terms of unexpected events during the study and unexpected observations, if any.

Some very interesting topics such as literacy issues, guidelines on risk and safety mechanisms, creation of family-focused health record, and protocols and models in human subject participation in PCHR evaluations were discussed in the implications for policy and practice.

Limitations
In qualitative research, much of the analysis begins during data collection (11). In this case, the researchers already had a priori themes but it is not clear as to where the themes originate from and what the three major codes are based upon. Moreover, in the presence of preset themes, the analysis does not discuss at what point the researchers reached data saturation. An appendix providing background reference pieces would suffice to provide context to readers.

The sampling technique used to recruit participants was not stated in the paper. Although the fact that participants were already familiar with and trained in the health field was not a major concern, the inclusion and exclusion criteria would have clarified the study methodologies further.

There was no mention of medico-legal concerns, who owns the records were not brought out by clinicians interviewed! Also, in light of privacy issues, Indivo PCHR being open-source, it would be relevant to consider the long-term implications of Indivo’s relationship with Dossia consortium.

Discussion
According to Pope et al. (11) the framework approach is recommended for policy-relevant qualitative research. Having a priori defined major codes imply that the study reflected characteristics of a framework analysis, a clear statement as to the approach used is crucial.
This study was a labor-intensive and time-consuming task requiring trained and crucially experienced researchers (12) and is reflected in Table 1. However, providing a figure or table with an overview of the three research activities, summarizing key information such as activity, number of participants, characteristics of participants, questions/instructions to the participants would enable a reader to assess the study’s merit while providing transparency of the research conducted and facilitate future research in the field.

Since the area of innovation studied is relatively unexplored when compared to other PHRs in general, the qualitative approach was a particularly useful means of analyzing and exploring stakeholder reactions to base policy decisions (13).

Despite its limitations, the results and discussions were well-defined and reported but the methodology appeared to be lacking sufficient detail in order to justify the findings alluded to.
Overall, the study concept was well-conceived with the exception of few transition gaps between the methodology and results section that ensures a smooth reading. It would be interesting to compare the results with follow-up studies in different settings.

Questions for Authors
· How were the participants recruited? How many participated in focus-groups and one-to-one interview? Were participant cross-over permitted, for example, if someone took part in a focus group, could they also have participated in usability testing?
· What qualitative approach (ethnography, grounded theory, framework analysis, or any other method) was used?
· How was the coding performed?
· Did the study make use of any computer package to analyze data?
· What potential biases were faced and how were they eliminated?
· How were the sharing of passwords and account information of the participants dealt with in the analysis? What level of privacy education was provided to the participants? What consent choices were offered?
· How many users are represented in the email communications? Did the other users have any technical or other issues?
· Did the IRB approval state how to dispose the research data?
· Why did participants include non-clinical information? Was there a visible privacy policy on the PCHR application?
· Was the individual record encryption both ways: patient-PCHR?

Acknowledgements
The author would like to thank the members of the CATCH-IT Journal Club from HAD 5726 ‘Design and Evaluation in eHealth Innovation and Information Management’ course instructors Dr. Gunther Eysenbach and Ms. Nancy Martin-Ronson at the Department of Health Policy Management and Evaluation (HPME), University of Toronto for their insightful comments and constructive criticism in the creation of this report.

Multimedia appendix
http://www.slideshare.net/secret/NU2Rtnj2IU9FMd

References
(1) Eysenbach G. Consumer health informatics. BMJ 2000 Jun 24;320(7251):1713-1716.
(2) Lewis D, Eysenbach G, Kukafka R, Stavri PZ, Jimison H. Consumer Health Informatics- Informing Consumers and Improving Health Care. New York: Springer; 2005.
(3) Mandl KD, Simons WW, Crawford WC, Abbett JM. Indivo: a personally controlled health record for health information exchange and communication. BMC Med.Inform.Decis.Mak. 2007 Sep 12;7:25.
(4) Weitzman ER, Kaci L, Mandl KD. Acceptability of a personally controlled health record in a community-based setting: implications for policy and design. J.Med.Internet Res. 2009 Apr 29;11(2):e14.
(5) Harvard Medical School. Personally Controlled Health Record Infrastructure. 2007; Available at: http://www.pchri.org/2007/pages/organizers. Accessed October 31, 2009.
(6) Children's Hospital Boston. Available at: http://www.childrenshospital.org/newsroom/Site1339/mainpageS1339P1sublevel526.html. Accessed October 31, 2009.
(7) Weitzman ER, Kaci L, Mandl KD. Acceptability of a personally controlled health record in a community-based setting: implications for policy and design. J.Med.Internet Res. 2009 Apr 29;11(2):e14.
(8) Mandl KD, Kohane IS. Tectonic shifts in the health information economy. N.Engl.J.Med. 2008 Apr 17;358(16):1732-1737.
(9) Denzin N editor. Sociological Methods: A Sourcebook.. 5th edition ed.: Aldine Transaction; 2006.
(10) Talmon J, Ammenwerth E, Brender J, de Keizer N, Nykanen P, Rigby M. STARE-HI -Statement on Reporting of Evaluation Studies in Health Informatics. Yearb.Med.Inform. 2009:23-31.
(11) Pope C, Ziebland S, Mays N. Qualitative Research in Health Care- Analysing qualitative data. British Medical Journal 2000 8 January 2000;320:114-115-116.
(12) Dingwall R, Murphy E, Watson P, Greatbatch D, Parker S. Catching goldfish: quality in qualitative research. J.Health Serv.Res.Policy 1998 Jul;3(3):167-172.
(13) Woolf K. Surely this can't be proper research? Experiences of a novice qualitative researcher. The Clinical Teacher 2006 March 2006;3(1):19-22.

Monday, November 23, 2009

Final CATCH-IT Report: Clinical Decision Support capabilities of Commercially-available Clinical Information Systems

WRIGHT, A., SITTIG, D. F., ASH, J.S., SHARMA, S., PANG, J. E., and MIDDLETON, B. (2009). Clinical Decision Support capabilities of Commercially-available Clinical Information Systems. Journal of the American Medical Informatics Association, 16(5), 637 – 644.

Links: Abstract, Full Text, Presentation, Draft Report

Introduction

Clinical information systems (CISs) are helping with the production and maintenance of an increased amount of health information that can be used for clinical decision support (CDS) using CDS systems (1). Recent studies have reported that the CDS applications built in-house produce the best results (2). However, there is not much research done for the CDS capabilities of commercially available CISs (2). The paper by Wright et al. (2), “Clinical Decision Support Capabilities of Commercially-available Clinical Information Systems”, wishes to fill this gap in research by evaluating the CDS capabilities of 9 commercially available CCHIT certified EHR systems using a 42-element functional taxonomy. The study findings suggest that while some CDS capabilities are commonly available in the evaluated systems, some other capabilities are not very well covered among most systems (2).

This report is based on an evaluation of the study in the CATCH-IT Journal Club (3). It reports the key methodological issues of the study as a result of the CATCH-IT analysis. These issues are discussed in the following, and it is expected that consideration of this evaluation will enhance the quality of research performed by the research community.

A majority of the researchers involved with this study are affiliated with the Partners Healthcare System (4,5), with one being a candidate for a Masters degree at the Oregon Health & Science University (6). The key authors have had numerous publications in the area of CDS (7). The authors seem to have a sizable research network, with the primary author having the first publication in 2007 (7).

Study Background

The authors claimed that since no other functional taxonomy existed for CDS evaluation, the study utilized a self-developed functional taxonomy. The taxonomy, developed at the Partners Healthcare System, describes CDS capabilities along four axes (triggers, input data elements, interventions, and offered choices) for the evaluation of the CDS-capabilities in this research. In order to establish a baseline, the study used CCHIT-certified EHR systems to ensure that the selected systems meet a particular quality and have comparable features.

A background study has been conducted to determine the availability of any other functional taxonomies for CDS evaluation. It has failed to identify any other functional taxonomy that could be applicable for this research. In addition, functional features such as the workflow capabilities is considered as one of the most important features for the success of a CDS (1), the selected taxonomy does not use this in any of its axes.

The background study has identified that even though the taxonomy research (8) (titled “A Description and Functional Taxonomy of Rule-Based Decision Support Content at a Large Integrated Delivery Network”) was published in 2007, to date there are only 5 journal articles that reference the study (9). Four of these articles are self-authored (9). The only one reference from a neutral authorship does not make any comment or use of the taxonomy itself (10). Thus, the survey has failed to identify any neutral opinion about the developed taxonomy, raising concerns of self-boasting by the authors’ use of a self-developed taxonomy that lacks an apparent acceptance in the research community.

It must be noted that CCHIT uses a matrix of requirements based on its domain (such as for ambulatory care and outpatient care) and aspect (such as for EMR storage and CDS) of use (3). The report is unclear about the steps taken to ensure the alignment of the selected taxonomy with the CDS –specific CCHIT requirements.

Methods

A preliminary set of CCHIT-certified EHR systems was identified based on figures from Klas and HIMSS Analytics. The vendors involved with the development of these systems and the customers of these systems were then contacted, based on which a sample of 9 systems were selected for this study. Three of the authors interviewed ‘knowledgeable individuals’ (2) within the vendor organizations and the results from these interviews were used to evaluate the CDS capabilities of each system against the 42-element taxonomy. If there were any doubts about the availability of a particular feature in the systems, the authors contacted other members of the vendor organization, read product manuals, and conducted hands-on evaluation to determine availability of a feature.

The methods do not provide details about the sample size (number of systems originally reviewed), nature of the communications with the vendors and customers, and the short-listing criteria. While the reader is unable to find such basic information about the study, it also remains unclear on whether the study had followed an effective selection procedure with no external influence or bias. It is also difficult for the readers to the procedure used that resulted in short-listing to 9 systems.

The report has omitted key details on the data collection mechanisms. Inefficient data collection procedures can be a cause of unreliable study data. For example, it is not known whether the authors used one-on-one interviews or panel interviews for collecting the data, the number of interviews conducted with the same interviewee, number of interviews conducted with representatives from the same vendor, the style of the interviews (such as open-ended or close-ended interviews), number and types of questions asked, the follow-up procedure, the interview preparation procedure, and whether the interviewers had to reach a consensus or whether different interviewers made their own decisions.

The report has missing details on the interviewees, and who the authors referred to as the ‘knowledgeable individuals’. It is unlikely that all members of the vendor organization can answer the technical questions that the authors may have had. These raise concerns of bias and lack of knowledge among the interviewees. And this leads to the question as to what method the authors used to ensure that information provided by the interviewees are valid. In addition, the report is not clear about the strategy used by the authors to validate the interviewees’ responses when in doubt (2). Such omissions can lead to the audience’s inability to assess the legibility of the method used and determination of whether the study results can be trusted.

Results

The study results are pseudonymously presented (to respect the software vendors’ privacy) in a tabular form for each of the axis of the taxonomy. A binary-style evaluation with a yes (available) or no (unavailable) for applicable systems, or with a N/A (not applicable) was used. The final result was represented with a count of unavailable features for each system. In the authors’ view, the system with the least number of unavailable features is the best system.

Based on the description of the study methods, there seems to be a mismatch between the collection and usage of data in this research. Perhaps this is due to the missing details about the steps that the researchers took to reach a conclusion about a particular feature based on the collected data. This raises concerns about potential bias in the evaluation of feature availability without the inclusion of completeness, usefulness, and applicability of CDS features in evaluating each system. These features can have a great impact on the proper adoption of a CDS application (1). As the selected taxonomy does not incorporate the criteria above, it raises concerns about the validity of the results of the study.

The representation of the final results in this research by tallying the number of unavailable features provides limited context to the reader. With pseudonymous representation of the systems, the audience is left unclear about which features are lacking in each systems. And if this is not the conclusion that the authors had intended to reach, it the report remains unclear about the actual intent of this research. In addition, the authors’ selection of the best system as the one with the least number of unavailable features fails to assess the impact that each unavailable feature could have on the success of the system. It is possible that the absence of five infrequently used features may be better than the absence of one crucial feature in a system. As a result, the authors’ approach in not weighing the importance of a feature in the system evaluation is not useful for such comparative evaluations.

Conclusion

The discussion presented in this report highlight a few important issues with the study. It seems that the report has left out important details such as the identification of the systems in the results, detailed description of the study methods, and appropriate usage of the data collected. Perhaps if the report was more appropriately written with these details, these issues may have been minimized or eliminated. Based on its current status, the study fails to add much value to the research community, as neither can its pseudonymous results be used for ongoing CDS research, nor can its usage of the evaluation methodology with an untested taxonomy be utilized for evaluating commercially available CDS-enabled EHR systems in a reliable manner.

The use of a comprehensively developed CDS-based taxonomy with the use of weighed features based on their importance in different healthcare settings may have helped immensely in making the study results more reliable.

Questions for the authors

  1. What was the reason behind the use of a taxonomy which is not yet well-received in the research community? Do you feel that the other taxonomies may have been used with minor modifications?
  2. Why were both inpatient and outpatient systems with potentially different capabilities selected for the study? What was done to ensure that the selected taxonomy and the two types of systems were in alignment in terms of their features?
  3. What was the reason for which vendors of the systems were contacted for evaluating the systems? What was done to ensure unbiased information from the vendors?
  4. What methodological procedure, including the details of each step, was undertaken to validate the information collected from the vendors and customers?
  5. What was evaluation procedure undertaken to use the qualitative information gathered from the interviewees in order to conclude about the availability of a feature in a system? Why was this detail not included in the report?
  6. What was the reason behind counting the number of unavailable features rather than available ones? Did you not want to deal with the complexity of working with N/A? Or was it done as a workaround to deal with the two types of inherently different types of systems (inpatient and outpatient)?
  7. What led to the linear treatment of the capabilities without any discussion about the importance or usefulness of the capabilities? What impact do you feel that the inclusion of such details would have had on the study results?
  8. What was the actual objective of the study? Was it to demonstrate a CDS evaluation methodology or was it to identify the best CDS-enabled EHR system currently in the market?

Acknowledgements

The author thanks all the members of the CATCH-IT Journal Club, including Professor Gunther Eysenbach and Professor Nancy Martin-Ronson, for their insightful comments that helped with the evaluation of the study.

Bibliography

  1. Sittig D F WAOJAMBTJMAJSCEBDW. Grand challenges in clinical decision support. Journal of Biomedical Informatics. 2008; 41(2): p. 387-392.

  2. Wright A SDFAJSSSPJEaMB. Clinical Decision Support capabilities of Commercially-available Clinical Information Systems. Journal of the American Medical Informatics Association. 2009; 16(5): p. 637-644.

  3. CCHIT. Concise Guide to CCHIT Certification Criteria. [Online].; 2009 [cited 2009 October 10. Available from: http://www.cchit.org/sites/all/files/ConciseGuideToCCHIT_CertificationCriteria_May_29_2009.pdf.

  4. Clinical and Quality Analysis ISPHS. Clinical and Quality Analysis Staff. [Online]. [cited 2009 October 18. Available from: http://www.partners.org/cqa/Staff.htm.

  5. Healthcare P. What is Partners? [Online]. [cited 2009 October 20. Available from: http://www.partners.org/about/about_whatis.html.

  6. OHSU. DMICE: People – Students, Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University. [Online]. [cited 2009 October 20. Available from: http://www.ohsu.edu/ohsuedu/academic/som/dmice/people/students/index.cfm.

  7. Experts B. BioMed Experts. [Online].; 2009 [cited 2009 October 15. Available from: http://www.biomedexperts.com.

  8. Wrigh A GHHTaMB. A Description and Functional Taxonomy of Rule-Based Decision Support Content at a Large Integrated Delivery Network. Journal of the American Medical Informatics Association. 2007; 14(4): p. 489-496.

  9. Scopus. Scopus Journal Search. [Online].; 2009 [cited 2009 October 22. Available from: http://simplelink.library.utoronto.ca/url.cfm/54186.

  10. Chused A E KGJaSPD. Alert Override Reasons: A Failure to Communicate. American Medical Informatics Association - Annual Symposium Proceedings 2008. 2008;: p. 111-115.