PEER REVIEW HISTORY BMJ Open publishes all

0 downloads 0 Views 267KB Size Report
Reviewers are asked to complete ... Attrition: One problem in online intervention tools is a high attrition rate. ... Language: Ms. should be corrected for typos, e.g. p.6: „“This in ... are being followed-up at 4 weeks the AUDIT-C is difficult to assess ..... Thus, we believe that this presents only a minor potential measurement error.
PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (http://bmjopen.bmj.com/site/about/resources/checklist.pdf) and are provided with free text boxes to elaborate on their assessment. These free text comments are reproduced below.

ARTICLE DETAILS TITLE (PROVISIONAL)

AUTHORS

Combining online community reinforcement and family training (CRAFT) with a parent training program for parents with partners suffering from alcohol use disorder: Study protocol for a randomized controlled trial Lindner, Philip; Siljeholm, Ola; Johansson, Magnus; Forster, Martin; Andreasson, Sven; Hammarberg, Anders

VERSION 1 – REVIEW REVIEWER REVIEW RETURNED GENERAL COMMENTS

Gallus Bischof University of Luebeck, Germany 20-Dec-2017 This study addresses an important topic by combing interventions for partners of individuals with Alcohol Use Disorders (AUDs) that aim to promote treatment entry of their drinking partners (Community Reinforcement and Family Training CRAFT) and to improve parenting skills for shared children (Parent Training PT). Since both concepts rely on similar principles, the rationale of the interventions is appealing. On the other hand, i had some concerns concerning the feasability of the study in terms of recruitment and attrition given the targeted sample size of 300 participants with highly specific inclusion criteria. Specific remarks: Introduction/target group: The numbers mentioned for hazardous alcohol consumption in Sweden should not be taken as a crude measure for AUDs, since in all western societies the number of hazardous (or even harmful) alcohol consumption exceeds by far the number of individuals suffering from AUDs; DSM-5 is not an appropriate reference to state that hazardous consumption is „indicative of AUDs“. In-/Exclusion criteria: In Table 1, inclusion and exclusion criteria are mentioned, however, it remains unclear in some of these how they are measured (especially since all contact with participants is made online), e.g. family violence: does this include psychological violence like threats etc.? How do you measure ist? How do you measure language skills or AUD criteria? Is there an option to identify individuals with multiple registrations (e.g. due to being unsatisfied with the intervention they will receive after randomisation? Attrition: One problem in online intervention tools is a high attrition rate. What attrition rate is expected and are there any strategies to improve access to the online tool in participants who do not actively work through the program? Minor issues: Language: Ms. should be corrected for typos, e.g. p.6: „“This in

1

includes learning how undesired, (….)“. Figure 1 appears twice (on p.27 and p.28) REVIEWER REVIEW RETURNED GENERAL COMMENTS

Professor Simon Coulton University of Kent UK 24-Jan-2018 1. Overall the paper would benefit from proof reading. There are some instances where the meaning is confused because of the language used. 2. On page 6 we are informed about a 'low-powered' study of selfhelp and bibliotherapy, was this 'low-powered' or 'under-powered'. 3. The aims and hypotheses might be better split into aims and hypotheses, and the hypothesis stated as a null hypothesis, it is this you will be testing. It should state that there will be no difference between the child's mental health state assessed by the SDQ and when assessed. Details of the methods need to be removed from here and placed in the methods section. 4. In the procedure section it states that parents will complete a screening battery, detail of what is included in this battery is required. Details of when consent will be taken is required. The details of the eligibility raised some serious safeguarding issues. You will exclude participants with children experiencing serious distress and exposed to family violence, as you ascertained this information via screening this may be the first time it has been identified, how will you respond to this population from an ethical perspective? 5. It is apparent from the methods section that you are running the study as a waiting list control study. Whereby participants allocated to the control become eligible to the intervention after 8 weeks. This would suggest as a research team you lack equipoise, the stated reason, to explore the differences in staff involvement cannot be answered using this method and it is invalid. The control population recruited at baseline are similar to the intervention population recruited at baseline, they have received an active intervention, you cannot just analyse them as if you have just recruited them. This is not the correct design to answer this additional question. 6. The sample size should be in the sample section of the methods rather than the analysis section. It should provide a justification of the clinical importance of the effect size of 0.4, the power, alpha and the fact your analysis will use a two-sided test and any adjustment for attrition in the study between randomisation and the primary outcome point. 7. In the intervention section additional detail on the active control is required so readers can distinguish between the control and intervention. You note participants will also have access to a discussion forum, will that not have the potential to introduce contamination between the group? 8. I found the outcome assessment findings difficult to understand. I assume the primary end-point for the study is the post-intervention assessment, it cannot be later because the control group would have crossed over and be receiving the intervention after this point. This raises a methodological issue. For those allocated to the intervention who complete a module each week the primary outcome is 4 weeks after randomisation, for those who do not, it is 8 weeks post-randomisation, you have two different outcome assessment times for two distinctly different groups, this has the potential to bias your analysis. 9. Your primary outcome is the SDQ, I assume this is the parent assessed version and this needs to be clear. I have never seen a

2

great deal of evidence that the SDQ is responsive to change, is this available? I also had a concern about how you will manage families with more than one child? The use of AUDIT-C to assess alcohol consumption at follow-up is potentially problematic. If participants are being followed-up at 4 weeks the AUDIT-C is difficult to assess as it uses monthly and monthly or less as a rating category. 10. The analysis section should address how the primary outcome at the primary will be assessed and how this concurs with the stated hypothesis. I could not quite appreciate the use of linear mixed models for the primary outcome, the data has too few data points, three, to be considered a repeated measure. It may be more rigorous to analyse the primary outcome using an analysis of covariance adjusting for baseline values. I was not convinced by the longer-term modelling approach, after 8 weeks the control group ceases to be a control group and becomes an intervention group. You cannot compare changes observed in this group with the intervention group because it is not similar at baseline and in essence is a self-selected rather than randomised group.

VERSION 1 – AUTHOR RESPONSE Response to reviewer comments

Editorial request:

Along with your revised manuscript, please provide a completed copy of the SPIRIT checklist (http://www.spirit-statement.org/). Please remember to include the relevant page number(s) from the manuscript next to each reporting item or state 'n/a' next to items that are not applicable to your study.

A SPIRIT checklist has now been included in the submission, as requested.

Comments by Reviewer #1:

Introduction/target group: The numbers mentioned for hazardous alcohol consumption in Sweden should not be taken as a crude measure for AUDs, since in all western societies the number of hazardous (or even harmful) alcohol consumption exceeds by far the number of individuals suffering from AUDs; DSM-5 is not an appropriate reference to state that hazardous consumption is „indicative of AUDs“.

We have revised this sentence to distinguish between hazardous alcohol consumption and AUD.

3

In-/Exclusion criteria: In Table 1, inclusion and exclusion criteria are mentioned, however, it remains unclear in some of these how they are measured (especially since all contact with participants is made online), e.g. family violence: does this include psychological violence like threats etc.? How do you measure ist? How do you measure language skills or AUD criteria? Is there an option to identify individuals with multiple registrations (e.g. due to being unsatisfied with the intervention they will receive after randomisation?

Thank you for bringing to our attention the need to clarify these aspects. All measures are selfreported (by the participant/CSO) using either validated instruments (e.g. AUDIT-C) or tailored questionnaires (e.g. exposure to violence). This has now been clarified in the manuscript. We have also clarified that we include only physical violence and that sufficient grasp of Swedish is defined as being able to follow procedure instructions, complete the screening battery and provide comprehensible, coherent answers.

As to whether it is possible to identify individuals with multiple registrations, the only pragmatic counter-measure that can be employed continuously and automatically is to only allow one account per email account, which is already implemented. This of course does not exclude the possibility of individuals registering multiple accounts to different email addresses, but since both study arms do receive an intervention and are blinded to the extent and content of the other, we believe that multiple registrations will not present a significant issue.

Attrition: One problem in online intervention tools is a high attrition rate. What attrition rate is expected and are there any strategies to improve access to the online tool in participants who do not actively work through the program?

Open, low-intensity interventions like the one studied in the current trial do indeed typically show high attrition rates. Based purely on meta-analytic findings on similar interventions, we expect approximately 25% of participants to complete all modules, 25% to drop out after completing the first module, and the remainder to complete at least two, but not all modules. However, motivation may very well be higher in our target population than in those of other internet interventions since help is harder to find elsewhere for our target group. For this reason, we see no value in speculating about expected attrition rates in the manuscript, beyond the issues pertaining to the power analysis.

Importantly, the expected high rate of attrition is one of the primary reasons for including a midtreatment assessment and modeling outcomes as a linear function of program completion and/or time, allowing maximum likelihood estimation of missing data (if missing at random). Attrition and engagement metrics will be described in the outcome study and we will perform per protocol analyses to complement ITT analyses. This is now mentioned in the study protocol.

Language: Ms. should be corrected for typos, e.g. p.6: „“This in includes learning how undesired, (….)“.

4

Thank you. We have carefully reviewed the manuscript for typos.

Figure 1 appears twice (on p.27 and p.28)

This was an artefact of the submission system, since we attached the figure both as part of the manuscript file and as a separate, vector-based image. The former was done for the convenience of the reviewers of seeing the image in context, but has now been omitted for the sake of clarity.

Comments by Reviewer #2:

Overall the paper would benefit from proof reading. There are some instances where the meaning is confused because of the language used.

Thank you. We have carefully reviewed the manuscript for typos.

On page 6 we are informed about a 'low-powered' study of self-help and bibliotherapy, was this 'low-powered' or 'under-powered'.

The term “under-powered’ describes a situation wherein there is insufficient power to test the hypothesis. We carefully chose the term “low-powered” to describe the study in question since we are not in a position to challenge the authors’ hypothesis, but nonetheless believe that it is important to describe that the non-significant difference reported may simply be due to a lack of power to detect anything but very large differences.

The aims and hypotheses might be better split into aims and hypotheses, and the hypothesis stated as a null hypothesis, it is this you will be testing. It should state that there will be no difference between the child's mental health state assessed by the SDQ and when assessed. Details of the methods need to be removed from here and placed in the methods section.

5

We wish to comply with journal standards and keep the heading “Aims and Hypotheses”. These two aspects are clearly separated in text. Reporting the statistical null hypothesis as the study hypothesis would deviate from scientific practice: for example, at time of writing, none of ten most recently published trial protocols in BMJ Open have reported study hypotheses in terms of statistical null hypotheses. We believe that such practice would at best be unnecessary, since any reader with a basic understanding of statistical inference knows that it is the statistical null hypothesis that is being tested, strictly speaking, and that the null hypothesis is derived from the stated study hypothesis. At worst, such practice risks confusing readers. As to the idea of moving methods-describing sentences from the “Aims and Hypotheses” section, we believe that a brief summary of the study design is necessary for readers to understand the hypotheses stated.

In the procedure section it states that parents will complete a screening battery, detail of what is included in this battery is required. Details of when consent will be taken is required. The details of the eligibility raised some serious safeguarding issues. You will exclude participants with children experiencing serious distress and exposed to family violence, as you ascertained this information via screening this may be the first time it has been identified, how will you respond to this population from an ethical perspective?

We explicitly state that the screening battery serves as the pre-intervention measure, i.e. includes all outcome measures. As screening batteries typically do, it also includes all questionnaires and rating scales required to inform inclusion and exclusion. Information about when consent is collected was already mentioned in the Ethics section; we now mention this also in the Procedure section.

Since submitting this study protocol and opening recruitment, we have seen a greater percentage than expected of potential participants excluded due to either exposure to violence, own drinking problems, co-parent drug problems, and similar issues that require more help. We have thus applied for and received ethical approval for providing similar interventions also to these individuals, albeit in separate, parallel studies, while also prompting them to seek help elsewhere. This is now described in the manuscript.

It is apparent from the methods section that you are running the study as a waiting list control study. Whereby participants allocated to the control become eligible to the intervention after 8 weeks. This would suggest as a research team you lack equipoise, the stated reason, to explore the differences in staff involvement cannot be answered using this method and it is invalid. The control population recruited at baseline are similar to the intervention population recruited at baseline, they have received an active intervention, you cannot just analyse them as if you have just recruited them. This is not the correct design to answer this additional question.

The reviewer appears to have misunderstood a key aspect of our study: nowhere in the manuscript do we state or imply that this is a waiting list-controlled study. On the contrary, we state explicitly in the “Aims and Hypotheses” section and in the very first paragraph of the Methods that we will evaluate intervention efficacy by comparison to an active control intervention that does not include any of the presumed psychotherapeutic components, but is believed to have some effect. The choice to have an

6

active control intervention rather than a waiting list was due to: (a) children being involved (and time thus being more sensitive), (b) to be able to blind participants, (c) to isolate the effects of the completing the behavioral exercises assumed to promote change in well-being, and (d) to reduce the possible biasing impact of being engaged in an online intervention regardless of type.

The reviewer also appears to have misunderstood our analytic strategy pertaining to the comparison group receiving the full intervention after first completing the comparison intervention. Our intent was never to directly contrast the two full intervention periods “as if we had just recruited them” – we are in full agreement with the reviewer that doing so would be inappropriate. As was stated in the manuscript, albeit seemingly not clearly enough, our intent is to model the within-group change in the comparison group arm after completing the two treatment periods in a separate, piecewise model which does not even include the other arm. This will allow us to compare achieved effect sizes, without directly contrasting the two arms. Comparing effect sizes in this way is perfectly akin to comparing effect sizes across different studies that feature the same intervention.

We are thankful to the reviewer for point out the need to clarify our analytic strategy. To avoid further misunderstandings, we have now omitted the mention of the piecewise analyses from the manuscript, now only stating that we can compare within-group effect sizes (without a statistical contrast) to get a cautious, preliminary indication of whether therapist support is associated with better outcomes. We hope that the reviewer is happy with this revision and is willing to leave it to the reviewers of the future study results manuscript to evaluate the pros and cons of this type of comparison.

The sample size should be in the sample section of the methods rather than the analysis section. It should provide a justification of the clinical importance of the effect size of 0.4, the power, alpha and the fact your analysis will use a two-sided test and any adjustment for attrition in the study between randomisation and the primary outcome point.

We have moved the power description to the Sample section of the Methods and added details and the rationale, as requested. As stated, we will analyze data using ITT, by which missing data is replaced. The described ITT effect size, being a function of means and standard deviations and estimated based on an ITT effect size from a previous study, is already adjusted for assumed attrition and the corresponding lower treatment effects, which we expect to be similar to the previous study.

In the intervention section additional detail on the active control is required so readers can distinguish between the control and intervention. You note participants will also have access to a discussion forum, will that not have the potential to introduce contamination between the group?

We have revised Table 2 to also feature the content of the control intervention, in order to provide more details (as requested) and allow side-by-side comparisons.

7

The discussion forum is not a part of the interventions being tested but rather the primary feature of the website where participants are recruited from. The forum is an open resource for both registered users and visitors of the site. The discussion forum is moderated and not structured in a way that allows participants to share program content.

I found the outcome assessment findings difficult to understand. I assume the primary endpoint for the study is the post-intervention assessment, it cannot be later because the control group would have crossed over and be receiving the intervention after this point. This raises a methodological issue. For those allocated to the intervention who complete a module each week the primary outcome is 4 weeks after randomisation, for those who do not, it is 8 weeks postrandomisation, you have two different outcome assessment times for two distinctly different groups, this has the potential to bias your analysis.

We believe that this issue is more complex than the reviewer suggests. Since our study features selfpaced interventions, albeit with recommended rates of completion and measurement interval limits, attempting to capture mid-treatment change means you will inexorably have to choose between either elapsed time or program completion as the definition of mid-treatment. We would argue that program (i.e. exercise) completion is the prime driver of change in this intervention, not elapsed time itself. Thus, it makes sense to define mid-treatment as actually having complete half the program. Since we do not want the elapsed time variation to be too large, we also superimpose an elapsed time limit of two weeks beyond the recommended pace. This is certainly no greater than the variation in elapsed time for a time-specified measurement that is typically seen in online interventions, which is seldom reported or taken into account in analyses. In any case, if our collected data does show that elapsed time is a better predictor of change than treatment completion, we can simply make use of the timestamps of the measurements, along with the inherent flexibility of mixed effects models to include time-varying predictors, to take variations in elapsed time between outcome measurements into account, or even model change as a function of numeric time rather than treatment completion.

Your primary outcome is the SDQ, I assume this is the parent assessed version and this needs to be clear. I have never seen a great deal of evidence that the SDQ is responsive to change, is this available? I also had a concern about how you will manage families with more than one child? The use of AUDIT-C to assess alcohol consumption at follow-up is potentially problematic. If participants are being followed-up at 4 weeks the AUDIT-C is difficult to assess as it uses monthly and monthly or less as a rating category.

We already clearly state that we will use the parent-rated version of the SDQ. Thank you for raising the need to clarify how measurement is done in cases where there is more than one child: the manuscript now mentions that participants are instructed to assess the child believed to be worst off at study inclusion and keep assessing the same child in the following assessments.

We are aware of at least two studies that have shown that the SDQ is responsive to change and have added these two references to the manuscript.

8

Concerning the use of the AUDIT-C, we recognize that two item response alternatives of the two items that measure drinking frequency loose sensitivity to change by one point per item if measured less than one month apart and the drinking frequency is suddenly so low that that other item response alternatives describing frequencies at a week-level do not apply. Since AUDIT-C scores of ≥4/5 are required for inclusion, sensitivity to change is lost only if there is a sudden exponential drop in scores, which may not be uncommon in interventions for individuals with AUD, but are unheard of in interventions for CSOs. Thus, we believe that this presents only a minor potential measurement error.

The analysis section should address how the primary outcome at the primary will be assessed and how this concurs with the stated hypothesis. I could not quite appreciate the use of linear mixed models for the primary outcome, the data has too few data points, three, to be considered a repeated measure. It may be more rigorous to analyse the primary outcome using an analysis of covariance adjusting for baseline values. I was not convinced by the longer-term modelling approach, after 8 weeks the control group ceases to be a control group and becomes an intervention group. You cannot compare changes observed in this group with the intervention group because it is not similar at baseline and in essence is a self-selected rather than randomised group.

We disagree with the reviewer: the primary advantage of modeling difference in change over difference in end-states is that missing data, which tends to be high in low-intensity internet interventions, can be estimated much more robustly in the former. Change-models are also less susceptible to unreliability of measurement and are able to handle time-varying predictors. Our understanding is that three measurements is the perfectly acceptable lower-limit for modeling change in a linear fashion, even at an individual (random) level, and we supply a modern statistical reference (Hesser, 2015) for this statement, which in turn contains more technical references. Having three measurements, as compared to two, allows for separation of error variance from individual heterogeneity, and estimating goodness-of-fit, and such models thus suffice for outcome modeling. (As a general recommendation, Hesser 2015 recommends a minimum of four measurements, since the “flexibility in the modeling approach is increased substantially when more than three time points are available”, yet we have no need for this added flexibility and thus three measurements are sufficient).

As explained above, our intent was never to equate and contrast the two intervention types (one started immediately after randomization with guidance, the other after first completing the comparison intervention with no guidance). And as explained above, we no longer mention the planned piece-wise modeling of the comparison group’s two intervention periods, as not to risk confusing readers.

VERSION 2 – REVIEW REVIEWER REVIEW RETURNED

Gallus Bischof University of Luebeck, Dpt. of Psychiatry, Germany 03-Apr-2018

GENERAL COMMENTS

The ms. has been clarified and now is clearly improved. I still find

9

the power analysis somewhat difficult to follow: although i understand that attrition rates cannot be determined apriori, i wonder if the estimated effect size (which is based on an intention-to-treat analysis) takes attrition into account - it would make sense to add which attrition rate is expected on grounds of the power analysis.In addition, i assume the impact of the forum that has been raised by reviewer #2 might be a bit more tricky as suggested by the authors in their response, since forums usually tend to include debates about practical strategies and parsonal experiences for dealing with problems. It might at least be reasonable to include a brief assessment about participant´s use of the forum at follow-up to be included as a moderator in the analysis. REVIEWER REVIEW RETURNED GENERAL COMMENTS

Professor Simon Coulton University of Kent UK 28-Mar-2018 The reviewed manuscript addresses my concerns raised with the previous version. My only concern with the revision is the use of English, there are some instances where the meaning is difficult to disentangle because of the language. In the first paragraph for example we have the first sentence "As much as..." this might better start with "In Sweden...", "...these present low to moderate..." may better read "...these present as low to moderate...", and I could not understand what the phrase "community children" means, do you mean "community dwelling children"?.

VERSION 2 – AUTHOR RESPONSE

Comments by Reviewer: 1

The ms. has been clarified and now is clearly improved. I still find the power analysis somewhat difficult to follow: although I understand that attrition rates cannot be determined a priori, I wonder if the estimated effect size (which is based on an intention-to-treat analysis) takes attrition into account it would make sense to add which attrition rate is expected on grounds of the power analysis.

The manuscript now includes an estimated attrition rate based on previous studies, and a statement that missing data will be handled using maximum likelihood estimation.

In addition, I assume the impact of the forum that has been raised by reviewer #2 might be a bit more tricky as suggested by the authors in their response, since forums usually tend to include debates about practical strategies and personal experiences for dealing with problems. It might at least be reasonable to include a brief assessment about participant´s use of the forum at follow-up to be included as a moderator in the analysis.

10

We have now added a statement in the Method section describing program engagement (including forum activity) that these will be explored as potential moderators of treatment outcomes.

**************************************************************************************************** Comments by Reviewer: 2

The reviewed manuscript addresses my concerns raised with the previous version. My only concern with the revision is the use of English, there are some instances where the meaning is difficult to disentangle because of the language. In the first paragraph for example we have the first sentence "As much as..." this might better start with "In Sweden...", "...these present low to moderate..." may better read "...these present as low to moderate...", and I could not understand what the phrase "community children" means, do you mean "community dwelling children"?.

A native English speaker has now reviewed the manuscript for language issues.

VERSION 3 – REVIEW REVIEWER REVIEW RETURNED GENERAL COMMENTS

Gallus Bischof University of Luebeck, Germany 23-May-2018 My minor concerns on the latest revision have been adequately adressed.

11