Horner, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S., & Wolery, M. (2005). The use of single-subject research to identify evidence-based practice in special education. Exceptional Children, 71(2), 165-179.
Experimental research –provides most credible evidence of effectiveness of a practice.
  • Instead of using groups to make comparisons, single subject participants provide their own comparison.
  • Baseline and intervention conditions. Essential features: To establish whether a functional relationship exists between a practice and student outcomes at the level of individual participants. Each participant’s behavior is compared to his or her own behavior across multiple conditions.
  • Three Key components: 1) Target behaviors must be assessed repeatedly. 2) Intervention is systematically introduced and withdrawn and 3) The effects across baseline and intervention conditions must be analyzed for each participant
  • Get a stable baseline (nned several data points to get current level)
  • Single subject research requires that at least one replication of the functional relationship is included within the design
  • Reversal: ABAB
  • Multiple Baseline: AB across participants
  • Mean performance calculated and compared for each condition in the design to determine impact of intervention
  • Level-change in performance that occurs just after intervention is implemented or withdrawn—referred to as change in level
  • Researchers look at the immediacy and magnitude of change as indicators of the strength of intervention’s effect.
  • Trend- the trendline of the treatment data
  • Results are always graphed
  • See Horner for Checklist

Gersten, R., Fuchs, L. S., Compton, D., Coyne, M., Greenwood, C., & Innocenti, M. S. (2005). Quality indicators for group experimental and quasi-experimental research in special education. Exceptional Children, 71 (2), 149-164.

Experiments suing randomized trials-underutilized in educational research.

What Works Clearinghouse

-useful to researchers as they think through the design of a study

-Critical to present the conceptualization and design—this happens in the literature review:

-describes prior research that leads to current conceptualizations

-makes case for the proposed research

-should reflect both recent and “seminal” studies

-make reader aware of how these studies relate to the proposed study

-Concise but complete summary of knowledge base for area in which the problem exists, should illuminate areas of consensus, and areas that require further investigation

-create theoretical understanding for why study addresses an important topic not fully addressed in the past

Compelling case for the importance of the research is made.

-Incidence validity: degree to which a particular piece addresses topic that effects large numbers of people.

-Impact validity: degree to which perceived to have serious and enduring consequences.

-Sympathetic validity: tendency to judge significance based on degree to which it generates feelings of sympathy for individuals

-Salience validity: degree to which people are aware of the problem or topic.

Important to characterize the intervention and place it within a context that makes sense to the reader. Rationale based on facts presented in the literature review

If the intervention is applied to new participants, settings, of context, there should be clear links based on argument and/or research for this new use.

Comparison groups: Most common reason—comparing innovative approach to a traditional one. “designs that are more elegant include multiple groups, some involving subtle contrasts (eg. Identical intervention with or without progress monitoring).

-Research question stated clearly: A succinct statement of the key questions addressed is critical. Linked in an integrated fashion to the purpose statement.

-Hypothesized results should be specified. But should be candid if there is no specific prediction—“reviewers need to be aware that it is equally appropriate to frame clear research questions without a specified hypothesis”

-Describing participants: Must move beyond school district provided labels. Provide definition of disability and also include assessment results documenting individual in the study met the definition requirements.

-Procedures used to increase probability that participants were comparable across conditions.

-Random Assignment: In some situations this is impossible. Have to describe HOW participants were assigned to the study conditions.

If can’t randomly assign participants, may be able to randomly assign teachers or interventions

Attrition must be documented

Supply information about intervention providers (training, etc.)

-Intervention is clearly described and specified: * Conceptula underpinnings, *detailed procedures *teacher actions and language *materials

Precision in operationalizing the independent variable is important for replication

Fidelity of implementation discussed

Have to observe on a regular basis

Include some measure of interobserver reliability

“Far too often, the weakest part of an intervention study is the quality of the measures used to evaluate the impact of the intervention”… a good part of the effort must be on development of dependent measures. Evidence for validity and reliability must be provided, too.

Data Analysis:

-Key: the appropriate unit of analysis for a given research questions

-Define the unit of analysis used!

-Multi-level analysis

-Post intervention procedures are used to adjust for differences among groups of more than 0.25 of a SD on salient pretest measures. Appropriate secondary analyses are conducted.

-Effect sizes in the .40 or larger range are considered minimum levels for educational or clinical significance

Determining when a practice is “evidence based”: There are at least 4 acceptable quality studies, or two high-quality studies that support the practice AND the weighted effect size is significantly greater than zero.

Angell, M. E., Stoner, J. B., & Shelden, D. L. (2009). Trust in education professionals: Perspectives of mothers of children with disabilities. Remedial and Special Education, 30(3), 160-176

Face to face interviews: Trust identified as a first step in creating collaborative relationships. Importance of trust –critical: “Most critical thing for advancing education and welfare of children”..found to be a factor that de-escalates conflict within Due Process

Establish Theoretical Basis: Factors that contribute to trust ir distrust:


-reliability and responsiveness




State Purpose: “There is a growing body of research on the importance of trust”…

State Problem:

“To date, however, researchers have not fully examined the nature of trust between parents of children with disabilities and schools”…

-has emerged as important in examining the collaboration between parents of children with disabilities and school, but trust has not been the central issue examined

Then, tell what they do:

-Examined perspectives of trust 16 mothers of children with disabilities

-Questions: Clearly state the research questions

* How do mothers of children with disabilities describe their trust in educational personnel?

* What factors do mothers of children with disabilities identify as contributing to or detracting from their trust in educational personnel?

They set specific eligibility criteria for participants and described it in detail.

-interviewed mothers

-wanted subjects to represent multiple areas, grade levels and disabilities

-Mailed letters to administrators

-Asked them to identify 15 mothers

Minimally effective

Second phase: asked individual education professionals to assist in recruiting. The participants distributed recruitment materials to their friends. “snowballing method”

-Qualitative—because of the nature of their research questions

-Collective case study method—these involve multiple sites or personalized stories of several similar individuals

-Cases are similar in that they are all mothers of children with disabilities

-Conducted semi-structured interviews—allowed to request clarification

-checked demographics of participants

-conducted cross-case analysis to study each case as a whole and comparative analysis to study the cases against one another

-transcribed the interviews

-Organized the data using a multiple coding approach (coded line-by-line each transcribed interview)

-Used Nvivo7 to manage data

-combined line-by-line codes

-met frequently to discuss the emerging categories

-Discussed categories and entered into Nvivo7

-Constant comparative method

-Achieved triangulation, respondent validation (process by which researchers ask participants to check accuracy of the findings in the areas of descriptors, themes, and interpretations), and member checking (providing participants with the opportunity to review material—shared transcripts with participants)

-Then, go through categories that emerged and provide multiple quotes that illustrate their points.

-Discussion section:

-In this study, after analysis of the categories, the researchers placed the results within a Family Systems framework.

-Limitations and scope:

Lack of relationship with participants

Interviewed only once

Generalizability is an issue

“recruitment through school personnel may also limit generalizability in that school personnel may not be comfortable identifying mothers with whom they or others in the school had negative relationships or incidents of distrust”

Brantlinger, E., Jimenez, R., Klingner, J., Pugach, M., & Richardson, V. (2005). Qualitative studies in special education. Exceptional Children, 71 (2), 195-207.

Definition—Qualitative research is a systematic approach to understanding qualities of a phenomenon within a particular context.

· Empiricism—knowledge derived from sense of experience and observation

· Specific research skills and tools

· Evidence

· Answering questions about what is happening and why or how it is happening

· Can be inductive or deductive

· To document or discover phenomena

· Typically collect their own data by observing in the field and or interviewing participants

· Finally, “tell the story”

· Data collection is most productively done in creative ways eg., using interview protocol flexibly

· Course or even purpose of study may change mid-stream

· Leeway in gathering and reporting

· Objectivity and subjectivity

Qualitative researchers are on the more positivist end of a qualitative to quantitative continuum –Quantitative see subjectivity as a problem that interferes with validity, Most qualitative researchers recommend being explicit about personal positions, perspectives, etc.

Example: Itard’s observations in The Wild boy of Aveyron (1806). Action research: overlap with single subject design in that researcher systematically experiments with various types of interventions

Example: Burton Blatt’s Christmas in purgatory (1966)

Example: Mercer’s work on IQ tests & African American children---requirement for an adaptive scale

Personal histories are prominent forms. Qualitative typically include an insider-to-phenomenon perspective (in contrast to quantitative)

Interpretation: Studies are interpretive when they contain critical element that entails intense interrogation of the meanings that undergrid daily life occurrences, common sense assumptions, trends in the field, power imbalances in institutions, etc.

Establishing credibility: Validity and reliability:

· Triangulation—consistence among evidence from multiple data sources

· Disconfirming evidence—looking for evidence inconsistent with themes

· Researcher Reflexivity—Researchers self-disclose

· Member checks—having participants review accuracy of interview transcriptions or field notes

· Collaborative work—multiple researchers design the study

· External auditors

· Peer debriefing

· Audit trail—keeping track of interviews conducted and/or specific times and dates spent observing as well as who was observed

· Prolonged field engagements

· Thick, detailed descriptions

· Particularizability—documenting cases with thick description so that readers can determine the degree of transferability to their own situations

Quality Indicators

· Appropriate participants

· Interview questions (eg., not leading)

· Recording and transcription

· Confidentiality and sensitivity

· Appropriate settings

· Time in field

· Researcher fits into site

· Minimal impact on the setting

· Field notes systematic

· Documents described and cited

· Results sorted and coded in systematic meaningful way

· Rationale for what is included/omitted

· Documentation of credibility

· Researcher’s personal position is reflected

· Conclusions substantiated by sufficient quotes from participants, observations, and evidence

· Connections with related research

Not done for generalization but to produce evidence based on exploration of specific contexts & particular individuals

Odom, S. L., Brantlinger, E., Gersten, R., Horner, R. H., Thompson, B., & Harris, K. R. (2005). Research in special education: Scientific methods and evidence-based practices. Exceptional Children, 71 (2), 137-148.
Concern about the quality of scientific research in education
Some agencies: What Works Clearinghouse—have focused on the “gold standard”—randomized experimental group designs (2003)
CEC-task force (2003) identified 4 types of research a) experimental group b) correlational c) single subject d) qualitative
Methodology matched to question arising from different points in continuum
National Academy of Sciences—proposed most research questions in education can be grouped into 3 types: a) description (what is happening?) b) cause (is there a systematic effect?) c) process or mechanism (why or how is it happening?)
Complexity of sp ed as a field
-extreme variability of participants
Greater ethnic and linguistic diversity doe to overrepresentation
-educational context more complex and varied
Specify for whom and in what context practice is effective
Can’t always have random assignment to a non-treatment group due to ethical issues. Students with disabilities often clustered in classrooms and in experimental group design, the classroom rather than the student becomes the unit on which researchers base random assignment, data analysis, etc.
Focus research on question of effectiveness
RCT-random assignment to experimental groups is 1 indicator of high quality group design
Descriptive & process oriented research may require qualitative
Value of mixing methods
WWC- design and implementation Assessment Device—rater can conduct very thorough evaluation of an article
Problems of deciding what studies to include in a research synthesis and how to determine ES
Levin et al. (2003) suggest program of educational research might be thought of as occurring in 4 stages: 1) observation—exploration and flexible methodology which qualitative and correlational methods allow 2) controlled lab or classroom experiments, observational studies of classroom, teacher-researcher collaborative experiments 3) design well-documented intervention and prove effectiveness through RCT studies implemented in classrooms or naturalistic settings by the natural participants 4) determine factors that lead to adoptions of effective practices in typical school systems under naturally existing conditions.

Tankersley, M., Harjusola-Webb, S., & Landrum, T. J. (2008). Using single-subject research to establish the evidence base of special education. Intervention in School and Clinic, 44(2), 83-90

To establish that an instructional practice is effective, researchers seek to find a systematic or functional relationship between variables
Experimental regarded as providing the most credible evidence of the effectiveness of a practice. Two types of methodological approaches: Group experimental and Single Subject
Group: a) Use meaningful comparison groups b) actively and systematically implement the instructional practice being tested.
Single Subject: allow researchers to draw causal conclusions
Differs from group experimental: 1) Instead of using groups to make comparisons, participants provide their own comparison. 2) Their performance is compared across conditions in which they are and are not participating in the intervention under study
· Baseline and intervention conditions
· Observations occur frequently to determine reliable baseline
· Once baseline is stable & predictable the intervention is introduced and performance is again measured frequently while it is in place during a period of time
· Comparisons of data across baseline and intervention provide the basis for determining whether there is a Functional Relationship between the indepdendent and dependent variables
· Horner et al. (2005) recommend that at least three demonstrations of a functional relationship should be included
· Purpose of Single Subject is to establish whether a functional relationship exists between a practice and student outcomes at level of individual participants
· Each participant’s behavior or performance is compared to h/her own behavior or performance across multiple conditions
3 Key Components:
1) Target behaviors must be assessed repeatedly using trustworthy measures
2) Interventions must be systematically introduced and withdrawn
3) Effects across baseline and intervention conditions must be analyzed for each participant

· Progress monitoring model—frequent and repeated measurement over time
· Interrater agreement—2 independent observers record same behaviors
· Operational definitions of targets
· Use at least two conditions
Requires at least one replication of the functional relationship included in the design
I) ABAB: One of the most powerful because it can clearly show relationship between implementation and changes in the target behavior
II) Multiple Baseline: when there are concerns with withdrawing the treatment (eg. SIB) or when something (such as “learning”) cannot be reversed. Incorporates a baseline and an intervention (AB) across participants, behaviors (more than one target), or settings (more than one environment). Intervention introduced in a staggered sequence for each participant, behavior, or setting. Sequential introduction continues until the intervention is introduced in succession for each participant, behavior, or setting
Analysis of Effects:
Within-participant changes in performance are typically evaluated according to the strength or magnitude of the target behavior (mean and level) across conditions and the rate of those changes.
· Mean: mean performance calculated and compared for each condition to determine impact—if mean performance during intervention is meaningfully better than baseline, the intervention shows evidence of effectiveness. Graphed
· Level: The change in performance that occurs just after intervention is implemented or withdrawn is referred to as a change in level. Target behaviors that show immediate or abrupt changes demonstrate strong reaction (immediacy and magnitude)
· Trend: Change in trend or direction of the data as the intervention is applied and withdrawn
· Latency: The quickness with which the behavior changes at the termination of one condition (baseline or intervention) and onset of another. The shorter the timeframe the clearer the effect
Practice of Sp Ed in schools lends itself to single subject—Hallmark of which is the assessment of individual performance and behavior change over time
Results must be replicated across several studies before generalization can be made
· Single Subject—Appropriate for answering research questions that allow comparisons to be made within participants and for which interventions can be made within participants and for which interventions can be introduced and withdrawn systematically
· Target behaviors must be assessed repeatedly using trustworthy measures
· Researchers must measure target same way on repeated occasions
· Measurement of target behavior must occur multiple times during each baseline and intervention condition
· Interventions must be systematically introduced and withdrawn repeatedly
· Effects of intervention are evaluated when conditions change
· The effectiveness of the intervention evaluated with participants
· Changes between baseline and intervention conditions are assessed within individuals rather than between individuals or groups.
· Changes in a participant’s performance across conditions are evaluated according to the mean, level, trend, and latency of the changes in h/her observed behavior
· Designs include explicit evaluation of causality and thus can be used to determine whether a practice is evidence-based
· Given typically small sample sizes, multiple single subject studies are needed to determine definitively whether a practice is effective.

Thompson, B., Diamond, K. E., Mcwilliam, R., Snyder, P., & Snyder, S. (2005). Evaluating the quality of evidence from correlational research for evidence-based practice. Exceptional Children, 71(2), 181-194. -
Correlational studies quantitative multi-subject designs in which participants have not been randomly assigned to treatment conditions
Analytic methods commonly applied are multiple regression analysis, canonical correlation analysis, hierarchical linear modeling, and structural equation modeling
Crucial to match research questions and designs
Questions involving school or classroom cultures may require qualitative
Clinical trials may raise ethical questions regarding denial of needed services to control groups
Correlational designs do not provide the best evidence re: causal maechanisms. There are two ways to causal inferences:
1) Statistical Testing of Rival Causal Models
Statistical equation modeling—factor analysis
Within the structural model, analysts may test whether 1) 2 latent constructs ( x + y) covary or are correlated 2) x causes y 3) y causes x 4) x + y reciprocally cause each other. Rival models can be tested & if there is only one model that fits the data, then there is some evidence bearing on causality
2) Logically Based Exclusion Methods: we might investigate the pre-intervention differences in the students on everything we consider relevant (eg., pre-intervention reading scores). Confirm no extraneous contaminants of treatment influences
Limitation of Nonexperimental research
-All of the correct variables are used in testing models—the true model is seldom known
-“Stepwise analysis”- do not correctly identify best subset of predictors
-Yield results that tend to be non-replicable
-temptation toward Type I errors
Effect sizes reported in less than ½ published articles
Quality Indicators:
· Measurement: reliability—do scores measure what they are supposed to? Most articles don’t even mention reliability. Should report coefficients for instruments used
· “It is unacceptable to induct the score reliability coefficients from prior studies or test manuals if there is no explicit evidence presented that the sample compositions and standard deviations from the prior study and a current study are both reasonably comparable
· Score reliability coefficients are reported for all measured variables
· Score reliability coefficients reported for all measured variables based on analysis of the data in hand in the particular study
· …or test manual that suggests scores are valid for the inferences being made in the study
· Score validity is empirically evaluated based on data generated within the study
· The influences of score reliability and validity on study interpretations are explicitly considered in reasonable detail
Practical Significance: evaluates the potential –quantifying the degree to which sample results diverge from the null hypothesis. Quantifications referred to as effect sizes. Standardized differences (eg., Cohen’s d) most common.
Clinical Significance: extent to which intervention recipients no longer meet diagnostic crieteria and thus no longer require intervention.
They mention Cohen’s Power Analysis, in which he provided benchmarks for small, medium, and large effect sizes.
Common Mistakes:
Effect size reporting is rare. Some report but do not interpret. Often they fail to identify which effect size is being reported.
Quality Indicators:
· One or more effect size statistics is reported for each outcome. Statistic used is identified
· Interpret effect sizes explicitly and compare to prior or related studies
· Explicitly consider study design and effect size statistic limitations
Four Types of Errors:
1) Failure to interpret structure coefficients: GLM weights are interpreted reflecting correlations of predictors with outcome variables only in the exceptional case that the weights indeed are correlation coefficients
2) Converting Intervally Scaled Variables to Nominal Scale
Researchers may convert independent variables into nominal scale to run “OVA’s”. eg. take intervally scaled IQ data and convert to “low” and “high”. This attenuates reliability—throws data away. Quality—interval data not converted to nominal unless such choices are justified and results interpreted. Univariate methods are inappropriate with multiple outcomes variables (inflates the probability of Type 1, does not honor the reality that outcome variables can interact with each other to define unique outcomes that are more than their constituent parts).
3) Failure to test statistical assumptions:
Confidence Intervals for Reliability Coefficients, Statistics, and Effect Sizes:
CI can be used to determine whether a given null hypothesis would be rejected. If a hypothesized value is not within the interval, the null hypothesis positing the parameter value is rejected. Confidence intervals inform judgment regarding all the values pf the parameters that appear to be plausible, given the data . By comparing the overlaps of CI across studies, can evaluate the consistency of evidence across studies. The widths of CI within a study or across studies provides critical information regarding the precision of estimates. When intervals are wide, evidence for a given point estimate being correct is called into question.
· confidence intervals are reported for the reliability coefficients derived for study data
· confidence intervals reported for sample statistics
· confidence intervals reported for study effect sizes