[Bestplus] RE: [Best plus] 2 BEST Plus updates
Schwerdtfeger, Jane
JaneS at doe.mass.edu
Tue Feb 15 16:10:49 EST 2005
Hi, Annemarie and all--good question. Here's a longish answer because I
want to be clear. There is no policy about NOT telling students the test
score they got. In some cases, the scale score can be a motivator if the
student's post-test score has gone up from the pre-test. It can also be a
way to show relative strengths and areas to work on in a test like BEST Plus
or REEP that has different areas that are being measured (e.g., Language
Complexity for BEST Plus, or Mechanics for REEP.) If the teacher uses the
REEP rubric in class for teaching, then he or she can use the rubric and
scores to understand what his/her REEP score means. Conversely, if the
score won't be a motivating factor (e.g., the score went down), then don't
use it if you don't think it will help the student. It is up to the program
how they tell the student (or not) about their score. I think most students
want to know.
That being said, don't show the student's response to the REEP prompt to
him/her after it has been scored. Instead, staff should tell the student
the areas of strength and areas for improvement reflected in the student's
REEP score. With the BEST Plus, staff can go over all the information on
the score report with the student if they wish. If there is a sense that
the information wouldn't be helpful, then it is left to the program staff to
explain the score how they wish to the student, giving more or less
information as they see fit. Generally, I think it is good to tell students
the score. But I had an interesting conversation about the pros and cons
with Annemarie--how do others feel?
IN ANY CASE, if you are ever concerned that a score might not be accurate,
or something "doesn't look right," please send it to me and I will forward
to the center for Applied Linguistics. They want to know of any issues, and
so do I, to make sure the test is functioning as we hope it will.
Thanks,
Jane
-----Original Message-----
From: Espindola, Annemarie [mailto:aespindo at bristol.mass.edu]
Sent: Tuesday, February 15, 2005 11:04 AM
To: Schwerdtfeger, Jane; Bestplus at lists.literacytent.org
Cc: Cornellier, Donna
Subject: RE: [Bestplus] 2 BEST Plus updates
Hi Jane,
This was very helpful.
I do have one question.
It is a policy issue.
I thought we were not to
give out student scores on any test.
Is this correct or are there exceptions?
Thanks,
Annemarie
Annemarie Espindola
SABES Southeast
Bristol Community College
777 Elsbree Street
Fall River, MA 02720
(508) 678-2811 ext 2782
-----Original Message-----
From: Schwerdtfeger, Jane [mailto:JaneS at doe.mass.edu]
Sent: Monday, February 14, 2005 6:14 PM
To: 'Bestplus at lists.literacytent.org'
Cc: Cornellier, Donna
Subject: [Bestplus] 2 BEST Plus updates
Dear BEST Plus trainers: we have recently had two questions come up
about BEST Plus scoring, and since this is the time many programs are
testing for the second time with BEST Plus, I thought I'd mention this
to all of you.
You may run across something similar at your own programs, and this may
help you. This is long, but hopefully useful!
1) One program recently had this issue:
This curious situation arose because we have a set of identical twin
sisters in our intermediate (that's SPL 3-4 at CNA) ESOL class. I gave
them a follow up BEST Plus this morning. Their teacher and I were
comparing their scores and we noticed what appeared to be a discrepancy.
They each scored 1.58 on listening comprehension and 1.15 on language
complexity. One scored 2.25 on communication while the other had 2.16.
So far, so good. The confusing part was that the student with the higher
(albeit only slightly) communication score had a lower scale score ( 459
vs. 463). Naturally they like to know each other's score. It's difficult
to explain why the overall score was higher for the one with the lower
subscale score. If the difference were in more than one subscale, then I
would guess that somehow items were weighted differently, but in this
case I can't account for it.
2) Another issue came up with comparing the scores from one student's
pre- and post-tests:
I would like to fax you 2 summary reports - a pre and post test for one
student - that a teacher sent to me asking for some explanation, which
neither I nor Jane were able to give him. The issue is, as you will see,
that the student's Subscale Scores all went up in the 2nd test from the
1st, and are all in or above the SPL 4 range. However, the student's
overall Scale Score went down and the SPL is a 3.
(Attached is a word document that charts out the student's two sets of
pre- and post-test scores for you to see.) <<BP Summary Report Scores
2-05.doc>> I thought it was how the questions were weighted, but wanted
to confirm with Carol. Her reply is below:
Hi Dori and Jane, Thanks for sending this information on to us and,
Jane, for passing the other 'mystery' on about the two sisters. These
are great questions and we hope that as we get more pre- and post-test
data back, that we will be able to study whether the subscale ranges
need to be adjusted.
Right now they are based on field test data--however, the operational
version of the test has been tweaked. Data from actual use will inform
any adjustments. The short explanation is, Jane, you were pretty much
right that it has to do with the difficulty level of the questions.
Below are the more extensive explanations from Dorry that I "translated"
a tiny bit. Let me know if you want to talk about these or need more
information.
Regarding the two sisters: The two sisters scored 4 points
different--that is not significant difference (esp. on a scale from 88
to 999!) It's quite eerie when you think about how close they were. This
actually shows that the adaptive version works well. It's quite
interesting that they scored exactly the same in two scale categories.
However, there is NOT a direct relationship between raw scores (which is
what the subscale scores
represent) and the scale score because the scale score takes into
account the difficulty of the questions.
Here is, perhaps, a more concrete example of what is going on. Let's say
that Jane and I each get four questions on a test. We each get three
right and one wrong. Both our raw scores are 3. However Jane's questions
are much more difficult than mine. Her ability (i.e., the scale score)
will be higher even though we got the same "raw" scores. The raw scores
(e.g., the
averages) are only to be used diagnostically to show RELATIVE strengths
and weaknesses WITHIN an individual (as compared to averages in the
SPLs). There is no absolute way to interpret them and they cannot be
used to show differences BETWEEN two individuals. If they are causing
trouble, they should be removed from the score report. We will look into
this as we get more data. So please continue to report these issues. And
let us know what you think about having the subscales reported. Is this
causing too much confusion?
The example with the twin sisters shows how the adaptive nature of the
BEST Plus works. The different questions that they got, based on the
ability estimate, gave them different scale scores (even though the
subscales--or raw scores--appeared to be similar).
Regarding the pre- and post- test scores of Dori's student:
First, there is no statistical difference between the two scores. First
score was 448, second was 434. That's a difference of 14 points. With
the BEST Plus stopping rule, the standard error is 20 points. The two
scores are within one standard error. Most likely, the student's ability
did not change much from one test to the other. Unfortunately, she was
right at the border between SPL 4 and SPL 3. Her first score was only 9
points above the SPL 4 border, her second score only 4 points below the
SPL 3 border. She's a low 4/high 3. There's not a big difference. If she
had been a high 4 (e.g., 470) and changed to a low 3 (e.g., 420), that's
a difference of 50 points, which is two and a half times the standard
error. That would be something to be concerned about, but the difference
of 14 points is really not a big concern although it crossed an SPL
threshold and looks discouraging to the teacher and the student.
The second thing is: if the student's ability didn't change, but for
some reason she got several easier questions in the second
administration, her raw score (e.g., the average subscales) could well
improve (that is, she was scoring higher, because the questions were
easier). However, her scoring higher on easier items didn't push her
overall score to make a measurable difference. The average subscale
scores can only be interpreted in relative terms, not in absolute terms,
for diagnostic purposes only.
Remember those subscale averages were made on the fixed field test form,
not the adaptive version. As I mentioned above, if we get datasets for
the gain scores study (and I think MA and IL are sending us data),
perhaps we can adjust them.
I wouldn't say that this student has decreased ability, but she has not
made a significant measurable improvement.
It would be good to see her data and follow her through. There are
randomization and decision rules in the program that can be changed. We
have tried to do our best to prevent anyone from being disadvantaged.
However, special cases like this may show where further improvements
could be made.
If the scores from the tests are still in the database, we would like to
have them. If you could make a backup of the pre- and post-database
using the software management system (SMS) and send it to us, it would
be helpful.
I have attached the instructions that we send to trainers in training to
email us their database. I think they will work. If you have trouble,
let me know. Email them to both me and Dorry.
Thanks again for being attentive to this. Carol Carol also said, "As you
talk to others, it would be great if they could send us any data similar
to this of Dori's. The data we are already collecting from programs in
MA will enable us to look at the operational sub scores and make
adjustments as well."
Please do ask questions as you may get them--it is helpful for us all to
know, since programs may experience similar problems, and also so CAL
can potentially fix any problems as they crop up. If you want to
discuss Carol's response further, please give me a call-- (781)
338-3855. Thank you for your help as BEST Plus experts and trainers!
More information about the Bestplus
mailing list