[NLA] BEST revision
Carol Van Duzer
carol at cal.org
Wed Oct 23 15:44:15 EDT 2002
Dear NLA colleagues,
Before responding to Kevin's concerns about the BEST revision and Nancy's concern about ED support, we just want to thank the US Department of Education for making the revision possible. OVAE is concerned about the issues of assessment and NRS accountability and has been very supportive of the BEST Plus project since its inception as a pilot project 3 years ago.
Carol
Carol H. Van Duzer
National Center for ESL Literacy Education
Center for Applied Linguistics
202-362-0700
carol at cal.org
Dear Kevin,
It is true that the BEST was developed for a different purpose than for what many programs are using today. That is why the Center for Applied Linguistics is in the process of revising it. The revised version will be appropriate for use in the "high-stakes" environment in which the field finds itself, as well as provide useful information to program staff and the learners.
As in the development of the original test, this revision has involved professional language test developers, ESOL practitioners, administrators, and learners from across the United States (including your state of Massachusetts). They serve as members of our Technical Working Advisory Board, item writers, field testers, and participants in the reliability and validity studies. Our language testing experts have ensured that the revisions are in line with the current standards for education and psychological testing of the AERA, APA, and NCME. Thus, we think the field will find that the revisions take into account many of things you mentioned in your posting as well as others that we felt could be improved upon. For example, the revision focuses on assessing oral English proficiency as opposed to curricular competencies. Thus, many of your concerns referring to specific test items in the original BEST are no longer applicable to the revision. Furthermore, the new items and the scoring rubric incorporate developments in language acquisition and proficiency assessment that have emerged over the past 20 years since the original BEST was developed.
Again, we believe the revision to the BEST addresses your main concerns. We are working hard to make it available by the spring of next year. We sincerely hope that, once the test becomes available in both a computer-based and paper-based versions next spring, you will take a good look at it in light of these concerns.
You can find additional information about the revised BEST on our website www.cal.org/BEST/compBEST.htm
I'll be happy to answer any other questions that you (or others) may have about the BEST or BEST Plus. Please write to the email address below.
Carol
Carol H. Van Duzer
National Center for ESL Literacy Education
Center for Applied Linguistics
202-362-0700
carol at cal.org
visit our website at www.cal.org/ncle
-----Original Message-----
From: David J. Rosen [mailto:DJRosen at theworld.com]
Sent: Tuesday, October 22, 2002 9:22 PM
To: nla at lists.literacytent.org
Cc: hans.meeder at ed.gov; koconnor at framingham.k12.ma.us
Subject: [NLA] Discussion: Assessment and WIA, title II Public policy
NLA colleagues who may be interested in assessment issues,
The long message cross-posted below from the NIFL-assessment list,
written by my Massachusetts colleague, Kevin O'Connor, is not focused on
public policy. Yet, because the Workforce Investment Act (WIA) has
required accountability from a field that does not yet have adequate
accountability tools, it >is< about public policy.
Kevin writes about the BEST test, a standardized ESL assessment designed
for adults, widely regarded as one of the better standardized
assessments. Yet, he points out how inadequate and mis-used it is.
This is not a criticism of the test developer -- indeed, the test is
being revised now and that is why Kevin has raised the issues -- nor
necessarily of the states that have chosen to use the BEST test. In my
view, the kinds of issues Kevin raises point to the shortsightedness of
Congress and the U.S. Department of Education in requiring the
implementation of high-stakes standardized assessment in a
long-marginalized field which does not yet have the tools and resources
needed to do this job well.
In my testimony last week on WIA, Title II, at a U.S. Department of
Education public meeting in Nashua, New Hampshire, I said the U.S.
Department of Education should not make decisions based on these data.
We do not have valid assessments for the full range of skills that are
currently required to be assessed. And I said I thought that it is the
responsibility of the U.S. Department of Education to pay for developing
these needed assessments out of their national programs funding (not the
funding which goes to the states for adult education services.)
I got an encouraging response. Hans Meeder, representing the U.S.
Department of Education at the meeting, asked me how much it would cost.
I said "millions," but I wonder if there might be some NLA colleagues
who have a more precise answer to this question. If you do, please
e-mail Hans Meeder at <hans.meeder at ed.gov> . He wants to know what
standardized assessments we need and how much it will cost to do a good
job developing them.
Also, if you want to see my full testimony, let me know and I'll e-mail
you a copy.
David J. Rosen
<DJRosen at theworld.com>
-------- Original Message --------
Subject: [NIFL-ASSESSMENT:220] RE: Level 3 SPL
Date: Tue, 22 Oct 2002 14:31:41 -0400 (EDT)
From: "Kevin O'Connor" <koconnor at framingham.k12.ma.us>
Reply-To: nifl-assessment at nifl.gov
To: Multiple recipients of list <nifl-assessment at literacy.nifl.gov>
Sandra, I based this on what I learned for the Standards of Educational
and Psychological Testing. It is the Rosetta Stone for
psycholinguistics. Sorry it's so long, but I really wanted to make this
more than another gripe piece, but not an impenetrable mass of
"assessment-speak".
Questioning validity: Realigning the BEST test with the NRS descriptors
The BEST Oral Test was developed in the 1980's through the Office of
Refugee Resettlement and Department of Health and Human Services to help
measure the life skills of Asian immigrants who were coming to the U.S.
Since then, it has been adopted as a placement tool by many programs
across the country. In recent years, it has been mandated by several
states through Department of Education as a measure of educational
gains. The BEST is a venerable and respected tool, but I believe that
its use has grown beyond the original framework. Today it is being used
to test linguistic domains that are different from its designed, and
with a population far beyond the designers' expectations. I understand
that here is a revision being developed by the Center for Applied
Linguistics, and I hope that they will consider some of my following
concerns to help produce a valid version of the test.
I am the Assessment Specialist for a large ESL and ABE program in
Massachusetts (over 600 students in ESL). The Mass. Department of
Education has mandated all DOE-funded programs to use the BEST Oral
Test, Long Form, to test all students within the range of SPL 1-4. As a
result, we have tested over 300 people during this pre-test cycle. As
Assessment Coordinator, I have been closely involved in the testing, and
I wanted to share some of our concerns.
The BEST is similar to a curriculum-based test that we had been evolving
and using at our program for over 10 years. As a result, we were very
familiar with interview-formatted performance assessments and how to
build and use them. We have found that there are some major problems
with the BEST test that should be addressed in any revision. I
understand that there is a new version of the BEST test being developed,
and I wanted to forward some of the biggest concerns to arise from our
experience with the BEST.
I understand that it was the Mass. DOE's decision to use this test, not
CAL's. However, many states are now using the BEST for this purpose (as
stated on the CAL website). I believe that there are many people out
there who would benefit from a revision aimed at validly measuring the
domain reportedly being measured: oral communication skills. Those of
us who have been using this test may be of help in revising it.
Test usage has expanded beyond the intended use and domain
I have read the history of the BEST test. When the test was developed,
its conceptual framework did not incorporate high-stakes testing of oral
proficiency based on the scores derived. To begin with, some items do
not test the domain that we (Massachusetts ESL programs) are looking
for. Specifically, here in Massachusetts (and in other states I
imagine), the BEST is being used to measure oral proficiency of English.
However, several of the test items do not seem valid for this purpose
(counting money, following maps, pointing to clocks...). Neither the
NRS Speaking/ Listening Descriptors nor the Massachusetts Curriculum
Frameworks Oral Communication Strand incorporate these kinds of life
skill constructs in this category. This is because the BEST Test was
designed to assess grammar, but also "Topic areas identified as crucial
to "survival level" competency in English..." (BEST Test Manual, p.53).
When developing a test, it is important to clearly state the intended
use and design it accordingly. The BEST was not originally built for
this purpose, and its validity suffers because of this. The potential
uses of a test should shape its conceptual framework. "Validation
logically begins with an explicit statement of the proposed
interpretation of test scores.... The proposed interpretation refers to
the constructs or concepts the test is intended to measure" (Standards
for Educational and Psychological Testing, p.9). Therefore, a test
designed to place students in relation to the domain represented by the
NRS Speaking/Listening Descriptors should not incorporate money, maps
and clocks, since they are not included in the construct measured to
report educational gain in NRS Speaking/Listening skills.
Test has expanded beyond the normed population
I have been told that the original BEST series was normed on 987
test-takers. This group was encompassed speakers of five Asian language
groups along with one Latin-based and one Slavic language. No
statistics gave percentages for each language (BEST Test Manual, p.54).
This does not seem to be an adequate sample size considering the tens
of thousands of students now taking the test. Our program alone has
just finished giving it to over 300 students, and by June, we will have
tested more than 950 who speak 32 different languages.
I am not second-guessing the test designers. They designed a good tool
for their intended purpose. It would have been impossible for them to
adequately sample every possible language group. In addition, the test
designers never intended this tool for such wide use of such a narrow
domain. However, the BEST has been brought out into the wider world
for a larger, more high-stakes use. If its use is to be continued (as
indicated by the revision), the developers should know of the
information out here in the field that could inform test revision.
SUGGESTIONS
Redesign the order/difficulty of test items
Everyone I have talked to believes that the order of the test could
benefit from revision. The cutoff point ("cut score") of four correct
answers before #14 leads to too many low-SPL students needing to
complete the whole test. Some of the questions that occur before #14
are just too easy or, as I have said, not relevant to the construct of
speaking/listening. The fact that someone can say their name and point
to two clock pictures does not indicate that they are ready to move on
to a 19-word question about the price of apples. Because of this, many
students have been forced to complete the whole test and still end up at
an SPL 0. That is 15 minutes of feeling like a failure. This is
crushing to a student's morale.
Some have noted:
"...We must continue the test because there are some easier questions
later in the test, and test takers may score points on them. It is
unfair to quit too early and deprive the student of a chance to score
those extra points and show the full extent of their abilities."
I agree that students should have the chance to answer all the easy
questions, but I DO NOT AGREE that they should be raked over the coals
on all the hard questions to get there. Almost every assessment tool I
have examined graduates test items based on their difficulty, so
students at low levels can find test items that suit their skill level
BEFORE they are demoralized by test items that are too difficult.
By the time SPL 1 students gets to the relatively easy item #42, many
have given up. The items that a low-SPL learner could answer are placed
too far into the test. CAL should move the easy questions up and move
the clock questions beyond the "cut score" point (currently #14), or
better yet, remove them entirely.
Technical concerns with specific items
Construct-irrelevant variance:
"The test scores may be systematically influenced to some extent by
components that are not part of the construct.... construct-irrelevant
components might include an emotional reaction to the test content"
Standards for Educational and Psychological Testing p.13
"An attempt is generally made to avoid words or topics that may offend
or otherwise disturb some test takers, if less offensive material is
equally useful"
Ibid, p.39
Please consider these excerpts when selecting test items and pictures.
Many students react badly to the picture of the child struck who has
been struck by a car. This "emotional reaction" may be affecting their
test scores.
Question #1 and the standard error of measurement
"My name is..." not a discriminating question. Out of 300 students,
only one was not able to answer this question. That is 0.3% of our
testing sample. Any question that 99.7% of the students can answer
correctly is not providing you with any real data about what students in
general can and cannot do. This item does not" discriminate among test
takers of different standing on the scale" (ibid, p.39); therefore it
should NOT be included in the "cut score" of four correct before #14.
Non-computer based BEST needed.
I understand the possibilities that computer-based testing (CBT) allow.
I am impressed with the research, theory and mathematical data that
go into creating a "computer-adaptive test". These tests gradually
increase the difficulty of the test items to best challenge the tester
(again, by graduating difficulty as I have proposed before).
However, computer-based testing poses construct-irrelevant difficulties
for an immigrant population. Many of my students have never used a
computer before, and this presents serious challenges, difficulty and
psychological intimidation. Studies are currently being conducted to
determine how much variance this causes.
In addition, not all programs have the facilities to test all their
students on a computer. I hope that the BEST revisions will appear in a
hard copy form as well, thereby allowing all programs to be able to use
a valid test, not just those who can afford the hardware.
As I have said, I understand how this test was developed. I respect
the work done through ORR and DHHS, and I appreciate the role that the
Center for Applied Linguistics has played in supporting and distributing
the BEST. There are many things going for this test, which is why so
many states are choosing it. However, it needs to be adapted to better
fill the niche it has found in the last 20 years. I hope the revision
team at CAL will consider some of our concerns and suggestions and give
us a valid, hard copy version of the BEST that will help us measure our
students speaking/listening skills.
Kevin O'Connor
ESL Teacher and
Assessment Specialist
Framingham Adult ESL
508-626-4282
koconnor at framingham.k12.ma.us
-----Original Message-----
From: Sandra Robinson [mailto:srobinson at doe.state.vt.us]
Sent: Thursday, October 17, 2002 3:51 PM
To: Multiple recipients of list
Subject: [NIFL-ASSESSMENT:216] RE: Level 3 SPL
Kevin,
I wonder if you could share those suggestions with the list serve. We
would certainly welcome the help.
Sandra Robinson
Kevin O'Connor wrote:
> Hell, Carol. I work with a large Massachusetts ESL program and we
have quite a few BEST test revision suggestions, based on our program's
concerns about validity. Who could I forward them to?
>
> Kevin O'Connor
> ESL Teacher and
> Assessment Specialist
> Framingham Adult ESL
> 508-626-4282
> koconnor at framingham.k12.ma.us
>
> -----Original Message-----
> From: Carol Van Duzer [mailto:carol at cal.org]
> Sent: Thursday, October 17, 2002 2:02 PM
> To: Multiple recipients of list
> Subject: [NIFL-ASSESSMENT:214] RE: Level 3 SPL
>
> Hi Cheryl,
>
> Part of the problem is that the BEST Oral interview assesses oral
skills and the REEP writing rubric assesses writing skills. For most
learners, speaking and listening skills do not develop at the same
pace--perhaps one is used (or needed immediately) more than the other or
instruction focusses on the most needed language skill. Placement should
reflect what is happening instructionally so learners are placed in
levels that best meet their needs (and proficiency). It frequently
happens that learners have high speaking skills, but lower writing skills.
>
> What did you use for exiting learners from your Level 3 before you
began using these assessments? I assume that would still be valid. Then
what you want to do is increase writing skills so that the learners can
advance on the REEP writing rubric to meet the state's NRS
requirements.Perhaps a stronger writing component will need to be added
to the Level 3 curriculum.
>
> Carol
> Carol H. Van Duzer
> National Center for ESL Literacy Education
> Center for Applied Linguistics
> 202-362-0700
> carol at cal.org
>
> visit our website at www.cal.org/ncle
>
> -----Original Message-----
> From: Cheryl Pyburn [mailto:cpyburn7 at yahoo.com]
> Sent: Wednesday, October 16, 2002 2:52 PM
> To: Multiple recipients of list
> Subject: [NIFL-ASSESSMENT:209] Level 3 SPL
>
> Hi everyone.
>
> I have a question about ESOL levels with regards to
> the REEP and BEST tests...I'll explain...
>
> We're having some trouble at our learning center
> trying to decide where our ESOL level 3 class should
> end. We have just started using the REEP, and we're
> running into the problem of level 3 students who have
> an SPL 7 (BEST oral), but their REEP score is quite
> low: 2 or 3. When would this student move on? At what
> point does a student 'finish'?
>
> Right now we have 3 ESOL levels. Level 1 is SPL 0-3;
> Level 2 is SPL 4-5; Level 3 is SPL 6+; however, with
> the REEP assessment, it seems that students could end
> up being in level 3 forever. Any thoughts?
> Suggestions? What do other learning centers do?
>
> Thank you.
>
> =====
> Cheryl Pyburn
> Operation Bootstrap
>
> __________________________________________________
> Do you Yahoo!?
> Faith Hill - Exclusive Performances, Videos & More
> http://faith.yahoo.com
_______________________________________________
NLA mailing list: NLA at lists.literacytent.org
http://lists.literacytent.org/mailman/listinfo/nla
LiteracyTent: web hosting, news, community and goodies for literacy
http://literacytent.org
_______________________________________________
NLA mailing list: NLA at lists.literacytent.org
http://lists.literacytent.org/mailman/listinfo/nla
LiteracyTent: web hosting, news, community and goodies for literacy
http://literacytent.org
More information about the Nla-nifl-archive
mailing list