[NLA] Discussion: Assessment and WIA, title II Public policy

David J. Rosen DJRosen at theworld.com
Tue Oct 22 21:22:12 EDT 2002


NLA colleagues who may be interested in assessment issues,

The long message cross-posted below from the NIFL-assessment list, 
written by my Massachusetts colleague, Kevin O'Connor, is not focused on 
public policy. Yet, because the Workforce Investment Act (WIA) has 
required accountability from a field that does not yet have adequate 
accountability tools, it >is< about public policy.

Kevin writes about the BEST test, a standardized ESL assessment designed 
for adults, widely regarded as one of the better standardized 
assessments.  Yet, he points out how inadequate and mis-used it is. 
This is not a criticism of the test developer -- indeed, the test is 
being revised now and that is why Kevin has raised the issues -- nor 
necessarily of the states that have chosen to use the BEST test.  In my 
view, the kinds of issues Kevin raises point to the shortsightedness of 
Congress and the U.S. Department of Education in requiring the 
implementation of high-stakes standardized assessment in a 
long-marginalized field which does not yet have the tools and resources 
needed to do this job well.

In my testimony last week on WIA, Title II, at a U.S. Department of 
Education public meeting in Nashua, New Hampshire, I said the U.S. 
Department of Education should not make decisions based on these data. 
We do not have valid assessments for the full range of skills that are 
currently required to be assessed.  And I said I thought that it is the 
responsibility of the U.S. Department of Education to pay for developing 
these needed assessments out of their national programs funding (not the 
funding which goes to the states for adult education services.)

I got an encouraging response.  Hans Meeder, representing the U.S. 
Department of Education at the meeting, asked me how much it would cost. 
  I said "millions," but I wonder if there might be some NLA colleagues 
who have a more precise answer to this question.  If you do, please 
e-mail Hans Meeder at <hans.meeder at ed.gov> .  He wants to know what 
standardized assessments we need and how much it will cost to do a good 
job developing them.

Also, if you want to see my full testimony, let me know and I'll e-mail 
you a copy.

David J. Rosen
<DJRosen at theworld.com>




-------- Original Message --------
Subject: [NIFL-ASSESSMENT:220] RE: Level 3 SPL
Date: Tue, 22 Oct 2002 14:31:41 -0400 (EDT)
From: "Kevin O'Connor" <koconnor at framingham.k12.ma.us>
Reply-To: nifl-assessment at nifl.gov
To: Multiple recipients of list <nifl-assessment at literacy.nifl.gov>

Sandra, I based this on what I learned for the Standards of Educational 
and Psychological Testing.  It is the Rosetta Stone for 
psycholinguistics.  Sorry it's so long, but I really wanted to make this 
more than another gripe piece, but not an impenetrable mass of 
"assessment-speak".

Questioning validity: Realigning the BEST test with the NRS descriptors

The BEST Oral Test was developed in the 1980's through the Office of 
Refugee Resettlement and Department of Health and Human Services to help 
measure the life skills of Asian immigrants who were coming to the U.S. 
  Since then, it has been adopted as a placement tool by many programs 
across the country.  In recent years, it has been mandated by several 
states through Department of Education as a measure of educational 
gains.  The BEST is a venerable and respected tool, but I believe that 
its use has grown beyond the original framework.  Today it is being used 
to test linguistic domains that are different from its designed, and 
with a population far beyond the designers' expectations.  I understand 
that here is a revision being developed by the Center for Applied 
Linguistics, and I hope that they will consider some of my following 
concerns to help produce a valid version of the test.

I am the Assessment Specialist for a large ESL and ABE program in 
Massachusetts (over 600 students in ESL).  The Mass. Department of 
Education has mandated all DOE-funded programs to use the BEST Oral 
Test, Long Form, to test all students within the range of SPL 1-4.  As a 
result, we have tested over 300 people during this pre-test cycle.  As 
Assessment Coordinator, I have been closely involved in the testing, and 
I wanted to share some of our concerns.

The BEST is similar to a curriculum-based test that we had been evolving 
and using at our program for over 10 years.  As a result, we were very 
familiar with interview-formatted performance assessments and how to 
build and use them.  We have found that there are some major problems 
with the BEST test that should be addressed in any revision.  I 
understand that there is a new version of the BEST test being developed, 
and I wanted to forward some of the biggest concerns to arise from our 
experience with the BEST.

I understand that it was the Mass. DOE's decision to use this test, not 
CAL's.  However, many states are now using the BEST for this purpose (as 
stated on the CAL website).  I believe that there are many people out 
there who would benefit from a revision aimed at validly measuring the 
domain reportedly being measured: oral communication skills.  Those of 
us who have been using this test may be of help in revising it.

Test usage has expanded beyond the intended use and domain
I have read the history of the BEST test.  When the test was developed, 
its conceptual framework did not incorporate high-stakes testing of oral 
proficiency based on the scores derived.  To begin with, some items do 
not test the domain that we (Massachusetts ESL programs) are looking 
for.  Specifically, here in Massachusetts (and in other states I 
imagine), the BEST is being used to measure oral proficiency of English. 
   However, several of the test items do not seem valid for this purpose 
(counting money, following maps, pointing to clocks...).  Neither the 
NRS Speaking/ Listening Descriptors nor the Massachusetts Curriculum 
Frameworks Oral Communication Strand incorporate these kinds of life 
skill constructs in this category.  This is because the BEST Test was 
designed to assess grammar, but also "Topic areas identified as crucial 
to "survival level" competency in English..." (BEST Test Manual, p.53).

When developing a test, it is important to clearly state the intended 
use and design it accordingly.  The BEST was not originally built for 
this purpose, and its validity suffers because of this.  The potential 
uses of a test should shape its conceptual framework.  "Validation 
logically begins with an explicit statement of the proposed 
interpretation of test scores....  The proposed interpretation refers to 
the constructs or concepts the test is intended to measure" (Standards 
for Educational and Psychological Testing, p.9).  Therefore, a test 
designed to place students in relation to the domain represented by the 
NRS Speaking/Listening Descriptors should not incorporate money, maps 
and clocks, since they are not included in the construct measured to 
report educational gain in NRS Speaking/Listening skills.

  Test has expanded beyond the normed population
I have been told that the original BEST series was normed on 987 
test-takers.  This group was encompassed speakers of five Asian language 
groups along with one Latin-based and one Slavic language.  No 
statistics gave percentages for each language (BEST Test Manual, p.54). 
  This does not seem to be an adequate sample size considering the tens 
of thousands of students now taking the test.  Our program alone has 
just finished giving it to over 300 students, and by June, we will have 
tested more than 950 who speak 32 different languages.

I am not second-guessing the test designers.  They designed a good tool 
for their intended purpose.  It would have been impossible for them to 
adequately sample every possible language group.  In addition, the test 
designers never intended this tool for such wide use of such a narrow 
domain.   However, the BEST has been brought out into the wider world 
for a larger, more high-stakes use.  If its use is to be continued (as 
indicated by the revision), the developers should know of the 
information out here in the field that could inform test revision.

SUGGESTIONS
Redesign the order/difficulty of test items
Everyone I have talked to believes that the order of the test could 
benefit from revision.  The cutoff point ("cut score") of four correct 
answers before #14 leads to too many low-SPL students needing to 
complete the whole test.  Some of the questions that occur before #14 
are just too easy or, as I have said, not relevant to the construct of 
speaking/listening.  The fact that someone can say their name and point 
to two clock pictures does not indicate that they are ready to move on 
to a 19-word question about the price of apples.  Because of this, many 
students have been forced to complete the whole test and still end up at 
an SPL 0.  That is 15 minutes of feeling like a failure.  This is 
crushing to a student's morale.

Some have noted:
"...We must continue the test because there are some easier questions 
later in the test, and test takers may score points on them.  It is 
unfair to quit too early and deprive the student of a chance to score 
those extra points and show the full extent of their abilities."
I agree that students should have the chance to answer all the easy 
questions, but I DO NOT AGREE that they should be raked over the coals 
on all the hard questions to get there.  Almost every assessment tool I 
have examined graduates test items based on their difficulty, so 
students at low levels can find test items that suit their skill level 
BEFORE they are demoralized by test items that are too difficult.

	By the time SPL 1 students gets to the relatively easy item #42, many 
have given up. The items that a low-SPL learner could answer are placed 
too far into the test.  CAL should move the easy questions up and move 
the clock questions beyond the "cut score" point (currently #14), or 
better yet, remove them entirely.

Technical concerns with specific items
Construct-irrelevant variance:
"The test scores may be systematically influenced to some extent by 
components that are not part of the construct.... construct-irrelevant 
components might include an emotional reaction to the test content"
Standards for Educational and Psychological Testing p.13
"An attempt is generally made to avoid words or topics that may offend 
or otherwise disturb some test takers, if less offensive material is 
equally useful"
Ibid, p.39

Please consider these excerpts when selecting test items and pictures. 
Many students react badly to the picture of the child struck who has 
been struck by a car.  This "emotional reaction" may be affecting their 
test scores.

Question #1 and the standard error of measurement
  	"My name is..." not a discriminating question.  Out of 300 students, 
only one was not able to answer this question.  That is 0.3% of our 
testing sample.  Any question that 99.7% of the students can answer 
correctly is not providing you with any real data about what students in 
general can and cannot do.  This item does not" discriminate among test 
takers of different standing on the scale" (ibid, p.39); therefore it 
should NOT be included in the "cut score" of four correct before #14.

Non-computer based BEST needed.
I understand the possibilities that computer-based testing (CBT) allow. 
   I am impressed with the research, theory and mathematical data that 
go into creating a "computer-adaptive test".  These tests gradually 
increase the difficulty of the test items to best challenge the tester 
(again, by graduating difficulty as I have proposed before).

However, computer-based testing poses construct-irrelevant difficulties 
for an immigrant population.  Many of my students have never used a 
computer before, and this presents serious challenges, difficulty and 
psychological intimidation.  Studies are currently being conducted to 
determine how much variance this causes.

In addition, not all programs have the facilities to test all their 
students on a computer.  I hope that the BEST revisions will appear in a 
hard copy form as well, thereby allowing all programs to be able to use 
a valid test, not just those who can afford the hardware.

	As I have said, I understand how this test was developed.  I respect 
the work done through ORR and DHHS, and I appreciate the role that the 
Center for Applied Linguistics has played in supporting and distributing 
the BEST.  There are many things going for this test, which is why so 
many states are choosing it.  However, it needs to be adapted to better 
fill the niche it has found in the last 20 years.  I hope the revision 
team at CAL will consider some of our concerns and suggestions and give 
us a valid, hard copy version of the BEST that will help us measure our 
students speaking/listening skills.

Kevin O'Connor
ESL Teacher and
Assessment Specialist
Framingham Adult ESL
508-626-4282
koconnor at framingham.k12.ma.us



  -----Original Message-----
From: 	Sandra Robinson [mailto:srobinson at doe.state.vt.us]
Sent:	Thursday, October 17, 2002 3:51 PM
To:	Multiple recipients of list
Subject:	[NIFL-ASSESSMENT:216] RE: Level 3 SPL

Kevin,
I wonder if you could share those suggestions with the list serve. We 
would certainly welcome the help.
Sandra Robinson

Kevin O'Connor wrote:

 > Hell, Carol.  I work with a large Massachusetts ESL program and we 
have quite a few BEST test revision suggestions, based on our program's 
concerns about validity.  Who could I forward them to?
 >
 > Kevin O'Connor
 > ESL Teacher and
 > Assessment Specialist
 > Framingham Adult ESL
 > 508-626-4282
 > koconnor at framingham.k12.ma.us
 >
 >  -----Original Message-----
 > From:   Carol Van Duzer [mailto:carol at cal.org]
 > Sent:   Thursday, October 17, 2002 2:02 PM
 > To:     Multiple recipients of list
 > Subject:        [NIFL-ASSESSMENT:214] RE: Level 3 SPL
 >
 > Hi Cheryl,
 >
 > Part of the problem is that the BEST Oral interview assesses oral 
skills and the REEP writing rubric assesses writing skills. For most 
learners, speaking and listening skills do not develop at the same 
pace--perhaps one is used (or needed immediately) more than the other or 
instruction focusses on the most needed language skill. Placement should 
reflect what is happening instructionally so learners are placed in 
levels that best meet their needs (and proficiency). It frequently 
happens that learners have high speaking skills, but lower writing skills.
 >
 > What did you use for exiting learners from your Level 3 before you 
began using these assessments? I assume that would still be valid. Then 
what you want to do is increase writing skills so that the learners can 
advance on the REEP writing rubric to meet the state's NRS 
requirements.Perhaps a stronger writing component will need to be added 
to the Level 3 curriculum.
 >
 > Carol
 > Carol H. Van Duzer
 > National Center for ESL Literacy Education
 > Center for Applied Linguistics
 > 202-362-0700
 > carol at cal.org
 >
 > visit our website at www.cal.org/ncle
 >
 > -----Original Message-----
 > From: Cheryl Pyburn [mailto:cpyburn7 at yahoo.com]
 > Sent: Wednesday, October 16, 2002 2:52 PM
 > To: Multiple recipients of list
 > Subject: [NIFL-ASSESSMENT:209] Level 3 SPL
 >
 > Hi everyone.
 >
 > I have a question about ESOL levels with regards to
 > the REEP and BEST tests...I'll explain...
 >
 > We're having some trouble at our learning center
 > trying to decide where our ESOL level 3 class should
 > end.  We have just started using the REEP, and we're
 > running into the problem of level 3 students who have
 > an SPL 7 (BEST oral), but their REEP score is quite
 > low: 2 or 3. When would this student move on? At what
 > point does a student 'finish'?
 >
 > Right now we have 3 ESOL levels.  Level 1 is SPL 0-3;
 > Level 2 is SPL 4-5; Level 3 is SPL 6+; however, with
 > the REEP assessment, it seems that students could end
 > up being in level 3 forever. Any thoughts?
 > Suggestions? What do other learning centers do?
 >
 > Thank you.
 >
 > =====
 > Cheryl Pyburn
 > Operation Bootstrap
 >
 > __________________________________________________
 > Do you Yahoo!?
 > Faith Hill - Exclusive Performances, Videos & More
 > http://faith.yahoo.com




_______________________________________________
NLA mailing list: NLA at lists.literacytent.org
http://lists.literacytent.org/mailman/listinfo/nla
LiteracyTent: web hosting, news, community and goodies for literacy
http://literacytent.org



More information about the Nla-nifl-archive mailing list