第七十一輯.第二期「108課綱回顧與前瞻」專刊-上(Open Access) - 2025-06-30

(專刊論文)【研究論文】拾級而上:第四學習階段國語文閱讀表現水準描述芻議

(Special Issue Paper)【Research Paper】Step-by-Step Progression: A Preliminary Discussion on Performance Level Descriptors for Mandarin Reading in the Fourth Learning Stage

作 者:
謝佩蓉 / Pei-Jung Hsieh
關鍵字:
大型評量、素養導向評量、量尺定錨法、標準設定、閱讀理解 / large-scale assessments, competency-based assessment, scale anchoring, standard setting, reading comprehension
  • 摘要
  • 英文摘要
  • 參考文獻
  • 全文下載
(1)研究目的:本研究目的在於建立第四學習階段國語文閱讀表現量尺及表現水準描述,並提出範例試題使表現水準描述具象化。(2)主要理論或概念架構:標準設定的方法各有所長,大多數研究採用「以內容為基礎」的方法,但設定程序可能受到人為因素影響。本研究比照國際大型教育調查所採用的「量尺定錨法」,包括TIMSS和PISA均採用此方法,讓資料使用者理解量尺分數所代表的意義。(3)研究設計/方法/對象:採用調查研究法蒐集標準設定所需實證資料,以縱貫性研究設計測量學生國語文閱讀學習表現,再以國際大型調查採用的量尺定錨法設定表現標準並產出表現水準描述。研究對象以107學年度和108學年度的七年級學生為母群體,107學年度為第一組、108學年度為第二組追蹤樣本。採二階段分層叢集抽樣設計,第一階段採「依比例的機率」抽取學校、第二階段針對抽中的學校進行校內班級隨機抽樣,所抽中的班級,全班學生均為樣本。「國語文閱讀」於第一組追蹤樣本七年級時分派2,803人,八年級分派2,807人;第二組追蹤樣本於七年級分派2,565人,八年級分派2,780人。評量工具為國語文閱讀素養導向電腦化線上測驗,總計344題,採部分平衡不完全區塊組成題本,信、效度證據良好。以試題反應理論之部分給分模式估計試題難度。(4)研究發現或結論:表現水準的四個切截點分數為400、475、550、625,並命名為M1、M2、M3及M4,撰寫各表現水準的描述,具體描寫學生的能力發展。依閱讀題組的閱讀特性,可概分為評量學生的一般閱讀和數位閱讀兩種素養。以找出訊息、理解及評價與省思三向度的閱讀認知歷程,描述達到每一水準學生所展現的兩種類型閱讀素養。整體而言,當認知複雜性逐漸提升時,試題的難度通常也會隨之增加。從本研究的實證資料可知,即便是認知複雜度較低的任務,也可能會出現令學生感到困難的題目;而認知複雜度較高的試題,可能反而較為容易。(5)理論或實務創見/貢獻/建議:本研究以嚴謹大規模調查之實證證據支持標準設定的有效性,並根據實證數據選取合適的範例題。所完成的表現水準描述揭示各閱讀歷程於不同水準可能涉及的能力差異,有助於教師精準掌握學生閱讀理解歷程中的困難點。建議未來研究能建構第三和第五學習階段國語文閱讀素養表現水準描述,使學習階段能力描述具備縱貫脈絡,並進一步研發涵蓋不同難度層級的數位閱讀試題,補充評價與省思M1至M3的數位閱讀表現描述,確立完整的數位閱讀素養能力階層。此外,可以表現水準描述為基礎,針對不同能力層級學生設計教學方案,發揮表現水準描述作為「學習地圖」在實務現場的應用潛力。
(1)Purpose: The study aims to develop a performance scale and performance level descriptors (PLDs) for Mandarin reading in the fourth learning stage, and to provide sample items to illustrate the PLDs. (2)Main Theories or Conceptual Frameworks: Each standard-setting method has its own strengths. Most studies use a “contentbased standard-setting method” approach, but the standard-setting process can be influenced by subjective factors. This study, therefore, adopts the “scale anchoring” method, which is commonly employed in international large-scale educational assessments such as TIMSS and PISA, to help data users interpret the meaning represented by scale scores. (3)Research Design/Methods/Participants: A survey research method was employed to collect empirical data required for standard setting. A longitudinal study design was used to measure students’ performance in Mandarin reading, and the scale anchoring method was adopted to set performance standards and generate performance descriptors. The study population consisted of 7th-grade students from the 107th and 108th academic years, with the 107th academic year as Panel 1 and the 108th academic year as Panel 2 for followup samples. A two-stage stratified cluster sampling design was implemented: in the first stage, schools were selected using “probabilities proportional to size” sampling; in the second stage, classes within the selected schools were randomly sampled. All students in the sampled classes were included in the study. In Panel 1, 2,803 students were assigned for the Mandarin reading assessment in 7th grade and 2,807 in 8th grade.In Panel 2, 2,565 students were assigned in 7th grade and 2,780 in 8th grade. The assessment tool was a competency-based computerized online test of Mandarin reading, consisting of 344 items. Test booklets were assembled using a partially balanced incomplete block design. The assessment provided strong evidence of reliability and validity. Item difficulty was estimated using the partial credit model of item response theory. (4)Research Findings or Conclusions: The four cut scores for performance levels were set at 400, 475, 550, and 625, corresponding to levels M1, M2, M3, and M4. Descriptions for each performance level were developed to portray students’ developmental abilities. Reading tasks were categorized into two literacy types: general reading and digital reading. The reading cognitive processes were classified into three dimensions: locating information, understanding, and evaluating and reflecting. Performance level descriptors (PLDs) detailed the literacy skills demonstrated by students at each level for both types of reading. In general, item difficulty increased with cognitive complexity. However, the empirical data revealed that tasks with lower cognitive complexity could still challenge students, while tasks with higher cognitive complexity might be relatively easier. (5)Theoretical or Practical Insights/Contributions/Recommendations: This study validates the effectiveness of standard setting with large-scale empirical data and selects appropriate sample items based on empirical evidence. The resulting performance level descriptors (PLDs) reveal the differences in reading processes across performance levels, enabling teachers to accurately identify students’ difficulties in reading comprehension. Future research is recommended to develop PLDs for Mandarin reading literacy in the third and fifth learning stages, providing a longitudinal framework for PLDs across learning stages. Additionally, the development of digital reading items covering various difficulty levels is suggested to enrich the PLDs for evaluating and reflecting processes at M1 to M3 levels. Furthermore, the PLDs can serve as a “learning map” for designing instructional plans tailored to students at different performance levels, enhancing their practical application in teaching and learning contexts.
任宗浩(2018)。素養導向評量的界定與實踐。載於蔡清華(主編),課程協作與實踐
  第二輯(頁75-82)。教育部。
[Jen, T.-H. (2018). The definition and practice of competency-based assessment. In C.-H. Tsai
  (Ed.), Curriculum collaboration and practice II (pp. 75-82). Ministry of Education.]
行政院(2018)。推動循環經濟—創造經濟與環保雙贏。https://www.ey.gov.tw/
  Page/5A8A0CB5B41DA11E/12c0a2b8-485d-49d7-ba9e-a9a10b82828e
[Executive Yuan. (2018). Promoting a circular economy: Creating a win-win
  situation for the economy and the environment. https://www.ey.gov.tw/Page/
  5A8A0CB5B41DA11E/12c0a2b8-485d-49d7-ba9e-a9a10b82828e]
吳正新(2022)。長期追蹤調查抽樣技術與權重校正(NAER-2019-113-A-1-1-E1-03)。
  國家教育研究院。
[Wu, J.-S. (2022). Sampling techniques and weight adjustment in longitudinal surveys (NAER-
  2019-113-A-1-1-E1-03). National Academy for Educational Research.]
財團法人資源循環臺灣基金會(2019)。邁向循環臺灣:循環經濟實踐案例手冊。
[Circular Taiwan Network. (2019). Toward a circular Taiwan: Handbook of circular economy
  case practices.]
國立臺灣師範大學心理與教育測驗研究發展中心(2024)。113年國中教育會考各科等級
  加標示人數百分比統計表。https://cap.rcpet.edu.tw/exam/113/113Table2.pdf
[Research Center for Psychological and Educational Testing, National Taiwan Normal
  University. (2024). Percentage statistics of students reaching each proficiency level in each
  subject in the 2024 comprehensive assessment program for junior high school students.
  https://cap.rcpet.edu.tw/exam/113/113Table2.pdf]
國立臺灣師範大學心理與教育測驗研究發展中心(2025)。114年國中教育會考各科等級
  加標示人數百分比統計表。https://cap.rcpet.edu.tw/exam/114/114%E5%B9%B4%E5%9
  C%8B%E4%B8%AD%E6%95%99%E8%82%B2%E6%9C%83%E8%80%83%E5%90%
  84%E7%A7%91%E8%A8%88%E5%88%86%E8%88%87%E9%96%B1%E5%8D%B7%
  E7%B5%90%E6%9E%9C%E8%AA%AA%E6%98%8E.pdf
[Research Center for Psychological and Educational Testing, National Taiwan Normal
  University. (2025). Percentage statistics of students reaching each proficiency level
  in each subject in the 2025 comprehensive assessment program for junior high school
  students. https://cap.rcpet.edu.tw/exam/114/114%E5%B9%B4%E5%9C%8B%E4%B
  8%AD%E6%95%99%E8%82%B2%E6%9C%83%E8%80%83%E5%90%84%E7%
  A7%91%E8%A8%88%E5%88%86%E8%88%87%E9%96%B1%E5%8D%B7%E7
  %B5%90%E6%9E%9C%E8%AA%AA%E6%98%8E.pdf]
國立臺灣師範大學心理與教育測驗研究發展中心(無日期)。各科等級描述。2025年5月
  4日,取自https://cap.rcpet.edu.tw/score1.html
[Research Center for Psychological and Educational Testing, National Taiwan Normal
  University. (n.d.). Proficiency level descriptions by subject. Retrieved May 4, 2025, from
  https://cap.rcpet.edu.tw/score1.html]
國家教育研究院(2015)。十二年國教語文領域—國語文課程綱要研修第7次全體委員大
  會會議紀錄。https://www.naer.edu.tw/upload/1/9/doc/1051/1040614%E5%8D%81%E4
  %BA%8C%E5%B9%B4%E5%9C%8B%E6%95%99%E8%AA%9E%E6%96%87%E9%
  A0%98%E5%9F%9F(%E5%9C%8B%E8%AA%9E%E6%96%87)%E8%AA%B2%E7%
  A8%8B%E7%B6%B1%E8%A6%81%E7%A0%94%E4%BF%AE%E7%AC%AC7%E6
  %AC%A1%E5%85%A8%E9%AB%94%E5%A7%94%E5%93%A1%E5%A4%A7%E6
  %9C%83%E6%9C%83%E8%AD%B0%E7%B4%80%E9%8C%84.pdf
[National Academy for Educational Research. (2015). Meeting minutes of the 7th plenary
  session on the curriculum guidelines for the 12-year basic education elementary school,
  junior high school, and upper secondary school: The domain of language arts— Mandarin.
  https://www.naer.edu.tw/upload/1/9/doc/1051/1040614%E5%8D%81%E4%BA%8C%E5
  %B9%B4%E5%9C%8B%E6%95%99%E8%AA%9E%E6%96%87%E9%A0%98%E5%9
  F%9F(%E5%9C%8B%E8%AA%9E%E6%96%87)%E8%AA%B2%E7%A8%8B%E7%B
  6%B1%E8%A6%81%E7%A0%94%E4%BF%AE%E7%AC%AC7%E6%AC%A1%E5%
  85%A8%E9%AB%94%E5%A7%94%E5%93%A1%E5%A4%A7%E6%9C%83%E6%9C
  %83%E8%AD%B0%E7%B4%80%E9%8C%84.pdf]
教育部(2014)。十二年國民基本教育課程綱要總綱。
[Ministry of Education. (2014). Curriculum guidelines of 12-year basic education: General
  guidelines.]
教育部(2018)。十二年國民基本教育課程綱要國民中小學暨普通型高級中等學校:語
  文領域—國語文。
[Ministry of Education. (2018). Curriculum guidelines for the 12-year basic education:
  Elementary school, junior high school, and upper secondary school– The domain of
  language arts— Mandarin.]
教育部(2020)。十二年國民基本教育課程綱要國民中小學暨普通型高級中等學校:議
  題融入說明手冊。
[Ministry of Education. (2020). Manual for integrating curriculum issues into the 12-year basic
  education: Elementary school, junior high school, and upper secondary school.]
教育部(2021)。十二年國民基本教育課程綱要總綱修正。
[Ministry of Education. (2021). Revised curriculum guidelines of 12-year basic education:
  General guidelines.]
教育部統計處(2020)。108(2019-2020)學年度高級中等學校科別資料。https://depart.
  moe.edu.tw/ed4500/News_Content.aspx?n=5A930C32CC6C3818&sms=91B3AAE8C638
  8B96&s=596D9D77281BE257
[Department of Statistics, Ministry of Education. (2020). Data on high school departments for
  the 2019-2020 academic year. https://depart.moe.edu.tw/ed4500/News_Content.aspx?n=5
  A930C32CC6C3818&sms=91B3AAE8C6388B96&s=596D9D77281BE257]
陳冠銘(2022)。長期追蹤調查似真值估算與量尺等化研究(NAER-2019-113-A-1-
  1-E1-02)。國家教育研究院。
[Chen, K.-M. (2022). Plausible value estimation and scale equating in longitudinal surveys
  (NAER-2019-113-A-1-1-E1-02). National Academy for Educational Research.]
曾建銘、王暄博(2012)。標準設定之效度評估:以TASA國語科為例。教育學刊,39,
  77-118。
[Cheng, C.-M., & Wang, H.-P. (2012). Assessing the standards set by TASA and its standardsetting
  procedures. Educational Review, 39, 77-118.]
鄒倫、張祖恩(主編)(2018)。循環經濟系列叢書第三冊:資源及產品循環應用技
  術。財團法人中技社。
[Tsou, L., & Chang, T.-E. (Eds.). (2018). Circular economy series volume 3: Technologies for
  resource and product recycling. CTCI Foundation.]
劉文海(2007)。中國大陸再生鋁市場現況與展望。https://www.mirdc.org.tw/
  FileDownLoad/IndustryNews/2007431016121252.doc
[Liu, W.-H. (2007). Current status and prospects of the recycled aluminum market in Mainland
  China. https://www.mirdc.org.tw/FileDownLoad/IndustryNews/2007431016121252.doc]
環境部(2018)。107至109年資源回收再利用推動計畫。
[Ministry of Environment. (2018). Resource recycling and reuse promotion plan (2018-2020).]
謝佩蓉(2018)。108課綱第四學習階段國語文閱讀素養線上評量之建構。教育科學研究
  期刊,63(4),193-228。https://doi.org/10.6209/JORIES.201812_63(4).0007
[Hsieh, P.-J. (2018). Development and validation of an online assessment of Mandarin
  reading literacy for middle school students of the 12-year basic education program.
  Journal of Research in Education Sciences, 63(4), 193-228. https://doi.org/10.6209/
  JORIES.201812_63(4).0007]
謝佩蓉、林明佳(2022)。國小中文閱讀素養長期追蹤(NAER-2019-077-A-1-
  1-E1-13)。國家教育研究院。
[Hsieh, P.-J., & Lin, M.-C. (2022). A longitudinal study of elementary students’ Mandarin reading
  literacy (NAER-2019-077-A-1-1-E1-13). National Academy for Educational Research.]
Adams, R. J. (2005). Reliability as a measurement design effect. Studies in Educational
  Evaluation, 31(2-3), 162-172. https://doi.org/10.1016/j.stueduc.2005.05.008
Alderson, J. C. (2007). The CEFR and the need for more research. Modern Language Journal,
  91(4), 659-663. https://doi.org/10.1111/j.1540-4781.2007.00627_4.x
Beaton, A. E., & Allen, N. L. (1992). Interpreting scales through scale anchoring. Journal of
  Educational Statistics, 17(2), 191-204. https://doi.org/10.2307/1165169
Brookhart, S. M. (2010). How to assess higher-order thinking skills in your classroom. ASCD.
Brookhart, S. M., & Chen, F. (2015). The quality and effectiveness of descriptive rubrics.
  Educational Review, 67(3), 343-368. https://doi.org/10.1080/00131911.2014.929565
Chall, J. S. (1983). Stages of reading development. McGraw-Hill.
Chall, J. S. (1996). Stages of reading development (2nd ed.). Harcourt Brace College Publishers.
Chen, K.-M., & Jen, T.-H. (2012, March 5-7). Positional effect of item blocks in international
  large-scale assessment caused the item difficulty different between Taiwan and the USA
  [Paper presentation]. The 6th International Technology, Education and Development
  Conference, Valencia, Spain.
Cizek, G. J. (2012). An introduction to contemporary standard setting: Concepts, characteristics,
  and contexts. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods,
  and innovations (2nd ed., pp. 3-14). Taylor & Francis.
Common Core State Standards Initiative. (2009). Common core state standards initiative
  standards-setting criteria. http://www.corestandards.org/about-the-standards/developmentprocess/
Common Core State Standards Initiative. (2022). Key shifts in English language arts. http://
  www.corestandards.org/other-resources/key-shifts-in-english-language-arts/
Council of Europe. (2001). Common European framework of reference for languages.
  Cambridge University Press.
Council of Europe. (2020). Common European framework of reference for languages: Learning,
  teaching, assessment– Companion volume. Council of Europe Publishing.
Dijk, T. A. v., & Kintsch, W. (1983). Strategies of discourse comprehension. Academic Press.
DJEFAFLIA, K. (2018). Developing critical thinking skill through reading: Both teachers and
  first year master students’ perspectives. Guelma University.
Gonzalez, E. J., Joseph Galia, A. A., Erberber, E., & Diaconu, D. (2004). Reporting student
  achievement in mathematics and science. In M. O. Martin, I. V. S. Mullis, & S. J.
  Chrostowski (Eds.), TIMSS 2003 technical report (pp. 275-307). TIMSS & PIRLS
  International Study Center.
Gouëdard, P., Pont, B., Hyttinen, S., & Huang, P. (2020). Curriculum reform: A literature review
  to support effective implementation (OECD Education Working Paper No. 239). OECD.
  https://doi.org/10.1787/efe8a48c-en
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
James, Z., & Rossiter, J. (2018). Using scale-anchoring to interpret the young lives 2016-17
  achievement scale. Young Lives.
Kintsch, W. (1988). The role of knowledge in discourse comprehension: A constructionintegration
  model. Psychotocical Review, 95(2), 163-182. https://doi.org/10.1016/S0166-
  4115(08)61551-4
Kintsch, W. (2019). Revisiting the construction-integration model of text comprehension and its
  implications for instruction. In D. E. Alvermann, N. J. Unrau, M. Sailors, & R. B. Ruddell
  (Eds.), Theoretical models and processes of literacy (7th ed., pp. 178-203). Routledge.
Kintsch, W., & Rawson, K. A. (2005). Comprehension. In M. J. Snowling & C. Hulme
  (Eds.), The science of reading: A handbook (pp. 209-226). Blackwell. https://doi.org/
  10.1002/9780470757642.ch12
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and
  practices (3rd ed.). Springer.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
  https://doi.org/10.1007/BF02296272
New Zealand Ministry of Education. (2015). The New Zealand curriculum. Crown.
New Zealand Ministry of Education. (2025a). The learning progression frameworks. https://
  curriculumprogresstools.education.govt.nz/lpf-tool/
New Zealand Ministry of Education. (2025b). Understanding the reading framework. https://
  curriculumprogresstools.education.govt.nz/lpfs/understanding-the-reading-framework/
Organisation for Economic Co-operation and Development. (2017). Proficiency scale
  construction. In OECD (Ed.), PISA 2015 technical report (pp. 275-287).
Organisation for Economic Co-operation and Development. (2019). PISA 2018 assessment and
  analytical framework.
Organisation for Economic Co-operation and Development. (2020). PISA 2018 technical report.
Organisation for Economic Co-operation and Development. (2021). 21st-century readers:
  Developing literacy skills in a digital world.
Olsen, R. V., & Nilsen, T. (2017). Standard setting in PISA and TIMSS and how these procedures
  can be used nationally. In S. Blömeke & J.-E. Gustafsson (Eds.), Standard setting in
  education: The Nordic countries in an international perspective (pp. 69-84). Springer.
Perie, M. (2008). A guide to understanding and developing performance-level descriptors.
  Educational Measurement: Issues and Practice, 27(4), 15-29. https://doi.org/10.1111/
  j.1745-3992.2008.00135.x
Phillips, G. W. (2012). The benchmark method of standard setting. In G. J. Cizek (Ed.), Setting
  performance standards: Foundations, methods, and innovations (2nd ed., pp. 323-345).
  Routledge.
Schneider, M. C., Huff, K. L., Egan, K. L., Gaines, M. L., & Ferrara, S. (2013). Relationships
  among item cognitive complexity, contextual demands, and item difficulty: Implications
  for achievement-level descriptors. Educational Assessment, 18(2), 99-121. https://doi.org/
  10.1080/10627197.2013.789296
Shalihin, R. R. (2024). Higher-order versus lower-order thinking skills: How much does the
  hierarchy matter? https://www.bera.ac.uk/blog/higher-order-versus-lower-order-thinkingskills-
  how-much-does-the-hierarchy-matter
Smith, R., Snow, P., Serry, T., & Hammond, L. (2021). The role of background knowledge in
  reading comprehension: A critical review. Reading Psychology, 42(3), 214-240. https://doi.
  org/10.1080/02702711.2021.1888348
Stanley, T. (2019). Using rubrics for performance-based assessment: A practical guide to
  evaluating student work. Routledge.
Stevens, D. D., & Levi, A. J. (2013). Introduction to rubrics: An assessment tool to save grading
  time, convey effective feedback, and promote student learning (2nd ed.). Stylus.
U.S. Department of Education. (2025). Distribution of reading questions: 2019 and 2024.
  https://nces.ed.gov/nationsreportcard/reading/distributequest.aspx
von Davier, M., Kennedy, A., Reynolds, K., Fishbein, B., Khorramdel, L., Aldrich, C.,
  Bookbinder, A., Bezirhan, U., & Yin, L. (2024). TIMSS 2023 international results in
  mathematics and science. Boston College, TIMSS & PIRLS International Study Center.
  https://doi.org/10.6017/lse.tpisc.timss.rs6460
Wagner, J.-P., & Hastedt, D. (2022). Valuing curriculum-based international large-scale
  assessments: Ensuring alignment with national curricula in IEA studies. IEA Compass:
  Briefs in Education, 16, 1-7.
Yan, Z., & Yang, L. (2022). Assessment-as-learning in the global assessment reforms. In Z. Yan
  & L. Yang (Eds.), Assessment as learning: Maximising opportunities for student learning
  and achievement (pp. 1-7). Routledge.