Paper 022: A novel approach to the ESP keyword list: 2800 entries with frequent lexical bundles for data-driven learning

anthony September 30, 2020 Uncategorized 11 Comments

Paper 022: A novel approach to the ESP keyword list: 2800 entries with frequent lexical bundles for data-driven learning

HENSHAW, Michael (Hokkaido University, Japan)

Keywords: AntConc, DDL, lexical bundles, n-grams, ESP

Abstract

Here I present and describe the creation of a wordlist designed to go in the hands of students. Many English for Specific Purposes (ESP) wordlists are constructed with coverage metrics and efficiency in mind, often for use by instructors or materials developers, meaning these lists and associated corpora are scarcely seen by students. Furthermore, while general academic multiword lists exist, those designed for specific fields are lacking. Thus, as a model to fill these gaps, I present the Keywords of One Health Biomedical Sciences (KOBS), a 2800-lemma database derived from the One Health English Corpus, a 2.8 million-token corpus of 651 research articles. Both materials were created for and distributed to L2 first-year PhD students of veterinary medicine, a multidisciplinary field guided by the One Health approach whose interests overlap with medicine, ecology, and other health sciences. To my knowledge, this is the first large vocabulary list or corpus developed for veterinary medicine. Additionally, from inception, KOBS was designed for the student-as-user; it is an easily navigable 2 MB Excel file with over 10 tabs of curated sublists. Each entry includes: most-used lexical bundles both left and right of the node, thematic category (e.g. lab technique, comparison, biochemistry), highest correlated subcorpus (e.g. Introduction, Infectious Diseases), and more. KOBS is a data-driven learning tool for writing research papers whereby students may, for instance, filter the Discussion sublist for category ‘transition’ to return 31 words disproportionately used in the discussion section: #1) also. freq./text=3.38: “it has also been shown to”, “not only in X, but also in Y”; #31) unfortunately. freq./text=0.04: “Unfortunately, this (method does not)”. In short, this enhanced keyword list offers a method for writers to bypass direct interaction with corpora and achieve quick solutions.

Presentation video

Supplementary Information

None

Q&A live (Zoom) session

No longer available.

11 Comments

anthony Post author
October 1, 2020 at 6:36 am

This looks to be a very interesting paper. I hope you enjoy the conference! – Organizing committee
Michael Henshaw
October 2, 2020 at 9:28 am

Links for relevant materials.

Excel file with sample tabs from Keywords of Biomedical Sciences (KOBS):

https://www.researchgate.net/publication/344450112_Keywords_of_Biomedical_Sciences_-_KOBS_-_sampler

Cover pages for the corpus from which the wordlist was created:

https://www.researchgate.net/publication/344449951_One_Health_English_Corpus_OHEC_cover_page
Michael Henshaw
October 3, 2020 at 1:35 am

Bibliography
Chen, M., & Flowerdew, J. (2018). Introducing data-driven learning to PhD students for research writing purposes: A territory-wide project in Hong Kong. English for Specific Purposes, 50, 97-112. https://doi.org/10.1016/j.esp.2017.11.004
Dang, T. N. Y. (2019). Corpus-based word lists in second language vocabulary research, learning, and teaching. In S. Webb (ed.), The Routledge Handbook of Vocabulary Studies. New York: Routledge, 288-303.
Durrant, P. (2009). Investigating the viability of a collocation list for students of English for academic purposes. English for Specific Purposes, 28, 157-169. https://doi.org/10.1016/j.esp.2009.02.002
Eriksson, A. (2012). Pedagogical perspectives on bundles: Teaching bundles to doctoral
students of biochemistry. In James Thomas & Alex Boulton (eds). Input, Process and Product: Developments in Teaching and Language Corpora. Brno: Masaryk University Press, 195-211.
Gandur, A. M. (2015). Titles in research and review articles in veterinary medicine: A corpus-based study [Master’s thesis, Universidad Nacional de Córdoba, Argentina]. Repositorio Digital UNC. http://hdl.handle.net/11086/4176
Gardner, D., & Davies, M. (2014). A new academic vocabulary list. Applied Linguistics, 35, 305-327. https://doi.org/10.1093/applin/amt015
Huntley, S. J., Mahlbergb, M., Wiegandb, V., Gennipc, Y., Yang, H., Deana, R. S., Brennana, M. L. (2018). Analysing the opinions of UK veterinarians on practice-based research using corpus linguistic and mathematical methods. Preventive Veterinary Medicine, 150, 60-69. https://doi.org/10.1016/j.prevetmed.2017.11.020
Learner, H., & Berg, C. (2017). Comparison of Three Holistic Approaches to Health: One Health, EcoHealth, and Planetary Health. Frontiers in Veterinary Science, 4, 1-7. doi: 10.3389/fvets.2017.00163
Lei, L., & Liu, D. (2016). A new medical academic word list: A corpus-based study
with enhanced methodology. Journal of English for Academic Purposes, 22, 42-53. http://dx.doi.org/10.1016/j.jeap.2016.01.008
drprc80
October 3, 2020 at 2:48 am

Hi Michael,

For some reason the Zoom chat is scheduled for tomorrow (Sunday), not the 3rd (Saturday). I had a question about “In short, this enhanced keyword list offers a method for writers to bypass direct interaction with corpora and achieve quick solutions” – My question is, how is providing learners an exhaustive excel sheet with these words on classed as ‘data-driven learning’, and what evidence of learning of these words have you collected / will you collect?
- drprc80
  October 3, 2020 at 2:53 am
  
  Never mind about the Zoom chat, that’s my mistake, reading the wrong day. I’ll ask you this question again tomorrow!
- Michael Henshaw
  October 3, 2020 at 7:41 am
  
  Thanks for asking, DRPRC. The reasoning behind my statement was thus: it seems my students have faced difficulties in using the corpus itself on AntConc, as they may not have been willing to overcome the learning curve. However, they are all very familiar with Excel in terms of searching, filtering, etc. I believe they can inform their writing choices by examining lines from the wordlist; there is a danger of slavishly copying and risking plagiarism, so it will require on their part some analysis of what to choose and what to leave.
  Perhaps I have misused ‘data-driven learning’ here; I took DDL to mean an inductive approach towards the accumulation of knowledge which can be analyzed and re-applied for one’s own purposes. But I’m relatively new to CL, so I don’t really know how this community uses the phrase.
  Finally, tough question: how will I gather evidence that students can learn this way… I’ve only just finished the list and distributed it during a workshop I gave at my faculty’s annual conference last month. Any suggestions on this would be most welcome. At this point it’s just anecdotal from a handful of self-selected students who have come to my office for writing help over the past few months. In the future, on the months-long scale, I plan to interview students to elicit self-appraisal. On the years-long scale, I plan to interview lab supervisors for progress in their students’ manuscript writing ability.
Akira Moriya
October 4, 2020 at 2:41 am

That was an inspiring presentation! I am interested in auxiliaries and passive forms as well, so I am curious to know whether the use of the passive voice construction was distributed evenly throughout the sections in a medical paper or across the fields.
You have found contextual conditions that some verbs like ‘perform’ are more frequently used in the passive voice, which I believe to be crucial when teaching students, as they would want to know what verbs can be in the passive and where in the paper they are commonly used (introduction/methodologies etc.).
- Michael Henshaw
  October 4, 2020 at 6:38 am
  
  Thank you for your comments, Akira. I assumed the passive was most common in the Methods, so that’s why I did my brief analysis starting there.
  But you’ve inspired me to look deeper, so I checked 4 of the most frequent non-grammatical / non-helping verbs across the 4 sections of RAs: performed, used, observed, studied (this is actually more common as the noun study, but anyway…).
  The data spoke pretty clearly:
  By both normalized frequency and range, the passive voice was used more in the Methods than in the 3 other sections combined. Predictably, Results was 2nd highest, with Discussion and Introduction in distant 3rd and 4th.
  Rough figures for normalized frequency:
  Methods=4.9
  Results=1.2
  Discussion=0.75
  Introduction=0.6
  Pretty dramatic!
Shimizu Takehiko
October 4, 2020 at 3:08 am

Thank you for your interesting and useful presentation.
Now, I offer some comments of using corpus in studying English. I am an undergraduate student and I also struggle with the use of collocation and various usage of English like “passive-voice”. Then, I have used the corpus (COCA) to deal with the problem. However, even if there is some examples of usages, it is difficult to judge whether my use is correct or not. Moreover, it is exhaustive to search all troubles of writing because it takes more time to search collocation in corpus rather than search simple meaning of a word in dictionaries. Do you have any ideas to solve these difficulties?
My hope is that integrating those corpora with software that revises and feedbacks self-writing such as “Grammaly”. Present grammar checkers can give me revisions, but the choice of repairing is limited and there is no evidence to the instruction of application. If Grammaly give me not only the revision choice, but also the corpus-based collocations like your research, it is more useful.
Thank you ーstudent from Kyushu University
- Michael Henshaw
  October 4, 2020 at 6:45 am
  
  Thanks for your thoughts, Takehiko. Right, teachers like me may not appreciate the high burden of becoming comfortable with using corpus tools. The short, unsatisfying answer is that you have to keep on practicing with them; after some time, you start to see patterns as if you were Neo in The Matrix.
  I believe corpus approaches are great for finding prepositions–that’s a good place to start and get comfortable.
  Finding the correct word? Now that’s hard. That’s why I spent a lot of effort on my ‘category’ section of the wordlist. I believe if you see frequencies along with synonyms or words of similar meaning, you can at least see what is most popular–but of course, high frequency doesn’t necessarily mean it’s the best word, but it is good supporting evidence.
Michael Henshaw
October 4, 2020 at 6:48 am

I really enjoyed this conference, and hope to speak more and collaborate with like-minded people.
Feel free to contact me:
mike@vetmed.hokudai.ac.jp
Twitter: @AcademicEnglis4
YouTube: Academic English for the World

Thank you for your support and helpful feedback.
Mike Henshaw

Comments are closed.