Language Educators Assemble » Educational Technologies » 7 Critical Questions Educators must ask about Generative Artificial Intelligence in Language Learning

7 Critical Questions Educators must ask about Generative Artificial Intelligence in Language Learning

Is generative AI the future of language education or just tech hype? Find out what questions really matter, beyond those flashy headlines.

Generative AI In Language Learning (A digital painting scene of a young child joyfully writing egyptian hieroglyphics with a large, glowing digital canvas in which a friendly, anthropomorphic AI character smiles encouragingly, surrounded by swirling letters, symbols, and imaginative doodles.)

Beyond the buzz: What should language educators really ask about Generative AI?

At this point, let’s face it—we’re light years beyond simply worrying that generative artificial intelligence (AI) might disrupt language learning. It’s already here, and it’s already shaking things up for researchers, learners, and educators (yes, that’s us!). By now, most of us—if not all—should have moved beyond the “glitz factor” stage and gained enough insight and hands-on experience to build our own approach to adopting Generative AI, much like an early career teacher transitioning to a more seasoned and confident professional after a few years on the job.

But the reality is, depending on the context we operate in, not everyone enjoys the benefit of clearly defined guidelines or practical strategies to effectively navigate changes. Truth be told, some of us don’t even have a reliable professional learning community to discuss and determine the best practices suited to our local situations—meaning we often need to tackle challenges armed only with optimism and a generous dose of guesswork.

Exploring artificial intelligence in language learning certainly isn’t new—it was happening well before ChatGPT made generative AI a household name. But ChatGPT’s meteoric rise put generative AI firmly on everyone’s radar, prompting trailblazing educators and researchers to capture and organise a diversity of pioneering ideas to guide or inspire subsequent classroom practice.

My goal with this article is straightforward: to distil key insights from content I’ve previously shared on LEA’s LinkedIn page, keeping our minds engaged and critical conversations flowing. I’ve heard stories about professional learning sessions that unfortunately devolve into complicated technical showcases or thinly disguised sales pitches, rather than truly insightful conversations bridging research and practical application. My hope is that this piece provides a refreshing alternative for you, even if it’s not yet comprehensive.

Get real-time updates and BE PART OF THE CONVERSATIONS by joining LEA’s online communities on your favourite platforms! Connect with like-minded language educators and get inspired for your next language lesson.

1. Exploring Generative AI in language classrooms: How much do we really know about the current state?

State of GenAI use in language classrooms (a photo of a young student using digital tablet under teacher guidance at school)
Photo from Rawpixel / A student using tablet under teacher guidance

Yes, every other educator or learner might be generatively doing something in the name of teaching and learning, but let’s pause and ask—how is everyone actually doing? While none of us wants to become that cynical colleague who dismisses every new idea as pointless fluff, we also shouldn’t blindly assume we’re all doing brilliantly or that we’re definitely headed in the right direction.

And so, the first practical question we should ask, as with most fresh initiatives teachers introduce in their own classrooms, is simply: what’s everyone else doing, and is it making a difference? And by everyone, I don’t mean just anyone—I mean the colleagues we’re working within our local context, teaching the same curriculum we’re familiar with, and probably dealing with the same types of students and local systems. What strategies are they using, how are these strategies working, and what useful insights can I draw from their experiences to enhance my own classroom practice?

This is why I applaud the effort by Nugroho et al. (2024) to codify experiences of teachers in using ChatGPT in a local context (university English Language Teaching in Indonesia). While there have been much shared by many people – IT professionals, entrepreneurs, self-proclaimed influencers – they do sometimes over-generalise their use cases to be universally effective in any context – which, let’s be honest, they don’t.

In this study, the teachers were found using ChatGPT to design lesson plans, prepare teaching materials, create assessments, translate texts, paraphrase sentences, enhance vocabulary, and improve grammar and syntax of raw texts. From the students’ perspective, they mainly use it to learn vocabulary and get assistance in writing. Even if these results aren’t exactly headline news, they do offer Indonesian language educators a useful glimpse into the methods and practices their colleagues are adopting with Generative AI.

On a broader scale, Cambridge has published a few research papers last year, one of which is on the “Impact  of Generative AI on  Language Education: Insights from Teachers”. This paper presents the results of a global survey examining English teachers’ experiences, attitudes, and the challenges they encounter. Key findings from the survey show that a solid majority of 71% of teachers currently incorporate Generative AI into their lessons—40% on a weekly basis and another 31% monthly. Interestingly, 84% of educators feel Generative AI effectively complements traditional teaching methods and can comfortably integrate into their roles. Additionally, around two-thirds (67%) said Generative AI offered fresh insights and perspectives they hadn’t previously considered. Teachers also expressed strong interest in further training, with 85% keen to explore more about Generative AI’s potential, particularly among private school educators.

How would the state of Generative AI use in language learning be like in your local context as compared with these findings? To be frank, I also don’t have the answers to such questions in my familiar contexts of practice, though I’d argue that is something that we should work on.

2. How do the various manifestations of Generative AI synergise in language classrooms?

I’d be upfront. I see this as one important question, but I haven’t yet done enough digging to offer useful preliminary findings. Still, I want to highlight it prominently, because it’s been nagging at me ever since the Generative AI excitement kicked off. Suddenly, we’re swimming in a sea of countless platforms and teaching options. Sure, it’s simple enough to brush aside the obviously irrelevant ones, but how do we navigate the gazillion others that actually look promising? Is there a way to dive in without sinking into this overwhelming abyss?

An ecosystem of applications (abstract image of many nodes connected by neon lines with cloud symbols signifying an ecosystem of applications)
Image from Rawpixel / An ecosystem of applications

One guiding framework I always return to is the TPACK, a key framework to guide our choice of EdTech tools. My key message, viewed through the lens of the TPACK framework, is simple: we should select technological tools first and foremost based on the specific language knowledge and skills we aim to teach (content), and the carefully planned teaching methods we use to support learning (pedagogy). While our own technological skills naturally influence our choices, content and pedagogy must always be the deciding factors.

Particularly pertaining to Generative AI, earlier in the days, Scott Thornbury organised a webinar to talk about “OpenAI & the future of language learning”. During the session, he outlined eight key principles to consider when deciding whether to adopt a new educational tool: Adaptivity, Authenticity, Creativity, Interactivity, Mediation, Autonomy, Engagement, and Indispensability. Feel free to check out the webinar for a deeper dive into these principles—and to see if they’re ones you’d like to follow yourself.

On the other extreme end, I’ve also encountered many educators who take to the equation: Generative AI = LLM = ChatGPT. This may still be the case, even after 2+ years of rapid development. It’s understandable, since ChatGPT as a platform and interface combines many possible use cases. That being said, we should be cognisant of the differences between the three:

  • Generative AI: the whole universe of generative technologies that creates content (e.g., text, image, audio, video) based on certain set of instructions to draw out and process the necessary data from its trained dataset
  • Large Language Model (LLM): an AI system at the backend trained on large amounts of linguistic textual data that seeks to represent natural language use that is then leveraged to perform various language-related tasks
  • ChatGPT: a chatbot designed to perform various generative tasks leveraging different backend AI systems, including different LLMs developed by OpenAI as well as other models (e.g., Whisper for audio, Dall-E for images)

If GenAI represents an ecosystem of governance, large language models then function as the individual agencies within that government (alongside other models for image, audio, video or any other form of signals), while ChatGPT acts as an integrated public-facing service platform, gathering relevant information and resources from these agencies to serve citizens through conversational exchanges.

And as such, while ChatGPT might be the most widely used LLM-powered chatbot, we have to know that it’s not the only chatbot and other chatbots or LLM-powered solutions do have different strengths and limitations. We can, of course, rely solely on ChatGPT to handle all our use cases. However, I thought the deeper consideration lies in whether we can systematically integrate our “governance structures” (various AI solutions) to substantially improve the quality and effectiveness of our language teaching environments.

Join our mailing list!

Receive insights and EXCLUSIVE resources on language education in a monthly newsletter, fresh into your inbox. No Fees, No Spam, so No Worries!

Post Subscription Box

3. Which aspects of language learning can hardly be replaced by Generative AI, at least for now?

We’ve probably heard or seen a lot of examples from how Generative AI has been used for specific language learning goals:

  • GenAI-powered chatbots or conversational agents providing interactive and sustained speaking practice opportunities;
  • GenAI-powered writing tools (or simply prompting publicly accessible AI chatbot to analyse and correct grammar) providing automated and immediate feedback to language learners on their writing, identifying errors and suggesting improvements;
  • GenAI being used to generate vocabulary lists for targeted topics or themes;
  • GenAI being used to create a range of audio content with options for adjusting the pace, visual support, and level of difficulty;
  • GenAI being used to generate texts with customisable features such as the targeted use of words or structures.

Still, as the human element in this equation, recognising the boundaries of what Generative AI can and cannot achieve for us and our learners is just as important as appreciating its capabilities. With this insight, we can then strategically delegate certain language learning tasks to Generative AI while focusing our efforts on facilitating aspects of learning that Generative AI simply can’t handle – at least for now.

So, what are some of such aspects? One aspect that perhaps didn’t get much attention before is the subtle role of pragmatics in everyday language use. Pragmatics is a branch of linguistics concerned with the study of how context influences the interpretation of meaning. It examines how speakers use language in social interactions, considering factors such as intention, inference, and situational context.

Engaging in an authentic chat (three friends having casual conversation in a cafe)
Photo from Rawpixel / Real-life meaning in communication is derived from many different cues

In language curriculum, we don’t usually segregate pragmatics from semantics in the teaching and learning of meaning. In our educational language, we may talk about it in terms of “literal vs metaphorical” or “surface vs inferred” meanings. The limitations with Generative AI, thanks largely to the fact that LLMs are trained on text-based data, is that it stays mostly confined within textual boundaries when trying to represent the human linguistic experience. In simpler terms, they’re more semantically focused by design. They can often miss the mark when it comes to fully capturing the messy, context-driven aspects of real-world language (pragmatics).

Why is this so? When humans encode or decode language, we deal with data from many sources: global knowledge (e.g., cultural understanding of slangs, emerging ways of expressing in neologism), previous interactional experiences (e.g., secret jokes with a close friend), actual situational details, etc. Generative AI is not (yet) able to access and tag all of that information – frankly speaking, I’m completely against the idea of Generative AI trying to aim for that anyway (this thesis would require a whole article). That’s also why Generative AI is not good with humour even if it appears to be able to generate jokes, purely based on statistical output from previous texts with patterns that are tagged with humour.

What does that mean for language learning, even with Generative AI? Rethinking some of the foci in our language classrooms can be helpful. Facilitating creative language use (with less restriction on experimentation) can be helpful. Helping our learners encode and decode language use, particularly in frameworks which take in data beyond the text (e.g., other contextual clues, inter-discourse information, situational cues), can become more prominent.

An example is the notion of interjections —such as “um”, “ah”, “oh”, and “hmm”. Previously these were viewed as indicators of hesitation or uncertainty, these linguistic elements actually provide valuable cues that help interlocutors interpret meaning and intention in authentic conversations (e.g., unscripted). They serve important communicative functions, aiding speakers in managing conversational flow, signaling comprehension, and facilitating turn-taking. This, and many other examples of pragmatics, can become more significant in our language learning experiences – something Generative AI isn’t quite ready to handle just yet (or never).

4. What is the value proposition of a human educator in contrast to a GenAI language learning solution?

Just as I’m writing this article, my social media feeds are buzzing with posts about Bill Gates’ latest prediction—that within the next decade, AI will replace doctors and teachers. Frankly speaking, I suspect many of these headlines are words taken out of context, possible deliberately alarmist as clickbait (ok, I confess I also took the bait). Nevertheless, they consistently highlight one key anxiety: will AI eventually take over human roles in many professions? At least for us, we’d ask will Generative AI eventually make human language educators obsolete?

I believe my previous section addresses this question to a certain extent: some areas of language learning are still inevitably requiring a human educator to facilitate – Generative AI is not yet, and possibly not ever, able to do that. I’d expand this further with insights from a neuroscientist, David Eagleman. In an interview on the podcast show “People I Mostly Admire” (from the Freakonomics Radio network), Eagleman argues that while current Generative AI models are impressive, they lack essential brain-like attributes such as an internal model of the world and genuine understanding.

“But it’s not yet the brain or anything like it for a couple of reasons. It’s just taking the very first idea about the brain and running with it. What a large language model does not have is an internal model of the world. It’s just acting as a statistical parrot.”

David Eagleman (2024): Feeling Sound and Hearing Color

There are many interesting experiments Eagleman shares in this episode, but I thought the significant insight is that how humans are able to interpret and respond coherently to sensory input from many different sources – to the extent that humans can even “rewire” themselves to interact with the world by “feeling sound and hearing colour”. What this means is that a highly adaptable psychobiological entity like us is not yet something that Generative AI is able to emulate – using language to represent such a human experience may also be beyond the scope of LLMs.

Sure, Generative AI is steadily becoming more multi-modal, meaning it’s able to handle various types of input simultaneously, and data scientists are busy developing models for different human faculties (e.g., emotions). But putting all these pieces together into one universal, adaptive system capable of processing novel sensory inputs (think: having all five human senses) and continually learning—also known as advanced Artificial General Intelligence—is still firmly in the realm of science fiction. On a practical note, just consider the enormous computational power required to operate such a system efficiently—the amount of energy and resources we’d need is staggering, and frankly, still beyond our current capabilities.

What has these have to do with language learning? Remember the different functions of language? Language is an important medium in which our human experience is represented and passed on as a legacy. Its fluid and evolving character makes it impossible enough for statistical predictions alone to encapsulate all its intricate details – Generative AI without sensory input sources and a full mind is unable to replace a human completely for human language use.

I’d just add a final argument, especially in the context of our practice. At the end of the day, we’re not here to teach language to Generative AI—we’re teaching human learners how to use language effectively. Sure, our learners might use Generative AI as a tool along the way, but who exactly are they actually hoping to communicate (and connect) with eventually? Are they learning language to chat with a robot, connect with an AI platform, or bond with something utterly lifeless? Of course not.

the human educator in action (a female lecturer explaining aerodynamics with a model aeroplane in her hand)
Photo from Rawpixel / Learners need the human touch in understanding authentic language use

Ultimately, we’re guiding our learners to communicate meaningfully with other humans, thereby expanding our shared human experience. If we take away the human dimension from this process, along with our own roles as educators guiding meaningful interactions, what exactly are we achieving? Will learners still be able to handle genuine human conversations, or will they gradually lose their sensitivity to the subtle yet vital cues—like social expectations and conversational nuances—that bring human interactions to life?

Learners can also be with us on this. In a recent move to introduce AI in administrative processes in higher education, I didn’t observe unanimous support from the students. In this instance, the aim was clearly to put Generative AI to work in answering all those mundane, trivial questions—like dates for upcoming exams or assignment details—that academics found increasingly burdensome, thus reducing their opportunity costs and enabling them to concentrate on what really matters: enlightening students with their professional knowledge and experience. However, even with such an arguably straightforward benefit, I observed criticisms of the increasing focus of academics moving away from direct human interactions in a separate thread elsewhere in my social feed. The university students still prefer being supported by real people, and I’d bet that’s true for many others, including our language learners.

Get real-time updates and BE PART OF THE CONVERSATIONS by joining LEA’s online communities on your favourite platforms! Connect with like-minded language educators and get inspired for your next language lesson.

In the previous two sections, I’ve discussed the “non-negotiables” of language learning and human educator despite Generative AI. In this section, I’d be moving into the other side of the coin where the key question is really to look at the expanding horizons because of Generative AI. Considering the widespread integration of generative AI into many dimensions of life and educational practice, which new skills and knowledge areas—formerly not regarded as necessary—have now become relevant to master? Within this set of competencies, which are particularly relevant to the field of language education?

For a start, let’s look at the more generic set that have been presented over these few years:

  • Prompt Engineering and Effective Querying: the ability to formulate precise, clear, and contextually relevant prompts to generate meaningful responses that actually aid in learning and not misguide learners;
  • Critical Evaluation of AI-Generated Content: the disposition (e.g., willingness, effort) and critical thinking skills to assess the quality, accuracy, and appropriateness of AI-generated content with the awareness that Generative AI can hallucinate; and
  • Digital Literacy and Ethical Awareness: a holistic understanding of digital ethics (e.g., IP rights, plagiarism), data privacy, and responsible AI use.

Now, these generic sets aren’t necessarily solely our business – our fellow educators in other areas should be part of the collective effort in developing them. Beyond this, we should engage in our critical discussion on what other “uniquely linguistic” skills and knowledge exist.

One great example was highlighted by Kalantzis and Cope in a very recent article “Literacy in the Time of Artificial Intelligence”. Kalantzis and Cope argue that traditional literacy education, which focuses on stable rules and forms, is inadequate in the face of Generative AI’s capabilities. Instead, they advocate for a more dynamic understanding of literacy that encompasses “multimodal and transpositional meaning-making”. By “multimodal meaning-making”, it means the ability to analyse, interpret, and generate multimodal outputs that integrate various modes of communication, such as text, images, videos, audio, space, and embodied gestures; and by “transpositional meaning-making”, it means the process of constantly moving across and between different modes and forms of meaning that involves representation (meaning for oneself), communication (meaning made for others), and interpretation (the meaning others make of one’s communication). This is the type of “literacy” that goes beyond what Generative AI can perform now but also demonstrates the type of linguistic phenomena that are becoming more prominent alongside Generative AI language use.

The authors also introduce the concept of “cyber-social literacy learning” which highlights a collaborative relationship between learners and AI. This approach encourages our learners to engage with AI as a tool for enhancing their writing and critical thinking skills rather than viewing it merely as a means to produce text. Within this context, literacy is framed as a dialogical and interactive process, where our learners’ interpretation interacts with the perspectives and feedback provided by the AI system. They argue that this relationship can help our learners develop deeper cognitive processes and foster a more nuanced understanding of meaning-making.

Beyond this, my personal take is that we need to equip learners with more precise metalinguistic vocabulary, providing clear labels that help them sharpen their linguistic awareness more easily. While Generative AI can offer technical feedback based on statistical probabilities of grammar rules and patterns, our learners need to genuinely grasp what these rules mean—like understanding terms such as “word order” or “conjugation”, or knowing exactly what to do when these terms confuse them—in order to learn and use language more effectively. And naturally, we also need to complete the loop by exploring how learners can leverage their newfound metalinguistic insights to actively improve their own language skills.

Join our mailing list!

Receive insights and EXCLUSIVE resources on language education in a monthly newsletter, fresh into your inbox. No Fees, No Spam, so No Worries!

Post Subscription Box

6. What worrying concerns have been raised in relation to Generative AI in language learning?

Back when the widespread adoption of Generative AI seemed unavoidable, we’ve probably encountered numerous debates and criticisms about its use, especially warnings against becoming over-reliant on it, which could then lead learners down a path of questionable ethics and failed learning. One particularly striking point, I thought, was the idea of “intellectual laziness”. If students use Generative AI merely as a handy shortcut for finishing assignments—much like grabbing the nearest tool to breeze through a mundane chore—it’s hardly surprising they’d miss out on genuine reflection, meaningful engagement and productive struggles (crucial elements for developing independent thinking).

Even now, as we become more comfortable with Generative AI, these concerns haven’t vanished, and it’s wise to keep them on our radar. Any of us who wishes to keep tabs on these challenges would appreciate the detailed analysis provided in this comprehensive Cambridge report: “Generative AI and Language Education: Opportunities, Challenges and the Need for Critical Perspectives”.

Notwithstanding such, my intention in highlighting this question as one of the key issues we should keep an eye on isn’t to reiterate limitations that also apply to subject areas beyond language learning. Rather, my real aim here is to bring attention for us to identify the particular challenges unique to language learning—especially the ones that usually slip through the cracks of our conversations. Let me just propose two candidates.

a language exam in progress (close shot of a student completing her language exam with a teacher-invigilator in the background)
Photo from Rawpixel / A student taking a traditional paper-and-pen language test

First up is the trend of automated language assessment. We’ve probably heard of major testing agencies like PTE increasingly adopting automated testing methods. But what we might not realise is that this shift towards automation has subtly reshaped which aspects of language proficiency get prioritised during testing. For instance, in traditional speaking assessments, a human examiner may pay close attention to what you’re actually saying (content) and how clearly you communicate your ideas (a combination of perceived pronunciation and fluency). But in automated tests, the spotlight can shift noticeably towards fluency—how quickly and smoothly you speak, and how often you pause or hesitate—since these features are simpler for AI to evaluate objectively. The issue with these changes is that they’re not really supported by professional research indicating oral assessments should head this way but are driven more by practical considerations. So, the question remains—how much should we actually allow this trend to go unchecked?

The second is about the standards of language use. I’ve explored on my blog several issues around language standards, including whether we should recognise the significance of linguistic variation and whether adhering strictly to native-speaker norms is beneficial. The tricky thing about Generative AI language models is that they’re trained on language patterns which inherently carry biases. Communities that aren’t prominently represented in the training data often see their linguistic standards ignored or undervalued. English itself is a prime example, as regional dialects can easily become marginalised (e.g., hardly be taken as an example if no explicit prompts to do so), and the same applies to many lower-resourced languages (e.g., not even represented as a construct of how language should be used). So, how exactly will generative AI reinforce biases by perpetuating language standards that may not be meaningful or appropriate universally? It’s a question worth examining closely, particularly if we genuinely support linguistic diversity. Using publicly accessible AI tools (like ChatGPT, Gemini, or Meta) without specifically fine-tuning them for particular contexts could easily lead us down this problematic path.

7. How may Generative AI in language learning be designed with stronger neuroscientific basis?

My final question touches upon an area that’s rarely discussed—at least, I haven’t encountered many conversations approaching Generative AI and language learning from this perspective. Granted, discussions on related topics are available, but synthesising all in one coherent manner? That’s still missing. And that’s a shame, because it’s important.

Educational research has steadily shifted towards being more theory-driven, evidence-based, and informed by learning sciences. Now, with Generative AI entering the scene, we do have an exciting opportunity to explore wider and deeper into this area. Plus, today’s technology gives us greater ability to gather and examine vast amounts of language learning data alongside psychological and neuroimaging data—something that previously went unnoticed or were simply too tricky to observe at scale.

I’ve earlier shared on Eagleman’s interview. In the same interview, he also argues that traditional classroom instruction—passive, lecture-based learning—is suboptimal because it does not leverage the brain’s natural curiosity-driven learning mechanisms. He explains that curiosity triggers neurotransmitter release, enhancing memory retention. Therefore, educational approaches that deliver information “just-in-time” (when learners are actively curious) rather than “just-in-case” (traditional rote learning) are arguably more effective. In that light, he advocates for interactive, project-based learning approaches, where students actively seek out information and test hypotheses, aligning education with the brain’s intrinsic learning processes.

We may all have our take on Eagleman’s proposal – and that’s totally fine as the learners we’re working with may be different. But even as we reject his proposal, we have to be clear and justify the rejection with sound reasoning and evidence – not just simply on difference in philosophical preference or epistemological understanding.

Ai In Language Education
Image generated by Bing / Abstract image of a human brain connected to a network

What’s more, where AI is concerned, especially in the more sophisticated systems anchored on neural-based architectures, AI has always an intriguing relationship with the human brain. Scientists have found many similarities between neural activity and AI modelling. When we explore AI as a simulation of brain processes—considering the inner workings of LLMs, and how information is analysed and interpreted within the networks (e.g., how weighting of parameters is shifted with every prompt)—what valuable clues might we uncover about the language learning journeys of our students? Is there potential in using Generative AI to decode and illuminate the nuanced steps learners take as they progress through different stages of language development, say by allowing Generative AI systems to document the changes?

No doubt, such research agendas encompass much about privacy and ethics. Tackling these challenges, especially when the insights gained are genuinely valuable to us and society at large, is exactly what researchers should strive for. But whether or not this actually pans out, my main point is this: let’s stay mindful of what research tells us about the brain and mind when we use Generative AI. And equally, let’s harness Generative AI to deepen our understanding of how the brain and mind work in language learning.

Let’s continue our critical discussion of these questions

And so here you are, having endured my take on the 7 key questions pertaining to the use of Generative AI in language learning. I need to put in a strong disclaimer nevertheless: I’m just one language educator offering commentary shaped by my own interpretation of the research I’ve managed to find and absorb, judged and valued/dismissed by my personal idiosyncrasies. These questions should continue undergoing critical examination from all of us (language educators) collectively. It’s only through contributing our unique viewpoints to the conversation that we can then jointly uncover answers that are richer and genuinely meaningful in future.

And with that in mind, let me share below some recommended readings for those of us venturing out and exploring: