From Our Neurons to Yours

What ChatGPT understands: Large language models and the neuroscience of meaning | Laura Gwilliams

Wu Tsai Neurosciences Institute at Stanford University, Nicholas Weiler, Laura Gwilliams Season 7 Episode 4

If you spend any time chatting with a modern AI chatbot, you've probably been amazed at just how human it sounds, how much it feels like you're talking to a real person. Much ink has been spilled explaining how these systems are not actually conversing, not actually understanding — they're statistical algorithms trained to predict the next likely word. 

But today on the show, let's flip our perspective on this. What if instead of thinking about how these algorithms are not like the human brain, we talked about how similar they are? What if we could use these large language models to help us understand how our own brains process language to extract meaning? 

There's no one better positioned to take us through this than returning guest Laura Gwilliams, a faculty scholar at the Wu Tsai Neurosciences Institute and Stanford Data Science Institute, and a member of the department of psychology here at Stanford.

Learn more:

Gwilliams' Laboratory of Speech Neuroscience

Fireside chat on AI and Neuroscience at Wu Tsai Neuro's 2024 Symposium (video)

The co-evolution of neuroscience and AI (Wu Tsai Neuro, 2024)

How we understand each other (From Our Neurons to Yours, 2023)

Q&A: On the frontiers of speech science (Wu Tsai Neuro, 2023)

Computational Architecture of Speech Comprehension in the Human Brain (Annual Review of Linguistics, 2025)

Hierarchical dynamic coding coordinates speech comprehension in the human brain (PMC Preprint, 2025)

Behind the Scenes segment:

By re-creating neural pathway in dish, Sergiu Pasca's research may speed pain treatment (Stanford Medicine, 2025)

Bridging nature and nurture: The brain's flexible foundation from birth (Wu Tsai Neuro, 2025)


Get in touch

We want to hear from your neurons! Email us at at neuronspodcast@stanford.edu if you'd be willing to help out with some listener research, and we'll be in touch with some follow-up questions.

Episode Credits

This episode was produced by Michael Osborne at 14th Street Studios, with sound design by Morgan Honaker. Our logo is by Aimee Garza. The show is hosted by Nicholas Weiler at Stanford's

Send us a text!

Thanks for listening! If you're enjoying our show, please take a moment to give us a review on your podcast app of choice and share this episode with your friends. That's how we grow as a show and bring the stories of the frontiers of neuroscience to a wider audience.

Learn more about the Wu Tsai Neurosciences Institute at Stanford and follow us on Twitter, Facebook, and LinkedIn.

Nicholas Weiler:

Welcome back to From Our Neurons to Yours from the Wu Tsai Neurosciences Institute at Stanford University, bringing you to the frontiers of brain science.

Today on the show we're going to be talking about ChatGPT and what it can teach us about our own brain's communication abilities. 

But first, our new segment, Behind the Scenes at Wu Tsai Neuro. 

We started this segment because I wanted to share just some of the interesting conversations and observations I come across in the course of talking to researchers and hearing what's going on at the frontiers of the field. And today I want to talk a little bit about how amazing it is that our brains wire themselves up. They are programmed to build this incredibly complex thing that is both genetically determined and totally flexible to represent our experience and build our unique personalities. Our brains are shaped both by our genes and by our experience.

How does that happen? It's really been a challenge for a long time. How do you study the development of the human brain? You going to get babies in a lab? Are we going to watch a baby in the lab growing up over 20 years? It can't be done. And there are many ways people have tried to get around this.

But what I want to talk about today is there are actually some really exciting new technologies that are coming out partly as a result of these big ideas in neuroscience projects that are the flagship research projects supported by the Wu Tsai Neurosciences Institute. So I'm going to tell you about two studies that just came out that touch on this question of how the brain builds itself in a couple of different ways.

The first one, it was led by Sergiu Pasca, who we've had on the show a couple of times. I'll link to our past conversations in the show notes. His lab has done this amazing work coming up with ways of growing simplified human brain circuits in the lab. They can take skin cells, reprogram them into stem cells, and basically get those stem cells to become actual human brain tissue in the lab. And this time, they've actually grown a complete sensory circuit in the lab. It's what they're calling an assembloid, bringing together multiple different kinds of neural tissue to wire up together.

So they've got the sensory neurons that detect pain stimuli. They've got the spinal cord neurons that transmit it to the brain. They've got the thalamus neurons, this is sort of a waystation that does incoming and outgoing information. And they've got the cortex, the sort of wrinkly outer part of the brain that we always see that does the sensory perception. So in a dish, they've got these four different populations of neurons and they can actually stimulate the sensory neurons and see the information processing through this little circuit in a lab dish.

And I was talking with Sergiu about this just before the paper came out, and I thought he said something that was really interesting to me. He's really excited about the opportunity this represents to study chronic pain, to understand how these circuits work and so on. That's why they're doing the research. But the observation that he had was each group of neurons has four different kinds of cells that they could wire up with, and they're picking the right ones. They know how this circuit needs to be developed genetically and molecularly. It's like they know the plan. There's a clear plan, these neurons know how to wire up this circuit without any help from the human experimenters and without any experience. So obviously, this is a way of observing the algorithm of the human brain development in a way that has just never been possible before.

And that conversation actually reminded me of another study that just came out, also a emerging from one of our big ideas in neuroscience projects, that also touches on this question of how our brains are intrinsically wired versus how they're shaped through our experience of life. This is this age-old nature and nurture conversation.

This one starts with kind of an odd observation. You and I and everyone else has a patch of the brain that is sensitive to different faces. This is the part of your brain where face recognition happens. And the weird thing is that the location of this patch right in front of your ear is almost exactly the same in you, me, everyone else who's listening to this show. And how can that be? Does the logic of the algorithm of brain wiring that Sergiu and his team are observing in the dish extend all the way to the level of detail of the patch of the brain that is recognizing faces? There's also a patch nearby that recognizes letters. How could that be hard-wired? That's not something that we evolved with.

So to answer this question is obviously really hard. How would you track the wiring of a human brain to see how these patches develop? To do this, Kalanit Grill-Spector and her lab in the Department of Psychology had to create new MRI brain imaging hardware and software tools to actually study sleeping human babies in an MRI scanner. And they were able to look at how the connections between brain regions develop over the first six months of life for the first time.

I'll leave you to read the details of it in the write-up that we did, it's nuanced. But the takeaway that I came away from this with is what they're finding is that it isn't a question of are our brains shaped by a genetic program or by our experience? Those things are intrinsically linked, that the genetics creates the infrastructure, creates the foundation that experience then works on to create these refined complex representations in our brains.

So I really encourage you to check out those two new studies, We'll have links in the show notes. And also let us know how this new segment, Behind the Scenes, is working for you. We want to hear what kinds of things you'd like to hear more about. 

So with that, let's transition to our featured conversation for today.

If you spend any time chatting with a modern AI chatbot, you've probably been amazed at just how human it sounds, how much it feels like you're talking to a real person. Personally, I've been using Anthropic's Claude system as sort of a personal diet and exercise coach, trying to keep up with my very modest exercise goals to try to stay fit and healthy. And I have to keep reminding myself, I am not talking to a person, I am talking to a statistical algorithm.

Most people probably are familiar at this point with at least the basics of how these AI models work. They're really just predicting the next likely word. They are trained on vast data sets, many orders of magnitude more than any human could read in a lifetime. And somehow, by virtue of this, they are able to produce text that seems very plausibly like it was written by a human being. They're not human, of course, but somehow we experience them in that way.

But today on the show, I want to ask us to flip our perspective on this. What if instead of thinking about how these algorithms are not like the human brain, what if we talked about how similar they are? What if we could use these large language models to help us understand how our own brains process language? That's what we're going to be talking about on the show today, and there's no one better positioned to take us through this than returning guest Laura Gwilliams.

Laura Gwilliams:

I am Laura Gwilliams. I am an assistant professor at Stanford University in the psychology department. I am a faculty scholar at the Wu Tsai Neurosciences Institute and a faculty scholar at Stanford Data Science. My research is focused on trying to understand how the human brain processes speech and language, so how do we understand and how do we talk? And I also use artificial speech and language systems to understand better how the human brain does that.

Nicholas Weiler:

Fantastic. Well, Laura, I'm really excited to have you back on the show, in particular to talk about some of the comparisons we can make and the uses we can make of the large language models and other AI systems that are coming online today. Last time on the show, we talked a lot about how our brains turn the sounds coming out of one another's mouths into a shared sense of meaning. I can say there's an old blue VW bug near my house that the owner has completely covered with seashells, and your brain turns the sounds I just made into an image in your mind that is something like the image that I have in my mind. So we talked a lot about this sort of remarkable capability.

And from our last conversation, it sounded like most of what we know so far about how this works is at the more mechanistic level, how those sounds coming out of my mouth get parsed into words and processed into some sort of structure. But we ended the conversation talking a little bit about how we generate that shared sense of meaning. And I'd love to just start there. How does understanding how we process language in our brains help us get to a better understanding of how we store, how we hold meaning in our minds?

Laura Gwilliams:

Yeah. So a lot of cognitive processes and internal world is inaccessible to people in our outside world, and language offers a window into understanding someone else's internal experience. We've also seen this in actually artificial systems too. Like if you have played around with DeepSeek, one of the new language models that have come out, it actually presents to you its internal reasoning through intermediate language outputs before it gets to the ultimate output.

Nicholas Weiler:

So it's sort of explaining to you its "thought process" for how it gets the response.

Laura Gwilliams:

Right, right. Exactly, yeah. So you can ask it, I don't know, please tell me how many R's there are in the strawberry. You'll see it maybe takes, I don't know, 20 seconds to give you the answer. And you'll see, during that 20 seconds, it's thought processes through intermediate language outputs. So it'll be like, okay, how would I count this? Okay, S? No, that's not an R. T? No, that's not an R, et cetera.

You can kind of hold that analogy for human language processing is that language enables and is deeply tied to many other cognitive abilities. Reasoning is one of them. And studying reasoning directly might be difficult to do, but we can actually use language to investigate many other cognitive abilities, as well as being able to understand what's internally going on in somebody's mind.

Nicholas Weiler:

And is this sort of the idea, and I know we could probably devote a whole series of podcasts to this question because it's been debated in linguistics and philosophy for ages, but does this get to the idea that our ability to think, our mental lives depends on having language? That without language, we really don't have the ability to reason?

Laura Gwilliams:

Yeah, this has been a philosophical debate for a long time. I think that for a long time, it was thought that there is a very tight and inherent link between thought and language. But my read of the literature is that this link has kind of been separated somewhat. For the public, we reason, okay, language is inherent to thought because I think through language. And we have this kind of internal monologue that helps us ruminate, self-reflect, as well as kind of think through problems.

But I think that even there, there's actually a lot of variability in how many people actually have this internal monologue. And for people who do have an internal monologue, actually there's been some work by someone called Russell Hurlburt, that when you scientifically probe, this internal monologue actually seems to be less continuous than people perceive it to be. So I think that language certainly does allow us to think through problems and it definitely has a relationship to thought, but I don't see language and thought as this one-to-one relationship.

Nicholas Weiler:

So that gets back to the question, in your lab you're studying how our brains process speech, how your brain would be processing the sounds coming out of my mouth right now and turning them into this shared sense of meaning, like in the example that I gave a few minutes ago. So how can studying language provide sort of a window into that higher level of representation of how do you get that picture of a blue VW bug covered in seashells and what that means to you? Like what associations that might have to you, how do you get there from studying language?

Laura Gwilliams:

Yeah. I think this aspect of language processing has really been the hardest part to study, even though you can kind of consider it the holy grail of what is so interesting, exciting about language processing. As you alluded to, we understand much more about how the actual sounds of speech are processed than we do about how the meanings of speech are processed.

Actually, this is where I and others are leveraging large language models to be able to understand these much more abstract representations of language, like meaning at the phrase and sentence level or the structure that holds those sentences together. Because it's very hard to put your sentence of the blue VW bug, how do you numerically represent that?

What has been done in the past and kind of like a traditional approach of studying meaning in language is okay, I'm going to ask a bunch of people to rate each of the words in that sentence across different dimensions. So maybe I'll ask them, "Okay, how much do you think color is important to the meaning of this word? How emotionally valent do you think this word is?" Et cetera, et cetera, but along a bunch of different dimensions. And then that gives you a numerical representation of that sentence that then you can try to investigate using those numbers, how the human brain processes that sentence.

Nicholas Weiler:

So the words are all kind of ranked on these different dimensions, like is this have a color meaning? Is this a mode of transportation or something like that?

Laura Gwilliams:

Right, exactly. So for every word, you can have, let's say 30 dimensions associated with it where each of those dimensions would be one question that I ask people to rate that word on. So then every word has a number scale, let's say for valence and transportation relevance, et cetera, then that would be your representation in numbers of what that word means.

Nicholas Weiler:

And the question is sort of does the brain do that?

Laura Gwilliams:

Right. And are those numbers good is a challenge. So then step in large language models that this is what they do as their day job. You feed them words and sentences and they transform those words and sentences into numbers. It's those numbers that we can use to try to model brain activity, where those numbers naturally represent the more complex abstract properties of language, precisely getting at the concepts and meaning of what it is that the person is trying to convey, which critically fills that gap that we've had so far in language neuroscience of really understanding the more symbolic abstract component of understanding.

Nicholas Weiler:

That's so interesting. So I hadn't thought about this before, which is that just the fact that there are these statistical machines that are capable of breaking down our sentences, the prompts that we're giving them and their training data presumably into some sort of numerical representation gives you at least a hypothesis or a set of numbers to work with to test how similar or different is this from what the brain is doing when it's listening to words and trying to turn it into meaning that it can respond to.

So I've been dying to talk to you about this actually since you spoke at the Wu Tsai Neuro Symposium last fall about AI systems and the brain. You presented this really interesting idea that we could use large language models like ChatGPT almost as a digital model organism for human language. Could you break down a little bit what you mean by that? Why would you need this for a digital model organism?

Laura Gwilliams:

Yeah, thank you. So other aspects of neuroscience, let's say vision, we can study how the visual system works in animals. That gives us a lot of flexibility and precision in the types of data that we can collect from a animal's brain that is not feasible to do in a human participant. Because of that, we know a lot about how the visual system works in animals and in humans by virtue of being able to study that processing system in species other than the human.

But then obviously, if you take the case of language, language is something that is special and unique to humans, and so the ability to use animals as a model organism is limited. There's some aspects of language processing that you could turn to an animal model for, but it's always a kind of subcomponent of speech processing.

Nicholas Weiler:

People often use songbirds for example, because there are some aspects of how they learn their songs that are a little bit like how humans learn our songs.

Laura Gwilliams:

Right.

Nicholas Weiler:

But I don't think people are really proposing that birdsong functions like language.

Laura Gwilliams:

Totally. And to your prior question of what do we know about how the brain processes abstract meaning, there, certainly we can't use an animal model to investigate the abstract meaning processing. So in comes large language models, which are the only system other than the human brain that is able to do language. Now, for the first time in history, there is a system other than the human brain that is able to comprehend language inputs and generate language inputs in return that make sense. And they make so much sense that people across the globe are using such systems on a daily basis. So we can take these models just like we would with an animal model, let's say like a mouse model of the visual system, to be able to understand how that system is actually achieving language.

Nicholas Weiler:

We can now do experiments on it and we don't have to worry about the ethics of doing human experimentation. It's a digital model and we can start to understand how this language processing works and then compare it to how our own brains work.

Laura Gwilliams:

Yeah, and I know that a lot of people refer to speech and language as a black box, which I understand from the perspective that yes, we don't know exactly how these systems work. On the other hand, they're kind of the opposite of a black box because we have access to every single neuron in that system. We can turn the neuron down, we can amplify it, we can turn it off entirely. We can turn off ensembles of neurons and kind of simulate a lesion to the brain. We can measure the consequences on that model's behavior. And we can do that as many times as we want to be able to get a really comprehensive link between the model's neural implementation and the behavioral output on a scale that, again, would just never be possible even with a animal model organism. This is a really exciting species, let's say, to have access to because you can just perform experiments on a scale that has never been possible before.

Nicholas Weiler:

That's amazing. And I want to hear a little bit about what kinds of experiments would you like to do or are you already doing? First, I think let's just take a quick step back. You mentioned a little bit about the neural implementation or turning neurons up and down. If I understand correctly, that refers to the fact that these large language models are built on what's called a neural network approach, which is loosely inspired by how the brain works. So when you talk about turning neurons up and down, these are sort of these nodes in the code of the large language models that sort of metaphorically act a bit like neurons.

Laura Gwilliams:

Right. Yes, exactly. Many of these successful models have an architecture that's called a transformer architecture, which is made up of different layers. If we're going to take an analogy to the human brain, maybe we say those layers are different brain regions. And then within a given layer, there are computational units. Again, by analogy, we could refer to those units as neurons.

Nicholas Weiler:

So they all have their inputs and they can adjust the weights of how much they care about a particular thing. We were talking earlier about the colorness of the word blue or something like that.

Laura Gwilliams:

Exactly. You can have a large language model read that sentence that you said, and then look at the corresponding activation in these neurons in each of these layers as a way of trying to understand what those neurons seem to care about in the text input.

Nicholas Weiler:

And I think one of the things people refer to when they say it's a black box is that usually it's not going to be like the neuron in the model actually cares about blue or something as-

Laura Gwilliams:

Right.

Nicholas Weiler:

... clear to us, to our understanding of meaning as we might like. It cares about some statistical property that is a little bit abstract or combines a number of different things. But I love this idea that you can then go in and say, "Well, let's understand how it works. Let's understand how this other system that now exists for the first time in history, that can understand in a sense or at least process and respond to language, how does that work? And what can that teach us about how our own brains work?" So what are some of the experiments you'd love to do with that model system or that you're already doing in your lab?

Laura Gwilliams:

Yeah. There's a bunch of ways that we're using these models. So for example, there's someone in my lab that has worked on aphasia in humans.

Nicholas Weiler:

Aphasia is the loss of the ability to speak or understand language?

Laura Gwilliams:

Right. Exactly, yeah. So she works with people who have had a stroke, usually in the left hemisphere, which has resulted in a loss of their ability to process language. So usually not an entire loss, but they're not able to use language in the same way that they were able to use it before they had a stroke. And there are certain types of errors that people make when they have this post-stroke aphasia. So you can kind of organize people's errors into certain buckets, let's say.

And she's collaborating with a student in linguistics where they're trying to investigate if we lesion a speech or language model, and by lesion this means manipulate or turn off certain neurons in that model, is the result a set of error behaviors that fall into the same buckets that a person with post-stroke aphasia, their errors would also fall into those same buckets? One thing that often happens, someone will, in response to a question, maybe the first few words that they say will kind of be in line with the question that you asked, but then the utterance will continue and essentially not really be related to what you asked. And the sentence that they say will be well-formed.

Nicholas Weiler:

It's grammatical.

Laura Gwilliams:

It's grammatical, but actually the words that they say might not actually really make sense together, or they're not really conveying a coherent message, even though on the grammatical level it totally makes sense. And to some degree, the lesioning these text models seems to have a similar effect, that essentially the model will produce a sequence of words that are very likely to follow one after the other, but they do not give rise to a larger meaning.

Nicholas Weiler:

So that sort of suggests, I was thinking about this, so these language models, at their base, and I know they've become more complex over the years as they've been developed, fundamentally that's what they're doing. There are statistical models that are predicting what's the word that's most likely to come next, given all the context that they understand and all their training data? What's the next thing that's likely to follow that word? And I've often thought about this, that when I am speaking, unless I've really rehearsed what I'm going to say and sort of planned it out, most of the time it almost feels like that, that the words are coming out and I'm kind of pleased to find that I mostly agree with what I say.

Laura Gwilliams:

That makes sense. Yeah.

Nicholas Weiler:

And like, "Oh, wow, that is what I think. Good. I'm glad that my words are coming out well today." And sometimes they don't, right? Sometimes you say stuff and you're like, "Wow, I don't think that." And that really feels like some of those earlier language models where they could produce sentences, but they didn't really mean anything. It was more obvious that it was just a statistical algorithm.

So I think that's one of the things that's so miraculous in a way or so impressive about these large language models, that you can get from a statistical algorithm that's predicting the next word to, you can really use these as almost a thought partner or a conversation partner where it really feels like there's meaning and understanding what the model is producing, even though we know it's just an algorithm. And that gets it closer to what it feels like to converse with another human.

So is there something more in our minds that you're getting closer to beyond this ability to produce words in a grammatical and statistically probable way? By tuning down some of these neurons in these models to lose that, what is the thing that you've lost?

Laura Gwilliams:

Yeah. I think this is a really important question actually. In the field of language neuroscience, some people have actually gone as far as to say, because these statistical models, large language models, one, so good at language, and two, do a really good job of modeling brain activity, that means therefore that what the human brain is doing is trying to predict the next word. So because these models are prediction engines and because they predict the brain activity well, therefore the brain is also a prediction engine.

I don't dispute that the brain is doing a lot of prediction. We're always trying to anticipate our entire world, not just what word someone's going to say next, not just the word we are ourselves going to say next, but our whole environment. However, I think that that is not all our language system is set up to do, and that's not the only component of acquiring language. There's something much more fundamental going on beyond just trying to predict the next word.

Why do we have language? Or if we weren't able to use language, what would be lost? And I think that a really huge component of that is social interaction. A lot of the time when we talk to one another, it's not actually to exchange these crucially important information. Sometimes it is, but a lot of the time it's just because that's how you connect with other people, and that isn't something that is kind of baked into these large language models, at least not yet. So I think that that's kind of a crucial missing component that yes, the human brain is trying to predict things, but that's not all it's trying to do. But it's impressive that with a model where all it's trying to do is predict, it's still a good model of language processing.

Nicholas Weiler:

And when you said a moment ago, they predict brain activity, what were you referring to there?

Laura Gwilliams:

Yeah. So maybe I can go back to your question of the ways in which I'm using these large language models.

Nicholas Weiler:

Yeah, please.

Laura Gwilliams:

And one way is to actually use these models as, say, a cognitive model of the human brain, which means that we take a text model like GPT, and we take the activation in these fake neurons that are in each of the layers, and we use the activity in those neurons to directly predict the brain activity of human participants that are processing the same language that we fed into the model. To say that another way, we take GPT and we ask it to read a book, and then we ask human participants to listen to that same book. And then we see how well we can use the activations in GPT-2 reading that book to model the brain activity of the humans processing that same book.

Nicholas Weiler:

Yeah. So again, the GPT has this sort of neural network type architecture, and so you can ask, well, how similar is that actually to what the human brain is doing while it's processing the same stimulus? That's interesting. What are you seeing so far? You seeing that it is fairly similar?

Laura Gwilliams:

Yes. It's actually quite impressive how well you can predict what the brain activity of someone is going to be based on what the brain activity was in this artificial model. We run various benchmark comparisons. So let's say one comparison point is, okay, I have these audiobooks and I do a more kind of traditional approach of taking each word in this book and annotating it for a bunch of different features, like those semantic dimensions that I talked about before of like hand-rating different meanings of words. But we can annotate these audiobooks for a bunch of different things, like is it a noun, verb, adjective? How complicated is the sentence at this point? Under a given linguistic theory would hypothesize certain moments are hard to process.

Then we can do that direct comparison. So okay, if I try to predict brain activity from these kind of hand-curated, theoretically motivated bundle of features, how well can I do? And if I take the activation and these artificial neurons of GPT, how well do I do? And GPT is way better than these hand-annotated features. What it doesn't give you though is the why. So that's kind of the crucial, interesting challenge, let's say, that we're working on right now is, okay, great. These models seem to explain brain activity really well. What is it that they are capturing that the theory of linguistic structure is not capturing?

I'm really excited about answering that question because that is real scientific discovery. If we can understand what the model is converging on that the human brain also converges on that has never been theorized before, that's really exciting.

Nicholas Weiler:

That's incredibly exciting. Well, one thing that you mentioned to me is obviously these large language models like ChatGPT are generally trained on text. But what the human brain's language systems are trained on, how we learn language is by hearing each other speak and by practicing speaking. Babies babble, right? There's this exchange and interchange.

I understand that you're working on how would you train a large language model based on speech, sound? I'd love to hear a little bit about what you're doing there and how you think that would produce a different kind of algorithm or a different kind of model than what we're becoming used to with these large language models.

Laura Gwilliams:

Yeah. Love that question. So yeah, as you rightly say, the most successful language models take text as input. Or more precisely, they take the indices to the entries of which word in the dictionary they're referring to, which is very, very different from how humans learn language and how they use language.

Nicholas Weiler:

Right, I'm not breaking down text that I'm reading into like, ah, this is word number 7,208.

Laura Gwilliams:

Right, exactly. Yeah.

Nicholas Weiler:

Although maybe my brain is, I don't know.

Laura Gwilliams:

Right. It's kind of amazing that there is this strong neural alignment between these text models and the human brain, given that they operate on text and that they've learned in such a different way that the human brain has done. Yeah, one of the things I'm really interested in pursuing is seeing if we can build a language model that learns from audio directly just like the baby does.

And there's already been some really excellent work in this space, for example, by Emmanuel Dupoux. It's challenging because speech is noisy. There's a lot of variance in there that you actually need to learn to ignore. And it's a sensory input that you need to figure out what is the symbolic content that I need to be extracting from this. Whereas if you compare that to text input, it's already been nicely organized for you into appropriate symbolic units, and the meaningless variation has been removed and stripped.

It's a big challenge from the learning perspective to be able to figure out what is the signal and what is the noise? And what are the appropriate tokens that I should be paying attention to? And one of the students in my lab is particularly interested in that question of if we train one of these speech models, what tokens is it going to learn, which are optimal for learning language from speech?

Nicholas Weiler:

Interesting. I wonder if we have to use baby talk or something. Right?

Laura Gwilliams:

Yeah.

Nicholas Weiler:

There's something, I remember reading or talking to some researchers about how we have a very particular way of teaching infants to speak. The kinds of speech that you introduce a one-year-old, an infant or a one-year-old or a two-year-old to, you sort of naturally progress them a little bit through this.

Imagine that this works. Imagine that we figure out the technical challenges to how do you get an algorithm to figure out what matters in a stream of speech. What do you think an algorithm trained that way would be like? How would it be different from the algorithms that we're becoming used to?

Laura Gwilliams:

Yeah. So if you interact with Siri or Alexa, what's happening under the hood even... You talk to it, right? And it talks back to you. But what's happening is I talk, it translates my speech into a text sequence. It operates on the text sequence. It produces a text sequence that then gets synthesized as speech back to me. So even though our experience is kind of a verbal conversation, under the hood it's getting translated into text, operating over text, and then getting translated back into speech.

So one of the big things that you lose through that process is all of the other stuff that gets conveyed when you're talking verbally versus writing something down. So my emotional state, the certain aspects of my sentence that I really want to emphasize to you, all of that gets lost in that translation to text. So you could imagine when these are operating end to end, speech to speech, that there is a sensitivity actually to the paralinguistic.

Nicholas Weiler:

The emotional nuance, the-

Laura Gwilliams:

Yeah.

Nicholas Weiler:

Yeah.

Laura Gwilliams:

Exactly. All of the stuff that would get lost in text, but it's actually present in the speech signal would be something that could be conveyed to the model and the model could convey back to you. I think it's a separate question of whether we want that, but that's information that would be accessible and learnable to the model.

And yeah, I can also plug Diyi Yang here at Stanford is doing some work also in this space. So I think that that would be one of the main ways that we would notice this difference, and hopefully also it would make these systems better. Because at least, I don't know, there's a lot of kind of context that gets lost, and I think that by operating at the speech level, it would actually just lead to better performance in these models overall.

Nicholas Weiler:

Yeah, there's so many questions. I love having these conversations about the connections we're finding between these AI algorithms and the human brain and how they're different and how they're the same. One of the fundamental ones that keeps coming up is these algorithms have no agency. They have no desires, they have no motivation. And one of the things that's so unique about humans is that we are so motivated to communicate and to connect with one another through speech and through other aspects of our communication.

I would love to come back and hear more about this as this research continues. I know that in particular, you're expecting this fantastic new human research tool, the optically Pumped magnetometer, it's a fantastic name for a very exciting piece of equipment that's going to let you and others at Stanford do really high resolution in space and time studying the brain. And hopefully that will help with some of these questions of how are these neural networks in the human brain producing language or listening to language? And how can you compare that to what's going on in the AI system? So we'll have to come back and chat with you again in a few months to hear how all of that is going.

Laura Gwilliams:

Yeah, definitely. That'd be great.

Nicholas Weiler:

Well, thank you so much again, Laura, for coming on the show.

Laura Gwilliams:

Thank you, Nick. Talk to you soon.

Nicholas Weiler:

Thanks again so much to our guest, Laura Gwilliams. She's a faculty scholar with the Wu Tsai Neurosciences Institute and Stanford Data Sciences Institute, and a faculty member in the Department of Psychology. To read more about her work, check out the links in the show notes. Next time on the show, we're going to go beyond large language models and take an even deeper dive into how researchers are using AI to understand the human brain.

Dan Yamins:

The AI models are kind of like the brains, and so the leap from training a network to solve a task and then asking do its internals, the internals of the neural network look like the true brain, to just directly training the neural network on brain data is like the leap going from a theoretical science to an empirical science. And that's very exciting because it tells us a underlying principle for why the brain is as it is, why the neurons are as they are.

Nicholas Weiler:

If you're enjoying the show, I hope you'll subscribe and share it with your friends. It helps us grow and bring more listeners to the frontiers of neuroscience. We'd also love to hear from you. Tell us what you love or what you hate in a comment on your favorite podcast platform, or send us an email at neuronspodcast@Stanford.edu. From our Neurons to Yours is produced by Michael Osborne at 14th Street Studios, with Sound Design by Morgan Honaker. I'm Nicholas Weiler, until next time.