A Nod to Non-Traditional Applied Math

What is applied mathematics? The phrase might bring to mind historical applications of analysis to physical problems, or something similar. I think that's often what folks mean when they say "applied mathematics." And yet there's a much broader sense in which mathematics is applied, especially nowadays. I like what mathematician Tom Leinster once had to say about this (emphasis mine):

"I hope mathematicians and other scientists hurry up and realize that there’s a glittering array of applications of mathematics in which non-traditional areas of mathematics are applied to non-traditional problems. It does no one any favours to keep using the term 'applied mathematics' in its current overly narrow sense."

I'm all in favor of rebranding the term "applied mathematics" to encompass this wider notion. I certainly enjoy applying non-traditional areas of mathematics to non-traditional problems — it's such a vibrant place to be! It's especially fun to take ideas that mathematicians already know lots about, then repurpose those ideas for potential applications in other domains. In fact, I plan to spend some time sharing one such example with you here on the blog.

But before sharing the math— which I'll do in the next couple of blog posts — I want to first motivate the story by telling you about an idea from the field of artificial intelligence (AI).

Large Language Models

Currently in AI there's a lot of buzz surrounding large language models (LLMs). You might already know about this, but I'll say a few words to get everyone up to speed.

A language model is just a fancy kind of function. You input some text, and you get text as output. The function itself can get pretty complicated, but we won't worry about details. The point is that you've seen examples before: Google translate or the autocorrect feature on your phone, for instance. You input text — say, something in English or something spelled incorrectly — and then you get some text as output— a translation of the English text or the original text with the corrected spelling.

In the past few years, language models have become increasingly sophisticated and thus quite newsworthy. Last year the NY Times reported about a widely popular language model called GPT-3 that "generates tweets, pens poetry, summarizes emails, answers trivia questions, translates languages and even writes its own computer programs, all with very little prompting. Some of these skills caught even the experts off guard." In fact, GPT-3's text generation capabilities are so impressive that Microsoft recently announced it plans to incorporate the technology into one of its products.

GPT-3 is an example of a kind of sophisticated neural network model referred to as a "large language model." Here, the adjective large means that massive amounts of data are required for this breed of language model to work well. Emphasis on the word massive. GPT-3, for instance, is a model specified by 175 billion parameters and trained on 570 gigabytes of text. That's a lot! In any case, you can find lots of hype surrounding LLMs on the internet. Do a quick web search and you'll also find articles voicing concern, hesitancy, and a number of other questions, as well.

It's all very interesting, but let's not dwell too much on the news. Remember, my goal is simply to tell you about some math. And the math comes when we take a closer look at how these LLMs are trained.

What do I mean?

Circling back to mathematics

LLMs are trained by being fed lots and lots of examples of texts. It's like letting a kid read millions of pages on the internet and seeing how well they pick up the English language. Except, the kid is actually a language model. And they pick up the language pretty well!

Now here's the thing to know: the text used to train an LLM is totally unstructured, which is to say that no grammatical or semantic rules or information are given to the model. In other words, you don't have to tell an LLM that "firetruck" is a noun, or that "red" is an adjective, or that adjectives precede nouns in English. You also don't have to tell an LLM that "red idea" doesn't have much meaning even though it's grammatically correct.

You simply show an LLM lots of text, and it learns that information from what it's seen.

Pretty neat, right?

And how do we know this? Because of all the news above! LLMs are excellent at producing coherent pieces of text — blog articles, tweets, computer programs — at a human-like level. That's what the hype is all about.

And this brings to mind an interesting math question, doesn't it?

Think about it.

An LLM essentially learns probability distributions on text. You give it a prompt — some piece of input text, like the first sentence of an article you'd like it to complete, or the description of a computer program you'd like it to write — and it generates some desired text as output. So under the hood, it's learning a conditional probability distribution on language: there's a certain probability that dog park will come after the expression, I'm walking my dog over to the. What's more, weaving such a grammatically correct sentence into a larger coherent narrative implies some semantic information also must have been learned: that dog parks are located outside, for instance. Impressively, LLMs learn this complex information just by seeing other examples of coherent text.

So — and this is the punchline — somewhere in the tangly web of an LLM's billions of parameters, syntactic and semantic structure is being learned. And it's being learned from unstructured text data; that is, from information about "what goes with what" in a language together with the statistics of those expressions.

That's it.

This then prompts a simple math question: What mathematical framework provides a home for these ideas? In other words,

What's a nice mathematical framework in which to explain the passage from probability distributions on text to syntactic and semantic information in language?

This is a question I've been thinking about with John Terilla (CUNY and Tunnel) and Yiannis Vlassopoulos (Tunnel), and we find that enriched category theory provides a compelling place to explore this question. We recently shared our ideas in a preprint on the arXiv, whose abstract you can see on the right.

That's the math I plan to share with you next time.

It's a nice bit of non-traditional applied mathematics, I think.

Stay tuned!

In the next few blog posts, I'll try to make the ideas as accessible as possible, though I'll have to assume some familiarity with basic ideas in category theory. If you're new to this branch of mathematics, I do hope you'll check out some of the introductory articles here on the blog — this one is a good place to start. I've also written an elementary introduction to enriched category theory, whose ideas we'll make good use of going forward. I'd also recommend becoming acquainted with the Yoneda perspective, which is a lovely way to think about mathematics and will play key a role in the next blog post.

Related Posts

(Co)homology: A Poem

Topology

Applied Category Theory 2020

Other
Leave a comment!