Tech Refactored

S2E8 - Google’s DeepMind AI Changing Biology and Solving Protein Structures

September 22, 2021 Nebraska Governance and Technology Center Season 2 Episode 8
Tech Refactored
S2E8 - Google’s DeepMind AI Changing Biology and Solving Protein Structures
Show Notes Transcript

On this episode we’re talking about the protein folding problem, the challenge of determining the 3D shapes that proteins form, and how Google’s DeepMind system stands to transform biology. We won’t lie to you listeners. We don’t really understand what most of that means, which is why we’re joined by Dr. Nicole Buan and Dr. Juan Cui, researchers at the University of Nebraska, who can hopefully help us to understand what this all means.

Disclaimer: This transcript is auto-generated and has not been thoroughly reviewed for completeness or accuracy.

[00:00:00] Gus Herwitz: This is Tech Refactored. I'm your host, Gus Herwitz, the Menard Director of the Nebraska Governance and Technology Center at the University of Nebraska. Today's episode is sure to be a mind bender or perhaps a mind folder. We're talking about the protein folding problem, the challenge of determining the 3D shapes that proteins form, and how Google's deep mind system, which arguably solved the protein folding problem last year, stands to transform biology.

And I'm not going to lie, I don't really understand what most of what I just said means, which is why we're joined today by Dr. Nicole Buan and Dr. Juan Cui researchers here at the University of Nebraska who can hopefully help us to understand what all of this means. Dr. Buan uh, completed her PhD in microbiology at the University of Wisconsin Madison, and did her post-doctoral work at UIUC before joining the Department of Biochemistry here at the University of Nebraska Lincoln.

Dr. [00:01:00] Sway has a PhD in bioinformatics from the National University of Singapore and did her post-doctoral work in biochemistry and molecular biology at the University of Georgia, and she is an associate professor here at the Department of Computer Science and Engineering. Thank you both for joining us today.

[00:01:17] Nicole Buan: Thanks for having us. Yeah, thank you.

[00:01:19] Juan Cui: I like to be, be here. 

[00:01:22] Gus Herwitz: So I, I would like to just start our first part of the discussion today, figuring out what the heck this protein folding problem is. And I'm going to start with the simplest, what I think is the simplest question. What is a protein? 

[00:01:38] Nicole Buan: So proteins are one of the major sort of macro molecules that make up all living organisms.

There are chains of amino acids, which are these tiny little chemicals that contain carbon and nitrogen and a couple other things. Um, so cells link these together to create polymer. That then fold up [00:02:00] into little shapes. And, uh, these shapes can do all sorts of things that cells need to do. They can carry out chemical reactions.

We call these types of proteins, enzymes, uh, other proteins are, um, you know, they connect to each other to create the structures that make up cells. Um, a lot of proteins that are in our cells are too small to. Um, but some of them we can actually see with the naked eye, such as, you know, you might have seen spider silk.

So that's one type of protein that spiders will spin out, uh, and, and produce into a web. Um, and of course, meat that we all like to eat is a compilation of, you know, those are animal cells that have a whole lot of protein in the muscles. Things like that. But there's a lot of different types of proteins that every type of cell, uh, makes in nature.

[00:02:49] Gus Herwitz: So we probably, most listeners think of protein as, uh, that stuff that they need to eat. A bunch of, Some of it comes from meat, some of it might come from plants or impossible burgers, I [00:03:00] guess. And in our minds, I, I. It's just protein. It's all the same stuff. But protein, it sounds like there's, it takes out a vast range of different forms and when it, It's my rough understanding.

When we eat protein, that all gets broken down back into these amino acids that the body can then reassemble into whatever proteins it, it needs. Yep, that's exactly right. Okay. I'm patting myself on the back for, uh, getting that far. Joanne, perhaps you can tell us what's going on with this folding part of the protein folding issue.

[00:03:34] Juan Cui: Uh, yes, actually in a nutshell, uh, it's a, it's a shape. The protein actually determine the function, which probably the most important thing we want to learn in biology. So, well, the shape actually is determined by the primary sequence of amino acid. So, uh, we learn from structured biology that, uh, protein change their shape all the time.

But eventually the, the, the sequence of am it have to twist and fold up into a certain [00:04:00] shape to gain a specific function. So that is a wider reason we want to study the shape of a particular 

[00:04:06] Gus Herwitz: protein. And when we're talking about all these proteins and shapes, how much variety is there? I, I've seen some pictures or why, I assume our computer generated images, looking through some of the papers on this topic.

And these things look like really bizarre abstract art, like really fancy, uh, glass blown, really pretty, uh, sorts of things with ribbons. And they're, they're not simple. 

[00:04:32] Nicole Buan: Not at all. So I, I can't even think about how many types of proteins are cause um, there's everything from the individual amino acids to peptides of just, uh, you know, 5, 10, 30 amino acid chains all the way up to thousands of amino acids in a single poly peptide.

Um, that can then fold up into very large. Um, like I said, there are some that we can see them with the naked [00:05:00] eye and many that we can't. Um, so yeah, I, I, it's, it's astronomical the number of types of proteins that there are. Um, but I, um, within the, uh, for instance, an e coli bacterial cell genome, you might have 5,000 proteins.

Or 5,000 genes or so that the cell makes, and, and you know, a large percentage of those are proteins. Um, but one thing I wanted to um, point out, and we'll get into this a little bit later, is that it's not just the protein component. Proteins can also be modified by other chemicals in the cell. Those affect their shape and then also affect their, their function.

Um, but it's really, it's. I don't think I've ever seen an estimate personally of like, how many types of proteins are in the cell. That's like stars in the universe. 

[00:05:49] Gus Herwitz: Do we know why the shape is so important? You, you mentioned, uh, spider silk as a protein. Are, are those proteins, are the shapes like, uh, uh, tube shapes so that they make [00:06:00] better ropes or something like that?

Or how, what's the mechanism by which the, the shape of a protein dresses, its. Wow, 

[00:06:08] Nicole Buan: that's a really great question. So, um, in the case of something like a spider silk, spider silk is really strong because the individual protein, uh, protein units, those polypeptides actually link together in a particular way.

So it's their shape and the chemical charges and greasy spots on the surface that then allow them to link together and be really, really strong. So that's, that's what. Uh, spiders be able to spin it out in a long string. And that those interactions because of the shape of those proteins help those, uh, individual units link together to create a really strong, you know, spider silk rope.

Um, in the case of enzymes, um, what happens with an enzyme is, Uh, a protein will fold [00:07:00] into a shape and it has a little pocket called an active site. This could be kind of, um, you know, inside the protein and you need to have substrates be able to fit inside of that pocket. Um, so you might have, um, you know, A phosphate group and a sugar, and they have to both bind in there in a particular orientation and then the enzyme catalyzes a reaction that might connect that phosphate to that sugar group.

Um, and then it goes on to do metabolism. So the shape of the outside of the cell, uh, outside of the protein affects how proteins interact with each. To then give fun various functions. They can also create those active sites that allow the, the protein to do chemistry. So if we mess up the, the shape of that active site or the shape of the outside of the protein, Um, you would create, uh, and we do this regularly in biochemistry and molecular biology.

You create [00:08:00] mutations and those amino acids and it changes the shape and it definitely correlates to a change in, uh, the function of the proteins that you see. So you would have a spider silk that is no longer as strong, or you would have an enzyme that is much, much poorer at doing the, the catalysis. Or you could.

You know, one are a few amino acid changes in a huge, you know, hunt, several hundred polypeptide change, and suddenly your protein doesn't fold up into the same shape it used to but is, you know, really like a spaghetti mess. Uh, so we, we, we do this all the time in biochemistry. Um, yeah, 

[00:08:38] Gus Herwitz: and, and I, I guess, uh, still trying to get my, uh, head around some of this.

So a, a, a small change in the genetic sequence that defines the, the protein can dramatically change its, uh, shape are similarly shaped proteins, do they tend to have similar characteristics or will a slight, a very minor change in the shape, [00:09:00] have a dramatic effect in how the protein. 

[00:09:03] Juan Cui: Um, okay. Maybe I can comment on this question here.

So related to your previous question about how to categorize them, different type of shape. So actually in computational biology we have a hierarchical kind of definition of categorized protein structure into, for example, the first level with the three major families. So RFA only secondary structure, alphabets, or be.

And then, but if you go down to the tray, you have over 1600, if I remember correctly, folds. So assume that, uh, each fold represents a certain specific structure and the protein belong to that family will have very similar function. 

[00:09:39] Nicole Buan: Yeah. I wanna kind of add with that. So, um, one of my, my background is in.

um, biochemistry of strange kind of extreme OIC microbes and stuff. So a lot of what we know about protein structure is from proteins that people can purify that fold wealth . So [00:10:00] there's a, there's kind of a huge bias, um, and what we understand about protein structure folds, there's a lot of proteins. We cannot do, um, a structural biology on, and this is where, you know, they won't necessarily make nice crystals for x-ray crystallography.

Um, they are too big to solve by nuclear magnetic resonance. Imaging, uh, spectroscopy, uh, and now we have cryo em or cryogenic electron microscopy, and that's a huge help. Um, but there are a lot of proteins in nature that we actually don't know anything about their folds. Um, so of the known folds, there might be, you know, there's very many different classes.

And so when there is a commonality in the structure, We make a good assumption that, uh, it's probably pretty close. It probably does something pretty similar. Um, on the other hand, when you study weird microbes like I do, um, a single amino acid change means that instead of doing a phospho transfer [00:11:00] reaction, like I kind of mentioned beforehand, maybe it's a sulfate transfer reaction.

So very different element. There might be some similarities, um, size similarities between the molecule. But the consequence of adding a sulfate group to a sugar instead of a phosphate to a sugar has huge meta, um, consequences in metabolism for the organism. So we, we still are at that level of we have to actually do biochemistry.

We could get pretty close with structural biology and tructure predictions, but there's no substitute for making a mutation in an organism. We're doing a biochemical assay. 

[00:11:38] Gus Herwitz: So that, that's a, a question I should have asked before. What, what's the scale that we're talking on? Are, are proteins molecules? So 

[00:11:46] Juan Cui: the question is, it's you, you ask about the, it's a protein, a small molecule.

[00:11:52] Gus Herwitz: Well, how, how large, how uh, small are, uh, proteins? Are they on the scale of molecules? Are they actually themselves, [00:12:00] uh, molecules? 

[00:12:01] Juan Cui: I, I, Okay. I'm a bio fermentation. I only handle protein sequences, so I know the protein considered as a big molecule. So the size, if you're talking about the sequence lens, could be from like within a hundred amino acid to some thousand.

So some of them could be very big. My superficial understanding with protein 

sides. 

[00:12:22] Nicole Buan: Yeah. So, um, again, there's a huge variety of how big proteins can be. I'm just, I'm trying to think of, I mean, there are some little antimicrobial peptides, which are proteins which are, um, you know, maybe 15. Kilodaltons. A Dalton is the atomic weight of a hydrogen atom, so pretty small because amino acids are most of them quite small.

Um, but there are some proteins that, like I said, are thousands of amino acids long. So, um, they can be quite large. And, [00:13:00] uh, as I was also kind of mentioning in nature, proteins often bind with each other and we have things called like mega complexes. So, um, uh, those can be quite large. Most of them, you know, unless they're, they're made into change chains like spider silk.

It's not like you can see them with your naked eye. You have to use, uh, electron microscopy to see them. 

[00:13:25] Gus Herwitz: Brings us to the next question, which I think will then turn us to what's going on with this, uh, protein folding, uh, a computational problem. Nicole, you had mentioned, uh, a couple of techniques, X-ray, crystallography, uh, and mri.

How historically have we. Figured out what proteins look like, what the fold structure is, and why. Why has that been either challenging or. 

[00:13:49] Nicole Buan: Right, So that's a great question. Um, you know, it, it was really a huge, uh, huge Nobel Prize winning achievement to, to solve the first protein crystal [00:14:00] structure. Um, and this is because when we used to think about, um, solving the structure of something, our knowledge beforehand was on things like gemstones and rocks, right?

Where you can see this hard thing and. It's, uh, you know, something like a gemstone, like a diamond. You can shine lasers at it. And what happens is the lasers, um, bounce off of the molecules in that crystal and uh, de frac, and then basically it's mathematical magic to then back calculate using forea transform to then figure out.

The density of atoms were in space within this crystal. Um, and crystals are because of the atoms that make it up or the molecules that make it up pack in a very regular, um, repeated structure. So you can, um, use the, the pattern of light defraction to then find out where atoms were in space and figure out what the structure of that lattice was.

So that's a lot [00:15:00] easier to do with something like a diamond. And what happens is people tried for ages, decades to be able to figure out how to crystallize biological macromolecules, and they could never really do it. Um, and especially protein proteins were super hard. We could do maybe. Things like, um, some vitamins or something like that where you could drive them down and, and crystallize them.

And we do that in our organic chemistry lab. Um, but nothing like the proteins, which we know are so important for a biological function. And it, what happened was they eventually figured out that you had to grow these crystals in an aqueous solution. Uh, so there has to be water in the protein molecules.

They have to be. And then some of them, not all of them are member, but some of them will start to form regular crystal lats. So when you look at them under the microscope, they really do look like beautiful little gemstones, just like a diamond, but they're floating in, you know, a [00:16:00] liquid solution that has water in it.

Um, and so we prepare those do x-ray crystallography, but uh, as I mentioned, um, x-ray crystal, Only works for those proteins that you can purify to a very, uh, high purity. That's not always possible. Uh, you have to be able to get a lot of this very pure protein. Sometimes that's very challenging. You have to be able to grow them, uh, and then be able to have them be stable enough to put it in an x-ray to solve the structure.

Um, the other, the other, uh, because that was impossible for a lot of things. We also have, um, nuclear magnetic resonance n. Um, and nmr, um, uh, uses, uh, the magnetic fields of the atoms to then figure out how they're moving in a solution. Um, but because they're all moving cuz they're in a liquid milieu, um, if the protein is too big, it gets way too confusing and you can't really figure out what the, the structure of the [00:17:00] protein is.

So there's, there's a, a limited. Size that you can use NMR for, but uh, it does work for very small proteins and unlike the crystal structure, NMR will give you an idea of how dynamic that protein is and you can heat it up or add other chemicals and see how that structure changes. Uh, with that changing environment.

Um, so they're complimentary and not the, not the same. And then now there's cryo where people take their pure proteins and, uh, cryo can be done with very large protein and protein complexes. Uh, and so you can take it and spray it onto a grid and flash, freeze it. And the cool thing with cryo em is, It doesn't matter.

Um, you know, or, or it can solve the structures many different confirmations simultaneously because it takes pictures of these crystals and all these different orientations because they're just randomly spewed out on this carbon grid. [00:18:00] Um, and it actually, uh, uses very sophisticated image processing to then find the patterns in, uh, the shapes of those, uh, uh, protein pictures.

And it, it finds how many different types of protein structures there are. Uh, and we can do that with such high resolution now. That it really can rival x-ray crystallography. It's really amazing and it's only possible cause of, um, our. Uh, image processing capabilities, the computers, the strength of the computers behind it, that can solve those 

[00:18:38] Gus Herwitz: structures.

And I think that brings us the strength of the computers behind it to the computational approach that, uh, really brings us to this topic today. It, it's my understanding that this protein folding. Problem. Computational solutions to it has been thought of as kind of one of the grand challenges, uh, of the field for the last 50 years or [00:19:00] so.

Juan, can you tell us a little bit about what this computational approach, why it's so difficult and why it's so important? And then then listeners, I promise we will give you a, a brief break and let your brains refresh. We get ready for some more discussion. 

[00:19:15] Juan Cui: Okay. I actually learned a lot from Nico in biochemistry.

So, um, uh, since the, the challenges, uh, experimental challenges actually created this situation that this, uh, uh, uh, structured research and couldn't keep pace with the new protein discovery, right? So we know that every day you discover, discover some new protein in the database. We basically have over a hundred million.

Different type protein there, but we only have over 200 thousands of structure available. So then to fill the gap, as you said, like 40 years ago, more than 40 years ago, um, biologists started to turn to a computational prediction and then to, to e especially to handle those difficult cases. Um, but uh, then there's many of methodology [00:20:00] emerged since then, and then they actually already improved the field dramatically, however, This early computational method, they kind of rely on this first principle in physics, uh, as a business.

So they need a well-defined entity function, which you can, you, you can explicitly, uh, kind of, uh, describe the force field potential between all atoms in the protein. So, but searching for a very optimal structure. Against a huge solution space that capture all the confirmation changes, right? It's already a very challenging task.

So, um, the complexity goes dramatically higher when the protein are much longer. So that's why the performance of computational, uh, kind of prediction for a long time was not a satisfactory, uh, kinda in many of the cases we kinda get stuck. Yeah. With this very slow. For a long time. 

[00:20:57] Gus Herwitz: So I'm just, uh, doing some quick math in [00:21:00] my head, thinking if we're only talking binary combinations when we're talking potentially hundreds or thousands of combinations, we're talking more combinations than atoms in the universe.

And when we're talking, we're not talking binary. We're talking. Much more complex interaction. So probably more, uh, if we're, if we're multiverse people, more atoms than exist in all the multiverses. So, uh, that, that sounds pretty complicated. Well, listeners, uh, we will be back in a moment to get into the recent discussion of potential computational solutions and the importance of these solutions to the.

[00:21:43] Lysandra Marquez: I'm Lysandra Marquez, associate producer of Tech Refactored. I hope you're enjoying this episode over show. And hey, do you have an idea for Tech Refactored? Is there some thorny tech issue you'd love to hear us break down? Visit our website or tweet at us at [00:22:00] UNL_NGTC to submit your ideas to the show and don't forget, the best way to help is continue making content like this episode is word of mouth, so ask your friends if they have an idea too. Now, back to this episode of Tech Refactored.

[00:22:22] Gus Herwitz: We're now, uh, coming back from our break to talk some more about the, uh, uh, protein folding problem. And, uh, we're gonna focus in specific on the approach to solving this problem that Google has developed with its, uh, deep mind learning artificial intelligence system. Before we get into that discussion, I want to briefly take a moment to remind you, our listeners, that we want to hear from you. Please submit any topic ideas that you have, uh, for future episodes on our website, or feel free to tweet them at us at un lsco n gtc, or directly to me, Gus Herwitz. That's at [00:23:00] Gus Herwitz. We love hearing your ideas. Uh, it, it is deep mind, really taking the right approach to, uh, thinking about or solving this problem.

I think, 

[00:23:10] Nicole Buan: I think from the, What I think is interesting is, well, from the biochemist perspective as a molecular biologist, um, yeah, it, people have been going about it a little bit wrong, , sorry, but it's true because none of the, none of these, so like, uh, the. Um, Deep Mind has been solving this is a very smart, good way to do it.

It's, you know, applying machine learning to it, so it's taking, you know, comparing the structures that we have with each other and learning from those things. So it's finding those most common folds. Um, but biology, as I always teach my students, is all about the, except. Um, [00:24:00] so, so it's only giving you the most likely, but not necessarily, um, reflecting, um, what is actually the case or what's really happening in the cell.

Uh, and if you think actually about what I, what I think if, if artificial intelligence is really gonna solve the protein problem, it would seek to, as much as possible, approximate how cells actually. Do this. And so when cells actually produce protein, uh, they produce it a, a single amino acid at a time.

Mm-hmm. and those nascent brand new baby polypeptides as they're coming off of the ribosome already start to fold even before the full sequence has been synthesized. Uh, and there are also things like, uh, protein chaperones that bind into those Mac poly pep. And traffic them or move them to different subcellular compartments.

Uh, and that [00:25:00] actually has a much larger effect on, or that has a much larger effect on the ultimate protein folding than, um, even that primary sequence. Like if you're considering the sequence, um, de novo all at once. So time. In other words, time. Is important and none of the, none of the algorithms really kind of consider that.

I consider the, the alpha fold method as sort of like cheating that by saying, well, these were all produced in a yeas or e coli or something. Mm-hmm. , so kind of accounts for that. But what I mean by time is really important is that if you synthesize a protein at different rates, it will fold Differe. 

[00:25:44] Gus Herwitz: So let's turn to what Google actually has done at Joanne.

Can you try to help us understand back about a year or so ago, November, I think of 2020, there was this big announcement that Google had [00:26:00] solved. I just put that in air quotes for listeners, The protein folding problem. What uh, did Google do? 

[00:26:08] Juan Cui: People may yeah, hear a lot about this company with demand, right?

So, They, they try to develop all kinds of software smart system to have some humanlike intelligence. So solo a problem as, as. You may hear go right. So have those, uh, uh, computer, they can play go games against, uh, kinda world class, uh, master. Um, so, but then now their focus, it's uh, try to apply AI technology, all kind of Google.

Kind of product, but this is for the first time they try to tackle a scientific kinda, uh, challenges. So then the devices R fold, uh, which is a AI software that utilize a deep learning technology so that you can predict a protein structure directly from, uh, acid sequence. So, uh, technically, technically [00:27:00] this, uh, software included two major components, right?

The first parts represent a neuron network kind of architecture with fun parameter tuning, which can read the protein sequence from one side, and on the other side, they output the all kinds of pair wise. Between am acid in the protein. And then the second model, then they try to construct, uh, an optimal kind of, uh, 3D structure based on those predicted sequences.

I mean this, sorry, distances. So to train the model, actually the team have to utilize existing sequence and the structure data to optimize all the p. Then with this, uh, RFA fold, uh, the company participate, uh, this, uh, the annual kinda, uh, competition in the protein structure prediction called caps Short for, uh, critical assessment of protein structure prediction, uh, kind of, uh, uh, challenges.

So then they won the first, uh, [00:28:00] kind of places in 2018 and 2020, and then among all hundreds, hundreds of different teams. So I think compared to the previous computational method, um, 

[00:28:12] Nicole Buan: uh, air for 

[00:28:13] Juan Cui: fold only needs peak sequence, right, for prediction. And then it doesn't utilize any of, solve the structure as a template, which is a good aspect.

And then one, one very unique, uh, feature eventually contribute to the success is that they utilize the multiple sequence. As a feature representation. So basically allow you to learn some useful information from group of protein, which with similar sequences and associate similar structure instead of just a single pairs of sequence and structure alone.

So I think this helps a lot improve the distance prediction. Um, We know that, um, it's just like a baby, right? A baby brain. You can learn [00:29:00] kind of discriminative features from dog and the cats when you have seen, uh, a, a large numbers of them. So the learning kind of model, uh, doesn't need much of, uh, interference from the human side where our knowledge probably you very limited and the biased.

So I think, uh, that is, uh, the, the nature of this. 

[00:29:21] Gus Herwitz: So it, it sounds like as with most, uh, artificial intelligence, what, uh, the Deep Mind system did with Alpha Fold is, I, I don't mean to diminish what it did by any means, but it, it's, uh, complex statistics. It's finding correlations, looking at existing protein structures that we know the sequences for, and using that to predict other shapes that we'll get.

Protein sequences. So instead it, it isn't telling us this is how the folding mechanism works in so that we better understand the mechanism, but it is telling us with a high level of accuracy, without [00:30:00] knowing the mechanism, we can tell you what the resulting protein is going to look like. I guess I, I, I'm going to ask a compound question.

I'm, I'm told I'm not supposed to do that, but my, uh, compound question is first, is this actually solving the protein folding problem? And second, does it matter? Is it important for us to understand the mechanism or from a, uh, research perspective is a useful thing, understanding what the shapes are so that we can use them for, uh, further.

I think 

[00:30:30] Juan Cui: overall it solved the problem as claim in the CA competition and then the nature paper result. Actually, if you see the performance, the evaluation metrics more than 90 kind of, uh, uh, the score they get there. Uh, it's considered equivalent, right to the experimental kind of structure. You can decide.

And then sometimes in some of the cases, they even think the, the, the, this cri say between the predict. And the experimental kind of ground shoes, maybe it's actually reflected artifact, [00:31:00] experimentally. Um, so, uh, with, but, uh, there's some limitation. Um, First of all, there's still some room for the performance improvement because, uh, in the original work, still have 40% of the structure are not very reliable.

So this is actually quite critical because when you handle some novel back or a virus, right, we're, we really might have some unique new folks and the structure, you're never observing other species. So the knowledge of course, is not learned by the system yet. And then, The, the second, Yes. The second thing is that, um, the, the technology kind of, uh, change the way, revolutionize the way, uh, how we, how we look at the, the protein structure, but then they don't solve the problem entirely.

So you can think about the static protein structure. You can either solve the experimentally or predict the, by the computational tool. They are snapshot, right? They don't have much of information about the [00:32:00] dynamics, uh, how the confirmation change actually over. But, uh, so we believe computationally the future efforts, moving the system up to be able to predict the protein, protein interaction is actually more important in many discipline, including drug design.

I think, uh, Nicole also mentioned that for novel species, right, so it doesn't cover that exceptions. Yeah. In that sense. So here, Of course, the prediction, as I said, is static. It doesn't reflect the mechanism, the dynamic mechanism behind too, 

[00:32:33] Gus Herwitz: and. What will the practical payoffs, um, be? How will this affect research in this field?

And I expect from Google's perspective, they don't care about the protein folding problem. This is a AI research problem. So they've done incredible work there, and it's a great accomplishment for artificial intelligence research. But in, in terms of a molecular biology, how, how is this going to affect that?

[00:32:59] Nicole Buan: [00:33:00] One way that, uh, this improved structure function prediction, um, moves the field forward is by helping with drug design, for instance, Uh, if you can, uh, skip the step of having to crystallize the protein and get a, a really high approximation of the structure. For something like a drug receptor or, uh, if you wanna design an inhibitor to something that's involved with cancer, for instance, uh, this would short circuit, you know, that.

Or, or decrease that discovery time because you wouldn't have to, uh, solve that crystal structure. You could make very good approximations and spend more time designing your or screening chemical libraries that fit better or designing new organic chem chemical molecules to inhibit or something like that.

So that's one, one way I think maybe more in medicine that it might be very helpful. Um, [00:34:00] another kind of interesting way I think would be really cool to see. I bet. You know, upcoming through the pipeline is on things like, uh, nanomaterial synthesis. When people try to try to make particular shapes with proteins.

Um, again, we talk about the solution space for like, you know, the individual chain of amino acids. It's, it's mind bogglingly complicated to try to think how is that gonna fold? Um, but now we could probably use Deep Mind and say, you know, if I wanted to put a hair pin right here, On this protein to then be a handle for some other interaction.

Or if I wanna make something, make a ribbon, uh, to fit a particular shape, uh, this, this would help dramatically with finding sequences that maybe have the type of shape that you want to find, uh, and then designing. Cool shapes and structures, so, so sort of nano-materials, nanorobots at that [00:35:00] sub, you know, subcellular scale.

Um, I think it, it might spur some of that type of discovery for sure. 

[00:35:07] Gus Herwitz: So I, I love it. Uh, the, just the description. Uh, am I right? Uh, a hairpin would just be, uh, like a 90 degree, a hairpin bend, and a ribbon is like a cork. Yeah, 

[00:35:18] Nicole Buan: yeah. Helix would be like a quirk screw, but what if you wanted to make crazier shapes, like, I don't know, big old circles or, or you know, more regular globules or, I don't know.

I, you know, this is where people could get really creative. Oh, so I, I'm, Doing things like that. 

[00:35:34] Gus Herwitz: I don't know. I'm imagining now, I, I'm certain, uh, actually now I'm, I'm literally certain based upon, uh, good old's and completeness theorem that there are more, uh, uh, possible expressions of proteins than exist in nature.

So there, there are artificial ones that we could figure out. This has never existed before. What, if we were to make this, how would, what would the effects be and new, new domains of research? Uh, [00:36:00] Joanne, I'd I'd like to ask you the same question. How will this affect research? Yeah, 

[00:36:04] Juan Cui: I, I agree with Nico, that, uh, having a afford being able to solve the structure much faster and more precisely, of course it will create a revolution, can be in the field, right.

That kind of release us from the, the lab struggles. And then so we can focus on more important question. So, um, as I, yeah, it's as I can imagine in the, in the, in the drug design field, right? So, uh, if you can. If you can, uh, and then get a, get a structure in any type of protein in any other individual, especially those subjects to kind of a, a genetic modification, right?

So then you can focus on really, really more advanced question. For example, complex disease like cancer, Alzheimer, they are caused by protein. Um, and then you can study the structural difference between the disease state and the normal state, right? Then you can study infer actually the functional. And then what are the potential [00:37:00] interact molecule so that help you uncover the entire, um, kind of a mechanism in a very, uh, yeah.

Effective way. So, um, so from that sense, definitely this is a game kinda game changer in the in medicine field. Uh, I agree for novel species, but that is also another kind of a. Uh, good sites. Uh, sorry. So for, for novel species, for example, uh, we are, we have very limited knowledge, right? So, but uh, this, you pro provide us a tool which quite promising and easy to access, right, to, to gain understanding towards, uh, function once we get the sequence done, because right sequencing tech technology is more well established and more mature.

So we can easily get a sequence and then study the structure and then the function follow that line of kinda a research. So, um, After all, having an easier way to determine if there's a genetic modification can have [00:38:00] any influence on the structure or the function is clearly very helpful. 

[00:38:05] Gus Herwitz: How would you recommend, or I guess if you were talking to a, a grad student in these fields, how would the developments here over the last, uh, year or so, Affect the advice that you would give to an aspiring grad student, uh, as an aspiring a PhD in these fields for the direction their research might go?

[00:38:28] Juan Cui: I think this depends on, uh, which discipline the students get involved, right? So, uh, just for, from computational aspect, right? Those, there are so many research group, uh, specialized in the protein structure prediction, right? We sounds very disruptive to them because look like a problem largely solved and then people should leave from here.

But actually it's, it's not right. So we can, there. We are not going to replace scientists by this highly efficient competition [00:39:00] prediction yet because there are so many more important questions, uh, related to protein beyond the structure, you know, the function, you should know the, uh, mechanism and then the dynamics, the interactions, right?

So they, they all need, uh, require a large kind of lab lab. From computational aspect, there's more important, the question, as we said, you should do molecular dynamics, right? So to understanding, uh, over time how the structure change and interact with others. And then to real realize the, the, the true function.

So there's a lot of computational work you should do. So I think this field probably will remain the same. It's just very fundamental structure prediction. Probably we don't need that much effort. But there's a lot of high levels of question Yeah. To be discovered 

[00:39:50] Nicole Buan: in my research lab, we, um, as I mentioned, we work with.

Uh, sort of extrem file microbes that, uh, there's not a lot like them. [00:40:00] Uh, they do very strange biochemistry and molecular biology and the questions we've been asking for a long time, uh, relate a lot more to protein, protein interactions. How do these bigger proteins come together to then create a living?

And so these are still questions that can't be solved by alpha fold, and they may never be because we're talking really big complexes. We're talking about gene regulation. It's always been really nice when we have a crystal structure or a decent prediction to go off of. Um, and, and so those lead to, to great knowledge, uh, moving forward.

But a lot of the things when we're talking about trying to synthes. New biochemicals and, uh, create bioenergy using strange microbes. Um, we're still, like I said, we're still gonna try to. Have to create mutations on the genome and we're gonna look at the function. And so those types of experiments are independent of a protein structure function.

Uh, but it [00:41:00] does give us a lot more confidence, um, if, if a good approximation is available, gives us more confidence that I can convince a graduate student that it's a good project. 

[00:41:10] Gus Herwitz: So we're, we're, uh, starting to run up on our time, but, uh, there, there are two questions that, uh, we need to touch on. The first one, uh, and I'll, uh, just ask it, uh, now, is what, what are the limitations of this solution to the protein fault folding problem and this technology generally?

[00:41:28] Nicole Buan: So alpha fold is fantastic. It's a huge, uh, leap forward in this field and will help a whole lot of people in their research, uh, progress. But there are some key things that it doesn't quite solve very well. Um, one of these things is, uh, post-translational modifications. So cells also. Protein function by modifying them with things like a methyl group or an acetate molecule or a phosphate group.

Uh, and adding [00:42:00] those modifications can turn a, an enzyme on or off or will make it, um, possible for a protein to interact with another protein. And those are really key interactions in the cell that affect behavior of the cell. That's what's involved in stress signaling. Uh, chemo taxes or, or movement of, of a cell, a swimming type of behaviors, things like that.

So Alpha Full does not predict those types of modifications. What might be modified, what wouldn't be modified, what, how the structure would change when it was modified. A lot of proteins that we're interested in, in medicine, uh, and other areas of biology. Proteins can be modified with sugar groups, uh, and, and things like that, like cell surface proteins, cell surface receptors, viral entry proteins, things like that.

Right? So when we don't have an idea of, man, what is, what is that carbohydrate structure doing on, on the [00:43:00] surface of that protein and how, what is the structure of. Um, those are things that are really important to understand in medicine, uh, and alpha fold. Does not predict those. Alpha fold also doesn't predict, uh, whether or not a.

Cofactor or co-enzyme or metal. So an accessory group. So something that's not part of the polypeptide chain but is essential for a protein or enzyme function. Uh, it won't predict it, It might kind of get the general fold, but it won't tell you that there needs to be a zinc there. Uh, so if you were trying to do biochemistry or, or thinking again about how this enzyme affects.

Cell growth in physiology alpha fold won't be able to tell you, uh, but you might be able to get some of those clues from the related structures that fold. Um, so there, there's really quite a, quite a few things, um, related to that. It also doesn't talk about huge, uh, multi enzyme complexes. So things [00:44:00] like the photosynthetic apparatus, which is many, many polypeptides organized together in a very, or, you know, structured way that allows us to get energy from sunlight or allows plants to do that efficiently.

Uh, we wanna understand that to create efficient biofuels. And lots of other different organisms, for instance. So alpha Fold isn't gonna help with that because those are still too big and involves too many components to what are essentially like molecular protein machines that cells have. 

[00:44:30] Gus Herwitz: So we talk, uh, a lot about technology, putting folks out of jobs.

It sounds like this isn't an area where technology will be putting folks, uh, out of jobs, uh, any. Soon. Uh, another thing that we talk about with technology, which is the, the last question that we have to touch on, uh, is, are, are there any ethical concerns, uh, raised by this technology that we, we should be thinking about as we continue to go down this path?

[00:44:56] Juan Cui: think one thing I can think of is that, um, [00:45:00] once the structure prediction become so effortless, so we can easily test, um, all kinds of protein at any moment, then like how human genetic variability used in the personalized medicine. Right? So structure variability in protein will be able to help kinda, uh, create a decision towards treatment and the invention, sorry, intervention which may cause acid concerns.

Safety, privacy, fairness. Um, so in the extreme case, you can imagine a person may not, uh, get a certain treatment because of the predicted structure. Doesn't seem to be targetable by the drug, right? So those are the situation, uh, we need to be aware of. And, and then the other EEO is that, uh, it just sounds very disruptive to the field, but, uh, it'll not change Romantic.

[00:45:51] Nicole Buan: Well, one, one concern, it's, it's always the case, like with any, any knowledge comes responsibility. And if you have people who wanna use [00:46:00] that knowledge for not good purposes, we, we would have to worry about that. So that's always gonna be the case. Um, at the same time though, it's a great leap for that will help, uh, people in drug design, you know, treating illnesses, uh, designing new biofuels that will.

You know, improve our global climate so that, you know, we're not spewing so much CO2 and things like that. Um, so you take the good with the bad and we do have a lot of, um, federal regulations that guide the kind of research that we do. So I don't, I don't think, I'm not an expert in this per se, but just because we have a protein structure prediction, I don't think that, uh, or improved protein structure fitting, improved protein structure.

Prediction. I don't think that the current federal guidelines would have to be revis. Um, so that, that's a good sign. But it's definitely, as ju said, you know, we, we have to think about what information went into the database, [00:47:00] um, and how we would use that, especially what we're talking about human health concerns.

[00:47:04] Gus Herwitz: Thank you, uh, Joan and Nicole, very much th this has been a doozy of a conversation. Thank you listeners for listening. I hope that you learned as much as I did. I, I've really just loved this conversation and learned so much. Uh, we. Speaking with doctors, uh, Sway and Nicole Boan, both here at the University of Nebraska talking about the protein folding problem and, uh, recent computational solutions to it.

And it's been, uh, a great conversation. And I have been your host. Gus Hurwitz. Thank you listeners for joining us on this episode of Tech Refactored. If you want to learn more about what we're doing here at the Nebraska Governance and Technology Center, or submit an idea for a future episode, you can go to our website at ngtc.unl.edu, or you can follow us on Twitter at UNL_ngtc.

If you enjoy the show, don't forget to leave us a rating and review wherever you listen to your podcasts. Our show is produced by Elbeth Magilton and [00:48:00] by Lysandra Marquez. And Colin McCarthy created and recorded our theme music. This podcast is part of the Menard Governance and Technology Programming Series.

Until next time, keep folding those proteins.