The vast machineries of gene regulation

The life of a cell is a dynamic one, requiring split-second decisions driven by interactions with multiple small molecules in the environment. From sugars to toxins to the molecular products of sun exposure, small molecules assail cells in what are known as signaling events. “There are thousands, if not millions, of different kinds of small molecules,” says B. Franklin Pugh ’83, Molecular Biology and Genetics/Physiology and Biophysics. “Any of them could come into a cell. When they do, the cell responds by putting out products that allow it to metabolize sugars, or protect itself against insults or toxins. But to do that, it needs to reprogram its genome.”

Receptor proteins inside the cell sense the invading small molecules. The proteins then bind to specific points along the cell’s DNA, turning on certain genes that generate the products needed to protect the cell against the molecules. Together, these small molecules and proteins interact to form a vast machinery that regulates gene expression. “That’s universal for all life forms,” Pugh says. “Imagine all those proteins and small molecules coming together in various combinations. How do these events coalesce at specific genes to turn them on?”

That question has been at the heart of Pugh’s research for decades. Before coming to Cornell in 2020, he carried out a groundbreaking, eight-year project at Pennsylvania State University to map the precise binding locations of more than 400 proteins on the genome of budding yeast, Saccharomyces cerevisiae. In early 2021, he and his collaborators published a major paper in the journal Nature about their findings. “We identified the organization of each of those proteins, how they position themselves on the genome with respect to every other protein,” he says. “And in that compilation of every individual protein, we get a sense of the machineries that control gene regulation and genome function in general.”

ChIP-exo: Defining the Coordinates of Protein Binding

A key aspect of the project required the researchers to define the location on the genome where each individual protein binds. “It’s not just one place,” Pugh says. “It could be as few as 10 or 15 genes a particular protein binds to, or it could be as many as a few thousand genes.”

Pugh and his colleagues developed a new technique, called ChIP-exo, for defining the coordinates of protein binding. ChIP-exo builds on ChIP-seq, an earlier technique wherein the target protein is bound to a specific antibody and the dynamic interactions taking place on a cell’s DNA are chemically fixed. Then the researchers fragment the DNA using high-frequency ultrasonic waves and use the antibody to find and pull out the target protein, bringing the DNA sequence it is attached to with it.

“The problem is that with ChIP-seq you end up with a broad distribution of DNA fragment sizes,” Pugh explains. “They can range from a hundred to a thousand base pairs. Because of that, it’s like taking a picture of something with a moving camera; it’s blurred, very low resolution.”

With ChIP-exo, the researchers add an enzyme to the process, which destroys the DNA up to the precise point where the target protein is bound. That creates ultra-high, single base-pair resolution. “So now we can look at proteins bound at that site, plus an assemblage of neighboring proteins all around it, and see how they all interact,” Pugh says.

PEGR: A New System for Managing and Distilling Data

Every target protein generates a data set of millions and millions of data points. “And because we’re looking at thousands of proteins, we’re now up to billions of data points,” Pugh says. “So we have to manage the data, and we had to develop a computational infrastructure to do that.”

The researchers developed a software system, Platform for Epigenetic and Genomic Regulation (PEGR), that manages and distills the overwhelming mass of data points into something humans can understand. “PEGR is also part of the visualization and dissemination process,” Pugh explains. “When you discover something, you need to get the word out. Part of the way we communicate is to make not only the data available but a means to analyze that data, so that anyone can look at billions of data points and ask a question and hopefully get an answer.”

“We used [yeast] to figure out the ground rules, unencumbered by the enormous complexities of multicellular human systems.”

Pugh’s ultimate goal is to uncover the details of human gene regulation and genome function, but he and his fellow researchers chose to work with single-cell yeast as an interim step first. While humans have approximately 20,000 genes, yeast has only 5,000 and has been well-studied for over a hundred years. “The yeast system is much simpler and much is already known about it,” Pugh says. “So we used it to figure out the ground rules, unencumbered by the enormous complexities of multicellular human systems.”

Applying Methodologies to Human Tumors

Now, Pugh and his team are collaborating with other researchers at Cornell in Ithaca and at Weill Cornell Medicine to apply their methodologies to human tissues—both normal and diseased. “We’re looking at tumors,” he says. “The configuration of proteins on a genome in a tumor is going to be different than for normal tissue. If we can identify those tumor protein configurations and connect them with the prognosis of the patient, we may be able to predict the prognosis going forward for other patients.”

Pugh’s research could add a new dimension to what researchers already understand about the genetic factors underlying diseases like cancer. “It’s along the lines of molecular diagnostics,” he says. “Many people are doing a lot of sequencing of people’s DNA and of the DNA of tumors, which might help predict outcomes. But that’s only a small part of the picture. We’re saying there’s a lot more out there to be seen in terms of molecular pictures. There’s the epigenetic component, which is partly what we’re looking at.”

Childhood Chemistry Set Gateway to Career

Pugh is no newcomer to Cornell. As an undergraduate, he earned a BS degree in Biology from the university. Back then he already knew he wanted to work in the biological sciences, although he didn’t know which one. His certainty stemmed from his experiences as a child, when he found a chemistry set that one of his older siblings had rejected in the basement of his family home.

“I started throwing things together, creating all kinds of messes,” he says. “I thought, ‘This is cool.’ Everything seemed cool. I had no idea what I was doing, but I was just creating sludges of all kinds. When you get that excited about it, I guess you’re just meant to be a scientist.”

More News from A&S