Saturday, February 11, 2017

What did ENCODE researchers say on Reddit?

ENCODE researchers answered a bunch of question on Reddit a few days ago. I asked them to give their opinion on how much junk DNA is in our genome but they declined to answer that question. However, I think we can get some idea about the current thinking in the leading labs by looking at the questions they did choose to answer. I don't think the picture is very encouraging. It's been almost five years since the ENCODE publicity disaster of September 2012. You'd think the researchers might have learned a thing or two about junk DNA since that fiasco.

The question and answer session on Reddit was prompted by award of a new grant to ENCODE. They just received 31.5 million dollars to continue their search for functional regions in the human genome. You might have guessed that Dan Graur would have a few words to say about giving ENCODE even more money [Proof that 100% of the Human Genome is Functional & that It Was Created by a Very Intelligent Designer @ENCODE_NIH].

Here's the list of researchers who answered questions on Reddit (Feb. 9, 2017).
  • Nadav Ahituv, UCSF professor in the department of bioengineering and therapeutic sciences. Interested in gene regulation and how its alteration leads to morphological differences between organisms and human disease. Loves science and juggling.
  • Elise Feingold: Lead Program Director, Functional Genomics Program, NHGRI. I’ve been part of the ENCODE Project Management team since its start in 2003. I came up with the project’s name, ENCODE!
  • Dan Gilchrist, Program Director, Computational Genomics and Data Science, NHGRI. I joined the ENCODE Project Management team in 2014. Interests include mechanisms of gene regulation, using informatics to address biological questions, surf fishing.
  • Mike Pazin, Program Director, Functional Genomics Program, NHGRI. I’ve been part of the ENCODE Project Management team since 2011. My background is in chromatin structure and gene regulation. I love science, learning about how things work, and playing music.
  • Yin Shen: Assistant Professor in Neurology and Institute for Human Genetics, UCSF. I am interested in how genetics and epigenetics contribute to human health and diseases, especial for the human brain and complex neurological diseases. If I am not doing science, I like experimenting in the kitchen.
I'll give the ENCODE response followed by my own response or the response on Reddit. I'd appreciate any feedback on which response you think is more accurate and informative.

When asked about repeat sequences (e.g. LTRs, LINES, SINES, etc.) in the genomes, here's how Nadav Ahituv responded.
Great question and one that my lab is actually very interested in and has active research on! With time and a lot of cool research, repeats are being found to have important functions in our genome. Many of them have been what's called "exapted." This is a term used in evolutionary biology to describe a trait that has been co-opted for a use other than the one for which natural selection originally built it. There are several cases where repeats have been found to turn into additional exons of existing genes, or gene regulatory elements that regulate other genes and change genome structure. Of note also, in the new phase of ENCODE, what we call affectionately call ENCODE phase 4, there is actually a computational group, led by Ting Wang from Washington University in St. Louis, who will specifically study the role of repeats in gene regulation. - Nadav
You can see that several readers on Reddit tried to set the record straight. This happened several times during the session and every time ENCODE researchers were challenged they ignored the challenge. Here's how I would have responded.
Great question! The vast majority of repeat sequences appear to be junk DNA by any reasonable definition. They are mostly broken transposons and fragments of transposons. A tiny percentage have become secondarily functional by adopting new roles in gene expression but the overall picture indicates that most of these repeats have no biological function. This is the largest category of junk DNA.
Someone asked about noncoding RNAs, specifically whether variation in human populations could be used to confirm they were functional. Here's how Nadav Ahituv responded.
Yes! Great question. There is beautiful work from Katie Pollard, David Hauslerr, Shyam Prabhakar, Jim Noonan and many others that used variation to find human accelerated sequences. These are sequences that are conserved in all mammals but changed significantly in humans, much more than expected by chance/neutral evolution. Many of them have been found to be functional enhancers and several have also been associated with human-specific diseases. -Nadav
Someone named "zmil" gave a much better response. Here's my answer.
Most noncoding RNAs aren't conserved, even in our closest relative. This is consistent with the idea that they are mostly junk RNA due to spurious transcription. That's the best explanation for the vast majority of these transcripts. However, there's always the remote possibility that a new functional gene could have arisen in the human lineage. When that happens, you can possibly detect it by looking for variation within the human population. A stretch of DNA that has fewer than average mutations may indicate that it's under negative selection and therefore functional. The experiments are difficult because these putative genes are quite small and the functional target size within the gene may be even smaller. Very few clear examples have been found. The evidence (lack of sequence conservation) still favors the conclusion that most noncoding RNA are nonfunctional.
The initial questioner also asked, "What is the evolutionary advance to keep "neutral" sequences?" ENCODE declined to answer that question but "PsiWavefunction" gave good answers.

A high school teacher asked a question about what's in our genome and here's how ENCODE responded.
As a high school biology teacher, what I've been telling my students for several years is that only about 1.5% of the human genome encodes proteins, and the rest is:
  • regulatory elements
  • genes for structural and regulatory RNAs
  • junk like pseudogenes and endogenous retroviruses
  • duplications of various kinds
  • stuff that may have a function but we have no idea what it is
As a high school level summary, was this a reasonably accurate picture of our knowledge of the genome ~10 years ago when I started teaching? What do you think the biggest revisions have been?

Thanks!

-------------
Yin Shen replies,
This is a pretty good summary. The lessons we learned in the past ten years include: 1. There are millions of non-coding regulatory elements, a much bigger number than the protein coding sequences. 2. The regulatory elements are cell type specific and they are the major driving force for cellular identity. 3. A majority of the genetic variations associated with complex diseases are located in these regulatory elements, therefore mutations in these regions can play important roles in individual's susceptibility to diseases.
I replied by posting a link to: What's in Your Genome?. There was no further response from ENCODE.

Referring to the lessons that Yin Sen supposedly learned in the past decade, all three of them are seriously flawed. If he really thinks there are millions of functional regulatory elements in the human genome then if I had reviewed his grant it would not have been funded.

ENCODE was asked, "Based on what you are doing how much of our DNA would you reckon is actually junk and how much of our DNA actually has a function?" Here's how Nadav Ahituv responded. Keep in mind that he is a professor at the University of California at San Francisco. This is (was?) a very prestigious university.
Great question! Only 2% of our genome are genes that code for protein. Around 45% of our genome is actually made of what's called repeats, many of them viruses that were inserted into our genome. Various cool studies show that several of them have adapted new functions that made them 'stay' in our genome — like becoming parts of other genes or adopting a gene regulatory function (instructing genes when, where and at what levels to turn on). As for the remaining 53%, we see that a lot of it has regulatory function and other functions which we still don't know and which are fascinating in my mind to uncover.

The history of this field is also really fascinating – I recommend this article that does a great job describing when researchers first recognized the role of non-coding regulatory regions in the DNA (earlier than you might think!) Is Most of Our DNA Garbage? -Nadav
.
I responded on Reddit .... (Image on the right is from Dan Graur.)
If we define a gene as a DNA sequence that's transcribed then protein-coding genes occupy about 25% of our genome. That's because they are mostly introns. Most intron sequences are junk.

Transposon- and virues-related sequences make up a substantial percentage of our genome (probably >50%). Most of it is bits and pieces of defective transposons that look very much like junk. Some tiny percentage of these sequences have secondarily acquired a new function but the vast majority still has all the characteristics of junk DNA.

Proven regulatory sequences make up a very tiny percentage of the genome (less than 1%). Many researchers speculate that regulatory sequences cover a significant fraction of the genome but there's no solid evidence that this is true. If it were true, those thousands of sequences have to be in the few percent of unknown conserved sequences otherwise you have to postulate that they all evolved (and became fixed) in the human population within the last few million years. That's not very likely.

I'd like to ask each of the ENCODE researchers to give us their informed opinion (best guess) on the amount of junk DNA in out genome.

I think it's 90%. If this is true then what is "dark Matter"?

Do the ENCODE researchers agree that the null hypothesis is "no function" and function has to be proven in the face of abundant evidence that most of our genome is junk?

For those of you who don't want to slog through Carl Zimmer's article, non-coding regulatory sequences have been in the textbooks since the mid-1960s (more than half-a-century!).
There was no reply from ENCODE.

The researchers were asked, "What would be a good book to understand more about our genome? I have some intro biolology and genetics books but they seem kind of outdated." They replied,
Mike Pazin said: The Deeper Genome, John Parrington; Homology, Genes, and Evolutionary Innovation, Gunter P. Wagner.

Elise Feingold said: For a more lay-oriented audience, I would recommend "The Gene: An Intimate History" by Siddhartha Mukherjee
Really!!! I'm not making this up. Those are actually the books they recommend to others.

I responded with links to my criticism of John Parrington's book. The fact that Mike Pazin would recommend this book indicates that he probably agrees with the author. Parrington is a great defender of ENCODE and the idea that our genome is chock full of sophisticated and mysterious regulatory sequences; hence, the "deeper" genome. If that's what Mike Pazin believes then I would have rejected his grant application.

Elise Feingold has been with ENCODE since the beginning. She recommends Mukherjee's book as a good source for information on the human genome! That's ridiculous. The average person would be hard-pressed to find any useful information about genomes in that book.

Here's another exchange that's very interesting. Do you think ENCODE answers the question?
Djebel1 asks,
  • So, do you now agree that a confirmed chemical activity at a site is not equivalent to the site being functional? And that, no, not 80% of our DNA is functional?
  • How did your point of view evolve on that matter since the 2012 controversy? How dissimilar were your own points of view as compared to the official press releases, saying that everything is functional?

Mike Pazin replies for ENCODE,
ENCODE 2 found biochemical signatures at 80% of the genome, adding up all signatures for all cell types. This was an important first pass. However, if one looks at particular biochemical marks (such as DNase) that are markers for particular candidate functions (regulatory DNA), the numbers are quite different (in this case about 10%). An important part of ENCODE 4 will be its specific focus on examining candidate elements to determine whether, when, and where they function in important human cell types. This will be the task of the new ENCODE characterization centers, two of which Yin and Nadav will be directing at UCSF.
It looks to me like they want to milk this controversy for as much money as possible. If that means giving up scientific integrity, then that's a small sacrifice.

Here's another exchange with a grad student who asked a pointed question.
zackroot asks,
Evo-Devo grad student here, it's great to see such an awesome genomics group for AMA!

Nowadays, genomic "dark matter" seems to be a heavy word implying a whole bunch of different things. Does your analysis include anything regarding transcriptomics or are you purely looking at "junk DNA"?

------------------
Navin Ahituv responds for ENCODE,
Our group is mainly looking at gene regulatory elements such as promoters and enhancers that regulate transcription. Several of them are actually transcribed and are being referred to as enhancer RNA (eRNAs)

------------------

GoSox2525 says,
I don't really think we should be throwing the term "dark matter" around in whatever context we like. Stuff like that is what strains the trust between scientist and layman

------------------

mylittlesyn says,
Personally I think it's mostly looking at "junk DNA" but also how "junk DNA" might interact or effect other -omics. It'd be interesting to see their answer.

------------------

No further comment from ENCODE.
My response,
ENCODE is mostly looking at junk DNA and spurious transcription but they refuse to consider this possibility. They use the term "dark matter" to imply there's some mysterious function in all that excess DNA. However, after 12 years and several hundred million dollars they still haven't found it.
Here's one response from ENCODE that's almost correct!
Navaltactics asks,

My question is what classifies a sequence as "biologically relevant", and is a relevant sequence always relevant?

-------------------

Elise Feingold replies,
Non-coding regulatory regions are often functional only in specific biological contexts, e.g., in specific cell types, during certain times in development or after particular environmental exposures. So a big challenge is assaying for function in the appropriate biological setting. If you don't find something has functional activity, it could be that you aren't looking for it in the right biological context or it's possible that those sequences have one function under one set of conditions and another function under a different set. It's also possible that we don't have the right set of tools to probe for the particular function. Or perhaps, it just isn't functional?
My response,
After half-a-century of studying genomes we have a pretty good idea that 90% of our genome is junk. The evidence is very solid and comes from several different sources. Thus, we can confidently conclude that much of what ENCODE identifies as "biochemically functional" is NOT biologically relevant. The only remaining step for ENCODE is to admit this and then tell us what small fraction of the genome is actually biologically functional. The best criterion for identifying true function is sequence conservation. In the absence of sequence conservation the burden of proof is on ENCODE researchers to identify specific nonconserved sequences that have a proven function. That can only be done by examining specific targets. So far, they have failed to identify more than a handful of nonconserved sequences that actually have a proven function.

At some point, the continuing attempt to find function where none exists has to stop being funded.
There were tons of questions that weren't answered. I'm guessing the researchers were too busy to address all of the questions. That probably explains why they didn't respond to some of the more pointed questions and followups. I'm sure they would have liked to answer the questions about how much junk DNA there is on our genome but the day went by far too quickly.


18 comments :

  1. "They just received 31.5 million dollars to continue their search for functional regions in the human genome."

    That's an annual grant as far as I can tell, even more than the 20 million I usually see per year. In 10 years that could be another 300 million. Whoa!

    I think they don't really care how much of the genome is functional. That's not the mission of ENCODE.

    But that said, since 90% of disease implicated SNPs are non-coding regions, it would seem a reasonable extrapolation to adopt 100% functionality as a working hypothesis till proven otherwise.

    If only 15% of the genome is functional and 85% is junk, unless we know in advance exactly which 15% to focus on and 85% to ignore, then it is pointless to defund the work of ENCODE, and the 100% functionality hypothesis is good enough to do their job. There would be an uproar in the medical research community to do otherwise.

    What set of experiments for ENCODE 4 do you want to defund? Hi-C, Chip-Seq, RNA-seq, WGBS, and about 40 or so other experiments?

    The 100% functionality figure is probably the intuitive sentiment of most medical researchers. The reason for this? When I was at ENCODE 2015, medical researcher after researcher said to the effect, "GWAS studies has shown the disease I'm studying is strongly associated with non-coding regions". That sort of data, as far as medical researchers are concerned, takes precedence over whatever evolutionary biologists have to say.

    ReplyDelete
    Replies
    1. Sal Cordova says,

      I think they don't really care how much of the genome is functional. That's not the mission of ENCODE.

      From The ENCODE Project: ENCyclopedia Of DNA Elements

      The National Human Genome Research Institute (NHGRI) launched a public research consortium named ENCODE, the Encyclopedia Of DNA Elements, in September 2003, to carry out a project to identify all functional elements in the human genome sequence.

      Delete
    2. "The National Human Genome Research Institute (NHGRI) launched a public research consortium named ENCODE, the Encyclopedia Of DNA Elements, in September 2003, to carry out a project to identify all functional elements in the human genome sequence"

      That statements says nothing of what percentage of the genome ENCODE is supposed to find as functional. They just have to identify all the functional elements, whatever percentage of the genome they are. Furthermore, if "functional" means biochemical or physical activity (aka transcription, serving as spacers, scaffolds, etc.) then ENCODE is doing their job as far I can tell.

      Delete
    3. I wonder if they would have received the $31 million grant if they had said "All available evidence strongly suggests that the vast majority of non-coding DNA is junk, and we believe that is true. But we want to check, just to make sure.

      Delete
    4. liarsfordarwin said,

      I think they don't really care how much of the genome is functional. That's not the mission of ENCODE.

      Three hours later he said,

      That statements says nothing of what percentage of the genome ENCODE is supposed to find as functional. They just have to identify all the functional elements, whatever percentage of the genome they are.


      I'm begining to see why Sal Cordova prefers to be called liarsfordarwin.

      Delete
    5. "But that said, since 90% of disease implicated SNPs are non-coding regions, it would seem a reasonable extrapolation to adopt 100% functionality as a working hypothesis till proven otherwise."

      No it wouldn't, you couldn't falsify that hypothesis. You could always just reason ad-hoc that there is some hitherto undiscovered circumstance where your piece of DNA has some obscure functional role you just haven't gotten around to test or discover yet.

      Whereas with a junk null, in principle a single observation would falsify the junk-null. For example, a knockout experiment.

      Besides, mutations in junk can easily cause otherwise nonfunctional reasons to "look like" a functional region and produce interfering gene products.

      As usual you've got all this shit backwards. Which is, I guess, to be expected from a person working backwards from his emotionally desired conclusion that life was instantaneously wished into existence through a magic spell, in perfect form, about 6000 years ago.

      Delete
    6. "No it wouldn't, you couldn't falsify that hypothesis."

      True, but it would motivate attempts at falsifying the junkDNA hypothesis, and until we cure every genomically associated disease on the planet (like cancer), we have plenty of reason to keep looking. Besides, that's the sentiment in the medical community that's well-entrenched long before ENCODE made their infamous 80-100% functionality statement. Goes to show they don't care that much about the C-value paradox, and won't let evolutionary biologists tell them what they can and can't discover about the genome.

      "in principle a single observation would falsify the junk-null. For example, a knockout experiment."

      But you can't do this with simple 1-gene at a time knockout experiments, but multi-knockout experiments since Eukaryotic genomes are "highly buffered" (to use the phrase of one of Larry's Colleagues at University of Toronto, Brenda Andrews, who was part of the ENCODE planning session in 2015.)

      She alludes to why the ENCODE approach and data aggregation is probably going to be the required approach to understanding genome function.

      https://www.youtube.com/watch?v=aeDHuY5lUek

      https://www.genome.gov/Multimedia/Slides/ENCODE015/25_Andrews.pdf

      Delete
    7. We are mostly interested in parts of the genome that are transcribed at some low level (less than one RNA per cell), and in sites that bind transcription factors.

      We can focus on those sites that are not conserved—this is probably 99% of the sites.

      The question before us is, do these sites have a biological function? One way to answer the question is to knock out (delete) the site and ask if there's any observable effect. Brenda Andrew's work with yeast cells tell us that such deletions often have no detectable effect even when known genes are removed.

      This result is fantastic from the perspective of ENCODE researchers. It means that the hypothesis of function can't be easily refuted by a simple knockout experiment.

      It should keep the money flowing for decades.

      (They've already convinced themselves that lack of sequence conservation as isn't going to count as falsification.)

      The real problem here is the assumption of function that seems to be the unanimous assumption of all ENCODE labs. It places the onus on opponents to prove that the site in question has no function. But, as we all know, proving the negative is an almost impossible goal.

      That's the wrong way to do science. The null hypothesis should be "no function." The onus should be on those who propose a largely functional genome to prove their case by finding function. The default explanation is that the genome is mostly junk. (There is plenty of evidence to support this claim.) So far ENCODE has failed to show that most of the genome has a biological function so the default explanation has not been falsified.

      Delete
    8. Well, thanks for looking at Brenda Andrews work at your University.

      The may be little or no detectable effect with single-gene knockout, but there is often detectable effect when there is double or triple knockout. What this means was that when Dr. Andrews started from a gene dictionary of 6000 yeast genes, the space of double-knockouts was 6000 squared which is a whopping 36,000,000 combinations that could only be surveyed high throughput assays. Triple knockouts would be on the order of 6000 cubed or 2.1 x 10^11 combinations. But what can't be denied is that she was able to elucidate cooperative function of genes that couldn't be deduced via single knockout. As she said, Eukaryotic genomes are heavily buffered.

      Now, one can only imagine what would be needed to do something similar with the coding and non-coding elements. Which would support your claim:

      "It should keep the money flowing for decades."

      How about centuries? :-)

      Delete
    9. Sal Cordova says,

      Well, thanks for looking at Brenda Andrews work at your University.

      You're welcome.

      Believe it or not, I actually knew about Brenda's work before I watched the video. :-)

      I remember when she was a graduate student.

      Delete
  2. I would expect that we would have some ENCODE success stories by now. Stories that go: (1) No one suspected this genome segment had a function, (2) but then ENCODE suggested that is does, (3) so someone checked it out and, (4) OMG, there is a clear cut and essential function there, the mechanism of which will be described in all of the next generation of textbooks.

    Are there any ENCODE success stories? I don't recall reading of any.

    ReplyDelete
    Replies
    1. " Stories that go: (1) No one suspected this genome segment had a function, (2) but then ENCODE suggested that is does, (3) so someone checked it out and, (4) OMG, there is a clear cut and essential function there, the mechanism of which will be described in all of the next generation of textbooks."


      One of the occasional ENCODE experimenters, John Rinn at the Broad Institute Harvard, in 2007 studied the differential RNA expression patterns that Dan Graur is so keen to dismiss as noise. One RNA in particular was expressed in skin cells below the human wasteline but not above it. Rinn discovered the first trans acting (inter-chromosome) RNA that regulated DNAs on other chromosomes through the polycomb repression complex. Rinn's discovery was one of the most spectacular ever published in the journal Cell. That seems to fit the bill.


      "Are there any ENCODE success stories?"

      593 research projects and counting that used ENCODE data in the understanding and treatment of disease, which is more than I can say for what the 170 million dollars wasted on the untestable phylogenies of the tree-of-life project did for the advancement of medical science. See here:

      https://www.encodeproject.org/search/?type=Publication&published_by=community&categories=human+disease

      That's why there will be now the PscyhoENCODE project. :-)

      http://www.nature.com/neuro/journal/v18/n12/full/nn.4156.html

      But if you want to see why there will be so much demand for ENCODE data and functional genomics, particularly ENCODE 4, take a look at this from the ENCODE planning session in 2015. This is a presentation by Brenda Andrews, University of Toronto:

      https://www.youtube.com/watch?v=aeDHuY5lUek

      https://www.genome.gov/Multimedia/Slides/ENCODE015/25_Andrews.pdf


      Delete
    2. 593 research projects and counting that used ENCODE data in the understanding and treatment of disease, which is more than I can say for what the 170 million dollars wasted on the untestable phylogenies of the tree-of-life project did for the advancement of medical science.

      So could you kindly remind me of the major contributions to medical science made by Intelligent Design Creationist research? Because for some reason it slipped my mind and I can't recall a single one.

      Delete
    3. liarsfordarwin wrote: "...John Rinn ... polycomb repression ... That seems to fit the bill"

      Well, thanks for the pointer leading me off to some interesting googling and pdf-ing. I'm far from an expert on this stuff. But no, I don't think that does 'fit the bill'. Most of what I read about were descriptions of effects rather than of functions. And the most clearly functional pieces of the story seemed to be the fruits of research that started well before ENCODE and that were in no way inspired by ENCODE.

      Still, my reading led me to this article:

      https://dspace.mit.edu/openaccess-disseminate/1721.1/58204

      ... which is interesting for several reasons.
      (1) Rinn is one of the authors.
      (2) It seems to endorse Larry's argument that 95% of the transcripts that ENCODE called functional really ARE junk.
      (3) Within the remaining 5% they found about a thousand examples of transcribed non-coding RNAs which probably do have a function, since they are conserved in evolution.

      The project and methods in that paper strike me a much more fruitful approach to research than ENCODE.

      Delete
  3. The spin doctors, hard at work! Like this above response which I have edited to point out the spin.
    "Great question! Only 2% of our genome are genes that code for protein. Around 45% of our genome is actually made of what's called repeats, many of them viruses that were inserted into our genome. Various cool studies show that BEGIN SPIN several of them END SPIN have adapted new functions that made them 'stay' in our genome — like becoming parts of other genes or adopting a gene regulatory function (instructing genes when, where and at what levels to turn on). As for the remaining 53%, we see that BEGIN SPIN a lot of it END SPIN has regulatory function and other functions which we still don't know and which are fascinating in my mind to uncover."

    ReplyDelete
  4. It seems from this summary that Encode researchers have not learnt from their previous claim that most of the genome was functional. Of course there are transposable elements that have been co-opted; of course not everything in our genomes are protein-coding genes; of course there is functionally relevant alternative splicing. However, there is no evidence to infer that much of our genome is "functional". On the contrary, the evolutionary patterns we see suggest it is not.

    Very frequently tissue-specificity is used to demonstrate functionality, as for example in the context of alternative splicing. This makes no sense. We know our genome is read differently in different cells, thanks to epigenetics, splicing factors, transcription factors... It is clear that all this "real" tissue specificities can result in tissue-specific noise. Hence, although tissue-specificity can be a starting point, it shouldn't be taken as confirmation of functional relevance.

    ReplyDelete
    Replies
    1. That's my impression too. It looks like they haven't been paying attention to any of the criticisms.

      Delete
  5. THE FOREIGN DNA AMOUNT ARGUMENT

    Strange celebrities turned politicians,
    May sadly lament 'so-called' judges,
    But scientists with truthful missions
    Should not engage in sim'lar fudges.

    The so-called junk DNA that some count,
    As actor main in Haldane's rule,
    A "question of DNA amount."
    Say Naviera-Masides' fine school.

    All that so-called junk,
    Acts with due celerity,
    How can it be just bunk,
    When endows hybrid sterility?

    Set Darwin's day on stage
    To Haldane's rule add further page!

    ReplyDelete