Conditional Epigenetics Literature (Roadmap Epigenetic Project Part1)
- The ENCODE project
- The Roadmap Epigenomics Project
- The Roadmap Epigenomics Project Publication
- Epigenome Road Map in Nature
- Nature Epigenome Issue
TODO Reading List
- Leung, D., et al. (2015). “Integrative analysis of haplotype-resolved epigenomes across human tissues.” Nature 518(7539): 350-354.
- Ziller, M. J., et al. (2015). “Dissecting neural differentiation regulatory networks through epigenetic footprinting.” Nature 518(7539): 355-359.
- Polak, P., et al. (2015). “Cell-of-origin chromatin organization shapes the mutational landscape of cancer.” Nature 518(7539): 360-364.
- Gjoneska, E., et al. (2015). “Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease.” Nature 518(7539): 365-369.
- Ernst, J. and M. Kellis (2015). “Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.” Nat Biotechnol.
- Whitaker, J. W., et al. (2014). “Predicting the human epigenome from DNA motifs.” Nat Methods.
- Zhou, X., et al. (2015). “Epigenomic annotation of genetic variants using the Roadmap Epigenome Browser.” Nat Biotechnol.
- Seumois, G., et al. (2014). “Epigenomic analysis of primary human T cells reveals enhancers associated with TH2 memory cell differentiation and asthma susceptibility.” Nat Immunol 15(8): 777-788.
- De Jager, P. L., et al. (2014). “Alzheimer’s disease: early alterations in brain DNA methylation at ANK1, BIN1, RHBDF2 and other loci.” Nat Neurosci 17(9): 1156-1163.
- Lunnon, K., et al. (2014). “Methylomic profiling implicates cortical deregulation of ANK1 in Alzheimer’s disease.” Nat Neurosci 17(9): 1164-1170.
- Yao, L., et al. (2014). “Functional annotation of colon cancer risk SNPs.” Nat Commun 5: 5114.
- Delahaye, F., et al. (2014). “Sexual dimorphism in epigenomic responses of stem cells to extreme fetal growth.” Nat Commun 5: 5187.
- Wijetunga, N. A., et al. (2014). “The meta-epigenomic structure of purified human stem cell populations is defined at cis-regulatory sequences.” Nat Commun 5: 5195.
- Reynolds, L. M., et al. (2014). “Age-related variations in the methylome associated with gene expression in human monocytes and T cells.” Nat Commun 5: 5366.
- Lowdon, R. F., et al. (2014). “Regulatory network decoded from epigenomes of surface ectoderm-derived cell types.” Nat Commun 5: 5442.
- Gascard, P., et al. (2015). “Epigenetic and transcriptional determinants of the human breast.” Nat Commun 6: 6351.
Not Related to Our Research
- Farh, K. K., et al. (2015). “Genetic and epigenetic fine mapping of causal autoimmune disease variants.” Nature 518(7539): 337-343.
Notes from literature#
(2015) Beyond the genome. Nature, 518, 273.
Non-genetic modifications to the genome : epigenetic modifications, that crucially determine which genes are expressed by which cell type, and when.
describes changes in the regulation of gene expression that can be passed on to a cell’s progeny but are not due to changes to the nucleotide sequence of the gene.
an epigenome — a map of the genome-wide modifications made to DNA and the protein scaffold that supports it
Insights into three fundamental aspects of epigenetics emerge
- how the epigenome affects gene expression;
- how the epigenome changes during stem-cell differentiation (that is, during normal development);
- how it changes during disease.
combinations of modifications predict gene activity in ways that a single type of modification does not
the epigenome of a cancer cell carries a fingerprint of the cell type that originated the cancer
Romanoski, C.E., Glass, C.K., Stunnenberg, H.G., Wilson, L. and Almouzni, G. (2015) Epigenomics: Roadmap for regulation. Nature, 518, 314-316.
Differentiation enhanced
Enhancers are activated through interactions with transcription factors, which recognize and bind to specific DNA sequences within the enhancer region.
embryonic stem (ES) cells, which give rise to almost every cell type of the body
Both Ziller et al. and Tsankov et al. found that regulatory elements controlling genes that are essential for cellular identity are often also epigenetically modified in parental cells (hierarchical model needed)
Diseases mapped
Given that epigenomes are cell-type specific, it makes sense to analyze disease-associated variants identified by GWAS in the context of the epigenome of the disease cell type. consider to combine GWAS data too
previous groundbreaking observations revealed that non-protein-coding genetic variants that are associated with phenotypic changes are often located in tissue-specific regulatory regions.
disease-related changes in gene expression in the hippocampus of the mouse brain correlate with those in post-mortem brain samples taken from people with Alzheimer’s disease, but not with those from people without the disease.
flanking sequences (of enhancers) might have a topological role affecting chromatin packaging and, consequently, DNA accessibility.
the density and distribution of cancer mutations are strongly linked to a cell-type-specific epigenomic signature (whether the LDA modules learned are related to cancer, since they can be used for prediction)
Chromatin charted
A case in point is modification of the amino-acid residue lysine 27 (K27) on histone H3 in chromatin. Addition of an acetyl group (a modification known as H3K27ac) correlates with transcription of the corresponding region of DNA, whereas trimethylation (H3K27me3) is linked to transcriptional repression.
Still in this paper they use “correlate” instead of “cause or caused”
In addition to the linear viewpoint of chromatin alterations presented through histone modifications, long-range chromatin interactions can also modulate gene expression — for instance, by bringing distant enhancers into contact with promoters that regulate the same gene.
haplotype-specific differences in histone modifications and chromatin architecture that correlate with allele-restricted transcription across many tissues. (need to read more about this to understand it)
Future work should try to address the changing relationship between the epigenome and genome over the lifespan of the cell, in different phases of the cell cycle and across cellular generations.
Combining such efforts will be essential for understanding the functional link between the epigenome and the genome.
Roadmap Epigenomics, C., et al. (2015). “Integrative analysis of 111 reference human epigenomes.” Nature 518(7539): 317-330.
- recognize epigenome differences that arise during lineage specification and cellular differentiation
- recognize modules of regulatory regions with coordinated activity across cell types
- identify key regulators of these modules based on motif enrichments and regulator expression.
a core set of five histone modification marks:
- H3 lysine 4 trimethylation (H3K4me3), associated with promoter regions;
- H3 lysine 4 monomethylation (H3K4me1), associated with enhancer regions;
- H3 lysine 36 trimethylation (H3K36me3), associated with transcribed regions;
- H3 lysine 27 trimethylation (H3K27me3), associated with Polycomb repression;
- H3 lysine 9 trimethylation (H3K9me3), associated with heterochromatin regions
a subset of additional epigenomic marks:
- acetylation marks H3K27ac and H3K9ac, associated with increased activation of enhancer and promoter regions27–29;
- DNase hypersensitivity, denoting regions of accessible chromatin commonly associated with regulator binding;
- DNAmethylation, typically associated with repressed regulatory regions or active gene transcripts and profiled using whole-genome bisulfite sequencing (WGBS)19, reduced-representation bisulfite sequencing (RRBS), and mCRF-combined methylation-sensitive restriction enzyme (MRE) and immunoprecipitation based assays;
- RNA expression levels, measured usingRNA-seq and gene expression microarrays
The active states (associated with expressed genes) consist of
- active transcription start site (TSS) proximal promoter states (TssA, TssAFlnk),
- a transcribed state at the 5’ and 3’ end of genes showing both promoter and enhancer signatures (TxFlnk),
- actively transcribed states (Tx,TxWk),
- enhancer states (Enh,EnhG)
- a state associated with zinc finger protein genes (ZNF/Rpts).
The inactive states consist of :
- constitutive heterochromatin (Het),
- bivalent regulatory states (TssBiv, BivFlnk, EnhBiv),
- repressed Polycomb states (ReprPC, ReprPCWk),
- a quiescent state (Quies), which covered on average 68% of each reference epigenome.
Enhancer and promoter states covered approximately 5% of each reference epigenome on average, and showed enrichment for evolutionarily conserved non-exonic regions.
- low DNA methylation and high accessibility in promoter states,
- high DNAmethylation and low accessibility in transcribed states,
- intermediate DNAmethylation and accessibility in enhancer states
Thread 2. Relationship between different epigenomic marks: DNA accessibility and methylation, histone marks, and RNA
Integrative analyses of reference epigenomes reveal complex context-specific relationships between chromatin state, accessibility, DNA methylation and gene expression
- Dstinct chromatin states exhibit different distributions of chromatin accessibility, DNA methylation and gene expression
- Global similarity and differences between epigenomes
- Relationship between allelic enhancer activity and allelic gene expression
- Regulatory region dynamics correlate with gene expression changes in Alzheimer’s disease
- Developmental origins influence epigenomes
- Characterization of age- and gene-expression-related methylation
- Combinatorial patterns of chromatin marks at lncRNA loci
Amin, V., et al. (2015). “Epigenomic footprints across 111 reference epigenomes reveal tissue-specific epigenetic regulation of lincRNAs.” Nat Commun 6: 6370.
A three-stage analysis.
- identify epigenomic features that discriminate established cell and tissue-types. We find that dynamic epigenomic footprints at lincRNA TSSs are at least as tissue specific as the footprints of enhancers and significantly more specific than those of promoters.
- using their epigenomic footprints, we assign a fraction of promoters, a majority of enhancers and even larger fraction of lincRNA TSSs to specific tissue types and discover striking association of those tissue-specific regulatory elements with cell- and tissue-specific developmental processes and mammalian phenotypes.
- examine patterns of epigenetic programming of regulatory elements as cells differentiate, with a focus on Polycomb regulation of lincRNAs.
We observe that H3K4me1 signals are more frequently lineage specific than H3K4me3 and a significant fraction of sites show cluster-specific signals for both marks
A data slice for a set of samples is a set of arrays, one array per sample, each array consisting of average signals of a specific histone mark over a given set of regions of interest (ROIs). (It seems that the definition of ROI, array representation of samples are quite similar to our LDA representation For each cell line, we used regions of interest (+-2kb around TSS), then each cell line + histone is represented as an array, thus each cell line a matrix of all histone)
More attention may be paid to the clustering method and data processing, tree processing.
Mele, M., et al. (2015). “Human genomics. The human transcriptome across tissues and individuals.” Science 348(6235): 660-665.##
Not a serial of Roadmap but talks about variant gene expression across tissues
The Genotype-Tissue Expression Project (GTEx)
Data Available in pilot data freeze: RNA sequencing (RNAseq) from 1641 samples from 175 individuals representing 43 sites: 29 solid organ tissues, 11 brain subregions, whole blood, and two cell lines: Epstein-Barr virus–transformed lymphocytes (LCL) and cultured fibroblasts from skin
At the threshold defined for expression quantitative trait loci (eQTL) analysis [reads per kilobase per million mapped reads (RPKM) > 0.1 -> this cutoff seems be used to defined as the gene expressed in one sample.
Although thousands of genes are differentially expressed between tissues (fig. S8) or show tissuepreferential expression (fig. S9 and table S5), fewer than 200 genes are expressed exclusively in a given tissue
identified 92 PCGs and 43 lncRNAs with global sex-biased expression. Genes overexpressed in males are predominantly located on the Y chromosome. Conversely, many genes on the X chromosome are overexpressed in females, suggesting that more genes might escape X inactivation than previously described
Overall, tissue specificity is likely to be driven by the concerted expression of multiple genes.
In contrast to gene expression, variation of splicing, measured either from relative isoform abundance or exon inclusion, is similar across tissues and across individuals => maybe then we should not even consider AS.
Elliott, G., et al. (2015). “Intermediate DNA methylation is a conserved signature of genome regulation.” Nat Commun 6: 6363.
Hierarchical clustering based on the presence or absence of the IM status or based on MeDIP/ MRE-Seq read density at the union set of IM regions strongly separated cell types isolated from different tissues. (Maybe hierarchical clustering is enough and simple enough for the hierarchical structure of tissue types, also not sure whether hierarchical LDA may find useful level related features for different cell lines. ) My previous hierarchical clustering on 16 cell lines using LDA features does not show promising result possible because we have too few cell lines
Approximately 50% of IM regions overlapped loci identified as differentially methylated across cell and tissue types, consistent with our observation that IM is often tissue-specific. => most patterns found in this paper and other paper are tissue-specific (since now we have so many cell lines from variant tissues, and each tissue have several different cell lines, it is possible to find patterns that are shared only in one or a few tissues (some paper are already related to it))
the normalized read densities for the active marks of H3K4me1 and H3K4me3 were anti-correlated with DNA methylation. Methylated and unmethylated status at these regions distinguished proximal genes with significantly different mean expression values, following the established inverse correlation between DNA methylation at enhancers and gene expression.
How to discover the conservation of (IM) regions: we identified IM states using MeDIP/ MRE-Seq data from murine embryonic stem (ES) cells and fetal neurons, taking the union of regions in both cell types as the reference IM set for mouse.
Dixon, J. R., et al. (2015). “Chromatin architecture reorganization during stem cell differentiation.” Nature 518(7539): 331-336.
HiC experiment, genome-wide chromatin interaction maps in H1 human ES cells and four H1-derived lineages. chromatin reorganization during lineage specification
Not related much to our study
Tsankov, A. M., et al. (2015). “Transcription factor binding dynamics during human ES cell differentiation.” Nature 518(7539): 344-349.
the TF binding dynamics into fourmain classes (static, dynamic, enhanced and suppressed)
Extended H3K27Ac domains have recently been termed super-enhancers
Ernst, J. and M. Kellis (2015). “Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.” Nat Biotechnol.
the correlated nature of epigenetic signals, across both marks and samples. signals of same marks across different cell line are correlated, and signals of different marks in same cell lines are correlated.
4,315 high-resolution signal maps, of which 26% are also experimentally observed => 111 + 16 cell lines * 34 marks (histone, methylation, Dnase, RNA-seq) - 3 (three 3 marks with only with one experimentally signal data, thus cannot be imputed)
18 tissue types (plus 2 others… : other tissue type in Roadmap and cell lines from ENCODE)
common haplotype structure for genotype, GWAS and SNPs? (need read some more)
Only five ‘core’ histone modification marks were experimentally profiled in all 127 reference epigenomes. These are promoter-associated H3K4me3, enhancer-associated H3K4me1, Polycomb repression-associated H3K27me3, transcription-associated H3K36me3 and heterochromatin- associated H3K9me3.
ensemble of regression trees are used here, not specific the detail. But ensemble tree methods (a similar method we have previously used is logitboost, and Gradient Boosted Regression Trees (GBRT) is extremely popular besides Deep learning), so maybe consider
Whitaker, J. W., et al. (2014). “Predicting the human epigenome from DNA motifs.” Nat Methods.
predicts histone modification and DNA methylation patterns from DNDNA motifs.
numerous motifs that have location preference, such as at the center of H3K27ac or at the edges of H3K4me3 and H3K9me3 => for the shape pattern, may check whether different motifs fall on different shapes (or even cause the shape from standard normal to skewed)
observations:
- (G+C)-rich sequences are strongly correlated with trimethylation of histone 3 lysine 27 (H3K27me3; ref. 2) and H3K4me3
- (G+C)-rich motifs establish H3K27me3 by recruiting the Polycomb repressive complex 2 (PRC2) through interaction with long noncoding RNAs
- CpG-binding protein, CFP1, recruits the H3K4 methyltransferase SETD1 to (G+C)-rich motifs;
- H3K4me3 methyltransferase, PRDM9, has a sequence-specific binding motif that directs it to meiotic recombination hotspots.
DNA 6-mers were used to predict the presence of H3K4me3 with reasonable accuracy but failed to find sequence features associated with other histone modifications (Ha, M., Hong, S. & Li, W.H. Predicting the probability of H3K4me3 occupation at a base pair from the genome sequence context. Bioinformatics 29, 1199–1205 (2013).) (did not focus on DNA motifs)
two types of peaks: tight for H3K27ac, H3K4me1and H3K4me3; broad for H3K27me3, H3K36me3 and H3K9me3. (For some of the data, we’d better use narrow peak instead of broad peak.)
identify mark– or cell type–specific and independent DNA motifs (also for histone, there are position specific (promoter ,enhancer and so on))
Zhou, X., et al. (2015). “Epigenomic annotation of genetic variants using the Roadmap Epigenome Browser.” Nat Biotechnol.
346 ‘complete epigenomes’, defined as tissues and cell types for which we have collected a complete set of DNA methylation, histone modification, open chromatin and other genomic data sets.
Annotation of two SNPs