CONFLICTS IN THE MIND
Towards an Understanding of the Genome and the Brain
Saturday, August 15, 2015
Gregg Lab Allele Silencing in Subpopulations of Cells
Friday, March 22, 2013
Great Science : It’s All About the Story
During
a recent meeting I had the pleasure of talking with Iain Patten, a professional
science writer in Europe. Iain travels
throughout Europe teaching scientists how to write high impact papers. We were very like minded about the scientific
process. I wanted to share some of information from our discussion here.
----------------------------------------------------------
Great,
high impact science papers tell an important new story with broad
implications. The story involves effort and planning from day one. Before you start experiments,
read and ask big questions.
· What are the potential stories that could
be told depending on the answers to your questions?
· What would be new and exciting about those
answers and where could they lead experimentally?
·
What are the different ways of telling the
story? Which ones are likely to be
exciting and which are not exciting?
You should have a clear idea of the potential stories when you start your project. Planning,
drafting and making figures with real data and imagined data must be a weekly activity. DO NOT gather data mindlessly and then try to
write a paper with everything in year 5….most people do this to some degree.
·
Make pretend figures of the result(s) that would
fit with your storyline and see what they might look like if they were really
convincing. This way you won’t forget
controls, you will have a clearer sense of the statistics you will use, and you
will think through the potential outcomes.
·
Constantly draft ‘pretend’ figures of results
describing alternative outcomes and think about where they might take you. This way you know where you could be headed
and you will be prepared. This will help
elevate the pressure to make the experimental result fit your hypothesis.
· Constantly reevaluating your story will reduce
wasted effort/experiments, help you maintain focus and set goals, and prevent
you from overlooking holes in your story or failing to complete experiments.
·
Remember that the story does not have to fit the
chronological order of the experiments.
Rearrange the work to tell the story.
· Plan, draft, make figures, see the best
experiments and storylines, do the experiments…repeat
High
impact scientists are journalists that gather evidence and assemble it into a
compelling story that changes how people think.
You must work on your story constantly.
It will light the path in front of you and the paper will be written as
you go.
·
When putting the final paper together, all that
matters is the data that tells the most compelling, clear and well-supported
story. You must be prepared to leave
things out and do more work that further supports the best story. This stuff happens near the end and it is the
toughest to accept when you are close to finishing.
·
Your conclusions must be supported by the data. Overstatement is the kiss of death from
editors.
·
As provided to me by an editor from Nature
Medicine, a great story often has the following components:
o Novel,
surprising, entertaining and broad implications/impact
o Strong
mechanistic insights
o In
vivo relevance
o Functional
manipulations
o Necessity
and sufficiency
o Elegance
·
Be clear about the core message of the
paper. If you can’t state the core
message in 10 words or less then you haven’t found the core message.
o Each
subheading and associated section of the paper builds logically toward the core
message
o Paragraphs
of each section have one message per paragraph and a logical transition from
one paragraph into the next. All working
to support the core message of the section.
o Plan
– draft – plan – draft…and repeat to get an elegant and clear message and structure
for the story.
·
Beware of stories that just connect existing
dots – the impact and advance tends to be small.
Think
carefully about the journal that fits your story best. A preliminary inquiry can save time. You must clearly (and without exaggeration)
explain how your article fits with the journal’s scope and what the core
message is and why it is novel and important.
o
The best reviewers are often experts outside of
your field. They can judge the technical
aspects of the work and assess the impact without the internal biases of the
field.
o
Editors recommend looking at the journal
editorial board for good reviewers to suggest.
These people have a reputation for solid, fair reviews. Exclude competitors.
Finally,
what are you reading for? It is good to
be an expert, but what you are really looking for is the following:
o
New and exciting plot lines for your story.
o
Gems of information that help you support and
tell your story.
o
New and exciting plot lines for future stories
(untold stories in waiting).
o
New and elegant techniques and approaches that
will help you tell a better story.
o
Literature that will confound your story and
contradict your interpretations/results.
o
Evidence that your story is novel and high
impact.
Good
luck!!! Have lots of ideas and chisel out the best stuff.
Sunday, February 17, 2013
The Epigenome Cometh
The factors that contribute to the development of
common diseases have been challenging to define. Epigenetic mechanisms may play a role and the
field is hopeful that epigenome-wide association studies (EWAS) studies will
gain new insights. However, EWAS studies
face challenges that genome-wide association studies studies do not. First, the epigenome has a dizzying array of
components involving different forms of methylated DNA, numerous histone
modifications and various non-coding RNAs.
Second, these components assemble in a highly cell-type specific manner. Finally, some elements of the epigenome change
in response to disease, making it challenging to find epigenetic signatures
with a causal role. Nonetheless, the
first signs that EWAS studies have potential are upon us.
A
recent study by Liu and colleagues undertook an EWAS study of rheumatoid
arthritis (RA) to uncover DNA methylation changes that interact with genetic
factors to mediate disease risk. The
authors note that RA is an ideal test case for EWAS because the cell-types
involved (leukocytes) are well defined and easily isolated. In addition, the disease state can be ascertained
by measures of anti-citrullinated protein (CP) antibodies. The authors performed a genome-wide DNA
methylation analysis of whole blood from 354 rheumatoid arthritis patients and
335 healthy controls for which genome-wide SNP and CP antibody data were also
available. They first correct for
cellular heterogeneity in their blood samples by effectively normalizing the
data using available DNA methylation signatures for major blood cell
types. Second, the authors use a clever series
of conditional correlation analyses involving genotype, methylation and
phenotype data to filter out differentially methylated positions (DMPs) that
are not likely to be causally related to RA.
Remarkably, this revealed significant associations between a set of SNPs
and DMPs located in the MHC gene cluster, which has previously been linked to
rheumatoid arthritis. In a final step,
the authors used a causal inference test to define 9 DMPs that mediate the
genetic risk for RA through interactions with 264 SNPs in the MHC region and
one SNP-DMP pair outside of the MHC region.
This
study not only reveals the importance of understanding the relationships
between genetic and epigenetic factors in common diseases, but also establishes
a clear methodology to overcome many of the issues inherent to EWAS studies.
Liu et al. Epigenome-wide association data implicate DNA
methylation as an intermediary of genetic risk in rheumatoid arthritis. Nature Biotechnology, published online
January 2013
Thursday, January 24, 2013
Retrieve SNPs in Promoters and Flanking Sequence for Many Genes
This post presents R code to retrieve SNPs in promoters for a list of genes. You provide your list of genes to the "gene" variable and then use biomaRt (mus_musculus) to get the transcriptional start sites (TSS) for each transcript for your list of genes. The code then uses the TSS info to search in promoter regions (defined here as [-1000, +200 bp] relative to TSS) for SNPs for each gene using biomaRt - my code looks for SNPs distinguishing C57BL/6J and CastEiJ mouse strains, but this can be easily altered in the getSnps function by changing "sp$cast_eij" to another strain name (use listAttributes(snpmart) in biomaRt).
You can view the promoters found to have SNPs in the "dataSnps" variable. Select the transcript SNP set you are interested in and run the final "getflank" function to get the surrounding sequence information using the mouse genome data provided in the BSgenome library. The code finds 100 bp of flanking sequence around each SNP site and this can adjust with the "offset" variable in "getflank" function. The "final" output provides row names and writes a tab delimited file that opens easily in Excel.
This is a quick way to get SNP data in promoter regions for genes of interest.
You can view the promoters found to have SNPs in the "dataSnps" variable. Select the transcript SNP set you are interested in and run the final "getflank" function to get the surrounding sequence information using the mouse genome data provided in the BSgenome library. The code finds 100 bp of flanking sequence around each SNP site and this can adjust with the "offset" variable in "getflank" function. The "final" output provides row names and writes a tab delimited file that opens easily in Excel.
This is a quick way to get SNP data in promoter regions for genes of interest.
library(biomaRt)
library(BSgenome.Mmusculus.UCSC.mm10)
setwd("")
#######################INPUT GENES OF INTEREST####
genes = c("Igf2","H19", "Igf2R", "Rasgrf", "Magel2")
#######################GET TSS#############################
#get transcriptional start sites for all genes of interest
ensembl = useMart("ensembl")
ensembl = useDataset("mmusculus_gene_ensembl", mart=ensembl)
tss <- getBM(attributes=c('ensembl_gene_id', 'ensembl_transcript_id', 'chromosome_name','transcript_start', 'transcript_end','mgi_symbol'), filters = c("mgi_symbol"), values=genes, mart=ensembl)
########################GET SNPS IN PROMOTER###########################
#get snps in promoter region : promoter region is defined as [-1000,+200] relative to transcriptional start site
snpmart = useMart("snp")
snpmart = useDataset("mmusculus_snp", mart=snpmart)
getSnps <- function(x){
txstart = as.numeric(x[4])
txend = as.numeric(x[5])
id = x[1]
txid = x[2]
name = x[6]
chr = x[3]
if ( txstart > txend ) {
getBM(attributes=c('refsnp_id', 'chr_name', 'chrom_start', 'allele', 'c57bl_6j', 'cast_eij'), filters=c("chr_name", "chrom_start", "chrom_end"), values=list( chr, (txstart - 200), (txstart + 1000)), mart = snpmart)->sp
sp[sp$cast_eij %in% c("A","T","G","C"), ]->dataSnps
} else if ( txstart < txend ) {
getBM(attributes=c('refsnp_id', 'chr_name', 'chrom_start', 'allele', 'c57bl_6j', 'cast_eij'), filters=c("chr_name", "chrom_start", "chrom_end"), values=list( 2, ( txstart - 1000 ), ( txstart +200 )), mart = snpmart)->sp
sp[sp$cast_eij %in% c("A","T","G","C"), ]->dataSnps
}
if ( nrow(dataSnps) > 1 ) {
cbind(id,txid,name,dataSnps, txstart)->results
return(results)
}
}
snps<-apply(tss, 1, getSnps)
dataSnps <- snps[!sapply(snps, is.null)]#All SNP Info with NULLS removed
###########################GET FLANKING SEQUENCE FOR EACH SNP###########################
#Get sequence for snps
dataSnps #view output
dataSnps[[19]] -> d #select SNP set for transcript of interest
getflank <- function(x) {
id = x[1]
txid = x[2]
name = x[3]
position = as.numeric(x[6])
alleles = paste("[",x[7]," > ",x[9],"]", sep="")
chr = paste("chr", x[5], sep="")
txstart = x[10]
offset = 100
leftflank <- getSeq(BSgenome.Mmusculus.UCSC.mm10,chr,position-offset,position-1)
rightflank <- getSeq(BSgenome.Mmusculus.UCSC.mm10,chr,position+1,position+offset)
paste(leftflank,alleles,rightflank,sep="")->seq
cbind(id, txid, name, txstart, position, alleles, chr, seq)->out
return(out)
}
final <- apply(d, 1, getflank)
r<- c("id","txid", "name", "txstart", "snp position", "Ref > Alt", "chr", "Seq")
rownames(final) <- r
write.table(final, file = "SNPSandFlankingSEQinPROMOTERS_Gene.txt", sep="\t")
Monday, December 17, 2012
New Approaches to Gene Co-expression Network Analysis
An important new paper on the methodology for doing gene co-expression network analysis was recently published in PLoS ONE by Kumari et al. (2012). The paper is entitled "Evaluation of Gene Association Methods for Coexpression Network Construction and Biological Knowledge Discovery".
The authors perform a comparative analysis of several different approaches for constructing co-expression networks.
____________________________________________________
Abstract:
Constructing coexpression networks and performing network analysis using large-scale gene expression data sets is an effective way to uncover new biological knowledge; however, the methods used for gene association in constructing these coexpression networks have not been thoroughly evaluated. Since different methods lead to structurally different coexpression networks and provide different information, selecting the optimal gene association method is critical.
Methods and Results: In this study, we compared eight gene association methods – Spearman rank correlation, Weighted Rank Correlation, Kendall, Hoeffding’s D measure, Theil-Sen, Rank Theil-Sen, Distance Covariance, and Pearson – and focused on their true knowledge discovery rates in associating pathway genes and construction coordination networks of regulatory genes. We also examined the behaviors of different methods to microarray data with different properties, and whether the biological processes affect the efficiency of different methods.
Conclusions: We found that the Spearman, Hoeffding and Kendall methods are effective in identifying coexpressed pathway genes, whereas the Theil-sen, Rank Theil-Sen, Spearman, and Weighted Rank methods perform well in identifying coordinated transcription factors that control the same biological processes and traits. Surprisingly, the widely used Pearson method is generally less efficient, and so is the Distance Covariance method that can find gene pairs of multiple relationships. Some analyses we did clearly show Pearson and Distance Covariance methods have distinct behaviors as compared to all other six methods. The efficiencies of different methods vary with the data properties to some degree and are largely contingent upon the biological processes, which necessitates the pre-analysis to identify the best performing method for gene association and coexpression network construction.
The authors perform a comparative analysis of several different approaches for constructing co-expression networks.
____________________________________________________
Abstract:
Constructing coexpression networks and performing network analysis using large-scale gene expression data sets is an effective way to uncover new biological knowledge; however, the methods used for gene association in constructing these coexpression networks have not been thoroughly evaluated. Since different methods lead to structurally different coexpression networks and provide different information, selecting the optimal gene association method is critical.
Methods and Results: In this study, we compared eight gene association methods – Spearman rank correlation, Weighted Rank Correlation, Kendall, Hoeffding’s D measure, Theil-Sen, Rank Theil-Sen, Distance Covariance, and Pearson – and focused on their true knowledge discovery rates in associating pathway genes and construction coordination networks of regulatory genes. We also examined the behaviors of different methods to microarray data with different properties, and whether the biological processes affect the efficiency of different methods.
Conclusions: We found that the Spearman, Hoeffding and Kendall methods are effective in identifying coexpressed pathway genes, whereas the Theil-sen, Rank Theil-Sen, Spearman, and Weighted Rank methods perform well in identifying coordinated transcription factors that control the same biological processes and traits. Surprisingly, the widely used Pearson method is generally less efficient, and so is the Distance Covariance method that can find gene pairs of multiple relationships. Some analyses we did clearly show Pearson and Distance Covariance methods have distinct behaviors as compared to all other six methods. The efficiencies of different methods vary with the data properties to some degree and are largely contingent upon the biological processes, which necessitates the pre-analysis to identify the best performing method for gene association and coexpression network construction.
Sunday, December 9, 2012
Dr. Coni Horndli Receives Prestigious Swiss Fellowship!!
Dr. Coni Horndli has been awarded a prestigious fellowship from the Swiss National Science Foundation. Dr. Horndli is a postdoctoral fellow in the Gregg Lab developing novel approaches to study genetic and epigenetic pathways in the brain that modulate complex feeding and foraging behaviors. She has a particular interest in molecular mechanisms that influence stress and anxiety. Dr. Horndli's work is anticipated to transform our understanding of the mechanisms in the brain that contribute to susceptibility to eating disorders, stress and anxiety-related disorders, and depression.
Wednesday, December 5, 2012
IGF2:IGF2R Evolution
By: Dr. Coni Horndli
An Exon Splice Enhancer Primes
IGF2:IGF2R Binding Site Structure and Function Evolution
Christopher Williams,1* Hans-Jürgen
Hoppe,2* Dellel Rezgui,2 Madeleine Strickland,1 Briony E. Forbes,3 Frank
Grutzner,3 Susana Frago,2 Rosamund Z. Ellis,1 Pakorn Wattana-Amorn,1 Stuart N.
Prince,2 Oliver J. Zaccheo,2 Catherine M. Nolan,4 Andrew J. Mungall,5 E. Yvonne
Jones,6 Matthew P. Crump,1† A. Bassim Hassan2†
ABSTRACT
Placental development and genomic
imprinting coevolved with parental conflict over resource distribution to
mammalian offspring. The imprinted genes IGF2 and IGF2R code for the growth
promoter insulin-like growth factor 2 (IGF2) and its inhibitor, mannose
6-phosphate (M6P)/IGF2 receptor (IGF2R), respectively. M6P/IGF2R of birds and
fish do not recognize IGF2. In monotremes, which lack imprinting, IGF2
specifically bound M6P/IGF2R via a hydrophobic CD loop. We show that the DNA
coding the CD loop in monotremes functions as an exon splice enhancer (ESE) and
that structural evolution of binding site loops (AB, HI, FG) improved therian
IGF2 affinity. We propose that ESE evolution led to the fortuitous acquisition
of IGF2 binding by M6P/IGF2R that drew IGF2R into parental conflict; subsequent
imprinting may then have accelerated affinity maturation.
COMMENT
This report published by Matthew
Crump’s and Bassim Hassan’s groups this week in Science analyses the
evolutionary molecular changes, which led to high affinity binding of IGF2R to
IGF2 in mammals but not birds and reptiles. IGF2 and IGF2R are two of the
roughly 100 canonically imprinted genes found in mammals, with IGF2 expressed
only from the paternal allele and IGF2R only from the maternal allele. In mice,
deletion of the maternal IGF2R gene results in overly large offspring while deletion
of the paternal IGF2 gene results in dwarf offspring. In humans, only IGF2 is
imprinted but not its receptor. Activation of the maternal IGF2 allele causes
Beckwith-Wiedemann syndrome, which is characterized large body size at birth
and an increased risk for childhood cancer. The reciprocal expression of IGF2
and IGF2R underscores the parental conflict over the distribution of resources
to their offspring. This hypothesis is based on the theory that mothers want to
distribute their resources equally to all their current and future offspring,
while fathers favor the maternal investment into the current offspring.
This study correlates the appearance of
IGF2/R with the occurrence of their monoallelic expression. Specifically,
Williams et al. show that binding appeared in all primitive mammals, while
imprinting is only found in theria, such as rodents, kangaroos and opossums.
Therefore, the authors hypothesize that the evolution of IGF2/R imprinting was
facilitated by the appearance of their molecular binding, which may conversely
have accelerated the selection for improved regulation of IGF2 through IGF2R.
This report thoroughly reveals the
structural changes that lead to IGF2:IGF2R complex formation but falls short on
explaining the mechanism of how IGF2/R binding facilitates genomic imprinting
of these two genes.
Subscribe to:
Posts (Atom)