How do cells change themselves in response to cues from their environments? This ability of mammalian cells, to remodel and adapt themselves in response to extracellular signals, plays a critical role in practically every physiological process, from learning and homeostatic regulation in the nervous system, to immune system function, to development, and dysregulation of this adaptive process is central to the pathogenesis of numerous diseases. Decades of research on this important topic have led to an essential understanding of how external commands become internal actions for the cell. One key aspect of this process, the primary means by which external signals lead to variation in the complement of genes that are expressed in the cell, is shown schematically in the above figure (left). Briefly, external signals (ligands) bind to and activate membrane-localized receptors, leading to dynamic changes in the activity of a wide number of signaling proteins through intracellular signaling networks. Modulation of signaling protein activities causes modulation of transcription factor activities, leading to changes in the expression levels (mRNA levels) of a wide variety of target genes, and functional consequences. The process is rich with crosstalk, where a single ligand may activate multiple receptors, receptors may interact with overlapping sets of signaling networks, signaling networks may interact with overlapping sets transcription factors, and transcription factors may regulate overlapping sets of genes . The process is also rich with feedback, where the system output, directly influences the system input-output behavior. For example, the signaling process described above often causes changes in the expression levels of the signaling proteins themselves. Finally, the transcription factors themselves are part of a network, the transcription network or gene regulatory network, in which transcription factors regulate the expression of other transcription factors in addition to structural genes, and determine the transcriptional state of the cell. In summary, extracellular signals lead to cellular remodeling or adaptation largely by modulating activities within intracellular signaling networks which interact with the transcription network.
We seek to develop computational models of this process. The means by which mammalian cells respond to extracellular signals is exceptionally complex. Add to that complexity the enormous scale of the process, in terms of number of interactions and diversity of time scales on which events occur, and it is unlikely that the overall system behavior can be predicted by intuition alone. By representing what is known or assumed about biological systems in quantitative computational models, it becomes possible to investigate the impact of specific interactions or assumptions on the overall system behavior, regardless of the system complexity or scale. Thus computational modeling provides a consistent (and demanding) framework for representing what is known or assumed about biological systems, as well as a framework for testing how consistent what is known or assumed is with what has been observed. Finally it provides a framework for generating new hypotheses and new ways to test old hypotheses experimentally. For these reasons, we see computational modeling as a natural partner to experimental work in biology, where the knowledge gained by simultaneous application of both approaches exceeds that from either individually.
Research focusMy research focus is the system-wide response of mammalian cells to receptor activation. Specifically, I am interested in developing computational models describing how extracellular signals excite transcription networks in mammalian cells. As described above, a growing body of research exists that describes some of the consequences of receptor activation. This research predominantly derives from drill-down approaches, where the objective is to illuminate the molecular details of these processes with ever finer granularity. This approach has and continues to yield important clinically relevant insights. Yet even a simple schematic (left) reveals the complexity of the process and suggests that efforts to put things together may be as fruitful as those that drill down to the details. For example, it is increasingly difficult to assign a functional role in physiological terms to proteins embedded deep within signaling networks. What is it that these proteins actually "do" for the cell? Fortunately, the advent of large-scale biology or omics, largely driven by the human and other genome projects, has rendered systems-level biology feasible as it has never been before. Whereas just a decade ago a researcher could feasibly measure the expression of only a handful of genes or proteins in a given experiment, it is now possible to measure the expression of thousands of genes in parallel and over time. These two circumstances, the paucity of integrative understanding and the great opportunities provided by omics technologies, combined with its biological and medical significance, renders the system-wide response of mammalian cells to receptor activation an especially relevant, fruitful, and fascinating field of inquiry.
My current experimental systems of interest are EGFR signaling in rat hepatocytes and neuromodulation in the rat SCN. Although a significant fraction of my work to date has employed simulated transcriptional networks (see below), my belief is that, given the innumerable idiosyncrasies of biological systems, there can be no substitute for the real thing. Most recently my experimental and computational energies have been directed towards understanding how epidermal growth factor receptor (EGFR) modulates the transcription network in rat hepatocytes. EGFR is an attractive system given its important role in many physiological processes (cancer, liver regeneration, neuromodulation), and there is a great deal known about its signaling. Specifically, I am applying my modeling approaches (described below) to dynamic gene expression profiles (cDNA arrays) I have collected for the response of hepatocytes to EGF with and without inhibitors for specific signaling pathways. The objective is to quantify the transcriptional output (in terms of transcription factor activation and their sequellae) of EGFR activation in computational transcription network models and to link these models to existing models of early EGFR signaling events. This work is being done in collaboration with Dr. Jan Hoek and Dr. Boris Kholodenko.
My experimental and computational efforts have also been directed towards neuromodulation in the rat suprachiasmatic nucleus (SCN). The SCN is an attractive model system for many reasons. As the core pacemaker for circadian rhythms, the SCN is unique in neuroscience in that it is straightforward to link behavior at the cellular level to the physiology of the organism. The self-sustaining circadian oscillation within the SCN neurons is the result of a transcriptional regulatory feedback loop for which many of the biochemical details are known, making this system both fascinating from a modeling perspective and comparatively tractable to other systems in neuroscience. Finally, the SCN cycling behavior is modulated by an enormous number of neuropeptides and other messengers, with the modulating effects being dependent on the phase of the rhythm itself, rendering the SCN an ideal system for investigating how receptor activation modulates transcription networks. My work with this system is being done in collaboration with a larger group of researchers at DBI, including Dr. Haiping Hao, Dr. Rajanikanth Vadigepalli, and Gregory Miller.
ApproachMy approach to the system-wide interaction of receptor signaling with transcription networks is to integrate multiple omics data types into structured dynamic models. The arguments given above for developing computational models of biological systems have been appreciated by many researchers for many years. There is an ever-growing number of detailed computational models describing the early signaling events downstream of receptor activation in mammalian cells. There is also a growing literature for identifying transcription networks or "gene networks" from omics data, in particular transcriptomics (gene expression profiles, microarrays). Furthermore, although a different type of modeling from the first two, the computational prediction of promoters and regulatory elements from genomic sequences is essentially a field of its own. These modeling approaches have yielded and continue to yield important insights into the functioning and organization of complex biological systems. Yet, for our purposes of understanding the system-wide response to receptor activation, none of the above approaches are sufficient on their own. The output of the signaling models is routinely the activity or a particular signaling protein, which is used as a surrogate for "function", even though the system-wide functional consequences of that activity for the cell are often unknown. Gene network modeling approaches often rely solely on transcriptomic data and involve significant simplifying assumptions that render the resulting models difficult to interpret in biological terms, difficult to validate, and potentially biologically incorrect. Promoter and regulatory element prediction methods, while providing candidate transcription factor-gene interactions, do not identify functional interactions, nor do they provide predictions of the functional nature of the interactions.
The structured approach to transcription network modeling overcomes the individual limitations of the above methods by integrating them together into a dynamic modeling framework and utilizing additional data sets. Rather than treating the entire cell as a black box, as in most gene network modeling approaches, we first impose subcellular structure, dividing the cell into nuclear and cytoplasmic models. The cytoplasmic model describes how variation in mRNA levels and extracellular signals lead to modulation of transcription factor activities, and thus naturally incorporates existing models of intracellular signaling. The nuclear model describes how modulation of transcription factor activities leads to variation in mRNA levels, as opposed to allowing every gene to regulate every other gene as in most gene network modeling approaches. Finally, we impose nuclear connectivity structure by specifying which transcription factors regulate which genes. Nuclear connectivity may be obtained directly from high-throughput protein-DNA interaction data (as in ChIP-chip studies), or indirectly. In the indirect approach, we first assume that coexpressed genes are coregulated, and then search the promoters of the coregulated genes (clusters) for transcription factor binding sites that are over-represented as compared to random groups of the same size. Our group has developed the bioinformatics tool PAINT (Vadigepalli et al., 2003) to automate the indirect method for mammalian systems. An ongoing aspect of my research involves applying dynamic modeling approaches and using other types of genomic information to improve upon the coexpression-coregulation assumption in the indirect approach. We have had success in applying the structured approach to data for the yeast cell cycle as a case study. I am presently applying the approach for the response of hepatocytes to EGF. More detail about the approach and the case studies may be found in a poster I presented at ICSB2003 (Zak et al., 2003c), an in press article in Computers and Chemical Engineering (Zak et al., In press-a), and my faculty candidate abstract for AIChE 2004 (Zak, 2004).
Practical implementation of structured transcription network modeling involves the tight integration of techniques from what have largely existed as independent disciplines, shown schematically in the figure above (right). We broadly lump these disciplines into modeling and model analysis, system identification, data analysis, and bioinformatics. Apart from my ongoing experimental work, my research efforts fit into each of the four categories. Below I provide some details about that work, both related and unrelated to structured modeling, with links to relevant publications and abstracts.
Modeling and model analysis
My modeling work has primarily been focused on the development and analysis of a transcriptional network simulator for in silico experiments where, unlike actual experiments, the underlying network is known. The primary result from the in silico experiments was that microarray data is generally too limited and biological systems generally too complex for the networks to be revealed by microarray data alone. Rather, microarray data is most powerful for inferring networks when coupled with other datasets. Support for this argument can be found in Zak et al., 2001a, Zak et al., 2002a, Zak et al., 2003a, Zak et al., 2003b.
One aspect of my model analysis work, like the in silico experiments described above, has been concerned with linking models to data. I have been using identifiability analysis as a tool to evaluate how the accuracy with which model parameters (and thus biological networks themselves) may be determined depends on the quality and quantity of the experimental data itself. Applying these techniques to transcription networks revealed that, as described above, microarray data must be combined with additional datasets for biological networks to be inferred with reasonable certainty (Zak et al., 2002b Zak et al., 2003a).
Finally, I have used model analysis to gain insight into biological mechanisms themselves. Using sensitivity analysis and stochastic simulations I have considered the relationship between network robustness and network structure for several simple models of circadian rhythms (Zak et al., 2001b, Zak et al., In press-b). I have also employed stochastic simulations to consider the impact of stochastic gene expression on transcription networks (Zak et al., 2003a). Using existing models for growth factor signaling and apoptosis, I performed simulations to evaluate the physiological consequences of crosstalk between these important pathways (Zak et al., 2003d).
System identification
In system identification, the objective is to estimate models from experimental data, with the models primarily being of an empirical nature. My work in this area has been focused on the application of continuous time system identification methods for identifying models for gene regulation from gene expression data (Zak et al., 2003b). Continuous-time methods were chosen over the more common discrete-time methods because they are better suited for data common in biology (sampled at irregular time intervals) and it is more straightforward to link continuous-time models for gene expression to signaling pathway models that are routinely represented as ordinary differential equations.
Data analysis
A large amount of my data analysis work has been concerned with measurements of gene expression. I have been applying both parametric and non-parametric methods to test for differential gene expression in microarray data. I have also been working on improved measurement models for quantitative polymerase chain reaction (qPCR). The objective is to develop robust and reliable models that relax the assumptions commonly made in qPCR data analysis.
My data analysis work additionally has involved working with list of genes, such as those differentially expressed in response to a particular ligand. In particular, my work has involved testing the lists for statistical enrichment for certain properties as compared to random groups of the same size, such as enrichment for functional groups or transcription factor binding sites in their promoters. In the case of transcription factor binding sites, the test is not straightforward given that binding sites may appear multiple times in a given promoter. To accommodate this complexity, we have been developing non-parametric methods (akin to permutation tests) that can appropriately address this complexity (Zak et al., 2003c Zak et al., In press-a).
Bioinformatics
Structured modeling of transcription networks requires fluency with genomic datasets that are non-numeric in nature, such as promoter sequences and gene annotations. I have been gaining these bioinformatics skills through use of the promoter analysis tool developed in our group, PAINT (Vadigepalli et al., 2003) and other online bioinformatics resources. In addition, I have been developing a tool that automates literature searches (PubMED) for large gene lists in order to determine gene interactions that are already known. This information may then inform subsequent analyses in the structured modeling approach.
Back to top
Back to Daniel Zak's main page
Back to Ogunnaike group main page