Summary
We apply methods in artificial intelligence and machine learning to a broad range of problems in computational and systems biology as well as in medical text and medical data mining.
- We design ensemble models for predicting mortality in patients with chronic heart failure using a large number of measurements including time series data on inflammatory biomarkers. The ROC of our predictive model is 84% on a large patient cohort which vastly improves upon the ROC of the state-of-the-art model which is 73%.
- We design machine learning algorithms to acquire probabilistic models of metabolic and signaling networks in cancer by integrating multiple sources of information. These include flow cytometry measurements of multiple phosphorylated protein and phospholipid components in cells, SELDI-ToF proteomic data, as well as mRNA expression analysis through microarrays. Our key results include (1) explaining the over-expression of putrescene in prostate cancer cells by computationally deriving changes in the glutathione and urea pathways of prostate cancer patients using microarray data. (2) reconstructing the T-cell signaling pathway from flow cytometry data of Sachs et. al. and finding a new crosstalk mechanism between JNK and P38 which has since been experimentally validated, (3) identifying key biomarkers that help in accurate differential diagnosis of colorectal cancer from other bowel diseases using SELDI-ToF data.
- Our newest work is in the area of biomedical text mining: using concept graphs to improve the effectiveness of retrieval of relevant papers in the biomedical literature and in high-throughout phenotyping using text data in electronic medical records.
Selected Projects
- Ensemble modeling
- Predictive models of heart failure mortality using time series measurements and ensemble models
- Model-averaging strategies for structure learning in Bayesian networks
- Automated endoscopic image analysis to detect early cancer in patients with Barrett’s esophagus using narrow band imaging
- Bayesian networks
- Bayesian analysis of expression data: revealing new components of the Dictyostelium PKA pathway
- Modeling metabolic pathways in prostate cancer
- Deriving T-cell signaling networks from flow cytometry data
- Supervised machine learning
- Identifying biomarkers for colorectal cancer, ulcerative colitis and Crohn’s disease using proteomic SELDI-ToF data
- Statistical methods for the objective design of screening procedures for macromolecular crystallization
- Text mining