Research Interests

AI in Healthcare and Medicine

The US medical research community has made remarkable advances in developing technologies and treatments for many previously deadly diseases, but actually getting these advances to patients in the clinic has been very challenging. This is primarily due to the ever-expanding medical knowledge base juxtaposed to the increasingly time-challenged practitioner in the clinic. This problem manifests to many real world effects, such as taking roughly 17 years for published research findings to make it into clinical practice on average. But the current circumstances are actually even more worrisome than this suggests. Astoundingly, each year, more than 250,000 patients in the United States die of preventable medical errors (e.g., misdiagnosis, delayed diagnosis, inappropriate treatment, iatrogenic infections, etc.), making it the third leading cause of death, after heart disease and cancer. With our current medical understanding of best practices, this is simply unacceptable. Evidence-based care, the current gold standard, describes a systematic approach that synthesizes technology, science, data, and clinical judgement. This is an exceptionally difficult task for doctors and healthcare providers especially in the context of information overload. With the advance of technologies, including automation and intelligence in data sharing, integration, and analytics, Artificial Intelligence (AI) can enable personalized, evidence-based care that can be realized in a safe, timely, effective, and affordable patient-centered way for the benefit of all stakeholders. Our research is to develop/apply/promote AI technologies to integrate and analyze EMR data to derive the diagnostic summary of the cohorts of similar patients, to predict potential medical risks using patient data to avoid huge medical expense, to find cohorts from EMR data to shorten clinical trial cycles, to build tools/apps to enable evidence-based care by presenting summarized diagnostic patterns from cohorts in an easy-to-use manner that physicians can grasp them effortless, and to enable human-centred medical imaging diagnosis by using deep learning on diagnosis notes and medical images to do multimodal training to do visual grounding, visual question and answer, and auto annotation. We are building the research themes to enable human-centered AI approach with data-driven evidence-based care. Our current projects include:

  • AI in medical imaging diagnosis: developing multimodal deep learning methods on diagnosis notes and medical images (e.g., chest x-rays) to enable visual grounding; exploring knowledge graph based approaches to support visual question & answering; merging text graph and scene graph of medical images to support auto-annotation
  • Deep learnng on electronic health records: developing pre-training model for downstream risk and disease prediction; building relational learning to improve the accuracy of risk prediction; exploring time series transformer to embed dynamic patient visits and history.
  • Human-centered medical imaging diagnosis: combining radiomics with CNN to generate hybrid approachs to support radiologists for their diagnosis during the clincial decsion making workflow.
  • Evidence-based care: utilizing EMR data to generate patterns and evidences from cohorts or patient history to support evidence-based care, such data-driven alerts, generating diganosis rules based on medical notes uing Natural Language Processing approaches, predicitng suicidal risk for patients with mental diseases.
  • Mining knowledge graph for drug discovery: developping and adapting deep graph learning algorithms to address the heterogeniety of the graph, including attributes and content for nodes and edges; exploring graph transfomers for metapath driven approaches to enalbe explanability of AI results
  • Natural Language Processing in healthcare: exploring all NLP models (e.g., cTakes, metamap, scispacy, biobert, bluebert, amazon comprehend) to improve the entity and relatiosnhip extraction, especially with the focus ot address the negtations in the text.

Data-driven Science of Science

Science always plays a pivotal role for human civilization. It is a magic maker that creates wonders beyond human imagination. Science, as an area of research, a common practice by scientists, or an enterprise run by funding agencies or governments, has existed for centuries. Now the fast pace of publications has accumulated large scales of data which fully and accurately documented the details of scholarly activities (e.g., collaboration, knowledge generation and accumulation, knowledge transfer, labor division, innovation generation and diffusion, knowledge entity for discovery, funding support) and social interactions/implications (e.g. team formation and dynamics, self promotion/self citation, leadership and culture, novelty, science inequality,citing behavior and bias, interdisciplinary/translational/parachuting collaboration, barriers for knowledge transfer, gender bias, funding bias (money and research)). Now we can utilize the big scholarly data to analyze and understand phenomena about science. Topics that we are focusing on, but not limited to, are: 1) team formation and dynamics, 2) leadership and culture, 3) science inequality, 4) AI innovation diffusion using boundary theory, 5) barriers of AI innovation in Medicine, 6) causal inference in information science, 7) selfish knowledge and role of self promotion, 8) labor division in modern science, 9) lean collaboration in github, 10) diversity in science, and so on.

Knowledge Graph and Mining

Knowledge Graph is becoming the backbone for the next wave of AI, being it cognitive AI or AI 3.0 (or how people will name it). The necessary next step for AI is to build the knowledge graph as the digital brain for AI like its counterpart human whose intelligence is generated and processed by their biological brains. Very soon, machine learning and deep learning will reach their next bottleneck that pure computation is limited to capture intelligence. Semantics and contexts should be added to the workflow and fully integrated into the machine learning and deep learning algorithms. We build the first PubMed knoweldge graph for 29million PubMed articles and it is open to the public. Our group also built the knowledge graph for drug discovery by integrating 25 different databases (Chem2Bio2RDF). We are working to extend this knowledge graph to include healthcare and clinical practices by extracting and mining PubMed articles, clinical trials, and available EMR data. Our research agenda is to create/apply latest machine learning and deep learning algorithms to derive intelligence by mining the integrated knowledge graphs (e.g., edge2vec), to promote research excellence by connecting experts using knowledge graphs, and build AI tools powered by knowledge graphs to enable evidence-based care in health.

Scholarly Communication for Knowledge Discovery

Nowadays science is being conducted in a very different way than it was twenty years ago. Vast amount of publicly available knowledge (e.g., datasets, publications, patents, case studies and tools) and the exponential-growing computing power have enabled this change to occur. No scientists can finish reading all the related articles, let alone browsing all the related datasets, patents and tools. The current amount of published knowledge is beyond what a single scientist can consume, and knowledge transfer has been restrained due to the limit of human cognition. New way of conducting science is highly demanded. Here we propose the new concept of entitymetrics which uses entities (i.e., evaluative entities or knowledge entities) in the measurement of impact, knowledge usage, and knowledge transfer to facilitate knowledge discovery. This extends scholarly communication by emphasizing the importance of entities, which are categorized as macro-level entities (e.g., author, journal, article), meso-level entities (e.g., keyword), and micro-level entities (e.g., dataset, method, domain entities). These entities can be analyzed from the temporal perspective to capture dynamic changes or from the spatial dimension to identify geographical differences. Entitymetrics focuses on both knowledge usage and discovery and can be viewed as the next generation of scholarly communication, as it aims to demonstrate how scholarly communication approaches can be applied to knowledge entities and ultimately contribute to knowledge discovery.