Machine Learning for Healthcare

We aim to improve data-driven clinical decision making through machine learning and data mining. Our work has spanned time series classification, word embeddings, and document series classification.

Early Classification of Time Series

Early Classification of Time Series is the search for early time steps at which accurate classifications can be made in time series data. For example, in diagnosis problems we want to provide accurate diagnoses to a doctor as early as possible. In particular, we focus on providing tunablity in our solutions, allowing users to choose how much to focus on earliness versus accuracy. Solutions to this problem enhance the practicality of clinical decision making systems by giving clinicians enough time to react to machine predictions.

Clinical Note Classification

A patient’s clinical notes correspond to a sequence of free-form texts generated by health care professionals over time, with each note in turn containing a sequence of words. Additionally, notes are accompanied by external attributes at multiple layers such as the time at which each note was created (note level) or the demographics of the patient (patient level). Thus, EHR notes correspond to a nested structure of text sequences augmented with external multi-layer attributes. To model this complex data, we propose a number of solutions ranging from hierarchical attention networks to transformer-based architectures.

Clinical Word Embeddings

Pre-trained word embeddings serve a large role in many NLP applications. However, choosing which source of word embeddings to use is a challenging problem. We demonstrate in which situation we could most benefit from different types of word embeddings, i.e., when we should use general pre-trained embeddings, clinical pre-trained embeddings, or locally-learned embeddings. Moreover, we propose using meta-embeddings, which is the combination of many pre-trained sources according to the information contained within their embedding vectors.

Multi-modal EHR Mining

We built risk estimation frameworks for Hospital-acquired Infections (HAI) including Clostridium Difficile and MRSA infections. We embrace a multi-modal approach where we extract features different types of EHR data such as clinical notes (text), time-varying vital signs (time series), and patient demographics (tabular).