As the Vice President of Data Science at Discern Health, Khurram Tehseen brings a wealth of experience and insight. In a recent conversation, Tehseen shared invaluable insights into the nuances of longitudinal data and its pivotal role in shaping data-driven decisions in healthcare.
Navigating Clinical, Claims, and Social Data for Better Predictive Models
Central to data science at Discern Health are publicly and commercially available datasets and research datasets that encompass clinical, claims, and social data.
Our data set includes longitudinal data on 77 million patients, unveiling a rich view of patient journeys. “The richer your data, the easier it is for a model to pick the underlying signal,” Khurram said. “Having that history allows us to predict adverse events before they happen, so preventative measures can be taken.”
What sets longitudinal data apart is its inherent ability to capture the evolution of patient experiences over time and finding patterns that may not otherwise be visible. “The intricacies of how these different interactions are happening already exist. What the models are doing is uncovering the complex relationships a human would miss and go through the large volume of data a lot faster than a human can.”
By tracing the trajectory of events, data scientists gain unparalleled insights into underlying patterns and interactions, thereby enhancing predictive accuracy. “Predictive models help with inconsistencies in how data is captured. They are helping to uncover adverse risk events that have a high probability of occurring so that the clinicians or care managers can provide targeted preventative care.”
Other clinical data sets, though smaller in scale, allow for rich detail, offering the data science and clinical teams at Discern Health a more comprehensive view, enabling deeper analyses to refine the predictive models.
The Interplay of Automation and Responsible Data Use
In a landscape abuzz with automation and AI, Tehseen advocates for a balanced approach that prioritizes responsible data utilization. While embracing cutting-edge machines learning (ML) models, he underscores the importance of explainability and transparency in model outputs. “At Discern Health, the modeling work is done in conjunction with our clinical team. At every step, from target definition to feature engineering to algorithm choice, we work hand in hand with the clinicians to make sure the proper clinical oversight is present. This forces us to explain the various modeling decisions and ensure transparency in the model is maintained.”
Tehseen said it is also important to do whatever is possible to mitigate bias in models. “Every data set has bias in some way, so we try to build models that account for its deficits. We strive not to misrepresent what the data is telling us just because we’ve forgotten to take into account the actual composition of the data itself.”
Bridging the Divide: Interoperability Challenges in Healthcare Data
Beyond the realms of data science, Tehseen sheds light on the broader industry challenges pertaining to interoperability and data standardization. With healthcare data characterized by multiple code sets and interoperability hurdles, his team’s proactive approach includes the development of internal crosswalks and terminologies to navigate these complexities.
The company has hired medical terminologists to work collaboratively with its data scientists to better understand where conditions could live in the code to inform predictive models. “If we are going to build a diabetes model, well, how is diabetes represented in terms of code? There could be a myriad of ways of representing diabetes across diagnostic codes and even beyond the ‘problem’ domain. Understanding that becomes really important.”