Applying Data Science to Biomedical Research & Development
Over the past 10 years, data science and machine learning have been applied to almost all industries and academic fields. From the general idea of making data-driven decisions and predictions, we have stepped into an era of using models in daily operations for virtually everything. This includes not only classical industries like tech, finance, and retail, but also those which might sound rather surprising, such as agriculture, human resources, civil engineering, and aviation. Of course, the pharmaceutical industry and its abundance of data is now a leading force in the use of data science.
One Size Does Not Fit All
Despite the use of data science and machine learning models across a wide array of industries, it is never a one-size-fits-all approach. In this sense, data science shares many similarities with other cross-functional jobs, for example, software engineering. While there are software engineers in almost every company nowadays, their skill sets and responsibilities differ dramatically across industries – no one expects the daily work of a tech-company software engineer to be the same as a software engineer in a construction company. The same analogy holds for the data scientists as well. At the current speed of industry development, the skill sets of data scientists in different industries will continue to diverge and problems will soon become so varied that specialization may become a requirement to build a successful career, particularly with regard to pharmaceutical data science.
What Makes Pharmaceutical Data Science So Unique?
First and foremost, data science in the pharmaceutical industry is naturally shaped by the major areas to which data scientists contribute, including analysis of clinical trial results, health economics and outcomes research, and sales and marketing. All of these areas were critical for pharmaceutical companies long before data science became a well-known field, but the tool sets of modern machine learning techniques brought depth and quality of analytics to a new level. This being said, the majority of those cutting-edge models were originally developed to be used across industries and are not tailored for the needs of data science in biotech. Thus, these models cannot be applied without scrupulous tuning and adjustment, which requires deep understanding of the underlying data and processes.
Precision Versus Recall
There is a famous saying by the British statistician George Box, “All models are wrong, but some are useful.” Indeed, uncertainty and rates of false positive and false negative errors are baked into all data science models. However, in drug development, when these models are driving life or death decisions, the stakes are much higher. Therefore, by the nature of their work, pharmaceutical data scientists must focus on minimizing the false negative rate more often than their counterparts in other industries thereby shaping the toolkit available for model building and validation.
Data Quality
Minimizing the error rate is closely tied with ensuring the highest data quality and integrity. Independent of the data scientist, models can only be as good as the data they are built on – the “trash in, trash out” rule still holds. So, any flaws in data consistency and accuracy may lead to misleading conclusions. With stakes being as high as they are in the pharmaceutical industry, such data issues cannot be tolerated.
Furthermore, not only does the data need to be collected, stored, and structured in a way that preserves its integrity, but it is also highly important to treat it in a way that ensures the security of critical personal information.
Changing the Face of Science at a Fast Pace
Given the aforementioned challenges and constraints, data science in the pharmaceutical industry is rapidly growing. Large companies such as Pfizer, Takeda, and Bayer have been employing data scientists for quite some time. In the last few years, new players in pharmaceutical data science, such as Verily (formerly Google Life Sciences) took on a larger role in building their teams as a symbiosis of machine learning engineers, data scientists, and research scientists. A similar strategy of blending technology, science, and engineering is at the core of numerous biotech start-ups and the demand for data scientists is on the rise. As the industry continues to evolve, how will data scientists help advance your research?
Do you have a data-related project to write up and publish? JetPub Scientific can help!