Medical device companies have massive amounts of data, but imaging data often isn’t set to support R&D efforts. Here’s how to change that.
Jim Olson, Flywheel
Data is the lifeblood of R&D groups at medical device companies. However, unlocking the full potential of your organization’s data assets isn’t straightforward, especially for complex data such as medical imaging.
Medical imaging assets are extremely valuable for R&D efforts, but they are often disorganized and lack consistent labeling. This means that before they can be used for analysis and/or for machine learning or AI applications, they must be standardized and made accessible.
But curating complex medical imaging data poses a major challenge in many organizations. Even after data are organized, existing infrastructure and research processes can continue impeding success, and ultimately time to market.
To speed and scale medical device R&D, organizations must embrace extensible data practices and scalable data management solutions. To make the process easier, here are seven data management tips to help maximize the value of imaging data.
Get data out of silos
Information doesn’t add value when it exists in isolation. A data silo occurs when data is not easily discoverable and cannot flow freely between departments, a common situation in life science organizations. In many cases, data may be held in different databases with different structures and storage conventions. A comprehensive data platform can alleviate these issues by centralizing data in a shared repository, with access controls and version history. A well-designed and maintained data platform gives organizations flexibility in how data is stored, accessed and used, while giving users more visibility to available assets.
Conduct upfront data cleaning and standardization
The complex nature of medical imaging data means that researchers must leverage the assets’ metadata to make them useful for large-scale projects. However, metadata conventions often differ between data sources, devices and practitioners. Data scientists can standardize the conventions for an organization, both within the archives and also as newly captured data come in. The standardization may include, but is not limited to, standardized labeling for imaging modalities and/or body parts.
Ensure data hygiene practices can work at scale
The data standardization described above is only meaningful if it can be applied in an automated way at the enterprise level. Manually curating, cataloging and organizing data — even if it’s performed by a trained team adhering to agreed-upon standards — is too time-consuming, and still carries the risk of inconsistency. Automating these processes as much as possible can prevent many challenges later.
Understand data modalities and associated measures used throughout the organization and ensure automation works for all
Your research teams may use image-based data such as DICOM and microscopy, time series-based data like electroencephalography, and CSV (comma-separated variable) files or other self-describing text-based files in their work. Even when utilizing the same modality, different analysis approaches and output measures may need to be followed or captured. In designing an approach to modernize data management, teams should consider every data modality, data type and associated workflow that they can integrate with and apply standardization for all.
Ensure you have an adequate volume of diverse data for AI training
The old saying “garbage in, garbage out” is especially true for AI training. Models trained using an inadequate volume of data — or data that doesn’t reflect the diversity of impacting variables such as scanner types or patient population — will likely underperform. To prevent this, it’s important to leverage all available data within your enterprise, but you may also look to supplement your own data with publicly available datasets or datasets licensed from collaborators. In either case, the need remains for consistent data curation and utilization of the entire data within validated workflows.
Leverage cloud-scale resources
Medical imaging data requires vast amounts of storage and computational power. Relying exclusively on on-premise resources can be both costly and limiting. Leveraging cloud-scale resources, on the other hand, allows for elastic compute infrastructure and more flexible storage. Organizations can spin up instances on the cloud as needed for unrivaled scalability.
Consider how to address comprehensive provenance
Provenance (establishing a documented trail to the original data source and associated processes and analysis steps) is required for reproducibility and regulatory approval. Research teams should look for systems that can automate provenance, with recording of access logs, versions and processing actions. Automating this work not only removes the burden from researchers but eliminates the risks of noncompliance and errors.
If your organization is facing challenges in leveraging insights from medical imaging data, you’re not alone. Fortunately, there are tools and resources available to automate and scale data capture, curation, and computation.
Incorporating these data management tips comes at an upfront investment of time and resources, but that investment can pay dividends through enriched data and accelerated innovation.
Jim Olson is the CEO of Flywheel, a biomedical research informatics platform that leverages the power of cloud-scale computing infrastructure to address the increasing complexity of modern computational science and machine learning.
The opinions expressed in this blog post are the author’s only and do not necessarily reflect those of Medical Design & Outsourcing or its employees.