LSIT LogoLSIT Header

A Virtual Solution to a Real Conundrum

by Tony Pietsch

One of the greatest pain points in life science IT is the large discrepancy between the development time of a typical LS product, with many regulatory checkpoints, and the ever shorter cycle time for hardware and software advances. Addressing the rate of computer hardware and operating system turnover (including software patches) through iterative virtualization can effectively mitigate this as an extreme barrier to modernization in the life science industry.

In 1936, British mathematician Alan Turing published a paper outlining a simple theoretical computing machine that is widely considered the conceptual precursor to the modern computer. Among the novel capabilities of the Turing Machine, as his invention came to be known, was its capacity to imitate another Turing Machine—a concept he called the universal computing machine. He postulated that given enough processing power, the imitating machine could emulate the imitated machine at a speed comparable to the original machine's performance. Although Turing's original work was strictly hypothetical, anyone familiar with mainframes or who has run a DOS box under Windows XP knows that his model, first devised more than seven decades ago, not only proved viable but has been entrenched in computing for at least 30 years.

As IT professionals working in life science, we are charged with supporting R&D processes that on average span 15 years and ten generations of hardware—a period that typically encompasses significant changes, especially in hardware interfaces to external devices such as laboratory instruments. Increasingly, we are also being called upon to support applications, such as genome searches, protein and receptor docking modeling, cellular activity as a time-lapse video, and chemical compound screening for activity, that present data in more than two dimensions and transcend the ability to capture the data on paper. With the FDA mandating that all data relevant to the development of a drug, medical device or life science product be available for two years after the last of the product is dispensed, it is essential that these "views" of the data collected be recorded so they can be submitted as supporting documentation when the product is reviewed. One of our biggest quandaries is how to support legacy data storage and application viewers that remain accessible on current-generation hardware and operating systems at the same time more and more applications are producing data that cannot be adequately viewed on paper.

Turing's pioneering work may hold the key.

Using Virtualization for Legacy Data Analysis

Currently, the holy grail of life science application design is a public format for data storage that is independent of the data collection program and process. The application needs to be structured for viewing data at a much later date without the collection hardware attached, and ideally without requiring a user to register the application with the vendor just to view the data. (Imagine if all the potential reviewers in government had to predict which applications they would need and still be able to obtain and register 40 years from now!)

Similar considerations govern the use of hardware. In life science it is often necessary to utilize the same selection assay or other instrumentation technique on the drug candidate submitted for approval that was used when the initial target was identified—a time frame that usually exceeds a decade. If the instrumentation changes in the midst of development, the biological assay or technique must usually be validated on the new instrument, resulting in prohibitively expensive time and labor costs to prove consistency and accuracy. As a result, most organizations opt simply to maintain their old instrumentation, computers and applications for the duration of the project.

If, however, through iterative virtualization—an update of Turing's universal machine—computers could be upgraded without losing or altering the validated applications and their instruments, the barrier to performing a companywide upgrade of hardware or base operating systems would be significantly lowered.

Consider how it could work.

Changing the Perspective on Virtualization

Today's data centers frequently contain aggregations of server computers and components that are capable of running multiple applications, each in its own protected real space. The protection is achieved by wrapping the application with special software that presents itself as a "virtual operating system" native to the application it runs. Multiple copies of any virtual OS—from any vendor or generation—may reside on the same hardware; they are interfaced to the "real world" by an underlying operating system commonly called a hypervisor, as diagrammed in figure 1.

Typical Horizontal Virtualization for Servers

Figure 1: Typical Horizontal Virtualization for Servers

This represents the most widespread view of virtualization today: running multiple applications in a horizontal structure that gives the hypervisor's assigned priority to each process as needed.

Similarly, endpoint computers (attached to an endpoint of a network) running common Windows, Apple or Linux operating systems can also emulate each other, as well as earlier versions of the most recent operating system, including other operating environments. Vendors have universally adopted this approach to allow the greatest flexibility for running the widest range of "best in class," or "I'm just used to it" applications, including legacy applications that require some prior version of each operating system or environment. A common view of this virtualization, also horizontal, is shown in the figure 2.

Desktop Virtualization using Windows Vista as Example

Figure 2: Desktop Virtualization using Windows Vista as Example

If we step back from this view and limit ourselves to the emulation of just a single operating system running only the prior version of that operating system from the same vendor, however, we have the simplest essence of Turing's virtualization, in a vertical view. If this is applied recursively—that is, each operating system in the machine, base or virtual, just runs an emulation of the prior version of itself—we theoretically can reach back in time through the chain of operating systems to run any legacy software application, even applications that extend back to hardware or software many generations earlier. Life science companies employing this strategy would no longer need to completely revalidate the biology, chemistry and instrumentation when updating to the next generation of hardware and software. (The same approach could also be applied to any industry requiring long-term, complex data access for regulatory review.)

A simple three-tier view of this vertical concept is shown in figure 3, along with an existing example of this potential recursion.

Theoretical Vertical Virtualization with Possible Realization

Figure 3: Theoretical Vertical Virtualization with Possible Realization

Making Virtualization Work

This vertical virtualization strategy places the burden of validation equivalency on operating system and hardware vendors, but it limits the scope of life science validation to the OS arena and applies across all legacy applications. While much of the necessary capability already exists in products available today, proof of equivalency is not always up to regulatory standards.

From a pure operating system or hardware source standpoint, equivalency has long since ceased to be an issue: A Turing Machine is a Turing Machine. It doesn't matter what the underlying processor is or how it works; it can emulate any other operating system and enter the recursion chain at any version level to support legacy applications, potentially entering the chain for eternity. For life science applications, however, the emulation must be validated to regulatory standards, which is a critical hurdle to adoption of new products.

In hardware, to be realistic, 100% equivalency is likely to remain elusive: Consider how many bus architectures have come and gone in just the past ten years. The "legacy" barrier to adoption of newer operating systems will be removed, however, so long as emulation extends downward to the most common device support and the current hardware generation is able to communicate with legacy devices. From the standpoint of an endpoint computer vendor, the hardware needs to support some means of connecting legacy devices. This could involve little more than utilizing the current best standard device communication and adding an external converter to the legacy hardware standard, much as USB to standard 9-pin serial cable products do today. Key to its success, however, is a device driver for the current operating system that permits the virtualized operating system to communicate transparently with the legacy device.

Beyond these considerations, virtualization in life science must continue to support both the common KVM (keyboard, video, mouse) user interface, and through the layers of virtualized device drivers, still be able to reach the most basic or common external device connection hardware and protocols. Fortunately, current products cover more than 90% of today's instrumentation, opening the door to defining best practices in the field.

Toward Best Practices

Instrument Software

To support the transition to a virtualized chain of operating systems, instrument applications running on commonplace computers need to divide and isolate data collection from data analysis. This means that only the run-time portion of the application should communicate with the instrument; the raw data or data using a predefined filtering algorithm should be stored, typically to hard disk or similar semi-permanent media.

The data generated by the run-time application will be read in, processed and presented by a viewing application that does not need the run-time application; it can operate independently. This approach allows a local or remote user to manipulate the data to get the clearest signal and also permits later review of the results, as necessary.

Instrument Hardware

If at all possible, any life science instrument should have one—and only one—bi-directional communication connection to the computer controlling it. That connection should be selected to utilize the most advanced and robust standardized communications protocol available at the time of initial design so that it remains generically supported for as long as possible.

Another key aspect of the connection is that it be "ground-isolated" from the controlling computer. Today that means using opto-isolators, inherent in most Ethernet interfaces, to prevent any ground loops. Another alternative is to go wireless. After all, who knows where the virtualized control application may actually reside in ten years?

***

It is a paradox of our field that although we contend with some of the most advanced applications in computing, we must forever be looking backward as well as ahead. Thanks to a forward-thinking mathematician who had the imagination to conceive of the challenges we face long before they existed, we are now poised to solve one of the thorniest dilemmas of our profession.


Tony Pietsch has been involved with LSIT since 2005 and co-leads development of the infrastructure chapter for LSIT's Good Informatics Practices. A consultant to the biotech industry, specializing in software architecture, communications protocols and real-time systems, he has solved intractable problems for such clients as Beckman Coulter, Vertex Pharmaceuticals and Aurora Discovery and has worked with Pfizer, Merck, Bristol Myers Squibb and Johnson and Johnson PRD. During his four years with Aurora Biosciences, he spearheaded software development for multiple large-scale instrument systems for the pharmaceutical industry. He is an expert in translating complex user needs into robust products and has won two awards for user interface design. He can be reached at tony@tonypietsch.com.


Valid XHTML 1.0 Strict  Valid CSS!


© 2003-2008 The Life Sciences Information Technology Global Institute.
LSIT Global Institute, 14677 Via Bettona 110, Suite 800, San Diego, CA 92127 USA • Ph: (858) 759-4750 • Fx: (858) 759-6646

The LSIT Global Institute is a U.S. 501(c)(3) tax-exempt organization. Contributions are tax deductible as allowed by law.
Use of this site indicates your understanding and agreement to our Privacy Policy and Terms of Use.