Pipeline Pilot is a desktop software application developed by Dassault Systèmes. Initially focused on extract, transform, and load (ETL) processes and data analytics, the software has evolved to offer broader capabilities in various scientific and industrial applications.

Developer(s)Accelrys
Initial release1999 (1999)
Stable release
18.1 / May 2018 (2018-05)
Written inC++
Operating systemWindows and Linux
TypeVisual and dataflow programming language
LicenseProprietary
Websiteaccelrys.com/products/collaborative-science/biovia-pipeline-pilot/ Edit this on Wikidata

Pipeline Pilot uses a visual and dataflow programming interface, allowing users to design workflows for data processing. The software's functionality spans several domains, including cheminformatics, QSAR,[1][2] next-generation sequencing,[3] image analysis,[4] and text analytics.[5]

Pipeline Pilot is primarily used in industries that require extensive data processing and analysis, including life sciences, materials science, and engineering. The software allows users to create workflows by dragging and dropping functional components that automate data analysis tasks, integrate with databases, and perform various scientific computations. These workflows are referred to as "protocols" and can be shared and reused within teams or organizations.

The product supports multiple programming languages, including Python, .NET, Matlab, Perl, SQL, Java, VBScript, and R, giving users flexibility in integrating custom code into their workflows. Additionally, Pipeline Pilot offers support for PilotScript, its own scripting language based on PLSQL, which allows users to perform custom data manipulations within their workflows.

Pipeline Pilot has continued to expand its capabilities with additional modules and toolsets for specific scientific tasks, such as next-generation sequencing analysis, cheminformatics, and polymer property prediction.

History

edit

Pipeline Pilot was initially developed by SciTegic, a company that was acquired by BIOVIA in 2004. In 2014, BIOVIA became part of Dassault Systèmes.

Originally designed for applications in chemistry, Pipeline Pilot's capabilities have since been expanded to support a wider range of data processing tasks, including extract, transform, and load (ETL) processes, as well as general analytical and data processing tasks across various fields. The software is used in domains such as life sciences, materials science, and engineering, providing users with tools for creating automated workflows for data analysis and scientific computation.

Overview

edit

Pipeline Pilot is a software tool designed for data manipulation and analysis. It provides a graphical user interface for users to construct workflows that integrate and process data from multiple sources, including CSV files, text files, and databases. The software is commonly used in extract, transform, and load (ETL) tasks.

The interface, known as the Pipeline Pilot Professional Client, allows users to create workflows by selecting and arranging individual data processing units called "components." These components perform a variety of functions such as loading, filtering, joining, or modifying data. Additional components can carry out more complex tasks, such as constructing regression models, training neural networks, or generating reports in formats like PDF.

Pipeline Pilot follows a component-based architecture where components serve as nodes in a workflow, connected by "pipes" that represent data flow in a directed graph. This framework enables the processing of data as it moves between the components.

Users have the flexibility to work with pre-installed components or develop custom ones within workflows, referred to as "protocols." Protocols, which consist of linked components, can be saved, reused, and shared, enabling streamlined data processing. The interface visualizes the connections between components, simplifying complex data workflows by presenting them as sequences of operations.

Component collections

edit

Pipeline Pilot offers several add-ons called "collections," which are groups of specialized functions aimed at specific domains, such as genetic information processing or polymer analysis. These collections are available to users for an additional licensing fee.

The collections are organized into two main groups: science-specific and generic. The science-specific collections focus on areas like chemistry, biology, and materials modeling, while the generic collections provide tools for reporting, data analysis, and document search. Below is an overview of the available collections:[6]

Group Domain Component collection
Science-specific Chemistry Chemistry
ADMET
Cheminformatics
Biology Gene Expression
Sequence Analysis
Mass Spectrometry for Proteomics
Next Generation Sequencing
Materials Modeling & Simulation Materials Studio
Polymer Properties (Synthia)
Generic Reporting & Visualization Reporting
Database & Application Integration Integration
Imaging Imaging
Analysis & Statistics Data Modeling
Advanced Data Modeling
R Statistics
Document Search & Analysis Chemical Text Mining
Text Analytics
Laboratory Plate Data Analytics
Analytical Instrumentation

Custom scripts

edit

Pipeline Pilot is commonly used for processing large and complex datasets, often exceeding 1TB in size. In its early development, Pipeline Pilot introduced a scripting language called "PilotScript," which allows users to write basic scripts that can be integrated into a protocol. Over time, support for additional programming languages was added, including Python, .NET, Matlab, Perl, SQL, Java, VBScript, and R. These languages can be used through APIs that execute commands without requiring the graphical user interface.[7]

PilotScript, a language modeled on PLSQL, is used within specific components like the "Custom Manipulator (PilotScript)" or "Custom Filter (PilotScript)." An example of a simple PilotScript command is shown below, where a property named "Hello" is added to each record passing through the component with the value "Hello World!":

 Hello := "Hello World!";

References

edit
  1. ^ Hassan, Moises; Brown, Robert D.; Varma-O'Brien, Shikha; Rogers, David (2007). "Cheminformatics Analysis and Learning in a Data Pipelining Environment". ChemInform. 38 (12). doi:10.1002/chin.200712278. ISSN 0931-7597.
  2. ^ Hu, Ye; Lounkine, Eugen; Bajorath, Jürgen (2009). "Improving the Search Performance of Extended Connectivity Fingerprints through Activity-Oriented Feature Filtering and Application of a Bit-Density-Dependent Similarity Function". ChemMedChem. 4 (4): 540–548. doi:10.1002/cmdc.200800408. ISSN 1860-7179. PMID 19263458. S2CID 35868099.
  3. ^ "Accelrys Enters Next Generation Sequencing Market with NGS Collection for Pipeline Pilot". Business Wire. 2011-02-23. Retrieved 15 February 2013.
  4. ^ Rabal, Obdulia; Link, Wolfgang; G. Serelde, Beatriz; Bischoff, James R.; Oyarzabal, Julen (2010). "An integrated one-step system to extract, analyze and annotate all relevant information from image-based cell screening of chemical libraries". Molecular BioSystems. 6 (4): 711–720. doi:10.1039/b919830j. ISSN 1742-206X. PMID 20237649.
  5. ^ Paveley, Ross A.; Mansour, Nuha R.; Hallyburton, Irene; Bleicher, Leo S.; Benn, Alex E.; Mikic, Ivana; Guidi, Alessandra; Gilbert, Ian H.; Hopkins, Andrew L.; Bickle, Quentin D. (2012). "Whole Organism High-Content Screening by Label-Free, Image-Based Bayesian Classification for Parasitic Diseases". PLOS Neglected Tropical Diseases. 6 (7): e1762. doi:10.1371/journal.pntd.0001762. ISSN 1935-2735. PMC 3409125. PMID 22860151.
  6. ^ "Pipeline Pilot Component Collections". Accelrys. Archived from the original on January 15, 2013. Retrieved 26 January 2013.
  7. ^ "Pipeline Pilot Integration Component Collection Datasheet" (PDF). Accelrys. Retrieved 8 February 2013.