Clinical Data Miner - Electronic Data Capture Software Providing a Data Querying Library Allowing Integration in Data Analysis Software
|
If you are the presenter of this abstract (or if you cite this abstract in a talk or on a poster), please show the QR code in your slide or poster (QR code contains this URL). |
Abstract
Background: Using Electronic Data Capture (EDC) software in clinical studies, data collection processes have been optimized to a great extent, leading to decreased personnel costs and reduced error rates. Subsequent data analysis, however, requires an often error-prone and time-consuming process of exporting data to a file, and converting the result to a format recognized by the analysis software.
Objective: Our aim is to create an EDC software framework offering straightforward integration with data analysis packages, by means of a software library, eliminating the need for exporting data and converting the result, allowing data analysis software to read study data directly as software objects from the EDC system instead.
Methods: The EDC software framework was implemented in Java, using a Test-Driven Development methodology. It has a modular architecture, with modules for common, server, and platform-agnostic client functionality, as well as a platform-specific module, which implements a Web 2.0 data collection user interface. Software quality is monitored by the Jenkins continuous integration build server, with automated validation of test results and test coverage targets.
The software is hosted on a virtual private server, performing daily database backups to a remote site, for data loss protection.
Results: The Clinical Data Miner software framework is being used in the context of five studies of the International Endometrial Tumor Analysis, and six interrater agreement studies, with more studies planned. The Test-Driven Development methodology has resulted in very low software bug counts.
A data querying software library, written in Java, allows to query pseudonymized study data, and provides preprocessors to prepare the data for analysis. The latter can, most notably, transform clinical data's hierarchical structure, resulting from the so-called "skip pattern", into vectors, which are better suited for vector-based machine-learning algorithms, such as logistic regression or Least-Squares Support Vector Machines.
Built on top of this data querying library, a number of scripts were written, allowing to query and preprocess data interactively from within a Jython interpreter.
Conclusions: We implemented an Electronic Data Capture software framework with a Web 2.0 interface, currently running eleven clinical studies. Its data querying library allows direct integration into applications, which then could, for example, automatically perform analyses provided by machine-learning toolbox libraries. The Electronic Data Capture user interface is available at http://cdm.esat.kuleuven.be.
Objective: Our aim is to create an EDC software framework offering straightforward integration with data analysis packages, by means of a software library, eliminating the need for exporting data and converting the result, allowing data analysis software to read study data directly as software objects from the EDC system instead.
Methods: The EDC software framework was implemented in Java, using a Test-Driven Development methodology. It has a modular architecture, with modules for common, server, and platform-agnostic client functionality, as well as a platform-specific module, which implements a Web 2.0 data collection user interface. Software quality is monitored by the Jenkins continuous integration build server, with automated validation of test results and test coverage targets.
The software is hosted on a virtual private server, performing daily database backups to a remote site, for data loss protection.
Results: The Clinical Data Miner software framework is being used in the context of five studies of the International Endometrial Tumor Analysis, and six interrater agreement studies, with more studies planned. The Test-Driven Development methodology has resulted in very low software bug counts.
A data querying software library, written in Java, allows to query pseudonymized study data, and provides preprocessors to prepare the data for analysis. The latter can, most notably, transform clinical data's hierarchical structure, resulting from the so-called "skip pattern", into vectors, which are better suited for vector-based machine-learning algorithms, such as logistic regression or Least-Squares Support Vector Machines.
Built on top of this data querying library, a number of scripts were written, allowing to query and preprocess data interactively from within a Jython interpreter.
Conclusions: We implemented an Electronic Data Capture software framework with a Web 2.0 interface, currently running eleven clinical studies. Its data querying library allows direct integration into applications, which then could, for example, automatically perform analyses provided by machine-learning toolbox libraries. The Electronic Data Capture user interface is available at http://cdm.esat.kuleuven.be.
Medicine 2.0® is happy to support and promote other conferences and workshops in this area. Contact us to produce, disseminate and promote your conference or workshop under this label and in this event series. In addition, we are always looking for hosts of future World Congresses. Medicine 2.0® is a registered trademark of JMIR Publications Inc., the leading academic ehealth publisher.

This work is licensed under a Creative Commons Attribution 3.0 License.