Towards a Conceptual Framework for the Specification of Reproducible and Replicable Data Analysis Projects

Conceptual Framework

Resumen

It is becoming increasingly common to exploit the data collected by Information Systems in order to carry out an analysis of them and obtain conclusions that give rise to a series of decisions in the different research fields. The fact that in most cases these conclusions cannot be properly backed up has given rise to a reproducibility crisis in Data Science, the discipline that makes it possible to convert such data into knowledge, and it research fields that apply it. In this paper we envision a conceptual framework to foster reproducible and replicable Data Science projects. The framework proposes the definition of systematic pipelines that may be (semi)automatically executed in terms of concrete implementation platforms. Proprietary or third party tools are also considered so that flexibility may be ensured without hindering reproducibility and replicability.

Publicación
Information Systems Development: Crossing Boundaries between Development and Operations (DevOps) in Information Systems (ISD2021 Proceedings), Valencia, Spain, September 8-10, 2021
Roberto Rodriguez-Echeverria
Roberto Rodriguez-Echeverria
Profesor titular

Profesor titular en la Universidad de Extremadura. Software passionate, Deep learner, MTB rider and father of 2.

José M. Conejero
José M. Conejero
Profesor titular

Profesor titular de la Universidad de Extremadura. Mis intereses de investigación incluyen el desarrollo dirigido por modelos, la ciencia de los datos y el aprendizaje automático.

Fran Melchor
Fran Melchor
Investigador
Juan D. Gutiérrez
Juan D. Gutiérrez
Profesor Ayudante Doctor

Profesor Ayudante Doctor en la Universidad de Extremadura. Me gusta la informática pero, sobre todo, aprender cosas nuevas.

Alvaro E. Prieto
Alvaro E. Prieto
Profesor Titular

Profesor Titular y Director de Administración Electrónica en la Universidad de Extremadura.