Workflow-Based Big Data Analytics in The Cloud Environment
Since digital data repositories are more and more massive and distributed, we need smart data analysis techniques and scalable architectures to extract useful information from them in reduced time. Cloud computing infrastructures offer an effective support for addressing both the computational and data storage needs of big data mining applications. In fact, complex data mining tasks involve data- and compute-intensive algorithms that require large and efficient storage facilities together with high-performance processors to get results in acceptable times. In this chapter, we present a Data Mining Cloud Framework designed for developing and executing distributed data analytics applications as workflows of services. In this environment, we use datasets, analysis tools, data mining algorithms and knowledge models that are implemented as single services that can be combined through a visual programming interface in distributed
workflows to be executed on Clouds. The first implementation of the Data Mining Cloud Framework on Azure is presented and the main features of the graphical programming interface are described.
Architecture
Research Paper Link: Download Paper