Traffic-aware Task Placement with Guaranteed Job Completion Time for Geo-distributed Big Data
Big data analysis is usually casted into parallel jobs running on geo-distributed data centers. Different from a single data center, geo-distributed environment imposes big challenges for big data analytics due to the limited network bandwidth between data centers located in different regions.Although research efforts have been devoted to geo-distributed big data, the results are still far from being efficient because of their suboptimal performance or high complexity. In this paper, we propose a traffic-aware task placement to minimize job completion time of big data jobs. We formulate the problem as a non-convex optimization problem and design an algorithm to solve it with proved performance gap. Finally, extensive simulations are conducted to evaluate the performance of our proposal. The simulation results show that our algorithm can reduce job completion time by 40%, compared to a conventional approach that aggregates all data for centralized processing. Meanwhile, it has only 10% performance gap with the optimal solution, but its problem-solving time is extremely small.