The aim of this course is to introduce the student with the high performance Big Data management tools. The student will gain expertise in the use od NO-SQL platforms for the analysis and mining of large data volumes, thus performing tasks that would not be feasible with traditional data bases.
The course illustrates the techniques,methodologies and programming tool for conducting data analysis and knowledge extraction from Big Data also exploiting large computational infrastructures.
Python, Hadoop, Pig, Hive, MongoDB, Spark
Some new datasets (Twitter, Movie Rating, Mobility) will be provided. Datasets used in other courses will also be analyzed.
The student will gain expertise in handling high performance computing tool for parallel and distributed platforms, and he will experiment several applications and use cases on real-world datasets