High Performance & Scalable Analytics, NO-SQL Big Data Platforms

Credits: 
2
Hours: 
22
Area: 
Big Data Technology
Teachers: 
Academic Year: 
2020-2021
Description: 

This course aims at teaching the basic theoretical concepts behind the MapReduce distributed computing paradigm, and Hadoop in particular, and at building expertise in the practical usage of high-performance computing tools for data engineering, analysis and mining. In particular, the students will learn how classical data mining algorithms can be applied to Big Data using Hadoop (Spark). Real (and open source) datasets will be used to present examples and to let the students build their own projects.

Notions: 

The course illustrates the techniques,methodologies and programming tool for conducting data analysis and knowledge extraction from Big Data also exploiting large computational infrastructures.

Technics and tools: 

Python, Hadoop, Pig, Hive, MongoDB, Spark

Case studies and datasets: 

Some new datasets (Twitter, Movie Rating, Mobility) will be provided. Datasets used in other courses will also be analyzed.

Competences: 

The student will gain expertise in handling high performance computing tool for parallel and distributed platforms, and he will experiment several applications and use cases on real-world datasets

Partners