Data Mining & Machine Learning

Big Data Mining
Academic Year: 

The formidable advances in computing power, data acquisition, data storage and connectivity have created unprecedented amounts of data. Data mining, i.e., the science of extracting knowledge from these masses of data, has therefore been affirmed as an interdisciplinary branch of computer science. Data mining techniques have been applied to many industrial, scientific, and social problems, and are believed to have an ever deeper impact on society. Besides, the large availability of data enabled to build highly accurate predictive models through Machine Learning techniques. The course objective is to provide an introduction to the basic concepts of data mining and machine learning and the process of extracting knowledge, with insights into analytical and predictive models and the most common algorithms.

  • Lesson 1
    • Introduction to Data Mining
    • Data Understanding
  • Lesson 2
    • Data Preparation & Features Engineering
    • Data Similarity Measures
  • Lesson 3
    • Introduction to Clustering
    • Clustering Evaluation
    • K-Means
  • Lesson 4
    • Density-based Clustering: DBSCAN & OPTICS
    • Hierarchical Clustering: Max-Linkage & Min-Linkage
  • Lesson 5
    • Introduction to Machine Learning
    • The Classification Problem
    • Classification Evaluation Measures
  • Lesson 6
    • K Nearest Neighbor Classifier
  • Lesson 7
    • Decision Tree Classifier
  • Lesson 8
    • Support Vector Machines
  • Lesson 9
    • Random Forest Classifier
  • Lesson 10
    • Machine Learning Models for Regression
Technics and tools: 
  • numpy
  • matplotlib
  • pandas
  • scipy
  • sklearn

At the end of the course the student will be able to 

  • Design a KDD process
  • Apply the different data mining & machine learning techniques on the basis of the analytical question to be answered
  • Use data mining & machine learning tools and python libraries 
  • Simulate how the data mining & machine learning algorithms work
  • Select the best algorithm for the right problem setting