Artificial Intelligence Methods For Text Analysis And Web Mining

Credits: 
3
Hours: 
36
Area: 
Big Data Mining
Academic Year: 
2023-2024
Description: 

This module presents artificial intelligence techniques aimed at defining analytics on text and data from the Web. The course is organized around three main strands: i) text analytics, where text mining methods applied to texts and social media are studied; ii) sorting techniques through the application of "learning to rank" techniques which have the purpose of estimating the relevance of objects with respect to user requirements, iii) web mining techniques aimed at exploiting user usage data to improve quality of services. Using the query logs of a real search engine as a case study, students will be guided in the development of a set of methodologies for data analysis that aims to create the knowledge base necessary to build a recommender system.

Prerequisites: Machine Learning and Python

Notions: 
  • Text Analytics
    • Properties of the Language and its Representation
    • Analytics on Text: Tasks, Methods, Applications
    • Language Models: from pure statistical approaches to learned complex solutions
    • Sparse vs. Dense Representations with Neural Approaches
    • Sentiment Analysis & Classification
    • Sentiment Analysis in Python
  • Ranking 
    • Machine Learning for Ranking: from standard techniques to BERT
    • Applications of Neural Networks to Text Ranking: Haystack & HuggingFace
    • Ranking with BERT
  • Web Mining
    • Analytics on Web Usage Data: query log mining for recommendation
    • Methods for Query Suggestion
    • Query Suggestion in Python ed ElasticSearch

 

Technics and tools: 

Python libraries: 

  • NLTK,
  • SpaCy
  • Scikit-learn
  • GenSim
  • VADER
  • Keras
  • Pytorch
  • Haystack
  • Huggingface Transformers
  • ElasticSearch
  • LightGBM
  • BASH
Competences: 

Ability to correctly identify and implement text and web analytics. Ability to use state-of-the-art solutions for text classification, sentiment analysis, sentiment classification, ranking. Ability to use learning to rank techniques and Transformer networks for text (BERT). Ability to define a Web mining problem and design a solution.

Partners