Statistical and Neural Machine Learning for Text Analysis

Credits: 
2
Hours: 
20
Area: 
Big Data Mining
Teachers: 
Academic Year: 
2020-2021
Description: 

This module introduces the main methods of analysis and mining of opinions and personal evaluations for users based on Big Data generated on the web or other sources. Emphasis will be put on text mining method applied to text originated on social media. Lessons will be supported by case studies developed in the SoBigData.eu lab.

Notions: 

Topic- and opinion-oriented text analysis, differences and peculiarities. The machine learning pipeline for automatic text analysis. Building and using lexical resources. Feature engineering for SAOM. Recognition, definition and solution of problems regarding classification, regression, information extraction, quantification. Differences between individual and aggregated analysis. Evaluation of models. State of the art in research and market products.

Technics and tools: 

Statistical relevance analysis

 

Case studies and datasets: 

Risorse lessicali: dataset SentiWordNet e altri prodotti durante i laboratori. Polarità: casi di studio su datasets IMDB e Twitter. Spam detection: casi di studio su datasets da Tripadvisor e Yelp. Regressione: casi di studio su datasets Amazon e Tripadvisor. Quantificazione: casi di studio su datasets Amazon.

Competences: 

Recognition of SAOM problems in practical contexts. Choice of best fit model for their formalization. Definition of the external resources required to solve the problem. Choice of proper software tools and implementation of ad hoc solutions. Choice and use of machine learning algorithms for the creation of SAOM models. Evaluation and analysis of results.

Partners