Big Data Sources, Crowdsourcing, Crowdsensing

Credits:

2

Hours:

24

Area:

Big Data Sensing & Procurement

Teachers:

Fagni Tiziano

Academic Year:

2023-2024

Description:

The module presents the characteristics and peculiarities of "big data", highlighting through specific use cases the growing importance of the ability to extract significant information and valuable insights from this enormous amount of heterogeneous data (for example data from sensors, purchase data and consumption, data from social media and social networks, open data, etc.). The participatory methods of data collection through crowdsourcing and crowdsensing systems are also discussed, showing popular examples of application of these concepts. The practical part will instead focus on data ingestion by presenting data crawling and scraping methodologies with concrete examples on Social Media and the Web, as well as on the use of pre-compiled publicly available datasets.

Prerequisites: Python

Notions:

Lesson 1
- Introduction to big data and the various data sources that characterize them
- Open data and linked open data, crowdsourcing and crowdsensing
- Big data analytics: interesting use cases
Lesson 2
- Social media crawling: REST architecture and OAUTH authentication framework, Twitter and Reddit overview
- Introduction to using the PRAW library for data access to Reddit + exercises with PRAW
Lesson 3
- Exercises with PRAW
- Introduction to HTML/CSS technologies
Lesson 4
- HTML/CSS exercises
- Introduction to Web scraping in Python: Selenium and Beautiful Soup
Lesson 5
- Exercises on Selenium
Lesson 6
- Exercises on BeautifulSoup
- CSV/JSON data parsing
Exam

Technics and tools:

Selenium
Beautiful Soup
PRAW

Competences:

Theoretical knowledge:
- Characterization of "big data" and the potential obtainable in terms of knowledge resulting from their analysis
- Data characterization: open sources, closed sources, open data and linked open date. Data collection or development of specific services that exploit groups of users (crowdsensing, crowdsourcing).
- HTML/CSS technologies underlying the functioning of the Web
- REST architectures
- Social media with focus on Twitter and Reddit: analysis of the main characteristics of social networks and high-level overview of the available APIs.
Practical knowledge:
Use of HTML tags and CSS selectors for creating web pages.
Website scraping with concrete examples using the Selenium and Beautiful libraries Soups
Social media crawling with concrete examples using the Reddit API through the PRAW library.
Parsing of data in CSV/JSON format

Partners

Big Data Sources, Crowdsourcing, Crowdsensing

Partners

CONTATTI

ORGANIZZAZIONE

INFORMAZIONI

Big Data Sources, Crowdsourcing, Crowdsensing

Partners

Search form