Who has a voice in the media ? An EPFL project to discover who to be quotes in newspapers

Friday. August 20, 2021 - 2 mins

This is a project carried out for the Applied Data Analysis course at EPFL by Pr. West.

Made in collaboration with Eliott Zemour, Matheus Bernat, and Benjamin Hansson.

Isn’t it a pleasure to be listened to? The ability to make your voice heard is a privilege that few have. Sometimes you can have the feeling that only the loudest are listened to. Using the Quotebank dataset from 2015 to 2020 and information about the speakers exctracted from Wikidata, we were able to dissect :

WHO you need to be to be quoted (age, gender, occupation)
WHAT you need to say (which subject to talk about)
HOW you need to talk about it (which emotion to use)

Once a primary analysis was done on the speakers on themselves, a K-means clustering algorithm was run on the data to cluster the speakers into sub-groups to be further and deeper analyzed. Then, in order to extract the topic and the emotion for each quote, we used a zero-shot classificatino approach based on the DistilBERT base model uncased. This model is fine-tuned on Multi-Genre Natural Language Inference (MNLI) dataset for the zero-shot classification task. In order to present our results, a website was developped using Jekyll. All visualisations were done through plotly. To make the content more interactive and more appealing to users, a webapp was also developped.

Developped using ReactJS and Material-UI, this app asks you who you are, what you want to talk about and with which emotion in order to predict a quotation score which is computed using a Deep Learning model made with TensorFlow and trained on the newly-labelled QuoteBank dataset. An API was developped with the Flask framework in order to host the predictive model and answer the requests made in the webapp.

About this project

Python, Javascript
Packages/Librairies: Pandas, skLearn, plotly, TensorFlow, ReactJS, Flask
Interactive webapp, identity and topic oriented analysis of the QuoteBank dataset

Who has a voice in the media ? An EPFL project to discover who to be quotes in newspapers

About this project

Links

Related Posts