
Who has a voice in the media ? An EPFL project to discover who to be quotes in newspapers
- 2 minsThis is a project carried out for the Applied Data Analysis course at EPFL by Pr. West.
Made in collaboration with Eliott Zemour, Matheus Bernat, and Benjamin Hansson.
Isn’t it a pleasure to be listened to? The ability to make your voice heard is a privilege that few have. Sometimes you can have the feeling that only the loudest are listened to. Using the Quotebank dataset from 2015 to 2020 and information about the speakers exctracted from Wikidata, we were able to dissect :
- WHO you need to be to be quoted (age, gender, occupation)
- WHAT you need to say (which subject to talk about)
- HOW you need to talk about it (which emotion to use)
Once a primary analysis was done on the speakers on themselves, a K-means clustering algorithm was run on the data to cluster the speakers into sub-groups to be further and deeper analyzed. Then, in order to extract the topic and the emotion for each quote, we used a zero-shot classificatino approach based on the DistilBERT base model uncased. This model is fine-tuned on Multi-Genre Natural Language Inference (MNLI) dataset for the zero-shot classification task. In order to present our results, a website was developped using Jekyll. All visualisations were done through plotly. To make the content more interactive and more appealing to users, a webapp was also developped.
Developped using ReactJS and Material-UI, this app asks you who you are, what you want to talk about and with which emotion in order to predict a quotation score which is computed using a Deep Learning model made with TensorFlow and trained on the newly-labelled QuoteBank dataset. An API was developped with the Flask framework in order to host the predictive model and answer the requests made in the webapp.
About this project
- Python, Javascript
- Packages/Librairies: Pandas, skLearn, plotly, TensorFlow, ReactJS, Flask
- Interactive webapp, identity and topic oriented analysis of the QuoteBank dataset
Links
- Link to the website.
- Link to the webapp
- Link to the repositories
- You can find the QuoteBank dataset here