Movies analysis

Analysis performed by means of data science methods such as API requests, text processing techniques (regular expressions), network analysis (betweeness centrality, degree distributions...) and language processing mechanisms (TF-IDF, sentiment analysis...)

Home

In this site, we disclose a wide movies’ analysis from 1989 to present. Wikipedia pages are used as the information source in order to download all the movies and information related such as actors, directors, genre, year, country… From these, a network that links movies by shared actors and a network that links actors by shared movies are built. Several interesting information is extracted from these networks as well as from the plain data that will be illustrated throughout this webpage with plots, figures, graphs, charts, etc. Moreover, movies reviews are processed with text analysis mechanisms such as sentiment analysis or TF-IDF. This leads to find the best movies by reviews comments.

 

An explainer notebook with more accurate and technical analysis of the overall project can be found in the following link:

http://nbviewer.jupyter.org/github/NestorBonjorn/SocialGraphsProject2016/blob/master/Explainer%20Notebook.ipynb

Moreover the code behind all this project is available in the following links:

Downloading information through the Wikipedia API, JSON files and regular expressions (regex) and creating a clean Python dictionary with all the useful data:

http://nbviewer.jupyter.org/github/ferrancanellas/moviesanalysis/blob/master/Wikipedia_dataset.ipynb

Creation of the network and analysis of the network and the data itself:

http://nbviewer.jupyter.org/github/NestorBonjorn/SocialGraphsProject2016/blob/master/Network_and_stats.ipynb

Downloading movies reviews, processing the reviews with TF-IDF techniques and assign a reviews grade for each movie that contains reviews:

http://nbviewer.jupyter.org/github/ferrancanellas/moviesanalysis/blob/master/movie%20reviews.ipynb

Finally, the generated dictionary with the movies information can be accessed in the following link. However, there are not the movies reviews because the file would be larger than allowed in GitHub. In the Movies Reviews section, we will tell you how to find all the movies reviews that we used in this project:

https://raw.githubusercontent.com/NestorBonjorn/SocialGraphsProject2016/master/movies_dict.txt


Project preview

This is a video that describes what is done in this project. Enjoy it!