Movies network

This network has all the movies from 1989 to 2016 obtained through Wikipedia as nodes and these nodes (movies) are linked with other movies when they share at least one actor. In our dataset, we have exactly 5083 actors, resulting in 5083 nodes, 142804 edges and an average degree per movie of 56 links. This network looks like this:

network_movies

We can see that there is a high cluster with most of movies that forms the Giant Connected Component (GCC). On the other hand, there are some movies that are not linked with this GCC; some of there are not linked with any more movie and some of them are linked with a few other ones.

The GCC looks like this:

network_gcc_movies

In general there are so many nodes and edges that is difficult to interpret something from this picture. However, there are a some nodes that seems to form small clusters since they are connected among them more than with the rest of the network. We can see what the Python-Louvain clustering algorithm generates in the following picture:

movies_network_louvain_communities

The different colors represent the 19 different communities found by this algorithm. The resulting modularity is of about 0.3 so there is a small clustering coefficient in this network; however, it is not too much significant.

Degree distributions

Now, we analyse the nodes degree of the network. The top 10 movies by degree are represented in the following figure:

top_movies_degree

This classification is led by “The Grand Budapest Hotel” with 273 movies linked with it. This is not surprising since this movie is starring by until 17 famous actors. The same goes for the other movies that share this top 10 classification. All of them are starring by many and very famous actors.

But how the degree distribution of this network looks like? The following picture gives us the answer both in linear and log axes:

movies_degree_distribution

We can see that for low degrees the distribution is quite constant but for the higher degrees it follows a power-law distribution. In this way, there are very few actors with the highest degrees.

Degree assortativity

Another questions that comes to our minds is if the high-degree movies tend to link with other high-degree movies and the low-degree movies with other low-degree movies, i.e., if there is any degree assortativity in the network.

In the following picture, we can see the correlation between the degree of a node and their neighbor’s average degree:

movies_degree_assortativity

It seems clear that there is a positive correlation between the degree of a node and its neighbor’s average degree, i.e. there is a positive degree assortativity in the network.

We can prove this point of view by calculating the assortative coefficient with the help of the NetworkX Python library. An assortative coefficient of 1, means that high degree nodes are attached only to high degree nodes, and a value of -1 means that low degree nodes are attached only to high degree nodes. A coefficient close to 0 means that there is no any correlation. Therefore, the result should be a positive value and, in fact, our result was 0.13. Also the linear regression of the previous plot illustrates this positive degree assortativity.

Betweenness centrality

One of the most interesting characteristics of the nodes of a network is its betweenness centrality. According to the Wikipedia definition, “the betweenness centrality is an indicator of a node’s centrality in a network. It is equal to the number of shortest paths from all vertices to all others that pass through that node. A node with high betweenness centrality has a large influence on the transfer of items through the network, under the assumption that item transfer follows the shortest paths.” In our betweenness centrality computation, it is not equal to the number of shortest paths that pass through that node but proportional to it.

This is the list of our top 10 movies by betweenness centrality.

top_betweenness_centrality

A large degree can help to have a higher betweenness centrality and in this way “The Grand Budapest Hotel” appears on the 3rd place. However, as it can be seen it is not the only factor since the list is led by Kung Fu Panda 2. Interesting…