Finding communities on graphs

Albert De La Fuente Vigliotti

Some months before I had to import a quite big list of use cases, relationships and requirements data into Enterprise Architect for a project. Of course I didn’t do that manually, so I made a hack to do that automatically. Recently I had a new task to accomplish, to split into groups that use case list by their relationships, and once again I thought about doing it automatically since I already have the data structures available and then, I decided: let’s play a little bit with the data 8-).

Since I’m not very familiarized with the use case data itself I looked up for a way on how to apply for the same concepts on other areas, and then I came up making some experiments with my Facebook and LinkedIn networks. I learned many things and the results were really interesting.

I first started with my Facebook contacts and this was the initial result

Afterward I applied a community detection algorithm (Louvain) with a specific parameter and I could identify 63 groups / communities.

Can you see them? Well… neither do I… So I applied a atlas layout algorithm so it’s visually friendly, and this is the result

It’s interesting to notice that about 46% of my contacts are within the three biggest groups (20.77%, 14.46% and 12.52%). There are also contacts (that multicolored group in the middle) that I couldn’t retrieve their connections (probably because a privacy setting). Of course the graphics generated are anonymized, but I’m able to identify each group, for instance the bottom middle blue group are my last job work European colleagues. The top right cyan group are some friends from FLOSS communities. Pretty cool, huh?

On the other hand getting the Linkedin results wasn’t that straightforward I had to deal with oauth and the LinkedIn API webservice directly within python. This are the results of my LinkedIn network.

Some notable groups at first were: bottom left cyan group are again my last work European colleagues, and green my last work Latin America colleagues.

Conclusions #

This is an amazingly powerful analytics tool that could be used in many areas, probably the most notable (nowadays) are marketing and social networking. At the same time it’s quite scary to see how our privacy goes away, I don’t think it would be hard to track somebody having the right information. If I could do this by just examining the relationships imagine what could be possible with some extra data.