Data Mining Email to Discover Social Networks and Emergent Communities
The social graph above shows the email flows amongst a large project team. It is an x-ray of how the project actually works! Each person on the team is represented by a node. Each node is colored according to the person's department -- red, blue, or green. Yellow nodes are consultants and other specialists hired to work on this project. Grey nodes are not formal team members but are external experts consulted during the project.
The client's I/T department gathered the email data and provided a snapshot every month of the project. Only information in an email's To: and From: fields was gathered. The Subject: line and the actual content of the email were ignored. Only emails addressed to individuals were used. Emails addressed to large distribution lists were disregarded. A grey link is drawn between two nodes if two persons sent email to each other at a weekly or higher frequency.
In addition to the network visualizations, network metrics were generated to see how well the various departments and groups were interacting. We used the E/I Ratio to measure the external/internal flows between/amongst formal groups. We also applied cluster analysis to see the emergent informal groups that self-organized as the project progressed. We took several "SNApshots" over time to view the emerging changes.
The project x-rays began after a key milestone was missed in the 4th month of the project. They continued for the next 11 months. The project leadership reviewed the network maps and metrics each month to monitor the health of the project. No further milestones or deadlines were missed.
The above diagram shows the project network soon after the missed deadline. Notice the clustering around formal departments -- blues interacting with blues, greens interacting with greens. Several of the hubs in this network were under-performing and often came across as bottlenecks. Project managers saw the need for more direct integration between the departments. One of the solutions was very simple, yet effective -- co-location of more project team members. A surprising solution in the age of the Internet! This intervention, along with others, improved the information flow, and reduced the communication load on the hubs, whose performance improved later in the project.