The Social Life of Books

Visualizing Communities of Interest via Purchase Patterns on the WWW

by Valdis Krebs

One of the cardinal rules of human networks is "Birds of a feather flock together". Friends of friends become friends, and coworkers of coworkers become colleagues. Dense clusters of connections emerge throughout the social space. The usual pattern found throughout social structures[and many other complex systems] is dense intra-connectivity within clusters with sparse inter-connectivity between clusters.

One day, while searching for a book on, I started thinking about Amazon's value-added service -- Customers who bought this book also bought these books. Amazon lists the top 6 books that where bought by individuals who also bought the book currently being browsed. I wondered...

  • How do these listed books relate?
    • Are they 'books of a feather'?
    • Or, are they different -- complementary?

  • What do these books say about the community buying them?
    • Who are these people?
    • What are their goals and interests?
    • Are these people I should know[obviously our interests overlap]?

Being a student of networks, I knew the inquiry would not stop at the books listed on this web page. What would happen if I joined these individual lists into a network?

The key to understanding the dynamics of networks is reading the emergent patterns of connections that surround an individual, or that are present, within and around, a community of interest. I wanted to see the network in which my book of interest was embedded. Seeing those connections would give me insight into the 'network neighborhood' surrounding this book and hopefully help me make a smarter purchase.

I decided to trace the network out one and two steps from the focus book. This is a common procedure in social network analysis when studying ego networks -- the immediate relationships of a chosen individual. An ego network allows us to see who was in one's network neighborhood, how they are interconnected, and how this structure may influence ego.

To continue my exploration I had to choose a book as my focal point, or ego. I chose Tom Petzinger's The New Pioneers. After all, that book was the reason I had originally visited -- before I got sidetracked. As I collected the data, I started wondering again...

  • What themes would I see...
    • in the books?
    • in their connections?

  • What other topics are Tom's readers interested in?
  • Will Tom's book end up in the center of one large, massively interconnected cluster -- a single community of interest?
  • Or, will it end up linking together otherwise disconnected clusters -- diverse communities of interest?

Below is the network surrounding The New Pioneers. Each node represents a book. A red line links books that were purchased together. The buying pattern of the books has self-organized into emergent clusters that I have named for the content of each cluster. It is obvious that Tom's book does span a diversity of interests!

Next we examine the network measures of each node/book, to see which nodes are well positioned in the web of connections. The most common measure in social networks is network centrality. To assess 'positional advantage' we measure each node's network centrality. We have two parts of the network 1) the Complexity cluster and 2) the other 3 interconnected clusters forming a large network component. The highest scoring nodes in the Complexity cluster are Open Boundaries and Complexity Advantage -- they received identical scores. The scores in the large network component, in declining order, are as follows

  1. [tie] Management Challenges in the 21st Century
  2. [tie] Business @ the Speed of Thought
  3. Dance of Change
  4. Innovator's Dilemma
  5. Information Rules
  6. New Rules for the New Economy

The top two books received the highest scores because they are instrumental in connecting/bridging the three clusters [Internet Economy, Old School, New School]. Without these bridging connections there would be more holes in the network such as those that surround the currently isolated Complexity cluster. Notice that more connections do not necessarily translate to network benefits --Information Rules has the most connections but not the highest network score. In networks it is not the number of connections one has, but where the connections lead to that creates advantage. In networks the golden rule is the same as in Real Estate -- location, location, location. In real estate it is physical location -- geography. In networks it is virtual location -- determined by the pattern of connections surrounding a node.

Another common network measure is structural equivalence. It reveals which nodes play a similar role in a network. Equivalent nodes may be substitutable for one another in the network. As an author, I would not like my book to be substitutable with many other books! As a reader, I would like equivalent choices.

Another value-added service that Amazon provides are the reader-submitted book reviews. A person considering the purchase of a particular book may be aided by the many reviews that accumulate over time. Unfortunately the reviews can be skewed. An author, with a large personal network, can quickly get a dozen or more glowing reviews of his/her latest book posted to Customers who are comparison shopping based on reader reviews alone may be mislead.

There is a similar phenomena with web pages -- many webmasters have become quite adept at formatting the content of their web pages and meta tags so that their web sites hit near the top in many search engines. The creators of a new search engine, Google, recognized this trickery. They created algorithms that scored a web page based on the number of other pages hyperlinked in to it. The links-in are further adjusted by the popularity of the pages linking in to the focus page. This severely limits 'alchemy of content' to score better with search engines. The social network analysis community has had a measure like Google's for many years. It was developed to trace the diffusion of innovation in a professional community.

In the Google search engine, if no one else points to your web page then you get bottom billing, if many popular web pages [those that have many links pointing to them] point to yours then you get top billing in the search results. It is easy for the webmaster to alter content, but not context[the pattern of incoming hyperlinks to a web site]. It is amazing how well this social network approach to searching the web works. Google usually lists the most useful pages right at the top of the returned search results. IBM is developing a similar search engine -- that looks at hubs and authorities in the webspace -- under its CLEVER project.

Could these community of interest maps work in a similar capacity with other consumer items? If I am not familiar with a product, an author, an artist, a vintage, or a brand, I would like to judge an item by the company it keeps -- its network neighborhood.

  • Who points to it?
  • What communities is it a member of?
  • Is it central in the community?
  • Does it bridge communities?
  • Are their equivalent alternatives?

It appears that as a customer of Amazon I could make smarter decisions by viewing the embeddedness of various items they sell in communities of interest -- especially if I did not have much experience with the items I am considering purchasing.

What are some network rules-of-thumb we can distill from this analysis?

  1. If you have read one nonfiction book of a structurally equivalent pair, you may not be in a rush to read the second[the second book probably covers the same information as the first book]. On the other hand, you may wish to read all structurally equivalent fiction titles[can't get enough of those cyber-thrillers].
  2. If you liked books A, B, and C and want to read something similar, find which books are linked to A AND B AND C. You can only see this in the network, you cannot see this in Amazon's individual lists unless you open three browser windows and compare the lists yourself.
  3. If you want to read just one book about topic X, find the book with the highest network centrality in the cluster of topic X books. This follows the Google philosophy and may reveal a book with excellent 'word of mouth'.
  4. If the book you are looking for is not in stock, find which books are structurally equivalent to the book you were searching for. These will provide similar content and are available now.

An irony in Amazon's drive to sell more books to its existing customers through value-added information is that these services could provide an opportunity to the businesses that Amazon competes against. All those local booksellers that have been going out of business from the onslaught of mega-retailers such as Borders, Barnes & Noble, and Amazon can now 'mine the data' on the and web sites to create smarter book orders for their own clientele. Rather than compete on discounting bestsellers -- a game they cannot win -- local booksellers could show their customers other purchase options using the book networks. For instance, they could recommend Petzinger's book to those customers that have interests in business, and the internet, and complexity science. It is one of the few books that link to all three communities of interest. With this type of data analysis local booksellers may again thrive in their niche. In a balanced ecosystem the larger species[i.e. Amazon, Barnes & Noble, Borders, etc.] help form a niche for the smaller species [i.e. the local booksellers] and they all co-evolve.

A book author and/or publicist could use the knowledge of existing book networks to position a book where there is a hole in the network. A publisher could view evolving book networks -- they may change weekly -- to adapt its marketing efforts. Amazon, of course, is still the big winner -- they have the data, and a rich upside of untapped possibilities of how to analyze the data and apply the findings.

Home | Software | Training | Consulting | Case Studies | Blog | Contact
Copyright © 1999, Valdis Krebs