wiki.sine.space | sinespace

Spider Webs, Bow Ties, Scale-Free Networks, And The Deep Web

From wiki.sine.space
Jump to: navigation, search

Theoretically, that's what makes the web different from of typical index system: You can follow hyperlinks from one page to another. In the "small world" theory of the web, every web page is thought to be separated from any other Web page by an average of about 19 clicks. In 1968, sociologist Stanley Milgram Excessive Traffic invented small-world theory for social networks by noting that every human was separated from any other human by only six degree of separation. On the Web, the small world theory was supported by early research on a small sampling of web sites. But research conducted jointly by scientists at IBM, Compaq, and Alta Vista found something entirely different. These scientists used a web crawler to identify 200 million Web pages and follow 1.5 billion links on these pages.The researcher discovered that the web was not like a spider web at all, but rather like a bow tie. The bow-tie Web had a " strong connected component" (SCC) composed of about 56 million Web pages. On the right side of the bow tie was a set of 44 million OUT pages that you could get from the center, but could not return to the center from. OUT pages tended to be corporate intranet and other web sites pages that are designed to trap you at the site when you land. On the left side of the bow tie was a set of 44 million IN pages from which you could get to the center, but that you could not travel to from the center. These were recently created pages that had not yet been linked to many centre pages. In addition, 43 million pages were classified as " tendrils" pages that did not link to the center and could not be linked to from the center. However, the tendril pages were sometimes linked to IN and/or OUT pages. Occasionally, tendrils linked to one another without passing through the center (these are called "tubes"). Finally, there were 16 million pages totally disconnected from everything.Further evidence for the non-random and structured nature of the Web is provided in research performed by Albert-Lazlo Barabasi at the University of Notre Dame. Barabasi's Team found that far from being a random, exponentially exploding network of 50 billion Web pages, activity on the Web was actually highly concentrated in "very-connected super nodes" that provided the connectivity to less well-connected nodes. Barabasi dubbed this type of network a "scale-free" network and found parallels in the growth of cancers, diseases transmission, and computer viruses. As its turns out, scale-free networks are highly vulnerable to destruction: Destroy their super nodes and transmission of messages breaks down rapidly. On the upside, if you are a marketer trying to "spread the message" about your products, place your products on one of the super nodes and watch the news spread. Or build super nodes and attract a huge audience.Thus the picture of the web that emerges from this research is quite different from earlier reports. The notion that most pairs of web pages are separated by a handful of links, almost always under 20, and that the number of connections would grow exponentially with the size of the web, is not supported.