bias metrics

I asked the authors of the Bias on the Web paper whether they're continuing their research: they are, and they're investigating alternative methods of measuring bias. You can try out their current measurement system, which lets you test a search term against several search engines. I don't understand all the results, but it does let you see, for each site, how many search engines place it in the top twenty hits. Here, for instance, are the results for ludology.

posted: 8/10/02 11:09 |

notes on the bias on the web article

Got my hands on Mowshowitz and Kawaguchi's article on Bias on the Web, that I mentioned earlier this week. It tests search engines against each other, searching for the same terms across several search engines and checking the top hits from each engine against each other. The table they show for which brands of refrigerator turn up when you search for "home refrigerators" on various engines really drives home how partial the information any search engine throws back at you is. None of the search engines tested presented the most common American brands of refrigerator in the top 50 hits. (And writing that I see another problem with the assumption that a search engine should pick up American brands. Most of the listed brands of fridge are quite unknown to me over here in Europe. If you want to buy a new fridge perhaps checking out a review site like, or a catalog like the ODP would be a better bet?) They go on to try searching for a topic, and a controversial topic: euthanasia. Some tricky mathematics produces vectors and cosines and quotients for each search engine, basically proving that none of the search engines provides a full or "objective" sample of information. The solution? Lots of different search engines.

Mowshowitz and Kawaguchi's final point is important, and I don't think it has been fully recognised yet:

Elimination of competition in the search engine business is just as problematic for a democratic society as consolidation in the news media. (60)

posted: 27/9/02 18:33 |

google news

Seen Google News? Look, it's done completely by computers with no human intervention, and that makes it completely free of ideology and bias! Isn't that wonderful!

Google News is highly unusual in that it offers a news service compiled solely by computer algorithms without human intervention. Google employs no editors, managing editors, or executive editors. While the sources of the news vary in perspective and editorial approach, their selection for inclusion is done without regard to political viewpoint or ideology. [about google news]

What luck that we've found a way to be totally objective.

[related links: my essay Links and Power, an essay by Mowshowitz and Kawaguchi on Bias on the Web, archive of my posts on Link Politics]

posted: 25/9/02 10:57 |

bias on the web

Jamie sent me a reference to an article on search engines and bias and commercialisation that looks very interesting - it's in the ACM digital library but I can't get the full text, must check at my library. It's by Abbe Mowshowitz and Akira Kawaguchi. A popular version of the research is given at searchenginewatch. (The link to the "free pdf" doesn't work for me, even after registering)

posted: 23/9/02 10:09 |

google news

Google's been banned in China (via Kairosnews), and is viewed with deep suspicion by a conspiracy theorist (via Fibreculture).

posted: 4/9/02 00:00 |

research on linking

Interesting paper (by Pennock, Lawrence, Giles, Flake, Glover; these guys do lots of link research) on the distribution of links: while lots of sites follow the power law of "rich get richer" (i.e. sites with lots of links to them get more links to them, this is also called preferential attachment and is common in heaps of systems apart from the web), in subcategories of sites that have the same topic the power law doesn't apply: for instance, amonog all university home pages and all newspaper home pages. I'm interested to notice that they use money-related imagery such as rich/poor (see page 3, particularly) - fits in nicely with my idea of links as the currency of the web. In the paper lots of complicated algorithms and simulations are applied to this, but (handily for non-programmers like me) heaps of the paper is in plain, even pleasant-to-read English too and it has some interesting points. I suppose in a way the basic point of the paper is not very surprising:

Intuitively, the two growth components can be viewed as capturing two common behaviors of web page authors: (a) creating links to pages that the author is aware of because they are popular, and (b) creating links to pages that the author is aware of because they are personally interesting or relevant, largely independent of popularity. (page 3)

But it's all framed really well, there are piles of references to related works, lots of good examples, very serious-looking use of mathematical notation, confidence-inspiring reports of simulations to prove it all and it's very relevant to stuff bloggers have been talking about. For instance, should we be worried that tools like Blogdex and Daypop increase the "cohesion" of the blogosphere, making more of us link to the same stuff?

The paper is called "Winners don't take all: Characterizing the competition for links on the web", and has its own website at

posted: 24/6/02 09:43 |

digital libraries

Wow. is an stunning digital library! I found it half an hour ago and have been amazedly uncovering new stuff there ever since. It indexes scientific papers that are online (only PDF and PS files I think, and possibly only freely available ones) and lists and links to all the citations to and from, co-citations (i.e. papers that cite the same papers as that paper), shows a summary etc - and (!) lets you correct or suggest a summary, rate the paper, see how many other people viewed the paper, comment on the paper etc, etc, etc. It uses autonomous citation indexing, which means that people don't have to do it, somehow the machines work it out. I don't think I've seen half of what this system does, yet. Here, have a look at the page for a paper, and rummage around a bit from there. It's run by Steve Lawrence, who's an Australian migrated to Princeton (as you would, unfortunately) and who's written all these fascinating papers about the web and structure and searches and links and self-organisation (he's one of the co-authors of that paper about a self-organising web and communities stuff), and how papers that are online get cited more, and C. Lee Giles and Kurt Bollacker.

Would we need a separate digital library for papers from the humanities and social sciences? No, surely it would be better if they co-exist.

Us non-computer scientists will have to find tools so we can read gzipped and PS files. And why do they not index html files, I wonder?

posted: 23/6/02 15:53 |

teoma and other search engines

Is Teoma, a new search engine getting some press these days, step beyond PageRank? I could go on forever thinking about the politcs of links and the value of a link economy.

To determine the authority or quality of a site's content, Teoma uses Subject-Specific Popularity. Subject-Specific Popularity ranks a site based on the number of same-subject pages that reference it, not just general popularity, to determine a site's level of authority. (About Teoma)

Brett Tabke of Webmasterworld has posted a short piece on Teoma, with a very interesting looking "advanced reading list" appended to it. So far, though the idea is interesting, I'm not impressed with the results. A Google search for "blog" turns up Blogger as first hit, BlogSpot as number three, and the eatonweb portal as number nine, in between fairly random blogs. A Teoma search for blog suggests I might like to research the geneology of the Blog family, has a link to Blogstickers (not that interesting, surely), then to MoveableType, which is a tool for blogging and a relevant hit, then to a pile of random blogs, including a couple which are dead. Their suggested authorities (sites that have link collections to a lot of other sites in their cluster) are all just blogs.

On the other hand, they've just started, they only have 200 million sites indexed (Google has 1.5 billion) and the basic idea is really interesting.

Anders searched a number of search engines for "permalink" a week or two ago (18 April), with interesting results. A more thorough research project than my currency of the web experiment ;). Teoma gives the same result as AskJeeves, a kind of relevant one but not really the ideal one. AskJeeves has bought Teoma so presumably they use the same engine.

posted: 4/5/02 17:58 |

poetry, ads, google

Brandon at Texturl has interesting links to a project to put poetry in the textad space at Google. Google stopped it, for some reason.

posted: 22/4/02 12:01 |

private and popular weblogs

Lisbeth comments a site that ranks the most-visited (registered) Danish weblogs. I've never seen a ranking based on actual hits before - though I suppose if there's one there'll be others. Somehow it feels like - like seeing someone naked when you weren't really supposed to to see each blogs average hits. Hm. Lisbeth's note that weblogs are still categorised as personal is important too.

posted: 22/4/02 11:26 |

social networking

Torill's going to an art and technology soire in NY where

A cocktail party will follow the discussion, where guests wear wireless badges called meme tags that track and analyze social interaction in real time.

Please, please, please, Torill, blog this! Wish I could go too. Btw, there's an interesting discussion of that paper on identifying web communities through linking patterns at webmasterworld, quite critical in places, and relating it to Google's penalising of closed link circles.

posted: 18/4/02 12:34 |

google's PigeonRank technology

More April first... Google actually uses pigeons to get us those speedy, accurate results! couple days late, but ya know, I was busy... (thanks Diane)

posted: 3/4/02 21:11 |

cataloging the web

Thomas suggests a collaborative librarians catalogue of the web, which could be connected to libraries book catalogues. Great idea. I just signed up (or applied, I've no idea whether it's hard to get this "job") to be an editor at dmoz, the open directory project. The idea there is that experts on fields edit a catalogue of the Web. Unfortunately the fields I'm really an expert on don't exist, so I signed up for Norwegian weblogs instead. I love this idea of using everyone's knowledge, but I must say I have great difficulty using web catalogues. I rarely find the correct slot for whatever I'm looking for or trying to classify. That's why librarians would be the perfect people to catalogue the Web - they know about classifying stuff.

posted: 28/3/02 14:25 |


I thought my short paper for Hypertext (deadline today) was brilliant a few hours ago. Now it's almost finished but I think it's dreadful. It's getting worse and worse. More and more boring as I pare it down (shouldn't it be getting sharper instead?) and oh, just terrible. I'd better send it in before I make it even worse...

God, if I feel this way finishing a two page paper, what's it going to be like finishing my whole dissertation?

posted: 25/3/02 21:40 |

self-organising web

Interesting paper on self-organising and self-identifying communities on the web. Interesting (and scary) discussion of linking strategies in relation to google etc. And a two page short paper is SHORT. It's really hard to write that short!

posted: 25/3/02 18:26 |

"link drain"

There's a weird system where google punishes sites that have lots of external links by reducing their PageRank, which means that site doesn't show up as high as it otherwise would in searches. "search engine considerations are perverting web linkage" - weird stuff, and on the face of it it would seem to contradict the idea that google loves weblogs? [update 26/3: this is unconfirmed. What appears to be a fact is that each page has a PageRank (PR) assigned to it by Google. The page also has that many points to "give" other pages by linking to them. So if a page has one link out, the full PR will be passed on to one site - and kept, the page doesn't lose PR by linking. More links mean more pages split the PR score between them. And you could be "spending" that PR score on your own site by only linking internally - a link from my front page to an archive page gives the archive some PR, you see, and so on. So it seems that some search engine-obsessed people avoid external links. And they don't want to put too many links on one page and so on. Weird, huh?

Good sources for google info are the google forum at webmasterworld and an article by Chris Riding, PageRank Explained (pdf) that elegantly explains what does appear to be known about Google's algorithms. Google themselves are silent on the matter, and the most recent paper by them appear to be from 1998, when Google was still in prototype at Stanford: one by Page et. al. and another more extensive paper given at the WWW7 conference by Brin et. al., "Anatomy of a Large Scale Search Engine"]

posted: 25/3/02 15:40 |

fake referrers

You can fake referrers, the kind I have displayed to the right there. Here's how.

posted: 25/3/02 15:12 |

links, referrers, Crisc

Lisbeth has some interesting comments on the referrers (i.e. what site a reader clicked a link on to come here) that I've started displaying in the right column. She also mentions Odigo, a system that lets you chat with other (Odigo-registered) visitors to a site. This brings me to art. It wouldn't be hard to explain that showing the referrers is a politico-aesthetical statement about the networked society we live in and that it forces the reader-viewer to assess her complicity in the surveillance of the network. And declare this blog (I'm not being sarcastic, I mean it. Honestly. I'll write it out properly later. After I've finished my thesis...)

This brings me to Atle Barclay's Crisc, which shows information about visitors to any site it's installed on and allows you to chat with other visitors without being registered at some central place. He implemented it as a art installation at the issue of Localmotives Kevin and I edited on net art. If you go to Crisc you'll see who you are (your IP number) and the IP numbers of others who are at any of the sites Crisc is installed at. You can zoom in on users and see their browsers, systems and all the other information browsers give a site, and also the other servers that their data is bouncing through on its way between the user and the server they're surfing. Data bounces lots.

Wow! There's a download link, connect your server to Crisc. With some PhP stuff. I'm going to write and ask Atle how to install that, if you can install it just like that.

Crisc is different to the referrer links though. Complementary, perhaps :)

posted: 18/3/02 14:15 |

visibility of links

A friend is deeply suspicious of the idea of showing referrers on a site. I think there are two reasons for this, mind you, these may not be my friend's reasons.

1. It's showing off. It's like having a visible counter on your page, trying to prove that you have so many hits. Showing off is not cool.

2. We've gotten very used to links being one way - from one site to another with no way back except the back button. Until recently, I could safely link to something and expect that not to be visible. Site owners might track their own referrers but until tools like Blogdex came along, readers were very unlikely to do so. When I put the referrers on my site, links to my site become much more visible. This may change the perceived meaning of the link. When I automatically display the links the last ten people followed to get to my site, that can be interpreted as taking those links as public endorsements or recommendations of my site. The links may not have been meant that way at all. It also contests the ownership of the link. When a link to me is displayed as content on my site, that link no longer only belongs to the person who made it.

The increased visibility of links is changing the power structures of the Web.

posted: 15/3/02 14:39 |

here's how to show referrer links

Leuschke and Hilde both sent me the source: Sean Nolan gives you what you need to put referrers on your site. I have to do this. Start the clock...NOW to see how long it'll take me!

posted: 15/3/02 10:12 |

bidirectional links

Over at leuschke there's a little link under the archive links which'll tell you where readers came from. That means what page they were on when they clicked a link that led them to leuschke. I want one of them. I like links that go backwards. So, I guess somewhere I can find a script or something that'll put that on my site? (Torill and Hilde found something like this somewhere?)

posted: 14/3/02 13:52 |

google articles

An interesting article on google's search algorithms and how the system's easily skewed by blogs: Google loves blogs. Also an early paper on Google, The Anatomy of a Search Engine. (thanks Noah) I'm working on making all this search, community, power and blogs stuff fit into my thesis. I think I can do it.

posted: 4/3/02 13:04 |

reading list

Suggested reading for a paper on the politics of the link: Diane Greco's short paper for HT96, Hypertext with Consequences: Recovering a Politics of Hypertext (thanks Mark); the talk Barrett Warren gave at e-poetry last year (Beyond the Demon of Analogy, not online) (thanks Katherine); Adrian Miles' Realism and a General Economy of the Link Force. Hm, I'm sure I've lost an email with some other suggestions.

posted: 27/2/02 17:32 |

not linking

Sometimes blogs don't link to whatever spurred the thoughts posted. Dave Winer carefully doesn't link to his flamers when he mentions them. I didn't link my last post to Adrian's post on blogging though it was what set me off and it was clearly related to what I was saying. I decided that would be mean and wrote an email instead: "it's outrageous to write a post like that, stressing the importance of links and context and so on, and not link *at all*." In my opinion, writing a post discussing a topic (like blogs in teaching and research) without linking to and openly admitting an awareness of work (like mine and Torill's on blogs in research) which you know about and which is highly relevant is unethical.

While not linking can be a result of ignorance or carelessness it is often a conscious choice. A link is a gift, or perhaps payment. You can choose not to link because you don't want to give anything to the person or work you're writing about. After all, if you link it, Google will rank it higher, more people will read it, and it will give the other people more power and influence. That's why Dave doesn't link to his flamers. You can choose not to link because you want your work to appear more original than it is. You can not link to be kind, because you want to poke generic fun at bad design, say, and not accuse individuals. Or you can not link because you're angry, like I was. Newspaper editors used to not link because they thought readers might disappear off their site, and because they worried that they might be legally responsible for content on the linked-to site (what if it turned into child pornography?). Nowadays online newspapers are starting to realise that while a reader may leave the site to follow a link, if the links are good, he or she will come back again for more links.

Adrian writes that he often doesn't link because he writes offline and then publishes his blog later. Often days and weeks later when software and servers are troublesome. I deduct then, that it is the online nature of the most popular blogging tools that have made the link not only so prevalent in blogs but also so powerful.

Anger can be really productive if you use it right. Today I'm writing an essay on the politics of links. Noone seems to have done this, everyone's busy with the poetics of links and such instead. I'm having a ball. Google, blogdex, Nelson, Everquest, economy, beauty contests, weblogs: it's a riot. Viewing things you thought you'd made up your mind about (like hypertext) from a new perspective is great fun, it sets everything in a new light. Now, where shall I send the article, I wonder? Where would the best place be, politically speaking?

posted: 25/2/02 14:05 |

bloggiquette (or: the ethics of the link)

Weblogs are fastidious about links and sources, as much or more so as academic articles. If you obviously know about a text and it is relevant to an academic article you're writing, you should cite it, even if you're not directly building on it or arguing against it. If it's relevant, and you don't know about it, you probably didn't do your research well enough. (And then there are degrees of "relevant", I'm remembering a discussion with Mark about whether a games researcher should cite every relevant game where I thought that was a tall order but Mark thought it was basic scholarship. Hm. Now I can't find the discussion, which I thought was in our blogs. Maybe I'm mistaken. Here's an indignant post I wrote which is more or less about it though..)

Though it may be an unwritten rule, bloggers clearly concur that when writing a blog, it's important to give your sources. It's a social given, essential to the generosity and sharing of weblogs, just as in academia and as in the open source community, for instance. Go to any random blog, and look at the posts. Even a generic post linking to an article in Wired everyone's already read and that's at the top of Blogdex's most-linked-to-site will almost always have a small "via such-and-such-a-blog" after the link, or written in the text, and there'll be a link to the place the blogger first saw the reference. That's way more conscientious than more academics are.

Professors worrying that the web causes more plagiarism are so terrified to see the traditional essay tearing at the seams that they won't look to understand the new ethics of the web. They're all thieves, they worry, assuming the worst and muttering about Napster. Throw the essays out the window for a while. Tell students to write blogs instead. Teach them the ethics of the link.

After that, the looser rules of academic citations will be easy.

posted: 23/2/02 17:27 |

'objectivity' in online communities

The main problem with applications that map relations between and relative popularity of websites and users is that they claim to be objective. Google, for instance, argues that their search results are automatically generated and therefore better: "Google's complex, automated methods make human tampering with our results extremely difficult." Blogdex's social network explorer is based on the same idea: use link information to map relations between blogs and to recommend other blogs the reader may like if she likes this site.

Blogdex and Google may create community among individually published sites. They interpret data intended for other purposes (links) to map connections and evaluate popularity and relevance. Websites that market themselves as online communities, on the other hand, request this kind of data specifically. Amazon asks you to rate books and reviews so that it can generate recommendations and build up links between customers. Sites like Slashdot and Plastic and Epinions and many others ask you to rank posts from other users and uses this information to filter posts both for you individually and globally for other users.

Cameron Barrett, of CamWorld, has written a short essay packed with ideas and links to things like these that actually exist in online communities - this is the field of online community management, btw. Reputation management is an interesting example - at sites like Advogato they use a trust metric where users certify each other as Apprentice, Journeyer or Master. Other sites let users give each other Karma points or call them friends etc. Cam writes about a lot of other systems too, the essay's worth a read.

Although these systems are fascinating, and often work well, there's lots of the slightly unpleasant feel of playground popularity contests here. With the infallability of the majority vote combined with Big Brother the computer. Every community, offline or online, has informal reputation management going on, but it's not absolute. And it looks different depending on where you're standing. Many of these automatic systems do enable that. But of course, they're never fully automatic: someone programmed them and chose the variables to be used.

Algorithms are always ideological! Machines are never objective!

posted: 29/12/01 18:08 |

