Tuesday, April 22, 2008

Social Bookmarking for Life Scientists Round 2: Connotea vs. CiteULike

CiteULike Yesterday's post on Connotea stirred up more controversy than I had anticipated, so I'd like to follow it up today with some further thoughts and responses to some of the people who contributed to the discussion, both on this site and in private emails. I was very pleased to hear from Ian Mulvany, the product development manager for Connotea. It's good to hear that Connotea are aware of and working on some of the problems users have raised. Ian also raised a fair point concerning Connotea being an NPG site.
I think that for at least as many people that worry about it an equal number choose to use Connotea as they feel that by being backed by a publisher that has been around for some time there is an implicit guarantee that the service will be around for a while. In the year that I have been involved this is the 3rd time that I have explicitly addressed this question, and it is a fair one. Part of our answer to that is that our code is open source, and we have an open API into our data. One advantage to a product like this coming out of a publishing company is that we are interested in technologies around the full spectrum of scientific communication. Working with people who develop Nature Network, or who look at content matching will feedback into Connotea and make it a stronger product, at least I hope ;).
While the open API gives Connotea an advantage over CiteULike (Citeulike: A Researcher's Social Bookmarking Service. Ariadne Issue 51 April 2007), in my view and in the view of some of those who contacted me privately, this is cancelled out by the fact that it is perceived as a tentacle of the "Nature Network" (network by name, but not by nature).

I've had chance to play a little bit with CiteULike now, and my feeling is that it's very similar to Connotea overall. In my limited testing, CiteUlike feels better than Connotea on spam links, and I much prefer the CiteUlike interface, without the spurious Nature crap and adverts which turn scientists off in a big way. The pdf upload option is also very attractive to many scientists. On balance, CiteUlike wins.

But here's the problem:

Almost all the people who emailed me privately said the same thing: I tried Connotea/CiteUlike a while ago, but I don't use them. I couldn't see the point. And there's a reason for that. If these sites are just filing systems into which people chuck references, they can't compete against existing systems in use such as EndNote, RefWorks, etc. The people who emailed me were not big social network users, and had not understood the resource discovery implications of a refined network such as one can build on (or Twitter).

Or had they?

Are either of these sites fit for purpose? Neither site feels like it has a big enough user base in the Life Sciences to make it of much use for resource discovery. And that's confirmed by this paper:

Content Reuse and Interest Sharing in Tagging Communities
Tagging communities represent a subclass of a broader class of user-generated content-sharing online communities. In such communities users introduce and tag content for later use. Although recent studies advocate and attempt to harness social knowledge in this context by exploiting collaboration among users, little research has been done to quantify the current level of user collaboration in these communities. This paper introduces two metrics to quantify the level of collaboration: content reuse and shared interest. Using these two metrics, this paper shows that the current level of collaboration in CiteULike and Connotea is consistently low, which significantly limits the potential of harnessing the social knowledge in communities. This study also discusses implications of these findings in the context of recommendation and reputation systems.


These numbers are trivial compared to (well over 2 million). But it gets worse. Scientists don't share. The above study finds: 1) consistently low levels of item reuse, (2) high levels of tag reuse, and (3) most activity being generated by existing users with little recruitment to a low base.

So here's the solution. Neither Connotea not CiteULike are fit for purpose, Even if they were to merge (which they won't), they probably couldn't overcome the damage they have done to social bookmarking in the Life Sciences. But there's one site which could. If PubMed were to add social resource discovery features, it would kick both Connotea and CiteUlike into touch, and possibly overcome the reluctance of self-absorbed bench scientists to share resources.

And until that happens, I'm not prepared to recommend either Connotea or CiteULike to undergraduates. I may recommend CiteULike to postgraduate and postdoctoral scientists as an alternative to - but only until PubMed introduces social resource discovery and sharing features, then it's game over.


  1. think this comes back to our debate about trying to encourage scientists and staff to engage with web 2.0. offering something because 'everyone else is using it' is not a strong argument to encourage use, in fact it will usually turn most scientists off. They read nature because they know everyone else reads nature, but there is culture of individualism which goes much deeper. I think that scientists will fear that if they reveal what they are reading about someone else will work out their latest new whizzy idea and nick it. Competition does not lead to playing nicely and sharing.

  2. AJC, jo_badge,

    There seems to be a pretty strong correlation between openness and the amount of commercialization that is possible from a particular scientific discipline. Broadly (and there are of course exceptions), physics and astronomy have high degrees of openness and reuse of data. Biomedical, pharma, nano-research low levels.

    The physics arxiv is a striking example, where most papers appear pre-publication.

    One comment we received when looking at developing Nature Network from the head of a biomedical facility is that they would ban all of their employees from using it in order to prevent disclosure.

    This aside there are benefits to individuals that are possible from engaging in an open way with their peers. Probably the single most valuable is the chance to find collaborators more quickly, especially across boundaries. At the very simplest having universities making it easy to see what their faculty are working on (in the broadest sense), and how to contact them, has greatly improved the easy of finding collaborators in the west. (Asian science suffers from very opaque web sites, insofar as Asian scientists themselves find it hard to find local collaborators).

    Of course individuals won't adopt a technology for the technology's sake, or for the greater good of the community. They need to see an immediate benefit to themselves, and it has to be made clear to them fast, probably in under two minutes when you are talking about a web application. When building an application one should be driven by asking what the problem is that you are solving for people. With online citation software the aim is to try to take away the pain from extracting citation information from web pages. Any other network effects come later. As this blog post has pointed out, perhaps much later!

    My experience has also been that many undergraduates, and even graduates, don't know about the kinds of tools that are available. They are in the golden position of developing work habits that will stay with them for much of the rest of their research careers.

    @AJC, I would encourage you to introduce all of these tools to your class, not as a recommendation for use perhaps, but as a pointer to open their eyes to the kinds of things that are evolving. Why not twitter in the classroom, google docs or basecamp for collaboration, why not delicious or library thing or some other online bookmarking tool for creating reading lists. There is still a long long way to go, and you are in a fantastic position to help us along the way by imparting an enthusiasm to those who will follow.

  3. Thanks again for your input Ian. Unfortunately, without the social dimension, Connotea, CiteULike et al will not compete against existing standalone tools such as EndNote and RefWorks.

    > @AJC, I would encourage you to introduce all of these tools to your class,
    > not as a recommendation for use perhaps, but as a pointer to open their eyes
    > to the kinds of things that are evolving. Why not twitter in the classroom,
    > google docs or basecamp for collaboration, why not delicious or library
    > thing or some other online bookmarking tool for creating reading lists.
    > There is still a long long way to go, and you are in a fantastic position to
    > help us along the way by imparting an enthusiasm to those who will follow.

    Absolutely, we intend to:
    but I'm less concerned about the reaction from undergraduates than I am from PI's.

  4. Just tried to open citeulike... didn't work.
    Strike one.

    The name sounds like "cellulite".
    Strike two.

    I really need to try it before saying any more :-)

    As for, I've only used it to store URLs. I also like to keep my URLs (which are a mix of work and personal) separate from the references in Connotea.
    Can I get an RSS feed of my new tagged material, personal or my network?

    All very interesting....

  5. Worth persisting with though - I prefer it to Connotea.

    > As for, I've only used it to store URLs. I also like to keep my URLs (which are a mix of work and personal) separate from the references in Connotea.

    You can easily have multiple accounts, e.g. one for research, one for teaching...

    > Can I get an RSS feed of my new tagged material, personal or my network?

    Yup, you can get an RSS feed from any page in, including what the rest of your network are reading - that's the really important bit, not storing urls which you can do in your browser or on a memory stick.

  6. Good analysis, thanks for your work here. I'm generally in agreement. I tend to side with the idea that a really good search engine trumps a set of user tags every time as far as organizing one's own set of references. The big downfall of both Connotea and CiteULike (and the myriad other sites that do the same thing) is that there is no ability to do a full text search on the papers you've listed there. Compare that with Papers or Yep or Yojimbo, file managers on one's own computer where one can search the full text of a pdf file and the online sites come out the loser. You can spend hours and hours tagging your papers or spend seconds coming up with an effective search string to find what you're seeking. So the online resources fail in terms of archiving one's own reference list. Although they tout the idea of accessing your reference list from any computer, how often is this beneficial? Really, how often can you get online without your own computer, and without the ability to find the paper you're seeking through PubMed or Google? Is it really all that difficult?

    You're also right on the money as far as the social aspects. Not enough participation is an obvious problem. I'm also unconvinced that these social networks are useful for putting together collaborations. I think a directed approach, reading the literature and finding reputable labs that can do what you need is a much better approach than trying to find new online pals who want to collaborate. But that's me, maybe I'm too old, or maybe the labs I was working in were of such a high quality that finding collaborators was never an impossible chore.

  7. Thanks for your comments David. I agree with most of your remarks, but one of the key features with tagging and filtered social networks is serendipitous resource discovery - the paper you never knew you were looking for - which is difficult to replicate with straight keyword searches, or even Mesh terms, which presuppose you have some idea what it is you hope to find.

  8. @David - I think the other point where I'd disagree with you is:

    "how often can you get online without your own computer?"

    Depending on how many sprogs or spouse I'm fighting for access, I might be working at any of 4 PCs/laptops, not all of which are connected to a printer. That's the beauty of delicious, etc - I can tag the paper without having to go back to Google/PubMed and when the aliens have been vanquished, the homework done or the shopping ordered - I can then quickly access the article I wanted to print

  9. AJ--
    The use of these sites as a discovery tool has merit, but it will be interesting to see how things play out in practice. Look at the top videos on YouTube on any given day. Is this the sort of quality filtering that one would want in searching the science literature? I guess finding a few trusted voices and viewing their tags would work better than a Digg approach where everyone votes.

    There's a nice comment on the use of tags over at my blog, left by Richard Gayle:
    "Social bookmarking sites were first used to make it easy to retrieve personal bookmarks from anywhere. It was useful to the individual. An emergent property, though, of these sites was that these bookmarks could become useful for others. I just do not see that same sort of process for these scientific reference sites. They do not seem to have the same usefulness.

    Same with tagging. It is useful for a group but not usually for the individual. Thus, many people just do not tag items. No perceived benefit."

    Bioethicsbytes--sounds like you'd be better served by a networked printer than a tagging site.

  10. I certainly agree that if PubMed were to go "web 2.0", they could wipe the floor with any opposition.

    CiteULike is actually a rather subtle website - you really need to use it properly over an extended period to appreciate its features. I don't think it's designed to be "social" in the sense of helping you to find relevant literature through social connections. In fact, I don't think that's even an effective way to find literature. I find papers relevant to me via (1) RSS feeds of journals and (2) PubMed searches. If I think that a paper deserves attention I'll share the item from Google Reader, post it to Twitter or aggregate the shared feed in FriendFeed - and someone else might find that useful - but really, all I want to do is capture and store stuff of interest to me.

    I think CiteULike is pretty good at that last function. For me, it's just an online reference manager. As to the question of "access from anywhere" - yes, it is useful. More to the point, you have everything in the one place, rather than having to synchronise collections across multiple machines.

    There are subtle features in CiteULike that you might call "social". One is that when you store a reference, you see which other individuals and groups stored it too. You can then visit their collections. When you use a tag, you can see the collections of other people who used that tag. The site also uses watchlists and RSS in abundance, allowing you to keep track of tags, people, groups or topics that interest you.

    It's not social in the "discussion" sense either - but there are others sites for that (such as Postgenomic or Scintilla). I note that almost no-one uses the irritating ratings system (might read/have read etc.) - this is the one CiteULike feature that I find pointless and think they should scrap.

    So as I say, for me CiteULike is mainly a very capable reference manager with the advantage of storing all my stuff online, in one location - a kind of "Endnote on the web". It gives me a workflow for writing a paper: (1) get references from PubMed or via journal RSS; (2) scrape into CiteULike and add PDF; (3) organise by tag ("my_paper_2008"), (4) grab BibTeX/RIS for that tag to go into citation software (LaTeX in my case). If the online editors such as Google Docs could use the online reference managers for citation + bibliography, my life would be complete :)

    It will be interesting to see how Zotero (the very good Firefox extension for reference scraping/storage) approaches these issues: the developers are working on network storage and social features at the moment.