Skip to content
December 28, 2010 / tommoradpour

Open Letter To Twitter Metrics Companies: Help Filter Spam And Bots!

gm-robot-marvin

I recently started to follow Mark Schaefer‘s advice to block spam and bot accounts who follow me.

Mark made a convincing case for this in a post on his blog {GROW}, based on both the ethics and pride of having a genuine follower list exempt of fake accounts, and on the positive impact this will have on your social scoring on tools such as Klout, Peer Index or Twitalyzer (who consider % of your followers who “act” on your post as a key measure of your influence… experts feel free to jump in if this is not correct).

I find it relatively easy to manage on a daily basis (I typically get 10-25 new followers in any single day), but going back through my 1,500 followers was a real pain. Yesterday and today, I took the time to use My Tweeple and Tweepi to analyze account metrics, figure out who was real, who was not, and block a hundred or so accounts. At the end of the day, I’m sure I blocked a few genuine users, and let many bots and spammers slip through.

There must be a better way. So this is an open letter to the smart “social analytics” guys out there – please help us manage this better.

Here’s the type of accounts I’d consider block-worthy, or at least suspicious enough to warrant a check :

  • Someone who tweets at predictable time intervals. No matter what the interval, if tweets come like clockwork, then the account must be a bot. Some people do automate part of their tweets for convenience (e.g. to post links across multiple time-zones), and that’s OK… As long as a significant portion of their tweets are real and organic, then there should not be a completely predictable pattern to their tweeting, down to minutes or seconds.
  • Someone who only Retweets. I know at least of one person who does only retweets but is still genuine (a Marketing prof by the handle of @niglesiasg)… How do I know he’s real? Because I follow him and know that, for the most part, he retweets interesting stuff. But he’s the exception. The serial Retweeter is more likely to be a bot triggered by keyword searches, such as the word iPad, or Poutine (try it, its funny). There should be a way to figure out a “rule” that triggers an RT, when it’s set on automatic.
  • Someone who only posts with links, or never @ mentions others, or never answers. These could be promo-bots, or broadcasters of interesting content, such as CNN or Mashable… but for all intents and purposes, if there is no chance to ever engage with them, they’re as good as bots to me. What I’d like to do with them is create a list with the most interesting ones, but not clutter my timeline or dillute my engagement metrics with them. Personal choice… but that’s the point. I’d like to have a choice.
  • Someone with an abnormally high number of tweets. No need to go to 360,000 per year (this is the highest in 2010). More reasonably, any account tweeting more than 100 times per day (or 35,000 tweets per year) should go through a spam screen. Funny enough, my own account and most of my friends would actually be flagged. But as 2% of twiter users only drive 60% of all tweets, the list to review would not be that long at these levels.
  • Someone with significanty more followers than tweets. Don’t get me wrong on this one, this will be “organically” the case for anyone famous outside Twitter (like Lady Gaga), or who achieved outstanding popularity on Twitter (like Gary V). What I’m talking about are cases where an account has several thousand followers and less than a dozen tweets. There is no way you can achieve this without “cheating” – the method is quite easy: these type of tweeps start following a few hundred accounts, wait a few days to see who follows back, then flush anyone who did not follow, and start over with a few hundred more. In a matter of weeks, you can build yourself quite a large following… particularly if you don’t mind having only bots set on “auto-follow” following you.
  • I’m sure commenters can add to this list.

My guess – algorithms are already in the works at Klout, Peer Index and Twitalyzer. Maybe there is an efficient way I’m simply not aware of (please share!!!). Peer Index actually has a scoring called “realness”, that estimates how likely it is that a user is a person and not a bot (last I checked, I’m 100% real, YAY!). I’d love this to become a tool I can filter my both followers and following through, in Hootsuite, or in a dedicated tool such as Tweepi… helping me zoom in on the most suspicious accounts.

Please, please… give us a tool!

Thanks!

Tom

About these ads

9 Comments

Leave a Comment
  1. Howie at Sky Pulse Media / Dec 28 2010 12:40

    This is really well said Tom. This will offend Twitter and their VC’s. Just like the same tact on Facebook would do the same. Ever notice when they brag about the network growth it basically focuses on just one number. How many active accounts they have. Because they feel this is their value.

    As a marketer I tell clients that is a bunch of baloney. If it is not a real person, and if they are not on the network when you seek to reach them, they might as well not exist. It is why I slam Facebook for their ‘Active User’ definition. It is why I look at Tweet Volume per day to identify really how many people are on the network.

    So the companies that are evolving to help us with analytics should look at this as a way to provide a valued service marketers and brands would pay for. If the Networks themselves ever offered the real information we wouldn’t need these outside companies. But since they feel it damages their IPO or Hyped Inflated Perceived value they refuse to do this.

  2. Zach Cole / Dec 28 2010 13:34

    Great points again, Tom! I like your ideas about what to do with Twitter accounts such as Mashable and CNN.

    Just a quick thought – what about people who are certifiably real accounts, who happen to also just spam people. For example, I also work as a music blogger, and I get pitches from artists via Twitter where it’s just a link followed by dozens of @handles. Or I’ll get ones where it’s clearly just copied and pasted over and over to many different Twitter users. Even though these people are real, they are still annoying and should be marked as spam.

    I think a lot of the onus here falls upon the real Twitter users to call out the spammers by reporting them.

  3. Heidi Cohen / Dec 28 2010 14:42

    Tom — Well stated argument. Part of the challenge is that spammers and bots have a head start on metrics and analytics companies. Metrics and analytics firms tend to react to customers’ needs rather than lead what should be analyzed. The good news is that in 2011 most companies will look for ways to measure their social media marketing effectiveness as well as how to effectively monitor the social media landscape for those company or brand mentions that require attention. Happy marketing, Heidi Cohen

    • Jeff Katz / Dec 30 2010 11:29

      I have worked at a few metrics/analytics and BI companies, mostly in the capacity of a product manager, so I have a different perspective especially as it relates to my current role at Twitalyzer.

      While much of our product direction is driven by customer feedback, we have from the earliest stages of our company, tried to be leaders in terms of what should be measured and how. This includes coming up various calculated measures that simply did not exist because of the relative newness and lack of industry standards of social media metrics. Of course, between Eric and I, we kind of know and understand measurement – http://www.twitalyzer.com/help.asp#company

      Are there challenges in this space, be it spams & bots or what & how to measure? Yup! Are companies simply knee-jerk reacting to customer needs? Hardly.

      — Jeff

  4. Azeem / Dec 28 2010 16:45

    Hi Tom

    We agree–it is important to filter spam accounts, and we have done so (with varying degrees of success) from the beginning of PeerIndex’s life.

    Our realness metric is an attempt to show our estimates of a users estimated realness. It’s in early days, but I can tell you we look at signals like the quality of other accounts a user is linked to, and increasingly the quality of a persons network, as well as their posting patterns.

    However, unlike email spam where the definition is probably less arbitrary, it can be much harder to judge what is and isn’t a ‘spam account’.

    Seth Godin is a great example: in our earliest version of spam checking (back in March 2010), we looked at several of the signals above (like posting frequency and URL percentage), Seth’s twitterfeed was blacklisted as a spambot. Why? It only sends out URLs at alarmingly regular intervals and has no conversational content. Is this or is this not a spam account?

    Seth’d case is in the grey. The most egregious are actually the accounts which are set-up to retweet RSS feeds of sites like NY Times, Guardian, and others, as well as auto-retweet top twitter users (and we’ve now seen the injection of @s resulting in reciprocations by real users!). Courtesy of several free tools out there it’s easy to set these up. And all they are is noise.

    We are trying to figure out the best way of approaching this. Twitter actually does a good job of suspending and banning these accounts, but the rate of creation is pretty fast.

    One is that there is to suggest that outside a few clear cases of spamming, is it worth presenting a classification of user types? For example: Seth’s blog/twitter account isn’t really a spammer; but is it worth pointing out you won’t ever engage with it?

    To a certain extent, we also depend on the strength of the network signals, i.e. relying on people’s reciprocal follow behaviour. As twitter users mature, they seem to ‘auto-follow’ less frequently, so older accounts seem to have a circle of trust around them which we can tap into. But we’re all still guilty of the ‘auto-follower back’ (mostly because it is easy to do relative to checking out who someone is), so many of us have actually gone out of our way to enable the spammers!

    Great discussion–love to hear what you guys need and we’ll see what we can do about delivering on it.

  5. Claudia Jackson / Dec 28 2010 17:35

    Thanks for another great post, Tom. It’s helpful to see what criteria others are using to identify bots. To date, my criteria for uncovering a bot is not as detailed as yours. I’m still learning! But in addition to your criteria, I block accounts that only post #FF lists, as they are clearly hoping to be followed just for mentioning someone’s name, but never have anything else to post….yet. Figure it will be a major spam rush when they reach whatever magic number of followers they are shooting for.

    While I like the idea of an algorithm which I could apply routinely to my follower list to clean out bots, I see some challenges in this. I think what makes it difficult to develop such an algorithm is that each of us use Twitter in a different way. I use it not only to engage, but to learn. So I will follow some accounts I don’t expect to converse with because they bring me a different kind of value…knowledge. Some of those accounts are on lists which I run twice a week, but others are in my home Twitter stream to keep me challenged and to help keep my interest in that stream. So an algorithm which meets your needs would be different from one which meets mine. Hopefully smarter people than me will rise to the challenge and create something useful to us both!

    Again, thanks for sharing your insights and experience. I’ll be broadening my criteria now!

  6. Nick Kellet / Dec 29 2010 16:12

    Makes total sense.

    Beyond a Klout rating, I’d like to see people described along the dimensions you mention.

    That would add value, helping you understand people (and their approach) in less time.

  7. Sean McGinnis / Jan 3 2011 07:47

    Great post Tom. Two points I would add:

    1. Any effort designed to weed out the spammers is likely to be overcome by that community. Take for instance your desire to weed out those that tweet out at regular intervals. If those rules get applied, our friends the bots will simply adapt, applying their own algorithm. If an algo can be gamed, it will be. And the more valuable the payoff for gaming, the swifter the game will adapt. My many years of SEO background confirm this.

    2. I was most interested to read Mark’s original post on Grow because I’ve struggled to understand a reason why I should care about who follows me. Like many others, my usage pattern of Twitter has adapted and changed over time. Mark’s post affected my daily routine pretty severely. I now block spam accounts as they follow me. It takes a little bit of work, but I think it will be worth it. My next big project is setting aside the time to go through my existing list of followers in an attempt to weed out the bots as well. Will undoubtedly be not very fun at all, but, again, I think it will be worth it.

    Thanks again for this post. Great insights.

    Sean McGinnis

Trackbacks

  1. How to Clean Up Spammers Using Tweepi’s Presets (2)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: