The @PlagiarismBad project has been moving along nicely. I wrote about it here and here. As I write this we have 1,636 tweet thieves on the list. That means that my collaborators and I have manually listed all those people after seeing firsthand evidence that they blatantly copied a tweet.
And we aren’t talking about tweets that are just similar. We are talking about tweets that are exactly the same. Genuine copy and paste tweets.
There are two ways to find a thief. The easy way is to start with a great tweet and search for it using the “Search” field in Twitter. You can then look at each user who copied it and failed to give any indication that it was not their original idea. No quotes. No “MT” or “RT”. No credit. Those people get added to the list. The only tricky part is that you can only list about 100 people an hour, and then Twitter puts the brakes on.
The hard way is to start with a suspect. Typically this is someone that was reported to us as a thief. The process of confirming these allegations is pretty labor intensive. You need to manually copy and paste each of their tweets into the Twitter Search box and then pore through the results, looking for a match. This looked to me like something that could use some automation. So, being me, I wrote a web app.
You give the app the handle of a suspect, and it finds up to 50 recent tweets by that person that are fairly long, don’t contain any links, don’t have “MT” or “RT” in them, and are not @-replies. It then uses Twitter’s “Search API” (which is not the same as the “Search” box in Twitter) to look for earlier tweets that are similar. It goes through the results if there are any, and reports the oldest, bestest match it can find.
There are a few problems with this app, though. Twitter’s API for getting a user’s tweets is basically like going to their timeline and scrolling down. You get all the @’s and retweets mixed in. So if the person you are looking at does a lot of that, it can take quite a while to get good list of tweets to search.
Also, Twitter has a really low rate limit of 180 searches every 15 minutes, which means you can only look up a few people before you bump into that limit.
But the biggest issue is that the Twitter Search API is horrible. It doesn’t return anything but really recent tweets. You can give exactly the same query to the search box on twitter.com and get hundreds of matches, and the Search API reports no results at all! (Programs cannot use the good search on twitter.com, so we have to use the Search API.) It’s a mess, and the developer support forums are full of people basically saying, “What the fuck?” and the Twitter support people replying, “Yeah, sorry about that.”
It turns out that Twitter has no intention of fixing this. They bought a company last year that provides good historical search results. If Twitter makes good historical data available in the Search API, people wouldn’t have to pay for that service any more.
What this means is that a really horrible tweet thief who copies tweets right away and does it all the time is easily outed by the app. But a person who just every now and then pretends to write something they originally saw on someone’s FavStar page or in an e-Card is not likely to show any results.
If you want to try it, go to tweetdetective.appspot.com and log in with your Twitter account (the program runs the searches as you). Then put in a suspect’s Twitter handle and see what you find. Have fun!