Plagiarism, Twitter, and the DMCA

It’s been a pretty crazy weekend for the Plagiarism is Bad account. One of the collaborators (not me) noticed a strange thing when he did one of our usual searches to find people stealing a tweet. Some of the tweets were replaced with a notice that said they were being “withheld in response to a report from the copyright holder.” He tweeted about that (with a screen shot) and his tweet went viral, first via retweets and quotes, and subsequently on all the new media news sites.

So what’s going on? I’ve covered a lot of this before so I won’t get into the details here, but basically, Twitter couldn’t care less about tweet theft. They give blue “verified” checkmarks to accounts like “Men’s Humor” that contain nothing but stolen content and links to clickbait sites. They ignore the fact that one guy has created hundreds of accounts and populated them with stolen content and more clickbait links. Twitter really doesn’t care about this at all. They have a form you can use to report accounts that chronically steal, and as far as anyone knows, they’ve never acted on any report filed using that form.

However, there’s this funny law in the USA (and similar laws in other countries) that says unless you provide a way for copyright holders to request that copyright-infringing content is taken down, you the service provider become the copyright thief. The penalties for infringing copyright are draconian, and so it’s really important that if you run a site like Twitter that you follow these so-called “safe harbor” rules. In the USA, this law is called the DMCA.

The rules go like this: a copyright holder makes a complaint and the service provider gives the infringer a chance to counter. If the infringer says it isn’t stealing, then the service provider is off the hook, and the two parties can go fight it out in court. If the infringer ignores the report, then the service provider has to take the content down. The service provider does not need to weigh in on whether they think it actually is infringing content. It’s certainly safest for them if they just assume all allegations are legitimate.

So what’s happened is that at least one writer has gone ahead and asserted that they own the copyright on a joke. And Twitter simply treated this like any other DMCA “take down request” and took the content down. There’s some question whether they actually notified the person who did the theft. Their policy is that they will, but at least one admitted thief said they got no notice.

So does that mean that Twitter thinks jokes are subject to copyright? No, it really doesn’t. Twitter’s DMCA request form has some language that implies they think poems and song lyrics are. But whether Twitter itself thinks anything is subject to copyright is basically irrelevant. They are going to take down anything that gets reported because that’s the safest way to not end up in legal trouble. Twitter is in no way alone in this regard. Pretty much every service provider has that same policy.

So although it seems that Twitter has suddenly started doing something new, maybe they really haven’t. Maybe what’s new is that authors are using the DMCA form to report theft instead of the abuse form. And whereas I’m pretty sure the abuse form goes to a “write only database,” they actually have to read the DMCA reports or they can lose their “safe harbor” status and be held liable by litigious bulldogs like the record companies.

Whether jokes are subject to copyright is not settled law. I covered this in detail before, but the short version is maybe they are, maybe they aren’t. Making a DMCA complaint about a joke is perfectly reasonable, given how grey this is. As long as you really are the author, there’s nothing “in bad faith” about reporting the theft, and you aren’t going to get in any trouble for doing it, even if some day the courts decide once and for all that joke tweets aren’t protected.

But the DMCA is absolutely the wrong tool for this job. What Twitter should do is read the abuse reports, see that the accounts being reported are crap, and shut them off. Why they don’t is anyone’s guess. But I’m quite certain that would be a lot cheaper to do than to handle all this in the DMCA reporting system. Perhaps all the publicity being generated this weekend is going to cause a big surge in the DMCA requests and that might cause Twitter to take a step back and realize they can do both themselves and the community a favor by simply shutting down any account that steals tweets.

Thoughts on the Plagiarism is Bad Project

A while back, myself and two collaborators started an account called @PlagiarismBad. (Background posts:  1 ,  2 ,  3 .) Things have evolved since we started it, and it seems like a good time to take a step back to see where we are.

What’s the Point?

This is a question I’m hearing both from my collaborators and from people we list. (Not so much from our fans.) I think there are three reasons for this project to continue: education, shaming, and inoculation. I’ll expand on each of these.

Education — A lot of people don’t understand that plagiarism is wrong, and they don’t understand in particular that copy/paste tweeting is plagiarism. So by shining a light on this practice, we are helping to end that ignorance. And it is clearly working. There have been many, many cases of people saying, “I’m sorry. I had no idea I shouldn’t do that. I’ll delete those posts!” And we take those people off our lists.

We are also helping to educate the general twitter population about how incredibly widespread the plagiarism problem is on twitter. There are more than 4,700 people on the main tweet thief list now. Twitter lists only allow a maximum of 5,000 names, so we will have to add a second list soon.

Shaming — Initially, the main point of the project was to shame tweet thieves into stopping this particular behavior. In a social network where “Reporting” people apparently has no impact whatsoever, this is probably the only way to combat the problem. By putting people on a list, we embarrass them. And we put everyone else on notice that certain people are thieves, so those other people may shun them.

There is plenty of evidence that this works. People react strongly to being listed. They argue they are not thieves. When we show them evidence, they either slink away, or switch to the “so what? who made you boss?” argument. The only people who seem to think what we are doing is a waste of time are the tweet thieves. The people who create original tweets seem to really appreciate what we are doing.

Inoculation — I’ve created an app that lets people easily block everyone on the tweet thief list. It also can periodically update those blocks so that as new thieves are added, they will be blocked as well. This effectively inoculates people from having to look at stolen tweets. That was one of my main personal objectives of the project in the first place. I found it embarrassing that I was following thieves and favoriting and even retweeting their stolen tweets. With the automatic blocking, I really don’t have to worry about that any more.

People may think that it also protects their tweets from being stolen. Unfortunately, this is not really the case. There are any number of ways that a thief you have blocked might see your tweet. They might see it being sent by another thief. Or they might pick it up from your FavStar page. Or perhaps your tweet was so good that it made it into a stolen-tweet-compilation site. (Yes, these exist. There are several. And most thieves believe that copying from these sites is not plagiarism, in the same way that ripping off a drug dealer isn’t a crime, I guess.)

Twitter’s Terms of Service

It is quite clear that Twitter (the company) could not care less about tweet theft. You can report people, and Twitter will do nothing. There are massive, verified (“blue check”) accounts that do nothing but post stolen tweets. There are so many that we actually made a list of those accounts (and a handful of other accounts that people are invariably shocked to learn are chronic thieves). Nonetheless, there are two places in Twitter’s terms of service where tweet theft is forbidden: the copyright rules, and the spam rules.

Twitter’s terms forbid you from posting content to which someone else owns the copyright. Whether tweets of words fall under this rule is a matter of debate, which I will cover later. However, most of the pictures you see on twitter (except selfies and foodies) are protected by copyright, and those are pretty much never owned by the person posting them. If you removed all the copyright violating pictures from twitter, you’d basically have no pictures on twitter (except pictures of faces, flesh, and flan). So it is pretty obvious that twitter does not take this service term seriously. But it’s in there.

The spam rules are written in a fluid way, listing many things that might make twitter consider you a spam account. One of the things explicitly listed is tweeting other people’s tweets and pretending that you wrote them. This is the key reason my collaborators and I believe that tweet theft violates Twitter’s terms of service. Because the terms of service says it does. It’s a pretty compelling argument. However, as with copyright violation, there is no evidence of Twitter caring the least bit about enforcing this either.

Twitter only seems to deactivate accounts that follow/unfollow too fast. Other than that, it certainly appears that they completely ignore their own terms of service. I wonder whether Twitter’s legal counsel knows this. It seems unwise, but I’m not a lawyer.

Copyright vs. Plagiarism

A lot of people get wrapped around the axle of whether plagiarism is illegal under copyright law. But really, it doesn’t matter. Plagiarism is wrong. It is unethical. It is immoral. Taking someone else’s words and passing them off as your own is stealing. You shouldn’t do it. It doesn’t matter whether it is illegal. It is wrong regardless of whether it is legal or not.

Okay, so that said, is it illegal? As with many legal issues in intellectual property law, the answer is a mix of “it depends” and “nobody really knows, because it hasn’t been to court.” For plagiarism of tweets to be illegal under copyright law, two things would need to be true. The tweets would have to be copyrightable, and the copying would have to not be “fair use.”

Some tweets, like “Damn it’s cold” are not entitled to copyright protection under the law. Some tweets, such as short poems (Haiku, Senryu, “six words”, etc.) probably are entitled to copyright protection. Songs are entitled to protection, and 140 characters is probably enough to convey a melody in solfège. A single joke tweet, which is mostly what people plagiarize is probably not. However, a whole timeline of jokes certainly is. And we have seen accounts that do exactly that: tweet everything another account has ever said. That’s clearly a copyright violation in the USA. (“In the USA” is an important qualification, because every country has different rules and judicial precedents about copyright law.)

So the vast majority of the plagiarism we flag on the account is not copyright violation, because the tweets cannot be copyrighted. But it is still plagiarism. And that’s the whole point.

The account isn’t called “Copyright Violation is Bad.” It is called “Plagiarism is Bad.”

And so…

The fight continues. My collaborators and I (and just to be clear, they are doing all the work; I just set the thing up and wax poetic about it here on my blog), are going to keep educating, shaming, and inoculating. And soon the list of tweet thieves will hit 5000 people and we will have to start a second list. Sigh.

Tweet Detective

The @PlagiarismBad project has been moving along nicely. I wrote about it here and here. As I write this we have 1,636 tweet thieves on the list. That means that my collaborators and I have manually listed all those people after seeing firsthand evidence that they blatantly copied a tweet.

And we aren’t talking about tweets that are just similar. We are talking about tweets that are exactly the same. Genuine copy and paste tweets.

There are two ways to find a thief. The easy way is to start with a great tweet and search for it using the “Search” field in Twitter. You can then look at each user who copied it and failed to give any indication that it was not their original idea. No quotes. No “MT” or “RT”. No credit. Those people get added to the list. The only tricky part is that you can only list about 100 people an hour, and then Twitter puts the brakes on.

The hard way is to start with a suspect. Typically this is someone that was reported to us as a thief. The process of confirming these allegations is pretty labor intensive. You need to manually copy and paste each of their tweets into the Twitter Search box and then pore through the results, looking for a match. This looked to me like something that could use some automation. So, being me, I wrote a web app.

You give the app the handle of a suspect, and it finds up to 50 recent tweets by that person that are fairly long, don’t contain any links, don’t have “MT” or “RT” in them, and are not @-replies. It then uses Twitter’s “Search API” (which is not the same as the “Search” box in Twitter) to look for earlier tweets that are similar. It goes through the results if there are any, and reports the oldest, bestest match it can find.

There are a few problems with this app, though. Twitter’s API for getting a user’s tweets is basically like going to their timeline and scrolling down. You get all the @’s and retweets mixed in. So if the person you are looking at does a lot of that, it can take quite a while to get good list of tweets to search.

Also, Twitter has a really low rate limit of 180 searches every 15 minutes, which means you can only look up a few people before you bump into that limit.

But the biggest issue is that the Twitter Search API is horrible. It doesn’t return anything but really recent tweets. You can give exactly the same query to the search box on twitter.com and get hundreds of matches, and the Search API reports no results at all! (Programs cannot use the good search on twitter.com, so we have to use the Search API.) It’s a mess, and the developer support forums are full of people basically saying, “What the fuck?” and the Twitter support people replying, “Yeah, sorry about that.”

It turns out that Twitter has no intention of fixing this. They bought a company last year that provides good historical search results. If Twitter makes good historical data available in the Search API, people wouldn’t have to pay for that service any more.

What this means is that a really horrible tweet thief who copies tweets right away and does it all the time is easily outed by the app. But a person who just every now and then pretends to write something they originally saw on someone’s FavStar page or in an e-Card is not likely to show any results.

If you want to try it, go to tweetdetective.appspot.com and log in with your Twitter account (the program runs the searches as you). Then put in a suspect’s Twitter handle and see what you find. Have fun!

Steal this Tweet

The title of this post is an homage to a book published by Abbie Hoffman in 1971. That book is about fighting the power, and it appears that that is at least part of what this Plagiarism is Bad exercise is all about. If you are just tuning in, you should start with Part 1, where I explain what I’m up to. If you’re too lazy to click through that (and I know that you are), we created an account at @PlagiarismBad to keep a list of people who steal tweets.

When I say “we,” I mean me and my collaborators, Frank (@WheelTod) and Andrea (@sheepandrobots). The three of us have access to the account and we are very deliberately going through the massive lists we already have. And the reports are flooding in. We confirm that we have an actual thief and then put them on one or more of the lists.

In one case, I sent a DM to a guy I follow who had been reported, since he really didn’t seem like the tweet-stealing type. In his case, he had stolen his own tweet from an old account. I’m glad I checked!

A funny thing happened as I started going through the reports: I quickly discovered that most tweet thieves aren’t people. They are companies and they are doing it with a profit motive. These companies are ultimately trying to drive clicks to websites full of advertisements. The more people that follow the links, the more money they make. To get you to see their links, they need you to follow their accounts. To do that, they need content. So they steal it.

They start by just tweeting stolen content so they can build a follower base. After a while, they start throwing the links into the mix. An overwhelming portion of these links end up on the site pict-twiter.com. According to the public DNS database, that’s a site run by a fellow named Steven Melton. I found him on Twitter at @StevenMelton14. If following tweet thieving, link baiting, “Professional Poker Player” scumbags is your fetish, he’s your guy.

According to the database and Zillow, Steve lives in Moore, Oklahoma, in a lovely 3 bedroom, 2 bath house worth just over $100K. (The value of his house took a big tumble last year. Bummer, Steve.) I’ll let you search the whois database yourself if you want his address (Google: whois pict-twiter.com). You know, in case you want to send him flowers to thank him for all the lovely links and plagiarism in your feed.

There’s a funny thing about the way that Steve is doing this. He’s mostly using a program called Tweet Adder 4 to post the stolen tweets. I looked into Skootle—the company that makes that tool—and their website certainly looks like a pretty legitimate social media promotion company. They make tools that big companies can use to post content to social media. But as I dug a little further, I found out they have also been a favorite tool for spammers—so much so that Twitter sued them.

The crux of Twitter’s suit was that there was too much automation in the Tweet Adder software. Twitter’s terms of service pretty much ban any kind of automation in managing your followers, so this software was violating those. After putting up a fight for a while, Skootle eventually settled the lawsuit, and took all the automation out of their tools. Their users apparently then started a virtual riot, because they loved all that automation. However, it was never okay with Twitter to do that stuff, so that’s gone now.

Steve doesn’t seem to mind. He’s still using the software to dump his plagiarism and link bait into your feed. It’s just that he needs to click a lot more to do it now than he used to.

I figured that the nice folks at Skootle might be a bit upset if they found out someone was using their tool to violate Twitter’s terms of service. That’s what got them into hot water in the first place. I opened a support ticket with them and told them about what I’d found. This was their reply:

Translation: Fuck you.

Translation: Fuck you.

For those of you not familiar with the fine art of passing the buck, what they are saying is that their software doesn’t violate any of the rules. If their users violate the rules, that’s not their problem. Obviously that’s nonsense, or they wouldn’t have had to settle with Twitter in the first place.

It’s worth noting at this point that it’s not really clear whether plagiarism is a violation of Twitter’s rules. Copyright infringement is banned. But whether a tweet constitutes copyrightable material is a complex legal question. It depends what the tweet is. A photo or a poem is more likely to qualify, a joke less likely, and a simple statement of fact not at all. Here is an excellent discussion of the topic.

So at this point, I can’t stop the practice. It’s not clear that plagiarism is something Twitter cares about in the slightest. And the company that makes the software that this scumbag is using isn’t going to shut him down. (And if they did, he’d just switch to different software anyway.) So what to do? Write code, of course!

If you think about it, this isn’t really all that different from the situation we have with spam. While various governments have banned spam (way to go up there, Canadia!), the reality is that it just keeps on coming. So we all use spam blocking software to detect it and get rid of it. Mostly this is provided by our email provider, like Google, so you might not even know it is happening. They detect spam using a couple of techniques. Some are based on content, but others just rely on lists of known spammers.

Thanks to the @PlagiarismBad project, we have a list of known plagiarists! Well, how about we set up a service that will automatically block them for you as they are added to the list? So that’s what I did for a few hours on Saturday while my wife was making soup and the kids had their noses buried in electronics.

The result is at listblocker.appspot.com if you want to try it out. You give it a list (it defaults to the list of known tweet thieves), and with one click you can block them all. You can also set it to automatically block new accounts as they are added to the list. So together with our curated list of tweet thieves, this is a plagiarism blocker for your twitter feed.

In case you are wondering, no I don’t have a profit motive. I started the plagiarism tracking project because I was sick of having this stuff in my feed. And I wrote the tool because I was sick of having to manually block people as I added them to the list. This is all about me, boys and girls. If the rest of you want to use my toys, you are welcome to. (None of this is costing me a penny, by the way. I’m hosting the app in a system provided by Google that has massive free quotas.)

Update: I wrote a tool to make tracking thieves easier.

Plagiarism Is Bad

That simple tweet started something kind of big. I had just been followed by a woman with a pretty face AVI, and I was doing my usual vetting, when I noticed that she was being a little too funny, a little too frequently. Comedy is hard, and if you have three great tweets on different subjects all within an hour, you are either brilliant or cheating. So I typed a few words from one of these tweets into the search box, and sure enough, that was not an original tweet. None of them were.

I removed all the stars I had just given. And then I tweeted the above. I didn’t block her right away, since I wanted her to maybe see that tweet.

This happens every couple of weeks. I get followed by a beautiful woman and I discover that her tweets are not original. Or, sometimes, I don’t realize that until much later when my feed starts filling with link-bait. But I’ll get back to that.

So after I tweeted that, a twitter friend @sheepandrobots DM’d me. She had a list of plagiarists that she had been keeping. She guessed I had one too and was wondering if there was some way we could get these out there. She knew I create web applications in my sleep, so I was a natural person to ask.

As it turns out, I don’t have a list. I just block them and move on. And I didn’t think a web application was the way to go. But I loved the idea of getting these lists out there. And Twitter lists seemed to be the way to do it. You could easily go through the lists to block people. The people on the list would get a notification. And if anyone looked at the lists they were a member of, they’d see this tweet thief thing mentioned.

Tweet thieves are probably not nearly as pretty as the AVIs they are using.

Tweet thieves are probably not nearly as pretty as the AVIs they are using.

So I set up a new account @PlagiarismBad, and asked another friend (who also edits my blog; she’s basically superwoman) to make me a cool AVI. We set up a private list containing all of our suspects. My plan is to slowly go through these, and see if they are still stealing tweets. And if they are, move them to a public list.

Why They Steal

When I started this exercise, I had one specific tweeter in mind. Let’s call her “Molly.” I discovered her tweet theft last spring, when I recognized something she wrote. It was a top tweet of someone I followed. I DM’d her and asked her about it. That led to a long conversation in which I learned that she really didn’t think there was anything wrong with what she was doing. “It’s not like it’s my thesis or anything,” she said. I tried to explain that what she was doing was plagiarism. I tried to reason with her. It made no difference. She simply doesn’t understand why stealing tweets is wrong, and she refuses to stop. I really liked the idea of putting her on a list, because a lot of people I know and like retweet her regularly. And I assume that is because they have no idea that every single thing she tweets is copied.

So I figured the first tweeters I would focus on are the 100%’rs. People who do nothing but tweet stolen tweets. Like the one I just found who inspired my tweet at the top of this post. So I started looking through the list for those. And I noticed a pattern.

Noticing patterns is something I do, and do well. It is how I discovered The Minion King. And I used the same tools I described in that post to figure out what I’m about to reveal here.

First, the vast majority of tweet thieves are not actually people. My friend Molly from last spring was an anomaly. These thieves are actually minion accounts created by some corporation. And based on what I’ve seen in the tweet metadata, I’m pretty sure there are just two organizations (possibly only one) that are responsible for almost all the tweet thief minion accounts.

One uses the pattern: pretty girl, generic name, @ handle same as the name but with an _ in the middle. (Elly Hedson ‏@Elly_Hedson) When they advance to link-bait stage (I’ll explain this in a bit), they always link to “pict-twiter.com” sites (which, despite the name, has nothing to do with twitter.com). When you tweet, the client you used to do the tweet is recorded. Usually it’s the iPhone or Android App, or the web site, or whatever. These people have a really weird “source” in the metadata for all their tweets. It’s their twitter home page. That isn’t what the “source” is supposed to be. It’s supposed to be a link to the website of the program.

The other uses the pattern: pretty girl, generic name, @handle of first name, state abbreviation, and last name or initial. (Sarah Moore ‏@SarahCAMoore) Sometimes the @ is first name, last name, birth year in the late ’80s. (Shelly Harkins ‏@ShellyHarkins86) These accounts link to a broader range of web sites. The source of these tweets are all “Tweet Adder 4” which is a social media tool.

It’s possible that these are both these same company, just using a mix of strategies. Or it could be two different companies. However, most of those sites linked by the “Tweet Adder 4” group also end up ultimately at “pict-twiter.com” so I think perhaps it’s really just one minion federation.

Stages of Thief

These accounts start off by simply tweeting stolen tweets. They follow people, follow back, unfollow unfollowers, etc. Other than their content being completely stolen, they are just like any other twitter account. Interestingly, they tweet the same things as each other. So if you search for the exact text of one of the tweets on these minion TL’s, you will find a whole slew more minions.

They stay in this stage a while to build a following. Then they start tweeting links. Most of them tweet more than just links. They also continue to tweet stolen tweets. Typically one link and three stolen tweets. The links mostly go to “pict-twiter” or other sites that end up back at “pict-twiter.” That site has short collections of pictures with captions, and is designed to trick the viewer into following a link to an app or another web site. The more people they can trick this way, the more money they make.

One funny thing about all this is that even though the minion accounts are all pretending to be pretty girls, they often tweet things that a girl wouldn’t say. For example:

Um, are you also the kind of guy who has a vagina?

Um, are you also the kind of guy who has a vagina?

Since many of the most popular comedians on Twitter are men, a surprising number of the posts by these “women” refer to themselves as being male. I can’t find the origin of this tweet, by the way. But I found it in a “best of 2011” list, so it’s a classic.

Old Fashioned Plagiarists

I have been so focused on these corporate plagiarism accounts that I’m mostly not looking at the classic plagiarists yet. However, my hope is that people will report cases to the @PlagiarismBad account as they occur, and then I can add them to the list right away. Perhaps we can create a “teachable moment” for the plagiarist. Probably not. But perhaps.

I have a list of literally hundreds of suspects. People who are known to have posted stolen tweets at one time in the past. My inclination is to not put them on the public shaming list unless they are still doing it. People learn, and I don’t think having once done a stupid thing on twitter should brand you forever. I’d love to get an @ to the account saying “I’m sorry I did it; could you take me off the list?” I’d absolutely take someone off the list if they delete the tweet and apologize to the author.

Since I figure more and more people will be coming to look at the lists page of this account, I am also adding other people’s thief lists by subscribing to them. It’s easy to find these, because I just look at the lists on which my most outrageous thieves (like Molly) are already listed.

What can you do?

If you find a case of tweet theft, send a DM to @PlagiarismBad. Other than that, I’d recommend that you simply block all the people on the list. That way you won’t accidentally star or RT them, and you don’t have to worry about them following you. It’s as easy as this:

As easy as 1, 2, 3!

As easy as 1, 2, 3!

Updates: I found out who is behind the accounts. And I made a tool to make thief-detection easier.