The art of the troll: New tool reveals egg users’—and Trump’s—posting patterns

Sean Gallagher
Tapping into Twitter's metadata isn't hard.
President Donald J. Trump loves his Twitter account. Much has been made of Trump's 140-character missives, which he at least in part blasts out to the world unfiltered with his own fingers (and sometimes his staff). Interpreting the data and metadata of Trump's tweets has become akin to a modern form of Kremlinology­—the art of reading the temperature of the Soviet Union's leadership by noting who was seated where in the reviewing stands at Red Square.
Twitter may not be the leading social media platform, and it may not be the friendliest place on Earth to exchange ideas. But it does offer an API that lets pretty much anyone mine its metadata. While much of the data associated with tweets is obscured by the usual Twitter clients, those who've followed the tweeting travails of our 45th president are likely aware of the fact that it contains information like what device or software tweets were posted from. In the past, Law enforcement agencies and others have used access to Twitter metadata for a variety of purposes, including surveillance, by tapping into high-volume feeds of Twitter's tweet-stream. But a great deal of information can be gathered with much simpler and accessible tools.
"As any other social media website Twitter know a lot of things about you, thanks [to] metadata," a French security researcher known as X0rz wrote in a recent blog post. "Indeed, for a 140 characters message you will get A LOT of metadata—more than 20 times the size of the initial content you typed in! And guess what? Almost all of this metadata is accessible through the open Twitter API." To demonstrate that, X0rz wrote a Python script called tweets_analyzer, a command-line tool to tap into some of Twitter's vast metadata that may not be accessible from the standard client.
Tweets_analyzer requires a Twitter account for authentication, as well as Twitter API credentials and, of course, a tweaked Python environment. It’s not exactly something to be handed over blindly to the average tweeter. But in the right hands (and with a little patience due to Twitter API rate-limiting), it can help analyze accounts to identify networks of Twitter bots or trolls concealing their actual location and identity. In addition to examining the metadata associated with Twitter users and their tweets, X0rz added a "friends" analysis feature that skims information from the metadata of the accounts followed by the target account, including language, timezone and location data.
For the sake of science, I turned tweets_analyzer loose on a few Twitter accounts to see what sort of information I could uncover. I started with the most obvious of suspects: Donald Trump.

Presidential tweets

There is high-volume tweeting, and then there is high-volume tweeting. To get a feel for the lower range of "high-volume," I pulled metadata for Trump's last 2,000 tweets, which stretch back to July 7, 2016.
Overall, Trump averaged 9.3 tweets per day since July, though that dropped to 5.7 tweets a day over the last 88 days. During the campaign, more of "his" tweets (50 percent) came from an iPhone client or from some other device (the Web client and iPad, 7 percent) than from his Android phone (40 percent). That has also shifted as the campaign ended—over the past 1,000 tweets, 62 percent are from his beloved Android device and 33 percent are from an iPhone (lately, likely from Trump social media director Dan Scavino’s device).
Unsurprisingly, the accounts Trump retweeted most belonged to his campaign team, to Dan Scavino, his three eldest children, and himself. (Notably, the Drudge Report was also RT'd seven times during the campaign to edge out Donald Trump Jr.) He also mentioned himself by his own Twitter handle more than anyone else by far—twice as much as the @'s for his campaign (and for Hillary Clinton).
Additionally, Trump was loose about using location data, with a significant number of his posts during and after the election carrying location tags. Here are the Top 10 detected places for his tweets:
  • - United States 57 (12 percent)
  • - Nevada 37 (8 percent)
  • - Florida 27 (6 percent)
  • - University of Nevada, Las Vegas 21 (4 percent)
  • - Manhattan 19 (4percent )
  • - Trump Tower 12 (2 percent)
  • - Colorado Springs 11 (2 percent)
  • - University City 11 (2 percent)
  • - Michigan 10 (2 percent)
  • - Pennsylvania 10 (2 percent)

Beating the eggs for trolls

Over the course of the past six months, I’ve encountered a rising number of Twitter accounts of questionable provenance launching missives my way. Some are obviously bots—accounts that simply retweet everything from certain users or offer canned (often hashtag-laden) responses. Some are humans out to troll. Others are not so obvious.
"Egg" accounts—Twitter profiles that have the default "egg" avatar instead of a profile photo—are frequently associated with bots and throwaway accounts. But they're also associated with people who just don't post that often and may not know how (or care) to change their profile picture. A sure sign that there's a normal user behind an egg is if geolocation data has been left turned on for the account. For example, one "egg" I had a run-in with had only 1200 tweets since 2012, had both political and gaming hashtags in tweets, and was clearly spending a lot of time in the San Diego area based on the 285 geolocated tweets the account made.
Today, bots and trolls are making a greater effort to put up the veneer required to make accounts look legitimate at a cursory glance. The biggest ones have been careful to turn off location information from posts and may provide a bogus timezone—what sets them apart is volume.
The tweets_analyzer tool by default only pulls the data for the last 1,000 posts—which is typically enough to profile a user for at least a month. But one account I had run into, @HawleysJadefav, generated 1,000 posts in just two and a half days. A pull of 4,000 tweets only went back 8 days, and the account threw up an average of 400 tweets per day. Of the 4,000 tweets_analyzer pulled in, 2,892 were retweets, and the vast majority of all the tweets (97 percent) were recorded as coming from an iPad.
Twitter's API requires developers to use a specific "consumer" key and secret to make posts, and it requires unique names for each client—I can't write a Twitter bot and register it as "Twitter for iPad," for example. But that doesn't mean it's impossible to do mass tweets with an iPad—especially if you have multiple people using the same account.
Without timezone or geographic data, it's still possible with accounts used from devices to get some sort of idea where they're coming from. To do this, you need to look at when they post the most and match it against timezones. The @HawleysJadefav account starts cranking up the volume at 5:00 AM Eastern and hits its second wind at 4:00 PM Eastern. That's not exactly a good match for any timezone except for possibly Brazil or Argentina—or Ecuador, where a number of accounts followed by this one are located. But @HawleysJadefav could just be one very determined person with nothing else to do all day long from dawn until dusk except mash the retweet button on an iPad screen. Only Twitter (and possibly the NSA and FBI) would be able to tell for sure.
Some bots are fairly obvious, especially by their volume. One 1,000-post-a-day account, @USFreedomArmy, generated most of its tweets through scheduled tweets using Sprout Social—the vast majority scheduled to drop at midday (1 PM Central time) each day. In other cases, posts came almost exclusively from an iPhone client, based on the metadata. But many of the links used in posts were shortened using "branded" URL shortener services that were aimed at monetizing tweets using interstitial advertisements. In almost all cases, there was no location associated with the accounts.
To do more extensive fingerprinting of these accounts would most likely require licensing Twitter's raw stream. Getting a real statistical handle on these accounts was more than Twitter's API would stand for, frequently causing tweets_analyzer to run over the usage limit.

Act more like a troll (sort of) for privacy

Most humans don't mass-post and don't use a tweet-staging service like HootSuite or Sprout Social to time their 140-character thought bombs. But many, if not most, Twitter users routinely give up time-sensitive location data with their posts that might be used to gain insight into their movements and possibly their identity. Sure, Donald Trump doesn’t need to worry about someone using his Twitter account to figure out when he's at Trump Tower—but most people don't have the Secret Service watching their condo while they're at work.
Fortunately, Twitter has made it a lot easier for users to strip location data from all of their posts, regardless of what device they make them from. This is done through user settings. But when combined with other sources of intelligence, even what's left after location data is trimmed can offer ample material for an analyst or attacker to work from in order to figure out who to target and how.
It's also important to realize that tweets_analyzer's concept and code could easily be expanded to do much more with the metadata collected from the Twitter API and the posts themselves. An enterprising developer could find ways to extract a lot more information about targeted accounts. So, people concerned about how much they're giving away about themselves should probably take a page from the book of the trolls and find ways to be heard in Twitter without leaving a trail pointing right back to them.

Comments