The art of the troll: New tool reveals egg users’—and Trump’s—posting patterns
on
Get link
Facebook
X
Pinterest
Email
Other Apps
Sean Gallagher
President Donald J. Trump loves his Twitter
account. Much has been made of Trump's 140-character missives, which he
at least in part blasts out to the world unfiltered with his own fingers
(and sometimes his staff). Interpreting the data and metadata of
Trump's tweets has become akin to a modern form of Kremlinology—the art
of reading the temperature of the Soviet Union's leadership by noting
who was seated where in the reviewing stands at Red Square.
Twitter may not be the leading social media
platform, and it may not be the friendliest place on Earth to exchange
ideas. But it does offer an API that lets pretty much anyone mine its
metadata. While much of the data associated with tweets is obscured by
the usual Twitter clients, those who've followed the tweeting travails
of our 45th president are
likely aware of the fact that it contains information like what device
or software tweets were posted from. In the past, Law enforcement agencies and others
have used access to Twitter metadata for a variety of purposes,
including surveillance, by tapping into high-volume feeds of Twitter's
tweet-stream. But a great deal of information can be gathered with much
simpler and accessible tools.
"As any other social media website Twitter
know a lot of things about you, thanks [to] metadata," a French security
researcher known as X0rz wrote in a recent blog post.
"Indeed, for a 140 characters message you will get A LOT of
metadata—more than 20 times the size of the initial content you typed
in! And guess what? Almost all of this metadata is accessible through
the open Twitter API." To demonstrate that, X0rz wrote a Python script called tweets_analyzer, a command-line tool to tap into some of Twitter's vast metadata that may not be accessible from the standard client.
Tweets_analyzer requires a Twitter account for
authentication, as well as Twitter API credentials and, of course, a
tweaked Python environment. It’s not exactly something to be handed over
blindly to the average tweeter. But in the right hands (and with a
little patience due to Twitter API rate-limiting), it can help analyze
accounts to identify networks of Twitter bots or trolls concealing their
actual location and identity. In addition to examining the metadata
associated with Twitter users and their tweets,
X0rz added a "friends" analysis feature that skims information from the
metadata of the accounts followed by the target account, including
language, timezone and location data.
For the sake of science, I turned
tweets_analyzer loose on a few Twitter accounts to see what sort of
information I could uncover. I started with the most obvious of
suspects: Donald Trump.
Presidential tweets
There is high-volume tweeting, and then there
is high-volume tweeting. To get a feel for the lower range of
"high-volume," I pulled metadata for Trump's last 2,000 tweets, which
stretch back to July 7, 2016.
Overall, Trump averaged 9.3 tweets per day
since July, though that dropped to 5.7 tweets a day over the last 88
days. During the campaign, more of "his" tweets (50 percent) came from
an iPhone client or from some other device (the Web client and iPad, 7
percent) than from his Android phone (40 percent). That has also shifted
as the campaign ended—over the past 1,000 tweets, 62 percent are from
his beloved Android device and 33 percent are from an iPhone (lately,
likely from Trump social media director Dan Scavino’s device).
Unsurprisingly, the accounts Trump retweeted
most belonged to his campaign team, to Dan Scavino, his three eldest
children, and himself. (Notably, the Drudge Report was also RT'd seven
times during the campaign to edge out Donald Trump Jr.) He also
mentioned himself by his own Twitter handle more than anyone else by
far—twice as much as the @'s for his campaign (and for Hillary Clinton).
Additionally, Trump was loose about using
location data, with a significant number of his posts during and after
the election carrying location tags. Here are the Top 10 detected
places for his tweets:
- United States 57 (12 percent)
- Nevada 37 (8 percent)
- Florida 27 (6 percent)
- University of Nevada, Las Vegas 21 (4 percent)
- Manhattan 19 (4percent )
- Trump Tower 12 (2 percent)
- Colorado Springs 11 (2 percent)
- University City 11 (2 percent)
- Michigan 10 (2 percent)
- Pennsylvania 10 (2 percent)
Beating the eggs for trolls
Over the course of the past six months, I’ve
encountered a rising number of Twitter accounts of questionable
provenance launching missives my way. Some are obviously bots—accounts
that simply retweet everything from certain users or offer canned (often
hashtag-laden) responses. Some are humans out to troll. Others are not
so obvious.
"Egg" accounts—Twitter profiles that have the
default "egg" avatar instead of a profile photo—are frequently
associated with bots and throwaway accounts. But they're also associated
with people who just don't post that often and may not know how (or
care) to change their profile picture. A sure sign that there's a normal
user behind an egg is if geolocation data has been left turned on for
the account. For example, one "egg" I had a run-in with had only 1200
tweets since 2012, had both political and gaming hashtags in tweets, and
was clearly spending a lot of time in the San Diego area based on the
285 geolocated tweets the account made.
Today, bots and trolls are making a greater
effort to put up the veneer required to make accounts look legitimate at
a cursory glance. The biggest ones have been careful to turn off
location information from posts and may provide a bogus timezone—what
sets them apart is volume.
The tweets_analyzer tool by default only pulls
the data for the last 1,000 posts—which is typically enough to profile a
user for at least a month. But one account I had run into,
@HawleysJadefav, generated 1,000 posts in just two and a half days. A
pull of 4,000 tweets only went back 8 days, and the account threw up an
average of 400 tweets per day. Of the 4,000 tweets_analyzer pulled in,
2,892 were retweets, and the vast majority of all the tweets (97
percent) were recorded as coming from an iPad.
Twitter's API requires developers to use a
specific "consumer" key and secret to make posts, and it requires unique
names for each client—I can't write a Twitter bot and register it as
"Twitter for iPad," for example. But that doesn't mean it's impossible
to do mass tweets with an iPad—especially if you have multiple people
using the same account.
Without timezone or geographic data, it's
still possible with accounts used from devices to get some sort of idea
where they're coming from. To do this, you need to look at when they
post the most and match it against timezones. The @HawleysJadefav
account starts cranking up the volume at 5:00 AM Eastern and hits its
second wind at 4:00 PM Eastern. That's not exactly a good match for any
timezone except for possibly Brazil or Argentina—or Ecuador, where a
number of accounts followed by this one are located. But @HawleysJadefav
could just be one very determined person with nothing else to do all
day long from dawn until dusk except mash the retweet button on an iPad
screen. Only Twitter (and possibly the NSA and FBI) would be able to
tell for sure.
Some bots are fairly obvious, especially by
their volume. One 1,000-post-a-day account, @USFreedomArmy, generated
most of its tweets through scheduled tweets using Sprout Social—the
vast majority scheduled to drop at midday (1 PM Central time) each day.
In other cases, posts came almost exclusively from an iPhone client,
based on the metadata. But many of the links used in posts were
shortened using "branded" URL shortener services that were aimed at
monetizing tweets using interstitial advertisements. In almost all
cases, there was no location associated with the accounts.
To do more extensive fingerprinting of these
accounts would most likely require licensing Twitter's raw stream.
Getting a real statistical handle on these accounts was more than
Twitter's API would stand for, frequently causing tweets_analyzer to run
over the usage limit.
Act more like a troll (sort of) for privacy
Most humans don't mass-post and don't use a
tweet-staging service like HootSuite or Sprout Social to time their
140-character thought bombs. But many, if not most, Twitter users
routinely give up time-sensitive location data with their posts that
might be used to gain insight into their movements and possibly their
identity. Sure, Donald Trump doesn’t need to worry about someone using
his Twitter account to figure out when he's at Trump Tower—but most
people don't have the Secret Service watching their condo while they're
at work.
Fortunately, Twitter has made it a lot easier
for users to strip location data from all of their posts, regardless of
what device they make them from. This is done through user settings. But
when combined with other sources of intelligence, even what's left
after location data is trimmed can offer ample material for an analyst
or attacker to work from in order to figure out who to target and how.
It's also important to realize that
tweets_analyzer's concept and code could easily be expanded to do much
more with the metadata collected from the Twitter API and the posts
themselves. An enterprising developer could find ways to extract a lot
more information about targeted accounts. So, people concerned about how
much they're giving away about themselves should probably take a page
from the book of the trolls and find ways to be heard in Twitter without
leaving a trail pointing right back to them.
Comments
Post a Comment