Tuesday, March 3, 2020

Learn To Code Political Edition: Building @WBrettWellson Bot

A few weeks ago Duncan Kinney of Press Progress asked (possibly half jokingly?) if anyone can build a bot to pester Brett Wilson about his abandoned wells.

Naturally I thought this is a great idea, and a simple example of political programming I was talking about in the Learn to Code rational. It took me about 4 hours in total to come up with an initial version. Couple more hours to run it through some simple tests and fix bugs. It's now online on Twitter @WBrettWellson.

In this post I'm going to walk you through the source code of this simple bot so you can possibly replicate or expand upon it. You can in theory use this source to setup your own bot to pester Brett Wilson about his abandoned wells.

Also, if you have ideas for tweet templates for me to use with the bot comment them below.

A few years ago I was playing with the Twitter API, I built two bots. One spat out Ghostbusters Quotes. Just for fun and because it's got some good one liners. The other was used when I was tracking lots of news as I could use hash tags when i tweeted important links to have my bot automatically pick them up and store them in a database I could then reference later. Neither is operational anymore but I still had the source code. Seemed like a good base.

Both bots are written in PHP, one used the real time twitter api and one did it based on a scheduler. I opted for the scheduler approach with WBrettWellson bot. So on the system running the bot there is a cron job (cron is the scheduler on Linux) that fires off the bot script every 3 minutes. Logic inside the script further randomizes when the script sends a tweet and what the contents are.

It's nothing fancy just uses some keyword matching to vary the tweets, and randomization to offset the scheduler. I also put it together pretty quickly, so the code isn't designed to be very organized or reusable, just a quick proof of concept.

The bot source code can be downloaded here. I've removed the application key, user tokens, and the majority of the tweets from the downloaded source (you'll have to fill them in with your own). I also rely on Abraham's Twitter PHP library to interact with the REST API Twitter uses.

The bot consists of two main source files, and we will go through them so you can understand how it works.

File: /Twitter.php
<?php 
  require_once "twitteroauth/autoload.php";

  use Abraham\TwitterOAuth\TwitterOAuth;


  define('KEY', "[INSERT YOUR APPLICATION KEY]");

  define('KEY_PRIVATE', "[INSERT YOUR APPLICATION PRIVATE KEY]");
  define('TOKEN', "[INSERT YOUR USER TOKEN]");
  define('TOKEN_PRIVATE', "[INSERT YOUR PRIVATE USER TOKEN]");

  function twitter_gettweets($sinceId) {
    $twitter = new TwitterOAuth(KEY, KEY_PRIVATE, TOKEN, TOKEN_PRIVATE);
    $tweets = $twitter->get("statuses/user_timeline",
      array('screen_name' => 'WBrettWilson',

            'since_id' => $sinceId,
            'trim_user' => true,
            'exclude_replies' => true,
            'include_rts' => false));

    return $tweets;
  }

  function twitter_posttweet($string, $quoteid = null) {
    $twitter = new TwitterOAuth(KEY, KEY_PRIVATE, TOKEN, TOKEN_PRIVATE);
    $params = array('status' => $string);

    if (!is_null($quoteid)) {
      $params['attachment_url'] = sprintf("https://twitter.com/WBrettWilson/status/%d", $quoteid);
    }

    $twitter->post('statuses/update', $params);
  }

?>
The Twitter.php file provides wrapper functions for the required facilities to access Twitter. It contains a reference to Abraham's Twitter OAuth library needed to access the Twitter API, definitions of the API access keys and tokens that are required to access the Twitter API, as well as two functions: twitter_gettweets() and twitter_posttweet() to facilitate the needed operations of the bot.

Now again, when I built this bot I aimed to do it in a single evening. I did not write it in a reusable way as you can see I've hardcoded references to the target WBrettWilson directly into the code. These could easily be changed to be defined constants like the tokens and keys, or command line arguments.

The two functions are quite simple comprising only 3-4 lines each.

Both functions begin by creating a new instance of the TwitterOAuth class, passing in the tokens and keys, and assigning it to the $twitter variable.

twitter_gettweets():
This function has a single argument; $sinceId is a 64bit integer value of the ID of the last tweet the bot used. This is passed to the Twitter API to only return new tweets.

Next after the creation of the OAuth connection we request from the twitter API all tweets since the specified ID; we trim out user information with 'trim_user' => true as the bot doesn't need it, we also exclude any tweets Brett is making in response to another user with 'exclude_replies' => true and finally we exclude retweets with 'include_rts' => false. We then store the result in the $tweets variable, and return it.

twitter_posttweet():
This function has two arguments; $string is simply the text of the tweet that is to be posted, there is also an optional $quoteId argument that defaults to null. The $quoteId argument is a 64bit integer ID of a tweet to attach as a quoted tweet, which gives me the flexibility in the future to have the bot post tweets not directly quoting Brett Wilson.

In this function we use a separate $params variable to store the API arguments for Twitter so that we can optionally add the quoted tweet attachment_url. We initialize $params with the $string as the tweet status text.

Next we test if $quoteId is not null, and if it isn't we assume it contains a valid 64bit integer ID of a tweet to quote and generate a status URL to attach with sprintf("https://twitter.com/WBrettWilson/status/%d", $quoteid). We then post the tweet and return the post result.

This makes up our Twitter-centric library functions, the real meat of the bot is in the next file.

File: /WBrettWellsonBot.php
#!/usr/bin/php
<?php
  require_once "Twitter.php";

  $tweets = array();
  $tweetinfo = array(0, null);
  $lastId = file_get_contents("lastid.dat");
  $tweeturl = 'https://pressprogress.ca/money-man-in-ucp-kamikaze-scheme-left-alberta-with-over-a-dozen-orphan-oil-wells-and-millions-in-clean-up-costs/';
  $tweethash = '#cdnpoli #ableg';
  $tweettemplates = array(
    # Generic tweets
    "generic1" => 'And don\'t forget: Making money is easier when you leave the wells behind! #investmenttips',
    "generic2" => 'I say a lot of stuff! All of it is way more important to me than cleaning my abandoned wells.',

    # Tweets about protesters
    "protesters1" => 'Ugh, protesters. Don\'t they know that if they just abandon the problem it goes away? Worked with my now abandoned wells!',
    "protesters2" => 'Can you believe protesters get paid to care about stuff? What bums! Job creators like me get rich creating problems not caring about them! Duh!'
  );
First we include a reference to our Twitter library functions above, and also initialize some variables we will be using in the script. The script starts with a reference to #!/usr/bin/php which on Linux systems (where I'm running the bot) indicates that this script should be executed with the designated interpreter, in my case php command line interface (CLI). This file is executed each time the scheduler triggers it, so every 3 minutes in my current configuration.

$tweets and $tweetinfo are initialized with default values, while $lastId is read from a file called lastid.dat which is a simple text file containing only the ID. The ID of the last tweet the bot quoted is the only piece of data that is persisted between executions. The bot also later writes this file when it successfully posts a tweet.

$tweeturl and $tweethash separately store data that is included in every tweet. $tweettemplates is where I store the actual tweet text. Each tweet is given a key so I can look it up later, such as "generic1", or "protesters2". Single apostrophe's in the tweet text need to be "escaped" with a backslash as a single quote also marks the beginning and end of the string.

  $tweetmap = array(
    "default" => array(
      $tweettemplates["generic1"],
      $tweettemplates["generic2"]
    ),
    "protesters" => array(
      $tweettemplates["protesters1"],
      $tweettemplates["protesters2"]
    ),
    "protestors" => array(
      $tweettemplates["protesters1"],
      $tweettemplates["protesters2"]
    )
  );
Next we have the $tweetmap definition. The tweet map associates tweets to keywords, or default if no keyword match is found. If a tweet is quoted that does not contain a keyword a "default" tweet will be used. Here you can see that as the spelling of protester, or protestor, is often interchanged on Brett's timeline I have associated the same two "protester" tweets to the two different spellings.
  function match_best_tweet_for_pidge($_tweets, $_tweetmap) {
    global $lastId;
    $mapKey = "default";
    $tweetId = 0;
    $keywordTweets = array();

    foreach(array_keys($_tweetmap) as $key) {
      if (strcmp($key, "default") == 0)
        continue;

      foreach ($_tweets as $tweet) {
        $matches = array();
        $pattern = sprintf("/%s/i", $key);
        if ($tweet->id < $lastId)
          continue;
        if (preg_match($pattern, $tweet->text, $matches) == 1) {
          $keywordTweets[] = array($key, $tweet);
        }
      }
    }

    if (count($keywordTweets) > 0) {
      $keywordTweetSet = $keywordTweets[rand(0, count($keywordTweets) - 1)];
      $mapKey = $keywordTweetSet[0];
      $tweetId = $keywordTweetSet[1]->id;
    } else if ($_tweets[count($_tweets) - 1]->id >= $lastId) {
      $tweetId = $_tweets[count($_tweets) - 1]->id;
    }

    return array($tweetId, $_tweetmap[$mapKey][rand(0, count($_tweetmap[$mapKey]) - 1)]);
  }
Next we have a simple function for the bot to determine which of the latest tweets is best to respond to. It's nothing fancy, it's just meant to provide some variation. We begin by importing the $lastId definition from the global scope as I later found I needed to verify the ID was truly newer. The function also takes two arguments, $_tweets and $_tweetmap which in our case will be the collection of tweets returned from twitter_gettweets() and our $tweetmap defined above.

We then iterate through all of the keys in the map, skipping default. For each possible keyword we then iterate through all of the tweets returned from the API and re-validate that the ID is in fact newer than the last tweet we responded to. If it is, we use a regular expression pattern match that simply checks if the keyword is contained in the tweet and if the keyword is found we store that keyword and the associated tweet ID in $keywordTweets.

Once we are done iterating we check if any keyword tweets were found, if one or more was we randomly select one otherwise we select the newest tweet and associate the default keyword to it.

We then randomly select one of the applicable tweets, such as "generic1", or "generic2" if we ended up with default as the keyword for instance. We then return the tweet text we intend to send, along with the ID of the tweet we selected to respond to.

Fun fact: The title of this function is called "match_best_tweet_for_pidge" because Pidge is my girlfriend's nickname and she wanted me to include it in the source.
  if (rand(1, 30) != 15)
    die();

  $tweets = twitter_gettweets($lastId);
  if (count($tweets) == 0)
    die();

  $tweetinfo = match_best_tweet_for_pidge($tweets, $tweetmap);
  twitter_posttweet(sprintf("%s %s %s", $tweetinfo[1], $tweeturl, $tweethash), $tweetinfo[0]);

  $lastId = sprintf("%d", (integer)$tweetinfo[0]);
  file_put_contents("lastid.dat", ++$lastId);
?>
And these 9 lines? This is basically all the bot does. This, right here, is the bot.

The first thing it does is generate a random number between 1 and 30, and tests if the number generated is 15. This has the effect of running roughly 1 in 30 times, which combined with a 3 minute scheduler (3 times 30 = 90) means that the bot will potentially be able to send a tweet roughly every 1 and half hours, but is randomized enough that it's unpredictable. Sometimes it may be a shorter time, sometimes longer upwards of 2 hours.

Next, if the randomization passes we test if there are even any new tweets since the last we responded to.

If there are we then execute our match_best_tweet_for_pidge() matching algorithm above on the collection and determine which is best to respond to and what the tweet should say. We combine this information with the $tweeturl and $tweethash, and then post it to Twitter.

Finally we write the ID of the tweet were responding to plus 1 to the lastid.dat file.

And that's it!

Click here to recommend this post on progressivebloggers.ca and help other people find this information.

Richard Fantin is a self-taught software developer who has mostly throughout his career focused on financial applications and high frequency trading. He currently works for CenturyLink

Nazayh Zanidean is a Project Coordinator for a mid-sized construction contractor in Calgary, Alberta. He enjoys writing as a hobby on topics that include foreign policy, international human rights, security and systemic media bias.

1 comment:

  1. .. amazing.. !
    I don't understand a word of it
    yet I see its brilliant !
    My thinking hat is on !

    ReplyDelete