Data Cyborg bot

I created @data_cyborg to help me stay updated on software engineering news, opinions, tutorials, and insights from industry experts.

The project combines the best of both worlds, Twitter and Reddit, by utilizing a bot that fetches the most upvoted posts from various software engineering-related subreddits and shares them on a dedicated Twitter account. This way, I can follow a curated reading list based on popular content from the subreddits I’m interested in, while also helping others with similar interests discover valuable resources. Additionally, I use the Data Cyborg account to share content I find interesting, resulting in a mix of automated and human-curated content.

The name “Data Cyborg” reflects this unique combination of robot (the bot) and human (my personal curation) contributions.


Quiniela_perfecta video

How it works?

A Lambda function runs several times a day with varying parameters, including the subreddit to track and the threshold score for filtering posts. Each execution queries the Reddit API for the specified subreddit and filters posts based on the threshold score.

Next, the Lambda function compares the post ID with previously processed posts (stored in an S3 JSON file containing posts from the last three days) and selects only new posts for processing.

The text of the post is extracted and trimmed to fit within Twitter’s 240-character limit, considering the inclusion of the post URL and a hashtag identifying the subreddit.

Finally, the tweet is published using Tweepy. If the tweet is successfully published, the post ID is added to the JSON file, which is then uploaded back to S3. This ensures that only unprocessed posts are considered in future Lambda runs.

Technical implementation

technical_diagram

Project built entirely using the AWS Free Tier.