supervisor is a UNIX utility to managing and respawning long running Python processes to ensure they are always running. Or according to its website: Supervisor is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems. Installation supervisor can be installed with pip $ pip install supervisor Given a script test_proc.py, start the process under supervisor as $ sudo supervisorctl start test_proc Now it will run forever and you can see the process running with
Querying S3 with Presto This post assumes you have an AWS account and a Presto instance (standalone or cluster) running. We’ll use the Presto CLI to run the queries against the Yelp dataset. The dataset is a JSON dump of a subset of Yelp’s data for businesses, reviews, checkins, users and tips. Configure Hive metastore Configure the Hive metastore to point at our data in S3. We are using the docker container inmobi/docker-hive
Creating a Presto Cluster I first came across Presto when researching data virtualization - the idea that all of your data can be integrated regardless of its format or storage location. One can use scripts or periodic jobs to mashup data or create regular reports from several independent sources. However, these methods don’t scale well, especially when the queries change frequently or the data is ingested in realtime. Presto allows one to query a variety of data sources using SQL and presents the data in a standard table format, where it can be manipulated and JOINed like traditional relational data.
Creating an Elixir module To get a better handle on Elixir, I developed a simple CLI tool for sending files in Slack. To create a new project, run $ mix new slack_bot This creates a new Elixir project which looks like this ├── README.md ├── config │ └── config.exs ├── lib │ └── slack_bot.ex ├── mix.exs └── slack_bot ├── slack_bot_helper.exs └── slack_bot_test.exs Navigate to the lib folder and create a folder inside it called slack_bot.

Git aliases

Here’s a quick post for managing your git shortcuts. If you use git regularly, you should have a .gitconfig file in your home directory that looks something like this: [user] email = [email protected] name = Your name You can add an alias section like so: [user] email = [email protected] name = Your name [alias] ls = log --oneline uom = push -u origin master These aliases can be used like so:
Recently, I have been working with the Python API for Spark to use distrbuted computing techniques to perform analytics at scale. When you write Spark code in Scala or Java, you can bundle your dependencies in the jar file that you submit to Spark. However, when writing Spark code in Python, dependency management becomes more difficult because each of the Spark executor nodes performing computations needs to have all of the Python dependencies installed locally.
To help facilitate my blogging workflow, I wanted to go from written to published post quickly. My general workflow for writing a post for this blog looks like this: Create a post in _posts Write the post Run fab sync Here is the repo fab sync is a custom command that uses the magic of Fabric to stage, commit and push changes in my blog repo to Github. Next, Fabric uses an ssh session in the Python process to connect to the server on which my blog is hosted, pull down the newest changes from the blog repo and finally, build the Jekyll blog so that the changes are immediately reflected on this site.
If you have a lot of servers to which you frequently connect, keeping track of IP addresses, pem files, and credentials can be tedious. SSH config files are great for this problem, but they don’t play well with bash. I wanted to store all of my hosts’ info in a config file but still have access to the HostNames since sometimes I just need the IP address of a server to use elsewhere.
Bash aliases are great. Whether you use them to quickly connect to servers or just soup up the standard bash commands, they are a useful tool for eliminating repetitive tasks. I’m always adding new ones to optimize my workflow which, of course, lead to me create aliases to optimize that workflow. While there are more complete CLI alternatives for alias management like aka, I prefer two simple commands for managing my aliases, which I keep in ~/.
A few days ago, I saw a Guess my word game on the front page of Hacker News. Before spoiling the fun for myself by checking out the comments, I decided to try my hand at writing a solution in Elixir. Afterwards, I generalized the code to choose its own word from the UNIX dictionary and then “guess” it, applying a binary search based on the feedback of whether each guess was alphabetically greater or less than the word itself.