Creating a Presto Cluster I first came across Presto when researching data virtualization - the idea that all of your data can be integrated regardless of its format or storage location. One can use scripts or periodic jobs to mashup data or create regular reports from several independent sources. However, these methods don’t scale well, especially when the queries change frequently or the data is ingested in realtime. Presto allows one to query a variety of data sources using SQL and presents the data in a standard table format, where it can be manipulated and JOINed like traditional relational data.Creating an Elixir module To get a better handle on Elixir, I developed a simple CLI tool for sending files in Slack.
To create a new project, run
$ mix new slack_bot This creates a new Elixir project which looks like this
├── README.md ├── config │ └── config.exs ├── lib │ └── slack_bot.ex ├── mix.exs └── slack_bot ├── slack_bot_helper.exs └── slack_bot_test.exs Navigate to the lib folder and create a folder inside it called slack_bot.Here’s a quick post for managing your git shortcuts. If you use git regularly, you should have a .gitconfig file in your home directory that looks something like this:
[user] email = [email protected] name = Your name You can add an alias section like so:
[user] email = [email protected] name = Your name [alias] ls = log --oneline uom = push -u origin master These aliases can be used like so:Recently, I have been working with the Python API for Spark to use distrbuted computing techniques to perform analytics at scale. When you write Spark code in Scala or Java, you can bundle your dependencies in the jar file that you submit to Spark. However, when writing Spark code in Python, dependency management becomes more difficult because each of the Spark executor nodes performing computations needs to have all of the Python dependencies installed locally.To help facilitate my blogging workflow, I wanted to go from written to published post quickly. My general workflow for writing a post for this blog looks like this:
Create a post in _posts Write the post Run fab sync Here is the repo
fab sync is a custom command that uses the magic of Fabric to stage, commit and push changes in my blog repo to Github. Next, Fabric uses an ssh session in the Python process to connect to the server on which my blog is hosted, pull down the newest changes from the blog repo and finally, build the Jekyll blog so that the changes are immediately reflected on this site.If you have a lot of servers to which you frequently connect, keeping track of IP addresses, pem files, and credentials can be tedious. SSH config files are great for this problem, but they don’t play well with bash. I wanted to store all of my hosts’ info in a config file but still have access to the HostNames since sometimes I just need the IP address of a server to use elsewhere.Bash aliases are great. Whether you use them to quickly connect to servers or just soup up the standard bash commands, they are a useful tool for eliminating repetitive tasks. I’m always adding new ones to optimize my workflow which, of course, lead to me create aliases to optimize that workflow. While there are more complete CLI alternatives for alias management like aka, I prefer two simple commands for managing my aliases, which I keep in ~/.A few days ago, I saw a Guess my word game on the front page of Hacker News. Before spoiling the fun for myself by checking out the comments, I decided to try my hand at writing a solution in Elixir. Afterwards, I generalized the code to choose its own word from the UNIX dictionary and then “guess” it, applying a binary search based on the feedback of whether each guess was alphabetically greater or less than the word itself.Hello and welcome to my third and final attempt to start a blog. To be honest, the hardest part has been choosing what tool to use to build and maintain the blog itself. I’ve been so inundated with The Next Big Thing™ in static site generators that I never actually got started on the writing part. That didn’t seem right.
I’ve tried to keep things as simple as possible. This is stock Jekyll with no bells or whistles to get things started on the right foot.If you spend most of your time in the command line, you don’t want to leave to do math. Qc is a script that does in-line command line math without forcing you to exit the main bash prompt as you might with a program like bc or a language interpreter.
#!/bin/bash python -c "print $1" Make the script executable with the command:
$ chmod +x qc.sh Alias it to qc by editing the .