Totally f***in’ hack bash to delete nodes from Chef

Totally f***in’ hack bash to delete nodes from Chef

I totally fucked up the other week keeping a calibrated way to clean up nodes in Chef at Artifact Uprising. Since now I migrated from managed Chef to my own Chef OSS implementation, I know have to begin doing some upkeep and cleanup work.

Some Lambda functions that rely on Cloudwatch need to be cleaned up but I write this sitting down after an outpatient procedure I had yesterday, so I made a totally fucking hack-script to do the following:

  • Find AWS EC2 instances in a “Terminated” state
  • Have knife retrieve that array and search for them
  • Have knife clean up the node and client

Why do this? Because EC2 instances can go away at any time and Auto-scaling is setup to re-provision them automatically and bind them to Chef (another future post here). In other words, Chef needs to remain current within reasonable time.

Frankly, right now I don’t feel like messing around with Golang or Ruby so Bash is my way to go. If you don’t use Lambda, Cloudwatch or any fancy tooling, cron this bash somewhere:

# Scan for terminated nodes and remove from Chef

getTerminatedNodes() {

# get nodes that are terminated via aws-cli
nodes=$(aws ec2 describe-instance-status --filters Name=instance-state-name,Values=terminated | grep "InstanceId" | awk -F ":" '{print $2}' | awk -F '"' '{print $2}')
 echo $nodes

# loop and have knife find them, then delete client and node
for node in $nodes;
 found=`knife search node "*${node}*" | grep "Node Name" | awk -F ":" '{print $2}'`
 if [ ! -z "${found}" ];
 knife client delete ${found} -y && knife node delete ${found} -y 



That’s it! Cron it at a desired interval and Chef will stay relatively updated. Next will be a writeup on Lambda and Cloudwatch.

…and then there is Docker

…and then there is Docker

I can’t believe that I am entering the fourth month at Artifact Uprising. It has been exciting, challenging and frankly, one of the best professional experiences I’ve had. Great folks, smart engineering, and a progressive attitude when it comes to technology and the workplace overall.

As I now sunset my project of migrating Chef implementations from managed chef to our in-house Chef Open Source environments, I now walk into moving a lot of our actual applications into a clustered Docker solution. Along the way I have also moved our Terraform code.

I still have some outstanding tasks to document all I did here, on this site, to share and get some feedback. In the meantime, I will start working with Amazon ECS. While in the past I have suggested Kubernetes, Nomad, and other container cluster solutions, truth is that I think ECS might be the path for us to have everything together within our already tightly-knit AWS infrastructure. For sure I will share my experiences and provide quick guides as we move forward.

My goals are a bit ambitious and it is very possible that I might fail, change course, but most important, learn. I feel as I might be able to leave the entire scene of configuration management if we become really, really good at writing our already fast micro-services and rely solely on services interacting with each other.

An architecture will follow soon…

Back in a kitchen! Provision without a knife.

Back in a kitchen! Provision without a knife.

As many of you who I interact online or over a drink (or two…three) know, I left Puppet. The company and the people are fantastic and very glad of my time there. I now work at Artifact Uprising in Denver, CO. Their products, culture, amazing people, and technology outlook are very attractive to me. Plus, I don’t travel anymore which means I see my wife and daughter every day.

Working at my new outfit meant a return to a kitchen. I was a former Chef of a very busy infrastructure kitchen some years ago and now I am back with a knife. Boy was I rusty! Oh my oh my, have I forgotten my cookbook-making skills! Surely need to read more recipes.

One of my current big projects involve some work with our Chef infrastructure and I couldn’t believe how stuck I was doing the simple task of auto-provisioning. If you are using the open-source version of Chef, usually you use the knife tool to provision machines. While knife is a great tool for managing Chef, I simply couldn’t use it for how dynamic, ever-changing, fast and hipster our infrastructure grows, shrinks and moves. I need to not think about a machine coming up and get provisioned. It has to register by itself.

There are several folks that have written about how to do this using the open-source version of Chef but none of them worked exactly for my setup. This post will show you, very simply what you need to do on your client to get it automatically up and running.

On another post I will detail how to build your own Chef development environment, at least the Xuxo way. For now, this applies to just the client.


On a cloud-init, bootstrap script, etc., script the following commands:

Install the chef-client from It will detect your OS:

curl -L | sudo bash

Obtain the validation pem from your server and place it somewhere on your client:

echo >/tmp/my-validator.pem 'my_validation_key'

Create a “First Boot” JSON file required by Chef and add the role(s) you want the machine you have:

{ "run_list": [ "my-role" ] }

Create the configuration folders:

mkdir /etc/chef && mkdir /etc/chef/trusted_certs

Create the file /etc/chef/client.rb with the following contents (change url and validator info):

chef_server_url "https://chef-server/organizations/my_organization"
client_fork true
log_location "/var/log/chef/client.log"
validation_client_name "my-validator"
node_name "this-client-node"
trusted_certs_dir "/etc/chef/trusted_certs"
# Do not crash if a handler is missing / not installed yet
rescue NameError => e
 Chef::Log.error e

I highlighted in blue the trusted_certs dir because that was key for me to get automatic provisioning going. Obtain the Chef server’s CRT file and place it in that directory.

Finally, run this command on the client to provision and register with your open-source Chef server:

sudo chef-client -j /tmp/first-boot.json --validation_key /tmp/my-validator.pem

Chef will now provision the system and the role will be applied to the node. Script that and you don’t need to be on your Chef workstation provisioning via knife!

Thanks for reading.

Puppet Enterprise Orchestrator: A Practical Guide

Puppet Enterprise Orchestrator: A Practical Guide

Oh wow! Just when you got really good at deploying VMs, configuring and installing stuff on them, someone walks into your area and asks: “Hey!, can you use Puppet to deploy multiple nodes in order and install app stacks on them?”. If you are in Denver, your answer might be: “Let me check with Xuxo”. Well, pretend you are in Denver as I will show you how to do that task by deploying a Python Flask app that needs a MongoDB database, plus some ideas to automate these deployments further.

After doing this a couple of times and reviewing the documentation, I thought about splitting this post into two parts as I combine several concepts. Then I thought a bit more and decided to give you everything in one long post. So, grab a coffee if it’s morning or a beer if evening (or lunch time if you are in Colorado)…this is a long post.

Components and knowledge requirements

  • Puppet Enterprise 2015.x.x or higher. There is an open-source guide out there, :). The author will surely hit me on Twitter later but it works differently.
  • Understanding of hiera. Go to Puppet’s docs on it or follow my minimalist guide.
  • Understanding of multi-tier applications. DB tier, application tier.

Flow Architecture

I will describe how to implement orchestration as close to operations as possible. This means that the request for a new stack will come from an external system and the host and stack information will be retrieved rather than hardcoded in Puppet manifests as it is expected to change. While I will use minimal values, you will see that the input data can grow and become as fine-grained as you want it.


The illustration above shows how a user would request a ‘stack’ and the new host(s) information will be stored in CouchDB, pretending it to be a CMDB or a host information database. Once that information is provided, an API call can trigger an orchestration job in Puppet and the build-out will begin. Also in the diagram, Puppet will retrieve the values for credentials and database info from a key/value store. I use Consul and recommend Vault. Puppet will validate all objects and deploy the nodes in order. When the process completes, Puppet returns a report URL with a job ID that can be tracked elsewhere to report completion to requester.

Now that we know what we are doing, let’s begin. Grab the second cup or second beer.

Hiera setup

I am taking this post to also show you how to extend Hiera capabilities. We will be retrieving values from two places, CouchDB and Consul. For that we need to add two new backends in hiera:

Once you install them, we need to move those backend providers to a new location as we are working with Puppet Enterprise not Open Source. Copy the providers .rb files to:


Now let’s modify our hiera.yaml file (below is my actual config) so we can use CouchDb and Consul. I have highlighted the changes:

 - yaml
 - json
 - http
 - consul
 :datadir: "/etc/puppetlabs/code/environments/%{::environment}/hieradata"
 :datadir: "/etc/puppetlabs/code/environments/%{::environment}/hieradata"
 :port: 5984
 :output: json
 :failure: graceful
 - /hiera/%{clientcert}
 - /hiera/%{environment}
 - /hiera/common
 :port: 8500
 - /v1/kv/hiera/common
 - "nodes/%{::trusted.certname}"
 - "global"
 - "common"
 - "aws"
 - "stiglinux"
 - "etcd"

Restart pe-puppetserver to apply the new backends and configuration:

systemctl restart pe-puppetserver

CouchDB and Consul setup

You must now be on beer #1 if coffee is done or beer #3. I will not walk you through the installations of CouchDB and Consul. Follow the vendor guides as they are pretty good. BTW, I host them on separate VMs. In this step, we will add some values to those two stores.

CouchDB and Consul have great REST APIs and UIs that can be used to get our data in and out of them. On Couch we will create a document that mimics the posted stack request:

Create DB for hiera:

curl -X PUT

Add document:

curl -X PUT -H \
'Content-Type: application/json' -d \

I prettied the text a bit for readability but you can see how I labeled each server we will be orchestrating as a DB and App units. The ‘ready‘ states are purely optional, but handy as you will see later. Also, notice how the database and document follow the paths highlighted on the hiera.yaml‘s http backend.

Login to the Consul server and create a key/value objects for hiera:

consul kv put hiera/common/dbuser admin
consul kv put hiera/common/dbport 27017
consul kv put hiera/common/dbpass admin

As you can see, you can put as many things as you want in there. It doesn’t necessarily mean you have to use them. The paths are reflected on the consul section of our hiera.yaml.

The Puppet manifests

On to beer #4 or #5…

Now we have setup a good portion of our infrastructure that will support a request for a stack. It is time to dive into the Puppet piece of this. We will begin by coding our application stack. There are new provisions on the Puppet 4 language to achieve this.

Create the work directory structure:

mkdir -p pyflaskapp/{manifests,templates,files,lib}
mkdir -p pyflaskapp/lib/puppet/type


First, we need to create a small capability (or interface) to share our database information with our application node. Sharing data is the core of orchestration inside Puppet. Create the file nosql.rb inside pyflaskapp/lib/puppet/type with this content:

Puppet::Type.newtype :nosql, :is_capability => true do
 newparam :name, :is_namevar => true
 newparam :user
 newparam :password
 newparam :port
 newparam :host
 newparam :database

Our next step is to create our database manifest that will export this values to orchestrator. The name of the file is on the first line:

# pyflaskapp/manifests/db.pp
define pyflaskapp::db(
  $host = $::fqdn,
  $port = 27017,
  $database = $name,
 class {'::mongodb::globals':
     manage_package_repo => true,
     bind_ip => '',
 class {'::mongodb::client': } ->
 class {'::mongodb::server': } ->

 mongodb::db {$database:
   user => $db_user,
   password => $db_password,

 Pyflaskapp::Db produces Nosql {
   user => $db_user,
   password => $db_password,
   host => $host,
   database => $database,
   port => $port

To achieve orchestration, we are using a new block in our manifests. I have highlighted some new things we will need to understand.

Define is our entry point in these manifests and it tells us which data we need as parameters between the parentheses. It has been available in the language and it’s essential for these jobs.

The last block is new and very important. Here is where we are stating that this DB module will produce or make available the stated information: user, password, host, database, port.

Our DB tier makes this available for our app tier to know where the resources to use are.

Now we will make our app manifest. This will build our flask application:

# pyflaskapp/manifests/app.pp
define pyflaskapp::app(

 $pippackages = ['flask', 'pymongo']

package {$pippackages:
    ensure => 'installed',
    provider => 'pip',

file {'/flask_app':
    ensure => 'directory',
    mode => '0775',

file {'/flask_app/templates':
    ensure => 'directory',
    mode => '0775',

file {'/flask_app/':
    ensure => present,
    content => template('pyflaskapp/')

file {'/flask_app/index.wsgi':
    ensure => present,
    source => 'puppet:///modules/pyflaskapp/index.wsgi',

file {'/flask_app/templates/index.html':
    ensure => present,
    source => 'puppet:///modules/pyflaskapp/index.html',

exec {'run_me':
    path => ['/usr/bin', '/bin', '/sbin', '/usr/local/bin'],
    command => "python &",
    cwd => "/flask_app",
    unless => "/usr/bin/test -f /flask_app/.running.txt",

    ensure => file,
    content => "Running flask instance"


Pyflaskapp::App consumes Nosql {
    db_name => $database,
    db_host => $host,
    db_port => $port,
    db_user => $user,
    db_password => $password


Notice again the last block. This time we consume what the DB manifest produced. To use some of the values that we will receive from the database piece of the orchestration job, I generated the flask start file from template. In this fashion, we can deploy as many unique instances of our application:

# pyflaskapp/templates/
from flask import Flask, render_template, request, redirect

import os
from pymongo import MongoClient

def connect():
# Substitute the 5 pieces of information you got when creating
# the Mongo DB Database (underlined in red in the screenshots)
# Obviously, do not store your password as plaintext in practice
 connection = MongoClient("<%= @db_host -%>",27017)
 handle = connection["<%= @db_name -%>"]
 handle.authenticate("<%= @db_user -%>","<%= @db_password -%>")
 return handle

app = Flask(__name__)
handle = connect()

# Bind our index page to both
@app.route("/index" ,methods=['GET'])
@app.route("/", methods=['GET'])
def index():
 userinputs = [x for x in handle.mycollection.find()]
 return render_template('index.html', userinputs=userinputs)

@app.route("/write", methods=['POST'])
def write():
 userinput = request.form.get("userinput")
 oid = handle.mycollection.insert({"message":userinput})
 return redirect ("/")

@app.route("/deleteall", methods=['GET'])
def deleteall():
 return redirect ("/")

# Remove the "debug=True" for production
if __name__ == '__main__':
 # Bind to PORT if defined, otherwise default to 5000.
 port = int(os.environ.get('PORT', 5000))'', port=port, debug=True)

Finally, our module needs to bring this all together. We do this in our init.pp:

# pyflaskapp/manifests/init.pp
application pyflaskapp(
  String $db_user,
  String $db_password,
  String $host,
    db_user => $db_user,
    db_password => $db_password,
    host => $host,
    port => $port,
    export => Nosql[$name],

    consume => Nosql[$name],



The entry point here is the word application. It will define our stack and its’ components. Notice the export and consume relationship. We are almost ready to trigger this job.

Orchestration job

Probably this is the last beer you will have on your desk as you work through this. It is all down to site.pp now. Just as you are used to defining nodes in that main file, now we define a site, our stack building steps and which nodes get what! Add this to site.pp:


     # get AppName from CouchDB's request    
     $name = hiera('AppName')

      # get values from Consul and CouchDB to fulfill request

         db_user => hiera('dbuser'),
         db_password => hiera('dbpass'),
         host =>hiera('DBServerName'),
         nodes => {
               Node[hiera('DBServerName')] => [Pyflaskapp::Db[$name]],
               Node[hiera('AppServerName')] => [Pyflaskapp::App[$name]],

Let’s run the job!

Running orchestrator

Orchestrator is a tool within Puppet Enterprise to accomplish these multi node stack deployments. It is available via REST API with secure token authentication.

The tool has two main parts. The first I want to show is the command ‘puppet app show‘. This utility works as a job plan that you can review. It checks that all dependencies are met, node information looks good, and which order things should run:


I show the image because it is actually color coded. If the plan review looks OK, we can go ahead and run the job. If one of the items does not pass validation, this tool will let you know. I added to my site.pp a conditional that would only run a job if all nodes are on a ‘ready‘ state. That way, I protect the dependencies even further:

# conditional block
 if (hiera('AppServerReady') == "not ready") or (hiera('DBServerReady') == "not ready") {
   fail("One of the servers is not ready")

To run the job, the command is as follows:

puppet job run --application Pyflaskapp --environment production

As you can see, we can apply a job to a specific environment also. Output is also color coded:

orch run.png

Our multi-tier node is now ready for use. It is a flask app that I took from the web somewhere and modified along the way:

Flask node:
flask app.png

Mongo node:

And there you have it! A full stack deployment with Puppet!

Thanks for reading.


A Quick Hit of Hash…icorp!

A Quick Hit of Hash…icorp!

Alright! This is going to be one of the quickest posts ever! Why? Because what we are going to do is ridiculously simple yet powerful!. We will build a 2-node Nomad distributed scheduler to run applications on. Sure, there is Kubernetes, Mesos, etc., but….can you do it in about 10 minutes and with single binaries? Ah!, the elegance of Hashicorp!

What you need

  • 2 VMs and a Consul installed somewhere. I always use RHEL or Ubuntu for my VMs.
  • Consul agent
  • Nomad


My nodes are called: nomad0.puppet.xuxo and nomad1.puppet.xuxo. We will make nomad0 our server and nomad1 our client. You can scale up and cluster as much as you want! It is very quick and simple to add.

On each of the nodes, download and place Consul agent and nomad:

cp nomad /usr/bin/
cp consul /usr/bin/

Create a config file (/etc/consul/config.json) for the Consul agent on each:

    "advertise_addr":"", (<-IP of node, change for each node)
    "bind_addr":"", (<-IP of node, change for each node)
    "datacenter":"xuxodrome-west", (<- your consul datacenter)
    "node_name":"nomad0", (<-node name, change for each)

Create a config file (/etc/nomad.d/server.hcl) for nomad0 (our server):


# Increase log verbosity
log_level = "DEBUG"

# Setup data dir
data_dir = "/opt/nomad0"

# Enable the server
server {
      enabled = true

# Self-elect
     bootstrap_expect = 1

Create a config file (/etc/nomad.d/client.hcl) for nomad1 (our client):

datacenter = "xuxodrome-west"

client {
 enabled = true

leave_on_terminate = true

Optionally, you can create a systemd file to manage the nomad service:


ExecStart=/usr/bin/nomad agent -config /etc/nomad.d

Alright, we are ready! Let’s start everything and verify:

On server node run:

consul agent -data-dir=/opt/consul -node=nomad0.puppet.xuxo /
-bind= -config-dir=/etc/consul &


systemctl start nomad (if you did the systemd service file)

On client node run the same commands in the same order but changing the -node option to the client name.

Verify cluster memberships by running a check on any consul member:

root@nomad0:~# consul members
Node Address Status Type Build Protocol DC
consul0 alive server 0.7.2 2 xuxodrome-west
nomad0.puppet.xuxo alive client 0.7.2 2 xuxodrome-west
nomad1.puppet.xuxo alive client 0.7.2 2 xuxodrome-west

Verify nomad cluster by running this command on our nomad server (nomad0):

root@nomad0:~# nomad node-status
ID DC Name Class Drain Status
e1be248c xuxodrome-west nomad1.puppet.xuxo <none> false ready

We are done! But what fun is it if our scheduler is not running anything? None. Let’s create a mongo container job then.

On nomad0, our server, create a job file:


job "mongo" {
        datacenters = ["xuxodrome-west"]
        type = "service"

        update {
           stagger = "10s" 
           max_parallel = 1

        group "cache" {
           count = 1
           restart {
          attempts = 10
           interval = "5m"
           delay = "25s"
           mode = "delay"

         ephemeral_disk {
              size = 300

         task "mongo" {
               driver = "docker"

         config {
               image = "mongo"
               port_map {
                    db = 27017

          resources {
                  cpu = 500 # 500 MHz
                  memory = 256 # 256MB
         network {
                  mbits = 10
                  port "db" {}

            service {
                 name = "global-mongodb-check"
                 tags = ["global", "cache"]
                 port = "db"
                 check {
                      name = "alive"
                      type = "tcp"
                      interval = "10s"
                      timeout = "2s"


Start the job:

nomad run mongo.nomad

Verify job after some seconds:

nomad status mongo

ID = mongo
Name = mongo
Type = service
Priority = 50
Datacenters = xuxodrome-west
Status = running
Periodic = false

Task Group Queued Starting Running Failed Complete Lost
cache 0 0 1 0 0 0

ID Eval ID Node ID Task Group Desired Status Created At
230d2ec2 ff77cbff e1be248c cache run running 12/27/16 21:20:32 UTC

Check Consul and now our nomad cluster is alive and the mongo service is available:


Have fun!

Yes, I want that node gone now…please?

Yes, I want that node gone now…please?

If you use any release of Puppet, open source or enterprise, you know that sometimes you have to cleanup leftover certnames, or nodes, when servers are decommissioned or simply ‘killed’ away. On this post, I will show you how to build a tool to constantly listen for node cleanup requests and delete them from Puppet. This is an example, you can take what I did here and do your own, improve, etc.

Use Case

An external system will send notifications when a machine has been destroyed, or decommissioned. After such notification is sent, Puppet should know about it and cleanup the SSL information about the machine and stop enforcing configuration management on it upon receipt.

The Code

The tool is written on Ruby and follows the same structure as the rest of my utilities. I stick to the same fashion of system design and re-use of code so I can work quickly on my concepts and validate them.

Message Queue

Install or re-use RabbitMQ on the puppetmaster or another server.


On your working directory, create a folder called config and the file common.yaml:

 # RabbitMQ values
 mq_user: admin
 mq_pass: admin
 mq_server: ls0
 remove_channel: noderemoval

# Mongo values
 mongo_host: ls0
 mongo_port: 27017
 db: removednodes

The mongo values can be ignored, in my environment I set a record for each deletion.

The “middleware”

On your working directory, create a file named clean_node.rb. It is your main class:

#!/usr/bin/env ruby
# encoding: utf-8
# one change

require 'bunny'
require 'yaml'
require 'date'
require 'mongo'

class Cleannode

# load configs to use across the methods

fn = File.dirname(File.expand_path(__FILE__)) + '/config/common.yaml'
 config = YAML.load_file(fn)

# export common variables

@@datetime =

# export the connection variables
 @@host = config['mq_server']
 @@mq_user = config['mq_user']
 @@mq_pass = config['mq_pass']

# export the channels to be created/used
 @@remove_ch = config['remove_channel']

# database values
 @@db = config['db']
 @@mongo_host = config['mongo_host']

# export connection to RabbitMQ
 @@conn = => @@host,
 :user => @@mq_user,
 :password => @@mq_pass)

 def initialize()

# define methods to use by server and clients
 # Post a message to remove a node that has been decommissioned
 def remove_node(certname)


type = "REMOVE"
 message = type + "," + certname + "," + String(@@datetime)

 ch = @@conn.create_channel
 q = ch.queue(@@remove_ch)
 ch.default_exchange.publish(message, :routing_key =>

puts " [x] Sent Removal Request to Puppet" + certname





The listener on the master

This is the actual piece that brings it all together and performs the deletion. Create a file called node_clean_listener.rb:

#!/usr/bin/env ruby
# encoding: utf-8

require "bunny"
require 'yaml'

fn = File.dirname(File.expand_path(__FILE__)) + '/config/common.yaml'
config = YAML.load_file(fn)

@@host = config['mq_server']
@@mq_user = config['mq_user']
@@mq_pass = config['mq_pass']
@@remove_ch = config['remove_channel']

conn = => "#{@@host}",
 :user => "#{@@mq_user}",
 :password => "#{@@mq_pass}")

ch = conn.create_channel
q = ch.queue("#{@@remove_ch}")

puts " [*] Waiting for messages in #{}. To exit press CTRL+C"
q.subscribe(:block => true) do |delivery_info, properties, body|
 res = body.split(',')
 typ = res[0]
 certname = res[1]

puts " [x] Received #{body}"

#puts res
 puts typ
 puts certname

if typ == "REMOVE"
 #remove_job = fork do
 fork do
 puts "Removing node"
 exec "/opt/puppetlabs/bin/puppet cert clean #{certname}"


The exec cert clean command will have to change to node purge if using Puppet Enterprise.

Run this on the background to listen for deletion requests all the time.

Client test

Create a file called try.rb:

require "./clean_node"

String host = "your-cert-name-to-delete"

d =

Replace the highlighted string in red with a node you want removed and run! The node will be deleted from Puppet.

The client piece can be any external system that will tell Puppet to remove the node.

Download the repo here and improve it.

Xuxodrome: My infrastructure (Part 2)

Xuxodrome: My infrastructure (Part 2)

The previous post explained how my virtual datacenter is setup. On this article I will show you a very simple monitoring tool that just checks that my hosts are alive. The tool is deployed on my Linux VMs. Since this is a complete throwaway environment I don’t need a full-blown Nagios or anything like that.

The monitoring tool is written in Ruby and records the checks into a MongoDB.

Something a little extra on this post will be the deployment of the utility via Puppet’s vcsrepo module which essentially keeps our tool up to date whenever Puppet agent runs!

The Ruby Code and YAML

Create a folder somewhere on your system called monitor-my-infra:

mkdir -p monitor-my-infra

Inside create the file config.yaml which will have some values that our tool will need:

  mongo_server: ls0.puppet.xuxo
  mongo_db: monitoring
  mongo_db_collection: host_stats
  host: ls0.puppet.xuxo
  log_dir: /var/log/monitoring/
  ping_timeout: 3

Please replace the values above with your own.

Next create a file called monitors.rb (description of actions in red):

require 'mongo'
require 'yaml'
require 'date'
require 'net/ping'
require 'free_disk_space'
require 'usagewatch'

class Monitorinfra

# load config
 fn = 'config.yaml'
 config = YAML.load_file(fn)

class_variable_set(:@@database, config['mongo_db'])
 class_variable_set(:@@db_server, config['mongo_server'])
 class_variable_set(:@@log_locale, config['log_dir'])
 class_variable_set(:@@collection, config['mongo_db_collection'])

@@datetime =

# Connect to Mongo for record keeping
 @@db_conn =[ "#{@@db_server}:27017" ], :database => "#{@@database}")

# Ping an host to see if node can reach out
 def ping_out(host)

    res = system("ping -c1 #{host} 2>&1 >/dev/null")

    if res == true
      s = 'ALIVE'

      s = 'DEAD'

   # Insert record in database
   collection = @@db_conn[:host_stats]
   doc = { host: host, status: s, time: @@datetime }
   result = collection.insert_one(doc)


# Check disk avail in gigabytes
  def disk_space(host, disk)

      res =
      val = res.gigabytes.round
      # insert record into DB
      collection = @@db_conn[:host_stats]
      doc = { host: host, disk: disk, avail_disk: val, unit: "GB", time: @@datetime }
      result = collection.insert_one(doc)
      # for debug:
      puts result.n



And create a client, let’s say try.rb:

require './monitors'

# I am pinging myself here but just use an external hostname
hostname = `hostname`.strip
d =

# ping
status = d.ping_out(hostname)
puts status

# disk
diskstatus = d.disk_space(hostname, '/')
puts diskstatus

Now, you can commit this to a repo. Why? Because we are going to use Puppet to deploy it and keep it updated!

Deploy with Puppet and keep the code updated on the client

Here is where things get a bit more hip! We will deploy this monitor using Puppet and a module called vcsrepo. Our Puppet module will deploy the code on the client node and then check the git repo on every run to ensure the code is the latest! We will also create a cron job to run the checks every hour. I really don’t need to know status every 60 seconds, once an hour will do.

Create our working module structure:

mkdir -p inframonitor/{manifests,files,templates}

Create our manifest in the manifests folder, simplemon.pp (Read comments for actions):

class inframonitor::simplemon {
  # Array of gems to install
  $gems = ['mongo', 'net-ping', 'free_disk_space', 'usagewatch']

  $gitusername = "your git account"
  $gitrepo = "monitor-my-infra.git"

   # Install the ruby devel package
   package {'ruby-devel':
     ensure => 'installed',

   # Install the gems from the array above
   package {$gems:
     ensure => 'installed',
     provider => 'gem',

   # Install git for repo cloning
   package {'git':
     ensure => 'installed',

  file { '/simplemon':
     ensure => directory,
     mode => '770',

  file{ '/root/':
     ensure => file,
     source => 'puppet:///modules/monitorpack/'

  # Clone repo! Notice the ensure latest!
  vcsrepo { '/simplemon':
     ensure => latest,
     provider => git,
     source => "git://${gitusername}/${gitrepo}",
     revision => 'master',

  # Create cron job
  cron::job { 'run_simplemon':
     minute => '0',
     hour => '*',
     date => '*',
     month => '*',
     weekday => '*',
     user => 'root',
     command => '/root/',
     environment => [ 'MAILTO=root', 'PATH="/usr/bin:/bin"', ],
     description => 'Run monitor',

Now create our runner shell script,, for cron inside the files folder:

cd /simplemon
/bin/ruby try.rb


Set a classification rule that groups all Linux hosts on the master and wait for Puppet agent to run.

Wait a couple of hours and query our Mongo DB via the mongo client:

> use monitoring
switched to db monitoring
> db.host_stats.find()
{ "_id" : ObjectId("5810f8bcd4e736d99bfadf17"), "host" : "ostack-master", "status" : "ALIVE", "time" : ISODate("2016-10-26T18:40:59.450Z") }
{ "_id" : ObjectId("5810f8f6d4e736da6c4d6a10"), "host" : "ostack-master", "status" : "ALIVE", "time" : ISODate("2016-10-26T18:41:58.280Z") }
Type "it" for more

Trimming some entries, but there are our ping results!

In the near future I will probably add notifications, but I can easily query my Database whenever I need to check status.

Download the repo here and have fun!