Dev Diary - Community Development Map

The Problem

The City of Calgary has a number of resources available for finding out what development is happening in your community. The primary resource is the Development Map, which while comprehensive is significantly off putting because of the amount of information presented. It also only covers the major developments - land use changes and development permits, that are open to some community consultation.

My particular community - Sunalta - has an additional problem because we joined a project with the City of Calgary removing the need for most businesses to apply for a development permit for change of use on our Main Streets (10th Ave. and 14th St). While we are very supportive of the project, we’ve always struggled with notification or knowing when a new business has moved in so we could welcome them to the community. The lack of development permit makes this a bit harder.

The Data

Thankfully, the City provides this data via their open data portal:

However the City also releases the information for more minor information to help answer resident’s questions that are sometimes directed to us.

But can I use it?

So cool, we have some data. How do we make this easier to use? The Open Data Portal does present the data and even allows you to do some visualization but you’re unable to share your visualizations and filtered datasets without logging in. So using data.calgary.ca is only good for exploration. It also does not provide a great way for less technically minded folks on our board to get this data or even glance at it.

So the next best thing is to look at setting something up to read the data and create the visualizations we want!

The “solution”

From here I’ll share my notes about how I arrived at my solution. It’s far from the only solution, but is the result of my trial, error, and realizing there are a LOT of options I’m simply not even aware of.

Filtering

The first step was to try and filter the data from the 4 datasets to just the area I cared about - turns out this is easy with Socrata’s Query Language. Adding ?communityname=SUNALTA to my queries and the python py-soda library and I’m off to the races. The land use dataset however does NOT include community fields, so I bet heavily on that 1000 land use items is sufficient for the last 365 days worth of data.

Visualization (Streamlit)

The next tech I dove into - was how can I show off this data on a map and in a table. At work another team has started using Streamlit and it was incredibly easy to get started with. It also had a fantastic feature where it will cache data removing my worry about needing to jump through hoops to avoid hitting data.calgary.ca too many times and needing to create a caching layer myself.

The biggest gotchas with streamlit I have found is that the default map is nowhere near as featureful as using the plotly library to put dots on the map. Plotly’s table feature while supported has terrible performance and causes everything to drag to a halt. As such I went with Plotly for showing the map, but used Streamlit’s native table view to show off the data.

Hosting

While Streamlit is easy to run and put a reverse proxy in front of (for SSL and caching), they do also run a free service to deploy your Streamlit app to Heroku for showing off. Streamlit Sharing is free for public repositories (like this project). It will also hook up to GitHub to redeploy your app whenever you push an update without needing to code this in GitHub workflows yourself. Very convenient.

Hugo Archetypes

As the first post to my TIL section where I’m trying to encourage myself to write more frequently on smaller topics; I learned about Hugo’s Archetypes.

The tl;dr is that archetypes is the template for new content. As such instead of typing out all of the metadata fields each time, I can type hugo new blog/<name>.md to get a new blog post and hugo new til/<name>.md. These are each defined in the archetypes folder in blog.md and til.md.

For example I use the below in blog.md:

+++
categories = []
tags = []
description = ""
slug = '{{ .File.BaseFileName }}'
draft = true
title = "{{ replace .Name "-" " " | title }}"
date = {{ .Date }}
author = "Micheal J."
+++

Dev Diary - yycbike_count - Python 3 and GitHub Actions

A couple weeks back, a Twitter bot I run stopped working correctly. It’s been on my list for a long time to revamp and clean up - if only because the server that it was running on was very overdue for being rebuilt. So when Eco Counter changed their private API and broke the script it provided a great chance to rewrite.

Fun items learned along the way:

Python 3 Upgrade Notes

  • Changing print is straight forward.
  • f strings are awesome
  • Cleaning up data is super time consuming, but cleans things up so nicely.

GitHub Actions

As part of the rewrite I wanted to explore ways to stop managing a server in order to run the bot. As it turns out GitHub actions has a schedule feature and could work quite well for both testing and running the script. They also provide a fantastic learning tutorial at https://lab.github.com/githubtraining/github-actions:-hello-world.

It gave me a great easy chance to get into actually setting up a simplistic CD system for myself, and to get my hands dirty. This coupled with act made it easy to test and run the script through being rewritten.

The whole process is summarized in a sentence, but it took some time and I hope will pay off in spades in the future.

You can find the workflow used on GitHub.

Repurposing or Extending yycbike_count

For those who are interested in taking the work in yycbike_count to use for their own city - please do. There are a couple things to mention if you want to re-use most of the work I did.

Counter Config

The counter config is hard coded into twitterBot.py - sorry.

GitHub Actions

If you want to use the workflow you’ll need to make 4 GitHub secrets correspondding to the environment variables (TWITTER_TOKEN, TWITTER_TOKEN_KEY, etc.). If you want to use act, you’ll also need to add them to the .secrets file.

act

If you want to use act to run the workflow locally you’ll need to add the following to your .actrc file:

-P ubuntu-latest=catthehacker/ubuntu:act-latest

Running Pihole using Docker on your Mac

A colleague of mine mentioned an open source program called Pi-hole designed to act as a DNS resolver in your local network to blackhole trackers and ads. The biggest advantage of this is that it can also be used by devices that don’t support adblockers natively or are cumbersome to use.

So how does one trial something on their local Mac to see if it’s worthwhile? Turns out the project has a Dockerfile and it works quite well, and if you don’t expose the DHCP ports you can ignore breaking your work network with a rogue DHCP server. So assuming you already have Docker installed:

cat <<EOF | tee ~/.piholeenv
WEBPASSWORD=pihole
DNS1=1.1.1.1
IPv6=True
EOF

docker pull pihole/pihole

docker run -d -p 80:80 -p 53:53 -p 53:53/udp -p 443:443 --restart=unless-stopped --env-file ~/.piholeenv --name pihole pihole/pihole

open http://127.0.0.1/admin/

Next, set your DNS server to 127.0.0.1:

networksetup -setdnsservers Wi-Fi 127.0.0.1

And nada, a quick, dirty, and SUPER ephemeral test that doesn’t mess with the current DHCP setup on your network. If you want to run it more long term, follow the docs properly and specify a volume to save the data.

To shut it all down:

docker stop pihole
docker rm pihole

Unfortunately my home router provided by my ISP doesn’t offer the ability to change DNS. So I guess that’s the push necessary to get around to putting it in bridge mode and getting a proper router.

Using CloudFlare as a v6 to v4 Bridge

CloudFlare offers the ability for you to turn on CDN caching and present your service to the public without requiring a public IPv4 address (so long as you have a publicly accessible v6 address) To turn it on, add the DNS entry to your domain on CloudFlare, and then turn on the caching service (Coloured in logo)

Alt text

The caveats with the CDN are the same as if you had a v4 address; only certain ports (eg. 80, 8080, 443 , 8443, etc.) work. The output from your server is cached/proxied via CloudFlare’s CDN servers. So it’s not a full fix; eg. no port 22 to ssh in, but for running a web/http based service can be quite useful.

SSH Key Types and Cryptography: The Short Notes

On nearly all current (< 3 years old) operating systems there are 4 different types of SSH key types available - both as a client’s key and the host key:

  • DSA (No longer allowed by default in OpenSSH 7.0+)
  • RSA
  • ECDSA (OpenSSH 5.7+)
  • ed25519 (OpenSSH 6.5+)

So which one to use?

In general, the best practice preference is to use ed25519 if possible, otherwise use RSA (4096 bits) due to mistrust of NIST’s curve for ECDSA. Which key is chosen/created is managed by HostKeyAlgorithms in sshd.conf, and when you create a client key by running ssh-keygen. So what about the other parts of an SSH connection, and can I use an ed25519 key anywhere?

The key types are just one portion of an SSH connection; authentication. SSH connections have three major cryptographic phases, the key exchange, the authentication, followed by the negotiated symmetric encryption used by the rest of the connection. (If you want more detail, check out Digital Ocean or Cisco’s explanations.)

Unlike the SSH key type, the ciphers and key exchange are decided on between sshd and ssh depending on their feature set and what is defined in their config files.

If you’re running OpenSSH 6.3 or newer you can see what algorithms are supported by running one of the three commands: ssh -Q [cipher|mac|kex], or read man ssh_config.

Key Exchange

A glossed over version of the key exchange, has the client and the server share some information (eg. public keys) and use the Diffie-Hellman algorithm with a decided curve to set up the cipher (symmetric key) and the MAC (message authentication code to confirm validity) to be used for the rest of the connection.

Mozilla’s recomended list of kex choices to use (specify in sshd_config) per their wiki is a great starting point. The summary being anything at least with a sha256 confirmation helps.

Encryption

The symmetric key created during the key exchange step is now used to encrypt and decrypt the rest of the connection.

Mozilla’s wiki again lists the most recommended ciphers and MACs with the new chacha20-poly1305 being the first on the list.

Key Type Reference

* - disabled by default for sshd
1 - PuTTY stable only supports dsa and rsa but the latest development snapshots support ecdsa and ed25519.

TL;DR

Unless you’re using CentOS 6 or Ubuntu 12.04, use ed25519 keys and Mozilla’s config files to limit the preferred connection ciphers.

http://www.openssh.com/legacy.html

What's calgary.bike?

On April 8th I stopped redirecting calgary.bike to Bike Calgary[1] to start showing off the aggregated data that I was pulling together from the 3 Eco-Counter installations. With the source on GitHub, I thought it’d be worth explaining a little of the why and how.

At the start of January, the City of Calgary made public the web page for bike counter on the Peace Bridge with promises of making more available including at least 10 more during the upcoming cycle track pilot. The Peace Bridge counter had data stretching back to April 24th, 2014 and by default was always showing the entire daily data set.

My first curiousity was whether I can could have a bookmark to just show the last week or so worth of numbers which led me to figuring out how the webapp worked. (Good ol' WebKit developer tools)

After that in tandem with some projects I was looking into for work I decided to start seeing about scrapping the data and storing it somewhere to compare numbers (different installations, averages, weather) more easily. So a big thank you for the people at the City and Eco-Counter for not telling me to “get lost and don’t use things inappropriately”.

As for how - the Python scripts just ask Environment Canada and the counters once a day for their last day’s worth of new data (if possible) and store it in Graphite. Interacting with the data is Grafana 2 behind nginx. All hosted on a tiny instance on some publicly available free compute resources that I just happen to also manage as part of my day job. Funnily, most of the script writing was done during an all nighter at a Denny’s in Kamloops waiting for 4 AM to roll around so I could swap some power cables in a maintenance window.

It’s nothing fancy but it’s fun to see what might come of it when data is made available.

1 - I had registered the domain last year and figured that was a good place to point until I had a better idea of how to use it.

Trying to make sense of when to use Docker vs. LXC

While working on some side projects the past couple weeks I kept confusing myself on how things worked behind the scenes between Linux Containers and Docker. They both leverage the Linux kernel’s cgroups to function on Linux (and in Docker’s case - similiar technologies in other OSes), but differ completely in terms of how you interact with them.

While Linux Containers can best be thought of a super lightweight VM to run a whole VM, Docker contains a slew of other features that blur the lines between it acting like a super lightweight VM and being a full platform to build off of. Docker plays closer to the idea of a process/group of processes (application) under a chroot versus LXC’s idea of a whole OS/machine in a chroot jail.

So it’s misleading to think of a Docker container the same way as a LXC container. Same technology behind the scenes but completely different approaches. For Docker it’s all in how you set up your container to run - you can have all the other services you normally get in a VM if you so wish.

For example with LXC setting up MySQL would consist of making the container, running the command to install MySQL and setting the service to go. You can then log in or attach and run other commands as well if necessary.

Docker on the other hand involves similar steps with the flexibility of having Docker do the install and run the service when the container starts (defined in the Dockerfile). However if you want to attach to that container and run more commands you have to have set access to do that up ahead of time (eg. supervisord, runit), create a new container with that command, or try and force your way into the container. (you can try lxc-attach but if you want a new TTY and you’re attaching to a mysqld instance? Not going to work)

After figuring that out - the use of Puppet in Docker started to make more sense. Have Puppet configure your image and then save/commit that state or kick off the supervisord process to keep the container “alive”. Docker lends itself more to recreating/iterating whenever a new update is needed over updating settings.

In summary - LXC container is analagous to a VM, while Docker a very supercharged sandbox for running a process or group of processes. Use LXC when you’re wanting a separate “server” without the extra overhead, Docker when you’re wanting to run a “service”.

I also recommend reading the FAQ - primarily the what Docker “adds to LXC”. In the end it’s left me more leery of using Docker - it’s a bit of a paradigm shift I’m not ready to do just yet.

On one last sidenote, IPv6 support also looks like a lot of pain - but not any worse than LXC.

About Me

I am a Technical Operations Manager at Cybera by day, a geek, father, and husband by night. My current role grants me a great deal of freedom to try out various different solutions both as a learning exercise and as a way to improve how things run. I’ve used the pseudonym Chealion online since 1998 and have subsequently owned and posted content on chealion.ca off and on (more off than on) since 2006.

I’m writing this for myself so I apologize in advance if you find it not very focused.

You can get contact me by emailing me chealion AT chealion DOT ca

Other haunts:

Github
Flickr
Twitter