Things Learned Week of 11/30/20 – 12/06/20

I haven’t had a chance to update this site because I am working on Udacity’s Data Engineer Nanodegree. It had been very time consuming, but worth it to learn databases and a slew of different technologies. Going forward, I’ll be writing a weekly blog starting with this one to show what I’ve learned and some invaluable resources that helped me learn it.

Topics

  • Docker Containers
  • How to run Airflow from Docker
  • Allowing containers to see data updates by exposing ports
  • Docker Data Volume vs Bind Mountings
  • $(pwd) to paste current directory on command line
  • Running VSC (Visual Studio Code) from terminal using “code .”

As you see, topics can be as small as finding shortcuts that help in everyday coding life to as large as setting up a Docker container to run Airflow by pulling from Docker Hub. I hope you enjoy this series.

Docker Containers

The hardest part of learning new technologies is getting them to play nice with what’s available in your system. I really want to learn Apache Airflow and Udacity provides a virtual environment to code, but coding from there won’t help me in the future so I decided with running Apache Airflow locally.

This lead to several days of frustration of trying to get everything to work together. That’s when I decided to never have this problem again so I am learning how to work with Docker. Docker is a tool that helps everyone use the same programs regardless of the user’s operating system. A good example would be getting a new hire with all the technologies to get their job done, a simple command of pulling the organization’s Dockerfile and running Docker Compose will get the new engineer the same system as everyone else.

Having code that only works locally and breaks in another machine is

How to Run Airflow from Docker

# get the Airflow docker image from Docker Hub
docker pull puckel/docker-airflow

# run docker image while setting up the location of dag folder
docker run -d -p 8080:8080 -v $(pwd):/usr/local/airflow/dags  puckel/docker-airflow webserver

The code above is all someone has to do to get Airflow running locally. Simply populate the dag folder with code and refresh the Airflow UI in order to run Dags.

Expose Ports and Show In Container

Being able to understand ports and allowing ports to be opened using docker file is important for many applications (AWS and anything web-based). The code below explains

# on dockerfile make sure to expose port 3000

# create container on this folder using dockerfile
# testnode is name of container
docker build -t testnode .

# run container
docker container run --rm -p 80:3000 testnode
  • –rm: removes the app after we close
  • -p: opens up port
  • 80:3000: runs on port 80 of container and listens to port 3000

Create docker image and push to Docker hub

Just like official releases from the organizations themselves, someone can create a mirror that other find useful. Docker Hub is a repository of Docker images that can help with projects.

# save dockerfile with tag
docker tag testnode johnrickcanque/testing-node

# push to dockerhub
docker push johnrickcanque/testing-node
Explanation of code
  • rename before putting up on dockerhub
  • testnode is the name of the docker image
  • johnrickcanque is the name of my dockerhub
  • /testing-node is the name of the image on dockerhub

Test our created docker image by deleting local file and pulling directly from docker hub

# remove docker image from local file
docker image rm johnrickcanque/testing-node

# run docker image from dockerhub
docker container run --rm -p 80:3000 johnrickcanque/testing-node 

Docker Volumes vs Bind Mountings

The great thing about Docker is that deleting images and containers do not delete data that was created or saved from the image. Docker Volumes allows a user to change the name of the Dag locations while bind mountings simply allow the user to point the dag folder to point to a location to find dags for Airflow.

  • Important to know to make it easier to get to files created by applications
  • Naming volume simply renames the file folder so Docker can easily get to it in the future
  • use of “-v” and “:” to rename the location
#find the VOLUME location on Docker Hub
docker container run -d --name psql -v psql:/var/lib/posgresql/data postgres:9.6.1

Bind mountings map the directory to a container file or directory. This can be as simple as using a different folder for the Dag folder or it can be used to create full personalized applications, with only changes needing to be saved.

Sample App Launch
  • Start at the right directory which contains the docker file
  • Forward port 80:4000 in container
  • Mount directory into the container (side directory)
  • Run image from source file (on dockerhub bretfisher/jeckyll-serve)
  • Make changes (this involves changing the default parameters on the source file) and save
  • The file changes the code locally and maps into the container (jeckyll). The code uses our changes and updates the image using our changes. The end result is that we can see the personalized changes without having to save a huge file.
  • In the example of Jeckyl. You can create changes on local machine and use a docker hub image to create a custom website while editing in your local machine. Docker does not need to run Jeckyll or dependecies, but instead we run the image and just apply changes. The changes that we made are then the end product of the project.
# run docker container while having the site point to the information on our current location
docker run -p 80:4000 -v $(pwd):/site bretfisher/jeckyll-serve

#change title on post file

$(pwd) to paste current directory on command line

Instead of copy and pasting file path, you can simply type $(pwd) to show current directory. Because it wouldn’t make sense to change the default settings on the Dockerfile, a simple way of pointing to the correct location of the dag folder is using

# to paste current directory on command line 
$(pwd) 

# example: use current location as the Dag folder
# default dag folder is usually /usr/local/airflow/dags
docker run -d -p 8080:8080 -v $(pwd):/usr/local/airflow/dags  puckel/docker-airflow webserver

Running VSC from terminal using “code .”

While working with Docker containers, one of my goals is being able to edit files quickly to ensure programs have the right versions. Editing from terminal has to be quick and this allows me to type “code .” on a folder to have the folder open in VSC
Link To Site

#copy paste to bash_profile
code () { VSCODE_CWD="$PWD" open -n -b "com.microsoft.VSCode" --args $* ;}

Code To open VSC from terminal

# add all to VSC
code .

# add just a file
code <file-name>
#ex. code Dockerfile