Don’t judge a tool by its look.

This story was originally posted on Medium.

stuck

Don’t worry if you are feeling stuck. We’ll figure it out.

Remember the old adage that says, “don’t judge a book by its covers” ? Well, what follows next is an interesting story that happened to me today and in many ways reminded me about that old saying.

Lately I’ve been very enthusiastic about where Cilium is. I’ve blogged about and have been talking to others about how great it is. With that, you’d imagine how surprised I was to hear from one of my gurus that there seems to be an issue when deploying cilium with aks-engine to Azure.

From that exchange I was intrigue by the fact that something was out of place. Did the deployment really misconfigured kube-dns ? If so, how ?

Assumptions at this point:

  1. Installing cilium through AKS-Engine misconfigured the kube-dns service.

Lesson 1: Double check your assumptions.

To confirm that the behavior we were seeing, I’ve decided to use a second tool, in this case: dig. Using a second tool is a lot like having a second opinion from a Doctor to confirm wether or not you are sick and what’s the best treatment. And to my surprise, the disease wasn’t the one I was expecting.

# dig  deathstar.default.svc.cluster.local

; <<>> DiG 9.10.4-P8 <<>> deathstar.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62090
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;deathstar.default.svc.cluster.local. IN    A

;; ANSWER SECTION:
deathstar.default.svc.cluster.local. 30    IN A    10.0.70.187

;; Query time: 1 msec
;; SERVER: 10.0.0.10#53(10.0.0.10)
;; WHEN: Fri Jul 06 16:28:29 UTC 2018
;; MSG SIZE  rcvd: 69

Wait ? dig works ? Trying the reverse lookup also worked.

# dig -x 10.0.70.187

; <<>> DiG 9.10.4-P8 <<>> -x 10.0.70.187
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37144
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;187.70.0.10.in-addr.arpa.    IN    PTR

;; ANSWER SECTION:
187.70.0.10.in-addr.arpa. 30    IN    PTR    deathstar.default.svc.cluster.local.

;; Query time: 1 msec
;; SERVER: 10.0.0.10#53(10.0.0.10)
;; WHEN: Fri Jul 06 16:30:41 UTC 2018
;; MSG SIZE  rcvd: 91

And here’s another surpise: nslookup works now!

Lesson 2: The devil is in the details

How could nslookup work after the installation of bind-tools ? Did installing bind-utils had the side-effect of fixing nslookup ?

Let’s disect nslookup First, I’ve rolled back my configuration by uninstalling bind-utils. Then, let’s check what libraries are linked (if any) to nslookup:

# ldd /usr/bin/nslookup
      /lib/ld-musl-x86_64.so.1 (0x7f31a076b000)
      libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f31a076b000)

Now, after installing bind-tools here is what we have:

# ldd /usr/bin/nslookup
/lib/ld-musl-x86_64.so.1 (0x7f229f407000)
libdns.so.165 => /usr/lib/libdns.so.165 (0x7f229ee4f000)
liblwres.so.141 => /usr/lib/liblwres.so.141 (0x7f229ec3e000)
libbind9.so.140 => /usr/lib/libbind9.so.140 (0x7f229ea30000)
libisccfg.so.140 => /usr/lib/libisccfg.so.140 (0x7f229e80b000)
libisc.so.160 => /usr/lib/libisc.so.160 (0x7f229e5b0000)
libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f229f407000)
libcrypto.so.1.0.0 => /lib/libcrypto.so.1.0.0 (0x7f229e191000)
libcap.so.2 => /usr/lib/libcap.so.2 (0x7f229df8c000)
libisccc.so.140 => /usr/lib/libisccc.so.140 (0x7f229dd83000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7f229db70000)
libz.so.1 => /lib/libz.so.1 (0x7f229d95a000)
libattr.so.1 => /lib/libattr.so.1 (0x7f229d755000)

Wow ! What’s up with all of these new libraries ? That seems to fix the issue by installing more libraries (this is odd).

With that, assumptions so far:

  1. DNS is working in the cluster. This invalidates assumption #1 where we thought that when enabling cilium would misconfigure kube-dns in some shape or form.

  2. Using the default nslookup on the alpine image, the naming resolution will fail.

  3. If you install bind-util dns resolution will work with both nslookup or dig. It seems like nslookup is missing a few libraries from the default installation and that these libraries are getting installed with bind-tools

This leads me to this: is nslookup on the docker alpine broken ? Running a md5sum between the pre/post installation of bind-tools actually revealed something a little different.

kubectl exec -it xwing bash
/ bash-4.3# which nslookup
/usr/bin/nslookup

/ bash-4.3# ls -larht /usr/bin/nslookup
lrwxrwxrwx    1 root     root          12 Jul  6 16:44 **/usr/bin/nslookup** -> **/bin/busybox**

/ # md5sum /usr/bin/nslookup
ded4975e6cc7ba1fce58ee7e3d557e13  /usr/bin/nslookup  
/ #

After bind-tools are installed:

/ # md5sum /usr/bin/nslookup
d0c7620c3ed745b5b17d7df77470bbe0  /usr/bin/nslookup
/ #

That could be due to the update (new package?).

/ # ls -larht /usr/bin/nslookup
-rwxr-xr-x    1 root     root       94.3K Nov 21 10:57 /usr/bin/nslookup
/ #

Ha! This is it! nslookup on a default alpine image is actually part of busybox and it doesn’t have (at least not of the time of this writting) all of the same features and flags as the nslookup provided by bind-tools.

Phew! That was fun ! :)