"Too many levels of symbolic links" in NFS via automount resolved by restarting Docker

by krivard   Last Updated December 21, 2017 22:00 PM

This is bizarre and while I have a workaround, I'd prefer a permanent fix.

I have a small group of GPU machines running Ubuntu 14.04 which I am using as workers for a cloud service that's effected via Docker images. I have nvidia-docker installed on all the worker machines, so that docker has access to the GPUs. The worker machines also function as individual servers which lab members can do experiments on directly (academic environment, the cloud service is experimental, etc). For the latter purpose, all the machines automount individual user shares over NFS. We recently switched to automount from a static fstab configuration, and I'm still getting used to it -- it's entirely possible there's some obvious issue at play here I'm not seeing because I'm an automount n00b. Finally, I haven't set anything up for docker images to be able to access the NFS shares, so in theory there should be no connection... in theory.

This week one of our lab members reported the Too many levels of symbolic links error when attempting to access their share drive from one of the GPU machines. They're not using docker at all (to their knowledge). There are no questionable symbolic links in their tree (via find -type l), so it has to be something else getting into a weird state. The mount point looks like this under ls -l from the parent directory:

dr-xr-xr-x 2 root root 0 Dec 5 18:38 labmember1

which seems... bad? root:root 555, really? and when you try to browse it you get, indeed:

$ cd /path/to/labmember1/
-bash: cd: /path/to/labmember1/: Too many levels of symbolic links

The share doesn't seem to actually be mounted -- it does not appear in /etc/mtab, and (predictably) attempts to unmount it manually report:

$ sudo umount /path/to/labmember1/
umount: /path/to/labmember1/: not mounted

Restarting autofs (service autofs restart) did nothing.

What I thought was unrelated at the time: docker had been spewing veth interfaces everywhere. This was a machine being actively used as a cloud worker, so I figured it was our cloud software. Now I'm not so sure.

Today the Too many levels of symbolic links failure occurred on another GPU machine, which has docker/nvidia-docker installed but does not run the cloud worker software. Lo and behold, veth interfaces everywhere, though in far fewer numbers than on the cloud worker machine.

On a whim, I stopped the docker service (service docker stop). Magic! The share mounts normally and our lab member can use their stuff again. The share remains in working condition after starting docker back up again.

So I can clearly fix this issue by restarting docker if(when) it happens again, but I'd like to know

  1. what is causing this in the first place? or, how can I find out?
  2. is there a way to prevent this from happening again, or am I stuck just fixing it every time it breaks?

Related Questions

autofs with nfs 4.2

Updated September 12, 2017 11:00 AM

using chained automount to mount home directory

Updated October 23, 2017 11:00 AM

Autofs and Samba not mounting on RHEL 7

Updated September 29, 2015 15:00 PM