JUJU_REMOTE_APP environment variable is not set in relation_broken hook

Bug #1960934 reported by Michele Mancioppi
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Unassigned

Bug Description

Juju is apparently not setting the `JUJU_REMOTE_APP` environment variable in relation_broken hooks and this causes hooks to fail with errors like the following:

```
unit-prometheus-k8s-0: 17:27:39 ERROR unit.prometheus-k8s/0.juju-log ingress:3: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/ops/model.py", line 1521, in _run
    result = run(args, **kwargs)
  File "/usr/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('/var/lib/juju/tools/unit-prometheus-k8s-0/relation-get', '-r', '3', '-', '', '--app', '--format=json')' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./src/charm.py", line 377, in <module>
    main(PrometheusCharm)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/ops/main.py", line 414, in main
    charm = charm_class(framework)
  File "./src/charm.py", line 68, in __init__
    external_url = urlparse(self._external_url)
  File "./src/charm.py", line 364, in _external_url
    if ingress_url := self.ingress.url:
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/lib/charms/traefik_k8s/v0/ingress_per_unit/ingress_per_unit.py", line 246, in url
    if not self.urls:
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/lib/charms/traefik_k8s/v0/ingress_per_unit/ingress_per_unit.py", line 234, in urls
    if not self.is_ready():
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/sborl/relation.py", line 280, in is_ready
    return any(self.is_ready(relation) for relation in self.relations)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/sborl/relation.py", line 280, in <genexpr>
    return any(self.is_ready(relation) for relation in self.relations)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/sborl/relation.py", line 282, in is_ready
    data = self.unwrap(relation)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/sborl/relation.py", line 357, in unwrap
    version = self._get_version(relation)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/sborl/relation.py", line 207, in _get_version
    remote_versions_raw = relation.data[relation.app].get(VERSION_KEY)
  File "/usr/lib/python3.8/_collections_abc.py", line 660, in get
    return self[key]
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/ops/model.py", line 430, in __getitem__
    return self._data[key]
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/ops/model.py", line 414, in _data
    data = self._lazy_data = self._load()
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/ops/model.py", line 779, in _load
    return self._backend.relation_get(self.relation.id, self._entity.name, self._is_app)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/ops/model.py", line 1588, in relation_get
    return self._run(*args, return_output=True, use_json=True)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/ops/model.py", line 1523, in _run
    raise ModelError(e.stderr)
ops.model.ModelError: b'ERROR "" is not a valid unit or application\n'
unit-prometheus-k8s-0: 17:27:39 ERROR juju.worker.uniter.operation hook "ingress-relation-broken" (via hook dispatching script: dispatch) failed: exit status 1
```

This is what I get when I log a `os.environ`:

```
unit-prometheus-k8s-0: 17:27:39 DEBUG unit.prometheus-k8s/0.juju-log ingress:3: environ({'JUJU_UNIT_NAME': 'prometheus-k8s/0', 'KUBERNETES_PORT': 'tcp://10.152.183.1:443', 'KUBERNETES_SERVICE_PORT': '443', 'JUJU_VERSION': '2.9.22', 'JUJU_CHARM_HTTP_PROXY': '', 'APT_LISTCHANGES_FRONTEND': 'none', 'JUJU_CONTEXT_ID': 'prometheus-k8s/0-ingress-relation-broken-1960487765391186668', 'JUJU_AGENT_SOCKET_NETWORK': 'unix', 'JUJU_API_ADDRESSES': '10.152.183.250:17070 controller-service.controller-development.svc.cluster.local:17070', 'JUJU_CHARM_HTTPS_PROXY': '', 'JUJU_AGENT_SOCKET_ADDRESS': '@/var/lib/juju/agents/unit-prometheus-k8s-0/agent.socket', 'JUJU_MODEL_NAME': 'cos', 'JUJU_DISPATCH_PATH': 'hooks/ingress-relation-broken', 'JUJU_AVAILABILITY_ZONE': '', 'JUJU_REMOTE_UNIT': '', 'JUJU_CHARM_DIR': '/var/lib/juju/agents/unit-prometheus-k8s-0/charm', 'TERM': 'tmux-256color', 'KUBERNETES_PORT_443_TCP_ADDR': '10.152.183.1', 'JUJU_RELATION': 'ingress', 'PATH': '/var/lib/juju/tools/unit-prometheus-k8s-0:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/charm/bin', 'JUJU_RELATION_ID': 'ingress:3', 'KUBERNETES_PORT_443_TCP_PORT': '443', 'JUJU_METER_STATUS': 'AMBER', 'KUBERNETES_PORT_443_TCP_PROTO': 'tcp', 'JUJU_HOOK_NAME': 'ingress-relation-broken', 'LANG': 'C.UTF-8', 'CLOUD_API_VERSION': '1.23.0', 'DEBIAN_FRONTEND': 'noninteractive', 'JUJU_SLA': 'unsupported', 'KUBERNETES_PORT_443_TCP': 'tcp://10.152.183.1:443', 'KUBERNETES_SERVICE_PORT_HTTPS': '443', 'JUJU_MODEL_UUID': '36c17c1c-090f-4ee9-8c45-d1fb5e18d3f6', 'KUBERNETES_SERVICE_HOST': '10.152.183.1', 'JUJU_MACHINE_ID': '', 'JUJU_CHARM_FTP_PROXY': '', 'JUJU_METER_INFO': 'not set', 'PWD': '/var/lib/juju/agents/unit-prometheus-k8s-0/charm', 'JUJU_PRINCIPAL_UNIT': '', 'JUJU_CHARM_NO_PROXY': '127.0.0.1,localhost,::1', 'PYTHONPATH': 'lib:venv', 'CHARM_DIR': '/var/lib/juju/agents/unit-prometheus-k8s-0/charm', 'JUJU_REMOTE_APP': '', 'OPERATOR_DISPATCH': '1'})
```

It seems to me that the problem is that `JUJU_REMOTE_APP` is not set, while it should.

Revision history for this message
John A Meinel (jameinel) wrote :

By the time you get to 'relation-broken' the remote application has been removed from the relation (we are informing you that it has gone away).

If you look at 'relation-list' for that relation, the unit and the associated app has been removed.

source-relation-changed
.../charm# relation-ids source
source:1
.../charm# relation-list -r source:1
dummy-source/0

vs

source-relation-broken
.../charm# relation-ids source
source:1
.../charm# relation-list -r source:1

(no text)

it is true that 'relation-ids' still lists the relation (as relation-broken is the last step before we remove it).
However, it is a little odd for the charm to decide to check "is the remote app telling me that this relation is happy" when we're telling you that the relation is going away.

I don't know exactly why ingress_per_unit is blindly iterating all relations, and not handling the case where you have a relation but the remote application is going away. Certainly it has relation-broken has always operated this way.
I'm guessing that at this point
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/sborl/relation.py", line 207, in _get_version
    remote_versions_raw = relation.data[relation.app].get(VERSION_KEY)

relation.app is None (or maybe ""?)

Changed in juju:
status: New → Incomplete
Revision history for this message
Michele Mancioppi (michele-mancioppi) wrote :

I am somewhat ambivalent on whether it is semantically consistent to have the remote_app there or not when executing the hook. It does, however, seem to me a violation of the element of least surprise: I can discover the relation iterating over the model, but if I access some of its fields, things blow up.

Revision history for this message
Cory Johns (johnsca) wrote :

relation.app is an Application instance during the broken hook but it has a blank name. I believe that the application-level relation data is also still available, but the way the framework tries to load the data, it ends up passing that empty name in to relation-get, leading to the error. I don't think that the charm / interface code actually needs that data, since it's really just checking whether the relation is valid or not. Perhaps this should be handled more gracefully in the framework; we can certainly work around it in the library but I've seen this edge case come up before in other charm code so it's probably worth handling it in the framework and just returning empty relation data to avoid the issue.

Revision history for this message
Michele Mancioppi (michele-mancioppi) wrote :

I’ll run it by Pen, let’s see what her take is

Revision history for this message
Pen Gale (pengale) wrote :

I agree that the Operator Framework should not fall over in this case.

I went ahead and filed https://github.com/canonical/operator/issues/693

Revision history for this message
Claudiu Belu (cbelu) wrote :

I want to mention that I've encountered this type of errors before [1], and that they still happen for the nginx-ingress-integrator charm when a related application is removed (the charm iterates over its relations in order to establish what Kubernetes Services and Ingress Resources need to be created / removed).

Will mention this in the https://github.com/canonical/operator/issues/693 as well.

[1] https://github.com/finos/legend-juju-gitlab-integrator/pull/6

Revision history for this message
Robert Carlsen (rwcarlsen) wrote :

I'm working on some better handling of this from the operator framework side. But it would still be helpful I think to have JUJU_REMOTE_APP set in this case to possibly facilitate cleanup for a charm - e.g. purging local caches, etc. it may have involving the remote app.

Revision history for this message
John A Meinel (jameinel) wrote :

Looking at the Juju side of the code, it is trying to set it:

github.com/juju/juju/worker/uniter/relation/resolver.go 235:
                return hook.Info{
                        Kind: hooks.RelationBroken,
                        RelationId: relationId,
                        RemoteApplication: r.stateTracker.RemoteApplication(relationId),
                }, nil

It would seem by that time the remote application is already gone from the tracking state, so we don't have a value to give.
Earlier in that file we do have a list of locally known app names in the relation, it might be possible for us to look there. I'm not positive, since we have been doing things like 'relation-departed' and having things cleaning up from there. I'm not positive if localState.ApplicationMembers only holds a value if the application has set a value in the app data bag. But it might be a potential solution.

Revision history for this message
John A Meinel (jameinel) wrote :

The code is clearly trying to set JUJU_REMOTE_APP, but is failing to find it when running relation broken. We can determine if we *could* update the context that gets set up for the Uniter, or whether that data is already gone, and then whether we could pull that information out of the last-known-state for the unit agent.

Changed in juju:
importance: Undecided → High
milestone: none → 2.9-next
status: Incomplete → Triaged
Ian Booth (wallyworld)
Changed in juju:
milestone: 2.9-next → none
Revision history for this message
John A Meinel (jameinel) wrote :

I did some debugging here, and at least for Machine charms, every time I hit relation-broken, it had JUJU_REMOTE_APP set. IIRC the unit agent that runs on containers does run a different code path, so it still might be a bug there.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.