Comment 0 for bug 1882535

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote : focal/core20: staging conflicts when multiple python parts have the same python dependencies (python 3.7+)

TL;DR: as of python 3.7, .pyc files by default include a timestamp and a size of the source file which results in a change of a hash every time a .pyc file is generated for a given source file. This results in staging conflicts for python parts.
https://docs.python.org/3/library/py_compile.html#py_compile.compile
https://docs.python.org/3/library/py_compile.html#py_compile.PycInvalidationMode.TIMESTAMP

Workaround: add the following build override to a python part
    override-build:
      export SOURCE_DATE_EPOCH=0
      snapcraftctl build

Description/analysis:

When building a project with multiple identical python dependencies I consistently get an error like this:

Failed to stage: Parts 'openstack-projects' and 'cluster' have the following files, but with different contents:
    bin/activate
    bin/activate.csh
    bin/activate.fish
    bin/python3
    pyvenv.cfg
    bin/python3
    lib/python3.8/site-packages/Flask-1.1.2.dist-info/RECORD
    lib/python3.8/site-packages/__pycache__/easy_install.cpython-38.pyc
    lib/python3.8/site-packages/certifi/__pycache__/__init__.cpython-38.pyc
    lib/python3.8/site-packages/certifi/__pycache__/__main__.cpython-38.pyc
# many other .pyc files ...

While snapcraft suggests that I use something like `organize`, `filesets` and `stage`, the issue is that the source files for those dependencies are identical - there is no reason for any manual work here.

Source hashes are the same:

snapcraft-microstack # sha256sum ./parts/cluster/install/lib/python3.8/site-packages/click/_textwrap.py
6a30b3933165cb9b639bd7e843937dfcc39e69824c063025b6e15aebd9f88976

./parts/cluster/install/lib/python3.8/site-packages/click/_textwrap.py
snapcraft-microstack # sha256sum ./parts/openstack-projects/install/lib/python3.8/site-packages/click/_textwrap.py
6a30b3933165cb9b639bd7e843937dfcc39e69824c063025b6e15aebd9f88976 ./parts/openstack-projects/install/lib/python3.8/site-packages/click/_textwrap.py

.pyc files are different:

snapcraft-microstack # sha256sum ./parts/openstack-projects/install/lib/python3.8/site-packages/click/__pycache__/_textwrap.cpython-38.pyc
398b47a5abfc87e9da73153e42d48dcd5d917bd637a0e0af1eb6999f19fb1085 ./parts/openstack-projects/install/lib/python3.8/site-packages/click/__pycache__/_textwrap.cpython-38.pyc

snapcraft-microstack # sha256sum ./parts/cluster/install/lib/python3.8/site-packages/click/__pycache__/_textwrap.cpython-38.pyc
d4642cfecd727d228944a1d31ff728e7ef6529a7a88898f6568ea6e96d1f8f82 ./parts/cluster/install/lib/python3.8/site-packages/click/__pycache__/_textwrap.cpython-38.pyc

RECORD files include hashes as well, hence they are also different:

snapcraft-microstack # diff ./parts/openstack-projects/install/lib/python3.8/site-packages/Flask-1.1.2.dist-info/RECORD ./parts/cluster/install/lib/python3.8/site-packages/Flask-1.1.2.dist-info/RECORD
1c1
< ../../../bin/flask,sha256=VXQqccMeG03Rn8_yN8Kq3Up13rzyaoHsEckFnCxHor4,242
---
> ../../../bin/flask,sha256=NAzPpe84iZFX3PYsCZEirt3fAFObAjBuCpM25792kSU,231

Apparently, as of python 3.7, .pyc files include a timestamp and a size of the source by default (PycInvalidationMode.TIMESTAMP). There is a way to override this behavior by setting the SOURCE_DATE_EPOCH environment variable to switch py_compile to using PycInvalidationMode.CHECKED_HASH:

https://docs.python.org/3/library/py_compile.html
py_compile.compile(file, cfile=None, dfile=None, doraise=False, optimize=-1, invalidation_mode=PycInvalidationMode.TIMESTAMP, quiet=0)

invalidation_mode should be a member of the PycInvalidationMode enum and controls how the generated bytecode cache is invalidated at runtime. The default is PycInvalidationMode.CHECKED_HASH ***if the SOURCE_DATE_EPOCH environment variable is set***, otherwise ***the default is PycInvalidationMode.TIMESTAMP***.

https://docs.python.org/3/library/py_compile.html#py_compile.PycInvalidationMode.TIMESTAMP
TIMESTAMP
The .pyc file includes the timestamp and size of the source file, which Python will compare against the metadata of the source file at runtime to determine if the .pyc file needs to be regenerated.

https://docs.python.org/3/library/py_compile.html#py_compile.PycInvalidationMode.CHECKED_HASH
CHECKED_HASH
The .pyc file includes a hash of the source file content, which Python will compare against the source at runtime to determine if the .pyc file needs to be regenerated.

The following workaround works because environment variables are passed down by default from the parent process in subprocess functions.

    override-build:
      export SOURCE_DATE_EPOCH=0
      snapcraftctl build