Optionally strip translations in PPAs

Bug #391820 reported by Steve Magoun
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Unassigned
pkgbinarymangler (Ubuntu)
Fix Released
Undecided
Martin Pitt

Bug Description

There should be an option to strip translations from packages built in a PPA. The goal is to build packages in the PPA that, when installed, still use Ubuntu's translations (from the language packs). I understand this can be accomplished by installing pkgbinarymangler in the build chroot, which will strip translations from the package. When stripping a package, I don't think there is a requirement to keep the package's translations - they can be discarded.

Since this behavior is not desirable for all PPAs, there should be an option to enable/disable it on a per-PPA basis.

For extra credit consider adding a per-package option that would override the per-PPA option. This would allow someone to set their PPA to strip translations by default, but *not* strip the translations from one or more specific packages.

tags: added: feature
Changed in soyuz:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Matthias Klose (doko) wrote :

the environment variable NO_PKG_MANGLE controls this on a per package basis. maybe another way is needed for the PPA stuff. using the configudation options of pkgbinarymangler?

Revision history for this message
Steve Magoun (smagoun) wrote :

If pkgbinarymangler were installed in all PPAs, could 'enable' in /etc/pkgbinarymangler/striptranslations.conf be flipped on/off on a per-PPA basis to control whether translations were stripped?

(It would also interesting to use pkgmaintainermangler to modify the maintainer of each package in a PPA to be the person/team who owns the PPA or the person who uploaded the package).

Revision history for this message
Adam Conrad (adconrad) wrote :

I like the sidetrack feature suggested by Steve (mangling maintainer to the PPA owner) in comment number 2, actually, but that's not what I landed here to comment on. Can someone perhaps file a seperate bug for that? :)

In Ubuntu proper, langpacks carry translations for fairly specific versions (ie: the release version) of the packages they're meant to be used with. Given the chance that OEM PPAs will include packages that aren't just "customisations of the normal Ubuntu packages", but often newer revisions (with newer strings), then we really DON'T want to strip the translations out, as the Ubuntu langpacks wouldn't correctly cover those strings.

This is why we disable striptranslations on the fly for backports, for instance. Backports will just contain too many changed strings, and will end up being half untranslated due to string mismatches with the langpacks, so they're better off being "fat" binaries that keep their translations.

Keep in mind, of course, that for packages you ship that are "more or less the same version as Ubuntu", the Ubuntu langpacks will still provide extra translations for languages that the package just plain doesn't ship translations for. The only loss of functionality is where we've outright replaced a translation.

On the whole, while I understand why you're asking for the feature (and if you still want it, "the customer is always right", and we can implement it), I don't think it's the right thing to do for your usecase.

Revision history for this message
Martin Pitt (pitti) wrote :

I see two options here:

quick & dirty
------------------
We could actually implement the proposal above, and enable pkgbinarymangler in the OEM PPAs. As long as you don't do major version upgrades and don't change strings, the Ubuntu langpacks should by and large still apply. However, it would mean that any package in the OEM langpacks which isn't in Ubuntu main would be totally untranslated.

So for new packages, or packages with heavily changed strings, you could set NO_PKG_MANGLE=1 in debian/rules to disable stripping on a per-package basis.

However, this would never be 100% correct, and you'd have to track the PKG_MANGLE fields pretty well.

I certainly consider it a working first option, until we found something better, since it gets you the space savings and 90% of the result.

correct & involves efforts
-------------------------------------
Recently, Rosetta grew the capability of message sharing, which turned the task of opening a new distrorelease from "OMG takes 2 months" to "snap of a finger". From my naive POV it should actually be feasible to use pkgbinarymangler exactly like in Ubuntu proper, i. e. strip out all translations and import the _translation.tar.gz into Rosetta. Rosetta would then import them into a hardy-oemfoo, hardy-oembar, etc. release. The new thing which we'd need to implement is to throw away strings that are already present in "hardy", and just keep the delta in hardy-oem$name. I think Rosetta already sort of works like that with message sharing, but Danilo would know better.

With this, langpack-o-matic could be used the normal way, too, and build (tiny) oem-$name langpacks, similar to the -gnome and -kde langpacks.

Danilo, David, am I talking rubbish here, or would that actually be feasible?

Revision history for this message
Steve Magoun (smagoun) wrote :

(Filed bug 403510 about using pkgbinarymangler to mangle the maintainer for packages in a PPA)

I think Martin's 'quick + dirty' option is probably what we should start with; I agree that it's the 90% use case for us. I think the 'correct but effort' option is really interesting and probably where we want to be in the long run - I'll defer to Kyle (OEM's translations expert) on that one though.

Revision history for this message
Kyle Nitzsche (knitzsche) wrote :

I'd like to express the current problem for the oem group to help define the solution.

PROBLEM: when we (oem) build packages currently, we do not strip mo files. Nor do we populate the package first with po files from Ubuntu/lang packs. Instead, to date, we have simply used what was in the package. This is the upstream translation content.

CONSEQUENCES: for pkgs that oem builds, we use upstream translations (we do not benefit from lp.net/ubuntu work) and we cannot use launchpad to check translation status. These are serious problems.

So, what do we need (use cases)?

 * in 99% of the cases we want to use translations from language packs, even if we build the package. Most of our modifications do not involve string changes, and most time we use the same major version as the Ubuntu released version, so the strings match up.
 * on a per package basis, we want to be able to NOT use translations from language packs but instead populate the pkg with translations we know are right (usually these would be obtained from Rosetta) and then customize them as needed.

I understand our (oem) plan is to move to ppa building.

So from a high level, I think of the solution as being along these lines:
 * when package is built in an oem ppa, its translations are stripped if the following two conditions are met:
   - that package would have its translations stripped in normal Ubuntu
   - we did not set the NO_PKG_MANGLE=1 in the package

Unless I've missed something, this would handle all cases. We get to use langpacks for pkgs we build and we have the ability to customize translations if we need to (with those customizations delivered as mo files in the pkg itself).

Addressing Adam's point, it would be up to us (oem) to track cases where we use a version of a package that is different than the released Ubuntu version. This would be a small number of cases though.

Revision history for this message
Kyle Nitzsche (knitzsche) wrote :

(Looks like Steve and I were writing comments at the same time.)

So regarding the quick and dirty method: This sounds very good except: I propose only stripping pkgs if they are stripped in ubuntu. This would save us a LOT of work and prevent numerous errors (where someone forgot to add NO_PKG_MANGLE=1).

How hard would that be? (add logic to pkgbinarymanagler to check component and only let it still be a candidate for stripping if pkg is in main component?)

Revision history for this message
Kyle Nitzsche (knitzsche) wrote :

Regarding Martin Pitt's "correct" method, Arne also mentioned that to me in Barcelona, and it sounds reasonable, but it needs to be fleshed out, prototyped, tested, deployed, etc. If it works, it could, perhaps, become a generic feature made more broadly (publicly) available.

Revision history for this message
Martin Pitt (pitti) wrote :

Thanks, Kyle. So it looks as if the quick&dirty method would actually go a long way to what you need then.

pkgbinarymangler itself does not figure out package components. It consults a file /CurrentlyBuilding which the buildd puts there, based on the soyuz data. Code:

# check whether /CurrentlyBuilding is present; so we can check the component
if [ -f /CurrentlyBuilding ]; then
    readctrl "$CONFFILE" "components"
    stripcomponents="$RET"
    unset dostrip
    readctrl "/CurrentlyBuilding" "Component"
    for c in $stripcomponents; do
        if [ $c = "$RET" ]; then dostrip=1; fi
    done
else
    dostrip=1
fi

CONFFILE is /etc/pkgbinarymangler/striptranslations.conf and defaults to stripcomponents="main". In other words, PPA buildds have to put "Component: main" for packages which are in Ubuntu main, then everything will just work.

Revision history for this message
David Planella (dpm) wrote :

To address Martin's question on the 'correct & involves efforts' method, it sounds feasible to me except for this point, where only Danilo, Henning or Jeroen can give you an authoritative answer, since I'm not familiar with the inner workings of message sharing:

> The new thing which we'd need to implement is to throw away strings that are already present in "hardy", and just keep the delta in hardy-oem$name. I think Rosetta already sort of works like that with message sharing, but Danilo would know better.

Regarding this, from what you are proposing and from my (limited) understanding of message sharing, there would be several distro series which would have been marked as sharing strings:

hardy (the Ubuntu one)
hardy-oemfoo
hardy-oembar
hardy-oemn

Then a new feature would have to be added to Rosetta whereby for each hardy-oemblablah a tarball with all strings differing (it would probably be mostly new strings, I guess) from the base 'hardy' distro series would be exported and used to build the -oemfoo, etc. language packs. The current message sharing infrastructure could now make the implementation easier. Is that what you meant?
(I apologise if I'm repeating your words here, I'm just trying to make sure I understand the proposal)

If this is the case and if there is interest, this particular feature should probably be filed against rosetta. But by Kyle's comments, I understand that the first option looks good already for the current OEM needs.

Revision history for this message
Michael Bienia (geser) wrote :

Wouldn't be a finer control over the parts of pkgbinarymangler be better than using this big switch NO_PKG_MANGLE which disable all parts of pkgbinarymangler?
E.g. it might be useful to run pkgsanitychecks even for PPAs or creating ddebs while pkgstriptranslations is disabled (or any other combination).

Btw: the diverted dpkg-deb (and dh_strip) check for "Purpose: PPA" in /CurrentlyBuilding and set NO_PKG_MANGLE in that case. So this would need modifications anyway to enable parts of pkgbinarymangler to be run on PPAs.

Revision history for this message
Martin Pitt (pitti) wrote : Re: [Bug 391820] Re: Optionally strip translations in PPAs

David Planella [2009-07-23 14:55 -0000]:
> Then a new feature would have to be added to Rosetta whereby for
> each hardy-oemblablah a tarball with all strings differing (it would
> probably be mostly new strings, I guess) from the base 'hardy'
> distro series would be exported and used to build the -oemfoo, etc.
> language packs.

Exactly. This has already worked for ages for the delta tarballs,
which we use to build the update langpacks for stable-updates.

Revision history for this message
Adam Conrad (adconrad) wrote :

And yes, NO_PKG_MANGLE really is meant to be "don't do any mangling at all", it was implemented to work around packages that just plain CAN'T handle being mangled (like, testbuiling fake packages in a testsuite, where you need consistent output, for instance), it's not really meant to control turning striptranslations on and off, it's just ended up being used for that. :/

OEM PPAs, I'd imagine, would end up with the same component-level mappings as the security PPAs do (ie: they feed the proper components at build time, rather than just assuming the whole PPA is universe/multiverse), here's hoping that's still the plan. If so, then the "only strip what would be stripped in Ubuntu" thing would be there for free.

From my perspective, I'd rather work toward doing it the "right and elegant" way, rather than any dirty hacks, but that's up to the rosetta guys telling us how feasible that is, and what sort of timeframe they could commit to. The nice thing about the "right" way is that it could easily be extended to allow these langpack-diffs for every PPA, not just special-purpose ones, which would be a pretty slick feature.

Steve, is this really a "blocker" for moving to PPA (given that this is a feature you don't currently have in your pre-PPA archive setup)? I'd like to think not, but...?

Revision history for this message
Steve Magoun (smagoun) wrote :

@Adam - this is not a blocker for moving to PPAs, but we figured the move to PPAs would be a good time to start doing it.

Revision history for this message
Данило Шеган (danilo) wrote :

Martin Pitt:
> David Planella [2009-07-23 14:55 -0000]:
> > Then a new feature would have to be added to Rosetta whereby for
> > each hardy-oemblablah a tarball with all strings differing (it would
> > probably be mostly new strings, I guess) from the base 'hardy'
> > distro series would be exported and used to build the -oemfoo, etc.
> > language packs.
>
> Exactly. This has already worked for ages for the delta tarballs,
> which we use to build the update langpacks for stable-updates.

I'd say it would be simpler to open a new series (hardy-oemfoo), and upload only the relevant templates (i.e. build only relevant packages in there). They would still benefit from message sharing (though, note that we are not yet sharing translations from Hardy with Jaunty and Karmic), and a full language pack export would include only the relevant bits and we wouldn't need to provide deltas between different series (which can be hard to define).

Revision history for this message
Martin Pitt (pitti) wrote :

I actually worked on this during my OEM cycle a few months ago, and OEM projects now do use pkgbinarymangler in their PPAs. Not for the lucid-based ones (if you need them there, you need to backport pkgbinarymangler into your project PPA), but after discussions with Kyle it was found that we want this by default from now on (i. e. for maverick based projects and newer).

I think this can be closed as "fix released" now. Kyle, is there still something missing here?

pkgbinarymangler (70) maverick; urgency=low

  * Move "PPA" check from dpkg-deb to pkgmaintainermangler and
    pkgstriptranslations. We want to call pkgsanitychecks for PPA builds as
    well, and also control behaviour for PPAs individually for each mangler.
  * pkgstriptranslations: Run if we are building in an OEM PPA. If the built
    package is in Ubuntu main, strip translations from it. However, introduce
    a blacklist of OEM projects (oem_blacklist in striptranslations.conf)
    which are close or past release, to not inflict this rather intrusive
    change on them.
  * debian/control: Update Maintainer field from myself to Ubuntu Developers.
  * Add debian/source/format: 3.0 (native).
  * debian/control: Bump Standards-Version to 3.9.0.
  * Apply a consistent indentation to all source files (4 spaces, expand
    tabs).
  * dpkg-deb: Replace hardcoded paths with "which", to allow local testing.
  * Replace hardcoded "/CurrentlyBuilding" path with a $BUILDINFO variable set
    in "common", to allow local testing.
  * Add test/icecream: Test source package building two binary packages
    "vanilla" and "chocolate" with two po/mo files each.
  * Allow changing the path of "common" with $PKGBINARYMANGLER_COMMON_PATH.
  * Allow changing the configuration file directory path with
    $PKGBINARYMANGLER_CONF_DIR.
  * pkgmaintainermangler: Allow changing the path to the override file with
    $PKMAINTAINERGMANGLER_OVERRIDES.
  * Add test/run: Test suite for checking the scripts/config files in the
    local build tree in a sandbox on the "icecream" test package. Cover all
    current scenarios: main/universe/PPA/OEM/NO_PKG_MANGLE/partner/local/no
    mangler.
  * debian/rules: Run tests during build, and have a failed test suite fail
    the build. Add python and fakeroot build dependencies for this.
  * Add debian/pkgbinarymangler.lintian-overrides: We divert "dpkg-deb", no
    need to ship a manpage for it.
  * striptranslations.blacklist: Drop language-selector. (LP: #570240)
  * test/run: Add test case for updating of Installed-Size. This reproduces
    LP #451764.
  * pkgstriptranslations: Adapt Installed-Size: header in control file if we
    stripped any files. (LP: #451764)

 -- Martin Pitt <email address hidden> Fri, 09 Jul 2010 09:19:44 +0200

Revision history for this message
Kyle Nitzsche (knitzsche) wrote :

Hi Martin,
Yes, feel free to mark fixed-released. (If any behaviors pop up unexpectedly, I'll let you know.)

Martin Pitt (pitti)
Changed in launchpad:
status: Triaged → Fix Released
Changed in pkgbinarymangler (Ubuntu):
status: New → Fix Released
assignee: nobody → Martin Pitt (pitti)
Revision history for this message
Ted Gould (ted) wrote :

Martin,

Does this mean that we can get the appropriate symbols to have apport backtrace PPA built packages as well?

Thanks, Ted

Revision history for this message
William Grant (wgrant) wrote :

Debug symbols are harder, as they need LP support. I did most of the implementation a year or so ago, but it never got finished, and my priorities are no longer my own :(

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.