[UI][EDP] Schema for "args" is incorrect for Pig jobs

Bug #1269968 reported by Trevor McKay
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Sahara
Fix Released
Medium
Chad Roberts

Bug Description

For Pig jobs, job_configs["args"] should be a list of strings. Savanna currently requires it to be a dictionary and generates a workflow which is incorrect but will still run.

Here is some background:

Oozie allows <param> tags and <argument> tags in pig actions. Both are used to pass options to the Pig script (Pig usage is shown at the end of this description).

For Oozie pig actions

  <param>name=value</param>

is short hand for

  <argument>-param</argument>
  <argument>name=value</argument>

So, job_configs["params"] should be a dictionary (as it is) but job_configs["args"] should be a list of literal strings which are passed as Pig options, separated by spaces.

Savanna currently uses <param> tags to set the INPUT and OUTPUT values for the pig script based on data sources.
Job_configs["args"] would be used to set additional flags (see the Pig usage below).

As an example, the following 2 Oozie workflows are equivalent:

<?xml version="1.0" ?>
<workflow-app name="job-wf" xmlns="uri:oozie:workflow:0.2">
  <start to="job-node"/>
  <action name="job-node">
    <pig>
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <configuration>
        <property>
          <name>fs.swift.service.savanna.password</name>
          <value>password</value>
        </property>
        <property>
          <name>fs.swift.service.savanna.username</name>
          <value>user</value>
        </property>
        <property>
           <name>oozie.use.system.libpath</name>
           <value>true</value>
        </property>
      </configuration>
      <script>pig.script</script>
      <param>INPUT=swift://tmckay.savanna/input</param>
      <param>OUTPUT=swift://tmckay.savanna/output</param>
    </pig>
    <ok to="end"/>
    <error to="fail"/>
  </action>

<?xml version="1.0" ?>
<workflow-app name="job-wf" xmlns="uri:oozie:workflow:0.2">
  <start to="job-node"/>
  <action name="job-node">
    <pig>
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <configuration>
        <property>
          <name>fs.swift.service.savanna.password</name>
          <value>password</value>
        </property>
        <property>
          <name>fs.swift.service.savanna.username</name>
          <value>user</value>
        </property>
      </configuration>
      <script>pig.script</script>
      <argument>-param</argument>
      <argument>OUTPUT=swift://tmckay.savanna/output</argument>
      <argument>-param</argument>
      <argument>INPUT=swift://tmckay.savanna/input</argument>
    </pig>
    <ok to="end"/>

Here is Pig usage, taken from Oozie logs on a failed job:

Apache Pig version 0.10.1 (r1426282)
compiled Dec 27 2012, 11:24:26

USAGE: Pig [options] [-] : Run interactively in grunt shell.
       Pig [options] -e[xecute] cmd [cmd ...] : Run cmd(s).
       Pig [options] [-f[ile]] file : Run cmds found in file.
  options include:
    -4, -log4jconf - Log4j configuration file, overrides log conf
    -b, -brief - Brief logging (no timestamps)
    -c, -check - Syntax check
    -d, -debug - Debug level, INFO is default
    -e, -execute - Commands to execute (within quotes)
    -f, -file - Path to the script to execute
    -g, -embedded - ScriptEngine classname or keyword for the ScriptEngine
    -h, -help - Display this message. You can specify topic to get help for that topic.
        properties is the only topic currently supported: -h properties.
    -i, -version - Display version information
    -l, -logfile - Path to client side log file; default is current working directory.
    -m, -param_file - Path to the parameter file
    -p, -param - Key value pair of the form param=val
    -r, -dryrun - Produces script with substituted parameters. Script is not executed.
    -t, -optimizer_off - Turn optimizations off. The following values are supported:
            SplitFilter - Split filter conditions
            PushUpFilter - Filter as early as possible
            MergeFilter - Merge filter conditions
            PushDownForeachFlatten - Join or explode as late as possible
            LimitOptimizer - Limit as early as possible
            ColumnMapKeyPrune - Remove unused data
            AddForEach - Add ForEach to remove unneeded columns
            MergeForEach - Merge adjacent ForEach
            GroupByConstParallelSetter - Force parallel 1 for "group all" statement
            All - Disable all optimizations
        All optimizations listed here are enabled by default. Optimization values are case insensitive.
    -v, -verbose - Print all error messages to screen
    -w, -warning - Turn warning logging on; also turns warning aggregation off
    -x, -exectype - Set execution mode: local|mapreduce, default is mapreduce.
    -F, -stop_on_failure - Aborts execution on the first failed job; default is off
    -M, -no_multiquery - Turn multiquery optimization off; default is on
    -P, -propertyFile - Path to property file

Tags: edp
Changed in savanna:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Trevor McKay (tmckay)
milestone: none → icehouse-3
tags: added: edp
Trevor McKay (tmckay)
summary: - [EDP] Schema for "args" is incorrect for Pig jobs
+ [UI][EDP] Schema for "args" is incorrect for Pig jobs
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to savanna (master)

Fix proposed to branch: master
Review: https://review.openstack.org/67588

Changed in savanna:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to savanna (master)

Reviewed: https://review.openstack.org/67588
Committed: https://git.openstack.org/cgit/openstack/savanna/commit/?id=de0a0b008292e5c34f2983686168c3d1a72e6b13
Submitter: Jenkins
Branch: master

commit de0a0b008292e5c34f2983686168c3d1a72e6b13
Author: Trevor McKay <email address hidden>
Date: Fri Jan 17 16:49:48 2014 -0500

    Change configs["args"] to be a list for Pig jobs

    The schema for Pig jobs incorrectly listed "args" as a dictionary.
    It should be a list of strings for all job types. Modify the schema
    and workflow generation to correctly handle "args" as a list.

    There is a UI component to this bug as well. To avoid having to
    sync changes in savanna and savanna-dashboard, temporarily allow
    "args" to be passed from the UI as a dictionary and replace it
    with an empty list. This will prevent the UI from breaking.

    Partial-bug: #1269968

    Change-Id: Ieacb0596dbb6da5d5f26dc6a5b923f2bafa819f2

Trevor McKay (tmckay)
Changed in savanna:
assignee: Trevor McKay (tmckay) → Chad Roberts (croberts)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to savanna-dashboard (master)

Fix proposed to branch: master
Review: https://review.openstack.org/68467

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to savanna-dashboard (master)

Reviewed: https://review.openstack.org/68467
Committed: https://git.openstack.org/cgit/openstack/savanna-dashboard/commit/?id=8eb7c9f65a782aabeca924e76274c7f704ad76d6
Submitter: Jenkins
Branch: master

commit 8eb7c9f65a782aabeca924e76274c7f704ad76d6
Author: Chad Roberts <email address hidden>
Date: Wed Jan 22 14:33:11 2014 -0500

    Changing the args type for Pig jobs to a list

    The args for a Pig job are now required to be a list rather
    than a dict. This change address that from the dashboard's perspective.

    Partial-bug: #1269968

    Change-Id: Ia4a30399686db4291e23efdbc9d7eea12a157a52

Chad Roberts (croberts)
Changed in savanna:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in savanna:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in sahara:
milestone: icehouse-3 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.