Data Proc API, REST: Job methods

Статья создана

Обновлена 26 мая 2023 г.

JSON Representation
Methods

A set of methods for managing Data Proc jobs.

JSON Representation

{
  "id": "string",
  "clusterId": "string",
  "createdAt": "string",
  "startedAt": "string",
  "finishedAt": "string",
  "name": "string",
  "createdBy": "string",
  "status": "string",
  "applicationInfo": {
    "id": "string",
    "applicationAttempts": [
      {
        "id": "string",
        "amContainerId": "string"
      }
    ]
  },

  //  includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
  "mapreduceJob": {
    "args": [
      "string"
    ],
    "jarFileUris": [
      "string"
    ],
    "fileUris": [
      "string"
    ],
    "archiveUris": [
      "string"
    ],
    "properties": "object",

    // `mapreduceJob` includes only one of the fields `mainJarFileUri`, `mainClass`
    "mainJarFileUri": "string",
    "mainClass": "string",
    // end of the list of possible fields`mapreduceJob`

  },
  "sparkJob": {
    "args": [
      "string"
    ],
    "jarFileUris": [
      "string"
    ],
    "fileUris": [
      "string"
    ],
    "archiveUris": [
      "string"
    ],
    "properties": "object",
    "mainJarFileUri": "string",
    "mainClass": "string",
    "packages": [
      "string"
    ],
    "repositories": [
      "string"
    ],
    "excludePackages": [
      "string"
    ]
  },
  "pysparkJob": {
    "args": [
      "string"
    ],
    "jarFileUris": [
      "string"
    ],
    "fileUris": [
      "string"
    ],
    "archiveUris": [
      "string"
    ],
    "properties": "object",
    "mainPythonFileUri": "string",
    "pythonFileUris": [
      "string"
    ],
    "packages": [
      "string"
    ],
    "repositories": [
      "string"
    ],
    "excludePackages": [
      "string"
    ]
  },
  "hiveJob": {
    "properties": "object",
    "continueOnFailure": true,
    "scriptVariables": "object",
    "jarFileUris": [
      "string"
    ],

    // `hiveJob` includes only one of the fields `queryFileUri`, `queryList`
    "queryFileUri": "string",
    "queryList": {
      "queries": [
        "string"
      ]
    },
    // end of the list of possible fields`hiveJob`

  },
  // end of the list of possible fields

}

Field	Description
id	string ID of the job. Generated at creation time.
clusterId	string ID of the Data Proc cluster that the job belongs to.
createdAt	string (date-time) Creation timestamp. String in RFC3339 text format. The range of possible values is from `0001-01-01T00:00:00Z` to `9999-12-31T23:59:59.999999999Z`, i.e. from 0 to 9 digits for fractions of a second. To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).
startedAt	string (date-time) The time when the job was started. String in RFC3339 text format. The range of possible values is from `0001-01-01T00:00:00Z` to `9999-12-31T23:59:59.999999999Z`, i.e. from 0 to 9 digits for fractions of a second. To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).
finishedAt	string (date-time) The time when the job was finished. String in RFC3339 text format. The range of possible values is from `0001-01-01T00:00:00Z` to `9999-12-31T23:59:59.999999999Z`, i.e. from 0 to 9 digits for fractions of a second. To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).
name	string Name of the job, specified in the create request.
createdBy	string The id of the user who created the job
status	string Job status. PROVISIONING: Job is logged in the database and is waiting for the agent to run it. PENDING: Job is acquired by the agent and is in the queue for execution. RUNNING: Job is being run in the cluster. ERROR: Job failed to finish the run properly. DONE: Job is finished. CANCELLED: Job is cancelled. CANCELLING: Job is waiting for cancellation.
applicationInfo	object
applicationInfo. id	string ID of YARN application
applicationInfo. applicationAttempts[]	object YARN application attempts
applicationInfo. applicationAttempts[]. id	string ID of YARN application attempt
applicationInfo. applicationAttempts[]. amContainerId	string ID of YARN Application Master container
mapreduceJob	object includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
mapreduceJob. args[]	string Optional arguments to pass to the driver.
mapreduceJob. jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
mapreduceJob. fileUris[]	string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
mapreduceJob. archiveUris[]	string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
mapreduceJob. properties	object Property names and values, used to configure Data Proc and MapReduce.
mapreduceJob. mainJarFileUri	string `mapreduceJob` includes only one of the fields `mainJarFileUri`, `mainClass` HCFS URI of the .jar file containing the driver class.
mapreduceJob. mainClass	string `mapreduceJob` includes only one of the fields `mainJarFileUri`, `mainClass` The name of the driver class.
sparkJob	object includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
sparkJob. args[]	string Optional arguments to pass to the driver.
sparkJob. jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
sparkJob. fileUris[]	string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
sparkJob. archiveUris[]	string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
sparkJob. properties	object Property names and values, used to configure Data Proc and Spark.
sparkJob. mainJarFileUri	string The HCFS URI of the JAR file containing the `main` class for the job.
sparkJob. mainClass	string The name of the driver class.
sparkJob. packages[]	string List of maven coordinates of jars to include on the driver and executor classpaths.
sparkJob. repositories[]	string List of additional remote repositories to search for the maven coordinates given with --packages.
sparkJob. excludePackages[]	string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.
pysparkJob	object includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
pysparkJob. args[]	string Optional arguments to pass to the driver.
pysparkJob. jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
pysparkJob. fileUris[]	string URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
pysparkJob. archiveUris[]	string URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
pysparkJob. properties	object Property names and values, used to configure Data Proc and PySpark.
pysparkJob. mainPythonFileUri	string URI of the file with the driver code. Must be a .py file.
pysparkJob. pythonFileUris[]	string URIs of Python files to pass to the PySpark framework.
pysparkJob. packages[]	string List of maven coordinates of jars to include on the driver and executor classpaths.
pysparkJob. repositories[]	string List of additional remote repositories to search for the maven coordinates given with --packages.
pysparkJob. excludePackages[]	string List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.
hiveJob	object includes only one of the fields `mapreduceJob`, `sparkJob`, `pysparkJob`, `hiveJob`
hiveJob. properties	object Property names and values, used to configure Data Proc and Hive.
hiveJob. continueOnFailure	boolean (boolean) Flag indicating whether a job should continue to run if a query fails.
hiveJob. scriptVariables	object Query variables and their values.
hiveJob. jarFileUris[]	string JAR file URIs to add to CLASSPATH of the Hive driver and each task.
hiveJob. queryFileUri	string `hiveJob` includes only one of the fields `queryFileUri`, `queryList` URI of the script with all the necessary Hive queries.
hiveJob. queryList	object `hiveJob` includes only one of the fields `queryFileUri`, `queryList`
hiveJob. queryList. queries[]	string List of Hive queries.

Methods

Method	Description
cancel	Cancels the specified Dataproc job.
create	Creates a job for a cluster.
get	Returns the specified job.
list	Retrieves a list of jobs for a cluster.
listLog	Returns a log for specified job.

Data Proc API, REST: Job methods

JSON RepresentationJSON Representation

MethodsMethods

Была ли статья полезна?

JSON Representation

Methods