Monday, August 15, 2022
HomeBig DataTips on how to Automate Apache NiFi Knowledge Circulation Deployments within the...

Tips on how to Automate Apache NiFi Knowledge Circulation Deployments within the Public Cloud


With the newest launch of Cloudera DataFlow for the Public Cloud (CDF-PC) we added new CLI capabilities that let you automate knowledge movement deployments, making it simpler than ever earlier than to include Apache NiFi movement deployments into your CI/CD pipelines. This weblog put up walks you thru the info movement improvement lifecycle and the way you should use APIs in CDP Public Cloud to totally automate your movement deployments.

Understanding the info movement improvement lifecycle

Like some other software program utility, NiFi knowledge flows undergo a improvement, testing and manufacturing section. Whereas key NiFi options like visible movement design and interactive knowledge exploration are entrance and heart throughout the improvement section, operational options like useful resource administration, auto-scaling and efficiency monitoring change into essential as soon as an information movement has been deployed in manufacturing and enterprise capabilities rely upon it. 

CDF-PC, the primary cloud-native runtime for Apache NiFi knowledge flows, is targeted on operationalizing NiFi knowledge flows in manufacturing by offering useful resource isolation, auto-scaling and detailed KPI monitoring for movement deployments.

On the similar time, Circulation Administration for Cloudera Knowledge Hub gives a standard NiFi expertise centered on visible movement design and interactive knowledge exploration. Collectively, Circulation Administration for Knowledge Hub and Cloudera DataFlow for the Public Cloud present all of the capabilities you could assist all the knowledge movement improvement lifecycle from improvement to manufacturing. 

Dev Deployment

Determine 1: Develop your knowledge flows utilizing Circulation Administration for Knowledge Hub and operationalize them utilizing Cloudera DataFlow for the Public Cloud (CDF-PC)

Growing knowledge flows with model management

As Determine 1 reveals, Circulation Administration for Knowledge Hub gives a great atmosphere that permits builders to shortly iterate on their knowledge flows till they’re able to be deployed in manufacturing. Each Circulation Administration cluster comes preinstalled with NiFi Registry making it straightforward for builders to model management their knowledge flows. 

Be aware: Whereas model controlling knowledge flows isn’t required for manually exporting knowledge flows from the NiFi canvas, it’s a prerequisite for automating knowledge movement export utilizing the NiFi Registry API.

To begin model controlling an information movement, merely proper click on the method group you wish to model, choose Model and Begin model management

Start Versioning

Determine 2: Beginning model management shops course of teams within the NiFi Registry and makes them accessible by way of the NiFi Registry API

Within the subsequent window, use the Bucket choice to affiliate your knowledge movement with a selected undertaking or staff and specify a Circulation Identify. Optionally you may as well present a Circulation Description and Model Feedback.

Save Flow Version

Determine 3: Whenever you begin model management you possibly can decide a Bucket and supply a reputation to your movement definition

As soon as your knowledge movement model has been saved to the NiFi Registry, you’ll discover a inexperienced tick showing in your NiFi course of group indicating that the method group is present and represents the newest model which is saved within the NiFi Registry.

Flow Definition

Determine 4: The inexperienced tick signifies that this course of group is utilizing the newest model of the movement definition

Altering your knowledge movement logic within the NiFi canvas introduces native modifications that aren’t but synchronized to the NiFi Registry. Proper click on on the method group, choose Model and Commit native modifications to create a brand new model that features your current modifications.

Local Changes

Determine 5: A gray star signifies that native modifications have to be dedicated to the NiFi Registry leading to a brand new model of the info movement

Be aware: If you’re planning to export your knowledge flows from the event atmosphere utilizing the NiFi Registry API, be sure that any native modifications you wish to embrace have been dedicated again to the Registry.

Now that you’re aware of versioning your knowledge flows in your improvement atmosphere, let’s take a look at how one can export these variations and deploy them utilizing CDF-PC.

Exporting knowledge flows from Circulation Administration for Knowledge Hub

Apache NiFi 1.11 launched a brand new Obtain movement definition functionality which exports the info movement logic of a course of group. The export contains any controller companies that exist within the chosen course of group in addition to parameter contexts which have been assigned to the chosen course of group. 

Flow Def

Determine 6: Exporting knowledge flows utilizing the “Obtain movement definition” functionality within the NiFi canvas even works when you find yourself not versioning your course of teams

To manually export a movement definition from the NiFI canvas, proper click on the method group you wish to export and choose Obtain movement definition to acquire the movement definition in JSON format. This methodology exports the present course of group from NiFi together with any native modifications which could not have been dedicated to the NiFi Registry but. Since this operation doesn’t depend on the NiFi Registry, you possibly can obtain the movement definitions with out versioning your knowledge flows.

Exporting knowledge flows utilizing the NiFi Registry API

Downloading movement definitions proper from the NiFi canvas is simple however it requires a handbook motion. One technique to automate this course of is to instantly use the NiFi Registry API which lets you programmatically export any model of your knowledge movement that has been saved within the Registry. 

Be aware: To make use of the NiFi Registry strategy it’s a must to model your knowledge flows as defined within the earlier part.

In CDP Public Cloud, endpoints just like the NiFi Registry API are protected and uncovered by a central Apache Knox proxy. To acquire the NiFi Registry API endpoint, navigate to your Circulation Administration Knowledge Hub cluster and choose the Endpoints tab.

Flow Management End Points

Determine 7: Circulation Administration cluster endpoints uncovered by Knox

Copy the NiFi Registry Relaxation URL and use it as the bottom URL to assemble your Relaxation calls. Confer with the Apache NiFi Registry Relaxation API documentation for all out there API calls. First, you wish to export the newest model of your knowledge movement from the Registry, due to this fact the endpoint you could use is /buckets/{bucketId}/flows/{flowId}/variations/newest .

After acquiring the Registry Relaxation URL and the API endpoint, you could receive the bucketID and flowId to assemble the complete API path. To do that, navigate to your Circulation Administration Knowledge Hub cluster and click on the NiFi Registry icon which logs you into the NiFi Registry UI.

Navigating to Registry

Determine 8: Navigating to the NiFi Registry UI

Within the NiFi Registry UI, discover the movement definition that you just wish to export by on the lookout for the movement identify that you just offered while you began versioning your course of group. Broaden the corresponding entry and duplicate the BUCKET IDENTIFIER and the FLOW IDENTIFIER.

Utilizing the NiFi Registry Relaxation URL in addition to the Bucket and Circulation identifiers now you can assemble the ultimate URL:
Nifi Registry Buckets

Determine 9: Acquiring the bucketID and flowId from NiFi Registry

For the reason that NiFi Registry API is uncovered by a Knox proxy, you could authenticate your Relaxation API name utilizing a CDP workload consumer and password. You should utilize your private CDP workload consumer or a machine consumer for this function so long as the EnvironmentUser position has been assigned to the CDP workload consumer for the CDP atmosphere which is internet hosting your Circulation Administration cluster.

So as to add the EnvironmentUser position, navigate to your CDP atmosphere, choose “Handle Entry” from the Actions menu and assign the EnvironmentUser position to the CDP workload consumer you wish to use.

User Setup

Determine 10: Assigning the EnvironmentUser position to a CDP workload consumer

In CDP Public Cloud, entry to versioned NiFi knowledge flows within the NiFi Registry is managed by Apache Ranger. The CDP workload consumer that you’re planning to make use of to name the NiFi Registry Relaxation API must be allowed entry to the movement definition that you just wish to export. To permit the nifi-kafka-ingest consumer entry to the bucket caea6227-2bde-452f-a325-3eac0424868f you could create a corresponding coverage in Ranger: 

Rangers Setup

Determine 11: This Ranger coverage permits your beforehand created machine consumer to entry the NiFi Registry bucket which shops the movement definition you wish to export.

Now that you’ve got arrange your CDP workload consumer, ensured that it could possibly entry the movement definition within the Registry, and obtained all the required IDs, you possibly can go forward and export your movement definition from the NiFi Registry.

Let’s mix the endpoint URL info you collected earlier with the bucket and movement identifiers and CDP workload consumer particulars to assemble your last Relaxation API name. The response would be the movement definition in JSON format and you may select to reserve it to a file utilizing the redirect operator >

curl -u CDP_WORKLOAD_USER:CDP_WORKLOAD_USER_PASSWORD > /house/youruser/myflowdefinition.json

Be aware: If you’re operating the command on one of many NiFi situations, change “gateway” by “management0” to make sure the Registry endpoint might be reached.

Be aware: On this instance we’re utilizing curl to invoke the Registry Relaxation endpoint. If you’re utilizing Python, take a look at nipyapi, which already gives Python wrappers for the NiFi and NiFi Registry API endpoints.

Be aware: To automate exporting knowledge flows even additional you should use NiFi Registry Hooks that let you execute a script when a sure motion within the Registry is triggered. You possibly can arrange a Registry hook that robotically exports the movement definition and uploads it to the CDF-PC Circulation Catalog each time a brand new model is created. 

Exporting knowledge flows utilizing the NiFi CLI

You may also use the NiFi CLI to export movement definitions from the registry. The NiFi CLI is a part of the NiFi toolkit which is put in on any NiFi node in your Circulation Administration cluster. 

To make use of the NiFi CLI, set up an SSH reference to any NiFi node and login along with your CDP workload consumer identify. Begin the NiFi CLI by executing the next command:


Along with the movement identifier, NiFi Registry Relaxation endpoint and CDP workload consumer credentials, this strategy additionally requires you to explicitly specify a truststore configuration to determine a safe connection. Whereas the truststore location (/hadoopfs/fs4/working-dir/cm-auto-global_truststore.jks

) and the truststore sort (JKS) are the identical on each Circulation Administration cluster, the truststore password is exclusive for every cluster and must be obtained from /and so on/hadoop/conf/ssl-client.xml

With the Registry Relaxation endpoint, CDP workload consumer credentials, movement identifier and truststore info now you can assemble the complete registry export-flow-version command:

registry export-flow-version --baseUrl --flowIdentifier 45f308ce-9dc2-4ac7-9ff2-153d714b52dd --basicAuthUsername CDP_WORKLOAD_USER --basicAuthPassword CDP_WORKLOAD_USER_PASSWORD --truststore /hadoopfs/fs4/working-dir/cm-auto-global_truststore.jks --truststorePasswd TRUSTSTORE_PASSWORD --truststoreType jks --outputType json --outputFile /house/youruser/myflowdefinition.json 

The command will return the movement definition in json format and write it to the placement specified utilizing –outputFile.

Be aware: If you’re operating the nifi toolkit on one of many NiFi situations, change “gateway” by “management0” to make sure the Registry endpoint might be reached.

Importing knowledge flows into CDF for the Public Cloud

Now that you’ve got exported the movement definition from the Circulation Administration improvement atmosphere, you could import it into CDF-PC’s central Circulation Catalog earlier than you possibly can create deployments.

Many of the actions which you can carry out in CDF-PC’s UI will also be automated utilizing the CDP CLI. Earlier than you can begin utilizing the CDP CLI to add your movement definition to the Circulation Catalog you could obtain and configure it appropriately.

Be aware: CDF-PC CLI instructions are presently solely out there within the CDP Beta CLI. Use these directions to put in and configure the Beta CLI.

After getting arrange the CDP CLI you possibly can discover all out there CDF-PC instructions just by operating cdp df.

The command for importing movement definitions into the catalog is df import-flow-definition and requires you to specify the trail to the movement definition you wish to add and supply a reputation for it within the catalog. 

cdp df import-flow-definition --file myflowdefinition.json --name MyFlowDefinition --description “That is my first uploaded Circulation Definition” --comments “Model 1”   

You will have now efficiently imported your movement definition and may discover it within the Circulation Catalog.

Flow Definition Importing

Determine 12: The movement definition has been imported efficiently to the catalog

If you wish to add new variations of this movement definition, use the import-flow-definition-version command. It requires you to specify the CRN of the prevailing movement definition within the catalog in addition to the brand new movement definition JSON file that you just wish to add as a brand new model.

To get the movement definition CRN, navigate to the catalog, choose your movement definition and duplicate the CRN. Use the CRN to assemble the ultimate import-flow-definition-version command:

cdp df import-flow-definition-version --file myflowdefinition_v2.json --flow-crn crn:cdp:df:us-west-1:558bc1d2-8867-4357-8524-311d51259233:movement:MyFlowDefinition --comments “Model 2 with fixes for processing knowledge”

After profitable execution, you’ll now see a second model for the movement definition within the catalog.

Flow Def Version Import

Determine 13: A brand new model has been created for the imported movement definition

Deploying knowledge flows with CDF for Public Cloud

After importing your movement definition into the catalog you should use the create-deployment command to automate movement deployments. 

To create a movement deployment in CDF-PC, it’s a must to present the movement definition CRN from the Circulation Catalog, any parameter values the movement would possibly require, any KPIs you wish to arrange in addition to deployment configurations just like the NiFi node dimension or whether or not the deployment ought to robotically scale up and down. 

The best technique to assemble the complete create-deployment command is to stroll by the Deployment Wizard as soon as and use the View CLI Command function within the Evaluation step to generate the corresponding CLI command and the required parameter and KPI information.

View Clic Command

Determine 14: The Evaluation step within the Deployment Wizard creates parameter and KPI property information and constructs the ultimate create-deployment command

In case your movement deployment comprises movement parameters and KPIs, obtain the Circulation Deployment Parameters JSON and Circulation Deployment KPIs JSON information. These information outline all parameters and their values in addition to KPIs that you just outlined within the wizard.

Be aware: Values for Parameters marked as delicate is not going to be included within the generated parameters file. Replace the parameter worth after downloading the file.

With these two information downloaded, all you may have left to do is copy the CLI command from the wizard, alter the parameter-groups file and kpis file paths earlier than you possibly can hit enter and programmatically create your first movement deployment.

  cdp df create-deployment 

  --service-crn crn:cdp:df:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:service:e7aef078-aa34-44eb-8bb7-79e89a734911 

  --flow-version-crn crn:cdp:df:us-west-1:558bc1d2-8867-4357-8524-311d51259233:movement:MyFlowDefinition/v.2 

  --deployment-name "MyFirstDeployment" 



  --cluster-size-name EXTRA_SMALL 


  --auto-scale-min-nodes 1 

  --auto-scale-max-nodes 3 

  --parameter-groups file://PATH_TO_UPDATE/flow-parameter-groups.json 

  --kpis file://PATH_TO_UPDATE/flow-kpis.json

After issuing the create-deployment command, you possibly can navigate to the Dashboard in CDF-PC and watch the deployment course of. As soon as the deployment has been created efficiently you possibly can handle it through the use of each the UI and the CLI.


Automating movement deployments with a single command is a key function of CDF-PC and helps you give attention to knowledge movement improvement, deployment and monitoring as a substitute of worrying about creating infrastructure and organising advanced CI/CD pipelines. Going ahead we’ll proceed to enhance the CDF-PC CLI capabilities to additional optimize the movement improvement lifecycle. Take the CDF-PC Product Tour and be taught extra about CDF-PC within the documentation.




Most Popular

Recent Comments