Part I: Data processing pipelines with Spring Cloud Data Flow on Oracle Cloud
For a master table-of-contents for blog posts on microservice topics, please refer — https://medium.com/oracledevs/bunch-of-microservices-related-blogs-57b5f1f062e5
Asynchronous communication/messaging is a key pattern when it comes to building loosely coupled and scalable apps (including microservices) in the cloud — one of a major class of problems solved by this pattern is real time data processing
What if you could have a platform/framework built specifically for this ? And even better, what if you could deploy & operate it on the cloud ?
This is the first of a two part blog series which deals with Spring Cloud Data Flow on Oracle Cloud
- Part 1 will give you an introduction and demonstrate how to deploy the Spring Cloud Data Flow server on Oracle Application Container Cloud including other components
- Part 2 will demonstrate how you can build stream processing pipelines using the foundation you setup in Part 1
To be specific, Part 1 will cover
- Gentle introduction to Spring Cloud Data Flow
- the secret sauce — Spring Cloud Data Flow on Oracle Application Container Cloud
- Infrastructure setup — Oracle Event Hub Cloud (Kafka) is used as the messaging middleware and Oracle MySQL Cloud serves as the persistent data store for the Spring Cloud Data Flow setup
Hello “Spring Cloud Data Flow”
Here is a quick intro — please refer to the documentation for details
TL;DR — It is a framework/toolkit for building data processing pipelines
- Types of apps — long lived stream processing, data integration and short-lived tasks
- Pipelines are nothing but Spring Boot apps which make use of Spring Cloud Stream or Spring Cloud Task
By the way, here is blog on how to use messaging based microservices with vanilla Spring Cloud Stream & Kafka on Oracle Cloud
- Event Bus/Middleware — Support for Kafka and Rabbit MQ (we will use Kafka via Oracle Event Hub Cloud)
- Infrastructure — the pipelines themselves can be deployed to a variety of runtimes (Kubernetes, Mesos etc.) whose implementations are pluggable
One such custom implementation for Oracle Application Container Cloud will be covered in the next section
- Interfaces — you can use the dashboard (graphical editor), REST API or CLI to work with Spring Cloud Data Flow
Let’s try to understand the solution and its components
Spring Cloud Data Flow on Oracle Application Container Cloud
Here is a high level solution architecture which involves all the components. You will encounter each one of them as you read on…
Oracle Application Container Cloud has a two-fold role in this case
- It serves as the platform for running the Spring Cloud Data Flow server itself
- It also doubles as a runtime for the pipelines which you build using Spring Cloud Data Flow — this is the interesting part!
Spring Cloud Data Flow Server
The server module is a Spring app with an embedded servlet container (e.g. Tomcat). As you will see in the upcoming sections — this can simply be run as a fat JAR on top of the Java SE runtime support in Oracle Application Container Cloud
Spring Cloud Data Flow pipelines
(as mentioned above) the data processing pipelines created using Spring Cloud Data Flow are just Spring Boot apps which need to run somewhere. This runtime portion abstracted in the form of a Spring Cloud Deployer SPI which encapsulates the implementation for a specific runtime e.g. local JVM (of the Data Flow server), Kubernetes, Apache Mesos etc.
Here is a snippet from the Spring Cloud Data Flow documentation which illustrates this concept
There is a Spring Cloud Deployer implementation specific to Oracle Application Container Cloud as well
Although this is a work in progress and evolving, in it’s current state, it can be used to operate Spring Cloud Data Flow pipelines — you will see it in action in Part 2 of this blog
Message broker: Oracle Event Hub Cloud
The individual pipelines need an underlying messaging layer for asynchronous communication — Spring Cloud Data Flow supports Apache Kafka and Rabbit MQ
We will be using Oracle Event Hub Cloud (Managed Kafka) in this case
Persistent store: Oracle MySQL Cloud
By default, Spring Cloud Data Flow stores all the info in an in-memory database, but it has support for other RDBMSes as well
We will leverage Oracle MySQL Cloud as the persistent data store for the Spring Cloud Data Flow server
Maven repository
Although you will see this in action in part 2 of the blog, for now, its sufficient to understand that Spring Cloud Data Flow uses Maven as one of its sources for the applications which need to be deployed as a part of the pipelines which you build — more details here and here
Infrastructure setup
This section provides a summary of how to setup the infrastructure components required for Spring Cloud Data Flow setup
- Oracle Event Hub Cloud instance
- Oracle MySQL Cloud instance
Oracle Event Hub Cloud (Kafka broker)
The Kafka cluster topology used in this case is relatively simple i.e. a single broker with co-located with Zookeeper). You can opt for a topology specific to your needs e.g. HA deployment with 5-node Kafka cluster and 3 Zookeeper nodes
Please refer to the documentation for further details on topology and the detailed installation process (hint: its straightforward!)
Creating custom access rule
You would need to create a custom Access Rule to open port 2181 on the Kafka Server VM on Oracle Event Hub Cloud — details here
Oracle Application Container Cloud does not need port 6667 (Kafka broker) to be opened since the secure connectivity is taken care of by the service binding
Oracle MySQL Cloud
Provision a MySQL database instance — you can refer to the detailed documentation here
Now that we have the infrastructure foundation, its time to deploy the Data Flow server on the cloud
Build & deployment
Build Spring Cloud Dataflow from source
This includes the SPI implementation specific to Oracle Application Container Cloud
git clone https://github.com/ankitbansal/spring-cloud-dfs-accs.git
mvn clean install
This will create spring-cloud-dataflow-server-accs-1.0.0-SNAPSHOT.jar
under the spring-cloud-dataflow-server-accs\target
folder
Zip it up — zip scdf.zip spring-cloud-dataflow-server-accs\target\spring-cloud-dataflow-server-accs-1.0.0-SNAPSHOT.jar
Edit metadata files
Configure the metadata files as per your setup
manifest.json
Here is summary of the key attributes
maven.remote-repositories.repo1.url
— Maven repository URL. Usemaven.remote-repositories.repo1.auth.username
andmaven.remote-repositories.repo1.auth.password
if applicable
We are using the Spring Maven repo in this case. More in part 2
spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.brokers
— leave this unchanged as it will be picked up from the environment variablespring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.zkNode
— Enter value for Event Hub Zookeeper host and port
deployment.json
Here is summary of the key attributes
ACCS_URL
— use emea instead of us in case of Europe data centerspring_datasource_username
— leave this unchanged as it will be picked up from the environment variablespring_datasource_password
— leave this unchanged as it will be picked up from the environment variablespring_datasource_driver-class-name
— leave this unchanged
Push to cloud
With Oracle Application Container Cloud, you have multiple options in terms of deploying your applications. This blog will leverage PSM CLI which is a powerful command line interface for managing Oracle Cloud services
other deployment options include REST API, Oracle Developer Cloud and of course the console/UI
Download and setup PSM CLI on your machine (using psm setup
) — details here
Deploy the Spring Cloud Dataflow server — psm accs push -n SpringCloudDataflowServer -r java -s hourly -m manifest.json -d deployment.json -p scdf.zip
Once executed, an asynchronous process is kicked off and the CLI returns its Job ID for you to track the application creation
Check your application
Access the application and check the instance and topology information
Notice the Service Bindings for Event Hub and MySQL cloud
Test drive
Access the Spring Cloud Data Flow dashboard — navigate to the URL which you see on your application detail screen e.g. https://SpringCloudDataflowServer-mydomain.apaas.us2.oraclecloud.com/dashboard
Change the deployment topology
You can easily modify the topology of your Spring Cloud Data Flow setup
- Scale up/down — Increase/decrease the memory (RAM) allocated to the Data Flow server
- Scale in/out — increase/decrease the number of instances for HA and performance. This is easy since the persistent state is stored in MySQL and the app itself is stateless
Summary
That’s it for Part I where we covered
- basic concepts
- deployed a Spring Cloud Data Flow server on Oracle Application Container Cloud along with its dependent components which included
- Oracle Event Hub Cloud as the Kafka based messaging layer, and
- Oracle MySQL Cloud as the persistent RDBMS store
The next installment will demonstrate things in action as we build Kafka based stream processing pipelines using Spring Cloud Data Flow on Oracle Application Container Cloud..
Don’t forget to…
- check out the tutorials for Oracle Application Container Cloud — there is something for every runtime!
- other blogs on Application Container Cloud
Cheers!
The views expressed in this post are my own and do not necessarily reflect the views of Oracle.