AWS CloudFormation

Hello everyone,

We are going to start one series of posts where we are going to show the AWS CloudFormation usage.

The nexts post going to abord a creation one RDS, DMS using AWS cloudFormation stack. In this series we’re going to showcase AWS CloudFormation by creating one RDS and one DMS.

First one fast introduction about AWS cloudFormation.

Let start to explain AWS CloudFormation

” AWS CloudFormation provides a common language for you to model and provision AWS and third party application resources in your cloud environment.”

In practice it’s a json/yaml file, where we can describe instructions to create AWS services.

Let’s  code!.

First step that we need is connect to AWS Console. In the search field type CloudFormation like the picture below.

Click on CloudFormation to open the service console.

 

Click on Stacks.

 

 

After this, click on create a new stack and select “with new resources”

We are going to click on Create template in designer and you will be redirected to page like below.

 

Click on Template and the code editor is going to open.

The next step will be to create one script to deploy a service. In this example, we going to use a DMS script and a RDS Postgresql. The examples used in this article are available in the next article.

To execute the script the first step is to validate by clicking the highlighted button in the below image.

The return can be OK or error. If the return is OK you can create the stack. To do this clicking the highlighted button in the below image.

You can check execution events by clicking in the Events page. The return is similar to the image below.

In the next articles, we are going see the source code and to use the AWS DMS Service to replicate data from Oracle Database to RDS PostgreSQL.

 

Apache Airflow Schedule: The scheduler does not appear to be running. Last heartbeat was received % seconds ago.

Hello everyone,

Are you facing the same?

Well, after opening some tasks to check Apache Airflow test environment for some investigation, I decided to check Apache Airflow configuration files to try to found something wrong to cause this error. I noticed every time the error happens, the Apache Airflow Console shows a message like this:

The scheduler does not appear to be running. Last heartbeat was received 14 seconds ago.

The DAGs list may not update, and new tasks will not be scheduled.

In general, we see this message when the environment doesn’t have resources available to execute a DAG. But in this case, it is different because CPU usage was 2%, memory usage was 50%, no swap, no disk at 100% usage. I checked the DAGs logs from the last hours and there were no errors in the logs. I also checked on the airflow.cfg file, I checked the database connection parameter, task memory, and max_paralelism. Nothing wrong. Long history short: everything was fine!

I then searched for the message in Apache Airflow Git and found a very similar bug: AIRFLOW-1156 BugFix: Unpausing a DAG with catchup=False creates an extra DAG run . In summary, it seems this situation happened when the parameter catchup_by_default is set to False in airflow.cfg file.

This parameter means for Apache Airflow to ignore pass execution time and start the schedule now. To confirm the case I checked with change management if we had some change in this environment. For my surprise, the same parameter was changed one month ago.

I then changed the Apache Airflow configuration file and set the parameter catchup_by_default to true again. The environment was released to the developers team to check everything is alright. One week later and we don’t have any issues reported.

Conclusion?

This issue showed us that the development environment is a no man’s land. The change management process exists alone without an approval process to support it. The lack of an approval process leads us to a 4 hours outage and 2 teams unable to work.

I hope you enjoy it!

And please be responsible on your environments!

Apache Airflow Rest API

Hello everyone,

Today I’ll talk about Apache Airflow usage, a REST API.

I frequently have customers asking about Apache Airflow’s integration with their own applications. “How can I execute a job from my application?” or “how can I get my job status in my dashboard?” are good examples of the questions I receive the most.

I’ll use the following question from a customer to show this great feature in Apache Airflow:

“ I would like to call one specific job orchestrated in Apache Airflow environment  in my application, is it possible?”

Quick answer: “Yes, all that you need to do is to call the Airflow DAG using REST API …..“

Details:

The simplest way to show how to achieve this is by using curl to call my Apache Airflow environment. I had one DAG to execute this from a bash operator. Quick example:

curl -X POST \

  http://localhost:8080/api/experimental/dags/my_bash_oeprator/dag_runs \

  -H ‘Cache-Control: no-cache’ \

  -H ‘Content-Type: application/json’ \

  -d ‘{“conf”:”{\”key\”:\”value\”}”}’

The curl execution returns the execution date id, with this ID you can use to get an execution status. Like this:

curl -X GET  http://localhost:8080/api/experimental/dags/my_bash_oeprator/dag_runs/2020-04-05T00:26:35

{“state”:”running”}

This command can also return other status {“state”:”failed”} or {“state”:”success”}.

I hope you enjoy it!