While I enjoy learning about DevOps, my primary role is as a Data Engineer, though I often function more like a software engineer working on data-intensive applications. Much of what I’ve learned comes from examining existing workflows, reading documentation, exploring other blogs, and practical experimentation. If you notice any inaccuracies, please feel free to create an issue on the repository. Your feedback is greatly appreciated!
How did I get here?
I’ve found working with GitHub Actions both empowering and intimidating. It’s empowering because it automates tasks that would otherwise be manual, but intimidating due to the complexity it adds, often leading to tricky debugging scenarios. A key takeaway is that while striving for simplicity can make code more manageable, over-optimization, especially in DevOps, can lead to diminishing returns and increased fragility. Debugging a draft PR with numerous commits and failed runs can be mentally exhausting, particularly when you’re making changes just to see what happens.
Recently, I explored how to trigger certain workflows based on the specific needs of the code changes for my team. This may not seem necessary for small projects where all tests run in under ten minutes, but in a monorepo with multiple services, including SDKs, containers running Python, JS, or databases, tests could take 15 minutes or more. This can significantly slow down the review process for PRs.
One solution I delved into was workflow_call
. I had to spend some time understanding how it works through examples in our repo, particularly how to manage inputs and outputs between workflow parts effectively.
workflow_call
from the docs
Let’s start by looking at the docs.
The documentation introduces workflow_call
as a method to allow a workflow to be triggered by another. It then directs you to the section on “Reusing workflows”. This resource is thorough and detailed, but can be overwhelming initially, so I’ll break down the basics first to build a solid foundation.
Basic workflows
GitHub Action workflows are located in the .github/
directory of your repo, under the workflows
folder:
Workflows are .yaml
files that specify what actions should run based on certain triggers. Initially, I found this part of a repo daunting, often relying on the community for pre-built actions. However, workflows are actually quite straightforward once you understand the syntax and logic.
Here is an example of a simple linting workflow:
lint.yaml
name: Lint
on:
push:
branches:
- main
pull_request:
branches:
- main
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: pip install ruff
- name: Lint with Ruff
run: ruff code/
This workflow includes three top-level elements:
name
: The name of the workflow.on
: Specifies the triggers for the workflow, such aspush
andpull_request
events on themain
branch.jobs
: Lists the jobs that will run, typically in parallel.
Let’s continue with a unit testing workflow:
test.yaml
name: Test
on:
push:
branches:
- main
pull_request:
branches:
- main
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: pip install pytest
- name: Run tests
run: pytest code/
Similar to the linting workflow, this workflow contains the same three top-level elements, but with a different job configuration.
Understanding the components of jobs:
and their keys was initially confusing. These elements specify the steps to be run in each job, which can be:
- Shell commands: Executable commands using the
run:
key, similar to what you would type in a terminal. - Actions: Reusable actions specified with the
uses:
key, likeactions/checkout@v3
for checking out repository code. - Composite run steps actions: Grouped steps within a single action for reuse across workflows, specified with the
steps:
key.
More complex workflows with workflow_call
Now let’s return to the workflow_call
and how it can be used to trigger a workflow from another workflow.
determineChanges.yaml
name: Determine Changes
on:
workflow_call:
outputs:
changed-files-data:
description: "JSON formatted list of changed files with metadata"
value: ${{ jobs.determine-changes.outputs.changed-files-data }}
jobs:
determine-changes:
runs-on: ubuntu-latest
outputs:
changed-files-data: ${{ steps.create-changed-files-data.outputs.result }}
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Fetch base and head branches
id: gather-branches
run: |
git fetch origin ${{ github.base_ref }}:${{ github.base_ref }}
git fetch origin ${{ github.head_ref }}:${{ github.head_ref }}
- name: Create changed files data
id: create-changed-files-data
run: |
echo "Base reference: ${{ github.base_ref }}"
echo "Head reference: ${{ github.head_ref }}"
# get the list of changed files
DIFF_OUTPUT=$(git diff --name-only ${{ github.base_ref }}...${{ github.head_ref }})
mapfile -t CHANGED_FILES <<< "$DIFF_OUTPUT"
echo "Changed files:"
printf '%s\n' "${CHANGED_FILES[@]}"
JSON_ARRAY="["
# create a JSON array of the changed files
for FILE in "${CHANGED_FILES[@]}"; do
EXTENSION="${FILE##*.}"
JSON_ENTRY=$(jq -nc \
--arg file "$FILE" \
--arg extension "$EXTENSION" \
'{
file: $file,
extension: $extension
}')
JSON_ARRAY+="$JSON_ENTRY,"
done
JSON_ARRAY="${JSON_ARRAY%,}]"
echo "Changed files data: $JSON_ARRAY"
echo "result=$JSON_ARRAY" >> $GITHUB_OUTPUT
This workflow has the same three top-level elements but we can see that there are some big differences. In the on:
top-level element we see the workflow_call
first-level element with its own set of keys that are unfamilar.
on:
workflow_call:
outputs:
changed-files-data:
description: "JSON formatted list of changed files with metadata"
value: ${{ jobs.determine-changes.outputs.changed-files-data }}
What we’re demonstrating here is that this workflow is exclusively triggered by another workflow—this is the only way it activates, using the workflow_call
trigger. This setup is designed to ensure that the workflow can contribute data to subsequent processes.
We also define an output for this workflow. This output is named changed-files-data
and it includes a description and a value specifying where in this workflow the data is produced. Whenever you encounter ${{ ... }}
, you’re seeing what’s known as a variable expression. These expressions help us dynamically reference data produced by the workflow.
Let’s break down the location of this variable:
jobs
: This indicates that the variable is located within the second-level element of the workflow.determine-changes
: This is the name of a specific job within our workflow where the output is produced.outputs
: This section within the job specifies where the output is generated.changed-files-data
: Here’s the interesting part! We encounter another variable expression that indicates where in the job’s steps the data is finalized.
Now, looking into the changed-files-data
variable expression: ${{ steps.create-changed-files-data.outputs.result }}
:
steps
: This tells us that the variable is found within the third-level element of our workflow.create-changed-files-data
: This is theid
of the step, labeled “Create changed files data”, where the output is generated.outputs
: This subsection within the step delineates where the output is specifically created.result
: This is the identifier for the output produced by the step.
By structuring workflows in this manner, we enable modular, reusable components that can interact seamlessly within GitHub Actions.
How do we use the output from workflow_call
?
Now that we’ve seen how we can set up a workflow that can be called by another workflow, let’s return to a slightly expanded linting example that includes the workflow from our previous steps.
lintChangedFiles.yaml
name: Lint Changed Files
on:
workflow_call:
inputs:
changed-files-data:
required: true
type: string
jobs:
lint-python:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.10'
- name: Install Ruff
run: |
pip install ruff
- name: Lint Python files
run: |
CHANGED_FILES_JSON='${{ inputs.changed-files-data }}'
CHANGED_PY_FILES=$(echo "$CHANGED_FILES_JSON" | jq -r '.[] | select(.extension == "py") | .file')
if [[ -n "$CHANGED_PY_FILES" ]]; then
echo "Changed Python Files: $CHANGED_PY_FILES"
ruff check --output-format=github $CHANGED_PY_FILES
ruff format --check $CHANGED_PY_FILES
else
echo "No Python files to lint."
In this example, the on:
top-level element specifies that this workflow is triggered by a workflow_call
event, and it requires an input named changed-files-data
. This input must be provided by the calling workflow, and it contains JSON-formatted data about which files have changed.
The run
key within the “Lint Python files” step shows how we can access the output from the previous workflow. We use the ${{ inputs.changed-files-data }}
variable expression to retrieve the JSON data produced by the previous workflow. This data is then parsed with jq
to extract the paths of Python files that have changed.
Now, let’s expand this concept to our unit test workflow example from earlier.
testChangedFiles.yaml
name: Test Changed Files
on:
workflow_call:
inputs:
changed-files-data:
required: true
type: string
jobs:
test-python:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.10'
- name: Install Dependencies
run: |
pip install -U pip
pip install pytest
- name: Test Python files
run: |
# Extract Python files from JSON input
CHANGED_FILES_JSON='${{ inputs.changed-files-data }}'
CHANGED_PY_FILES=$(echo "$CHANGED_FILES_JSON" | jq -r '.[] | select(.extension == "py") | .file')
if [[ -n "$CHANGED_PY_FILES" ]]; then
echo "Changed Python Files: $CHANGED_PY_FILES"
pytest tests/
else
echo "No Python files to test."
In this example, the workflow assumes that the tests are located in a tests/
directory. This setup illustrates how you can use the output from a previous workflow to trigger subsequent testing only on the relevant files, although here we run tests on the entire directory whenever any Python file changes.
With individual workflows connected from output to input, we can create a workflow for a pull request that runs a smaller subset of tests, balancing speed and coverage efficiently. This setup helps keep “push to main” tests comprehensive for the entire codebase.
Stitching it all tegether
In a PR workflow, we can integrate the workflow that determines what has changed in our repo, which then sequentially triggers linting and testing based on those changes.
pullRequest.yaml
name: Pull Request
on:
pull_request:
branches:
- "**"
jobs:
determine-changes:
uses: ./.github/workflows/changes.yaml
lint-changed-files:
needs: determine-changes
uses: ./.github/workflows/lintChangedFiles.yaml
with:
changed-files-data: ${{ needs.determine-changes.outputs.changed-files-data }}
test-changed-files:
needs: determine-changes
uses: ./.github/workflows/testChangedFiles.yaml
with:
changed-files-data: ${{ needs.determine-changes.outputs.changed-files-data }}
This setup triggers on any pull request, first determining the changes, then linting, and finally testing the code based on those changes. Using needs
ensures that each job waits for the necessary data from its predecessor, creating an efficient and effective CI pipeline.
Now that we have established a dedicated PR workflow that efficiently manages the processes of determining changes, linting, and testing, we can streamline our existing lint.yaml
and test.yaml
workflows. Since the new PR workflow handles all pull requests, the original workflows can be adjusted to trigger only on pushes to the main
branch. This reduces redundancy and focuses these workflows on final validation before or after merges into main
.
Here is the specific section we can now remove from both lint.yaml
and test.yaml
workflows:
This modification ensures that the lint.yaml
and test.yaml
workflows are no longer triggered by pull requests, as the new PR workflow now covers this scenario. Instead, they will continue to run only on direct pushes to the main branch, which might include final checks or post-merge validations.
By making this change, we achieve a cleaner separation of concerns:
- The PR workflow is optimized for handling all pull request-related checks and tests.
- The Main branch workflows (
lint.yaml
andtest.yaml
) are streamlined to focus solely on changes directly pushed to the main branch, ensuring that these changes meet our standards without duplicating the checks done in PRs.
This setup not only organizes our workflows more logically but also helps in conserving CI/CD resources and reducing the potential for confusion about which workflows run under which circumstances.
Conclusion
There is always a trade-off when deciding to make certain parts of the codebase more DRY (Don’t Repeat Yourself) and simultaneously more complex. For many projects, determining changes and adding extra steps could introduce new points of potential failure in workflows. Like any other code, workflows require maintenance, and having more complex workflows is no exception. This guide serves as a minimal viable product (MVP) for using workflow_call
, but it can also be a good starting point for thinking about what you really want out of your GitHub Actions workflows.
Perhaps instead of relying on off-the-shelf workflows, you’ll decide to tailor a set of automations that is uniquely suited to your project’s needs. Feel free to get in touch with me if you have any questions or comments. And again, if something isn’t correct, please create an issue on the repo so I can improve it.