I’m working on a GitHub Actions workflow where I need to dynamically create jobs based on what gets discovered in an earlier step. Here’s what I’m trying to do:
First job runs with a matrix to scan multiple repositories
Each matrix job finds different tags that need processing
Second job should run for every repo/tag combination found
I have around 100 repositories to check daily, but here’s a simple example with 3:
The problem is that when the first job runs as a matrix, only the last completed job’s output gets passed to the second job. I need all discovered items from all matrix runs to feed into the next job.
I’ve considered using artifacts but I’m not sure how to make them work as a queue for the second job. Any ideas on the right approach?
You need a middleman job between your matrix discovery and processing jobs. The matrix output issue is super common in GitHub Actions. I fixed this by having each matrix job write results to its own artifact, then using a separate job that waits for the matrix to finish before collecting all artifacts and merging them into one JSON structure. That merged output feeds into your processing job. The trick is using strategy.matrix.include in your final processing job - reference the combined output from the aggregation step instead of trying to grab matrix outputs directly.
Been dealing with this exact workflow headache for years. GitHub Actions matrix outputs are a nightmare when you need to collect results from multiple runs.
You’re fighting GitHub’s design instead of working with it. All that artifact juggling and aggregation just adds complexity and failure points.
I switched to handling repository scanning and dynamic processing completely outside GitHub Actions. Set up simple automation that runs on schedule, scans your repos, discovers tags, then triggers individual GitHub Actions for each repo/tag combo that needs processing.
You get proper error handling, can easily see which repos failed, and don’t worry about GitHub’s matrix output limitations. Plus you can add smart scheduling, retry logic, and parallel processing limits.
For 100 repos, this scales way better than making GitHub Actions do something it wasn’t designed for. Clean separation between discovery and processing, with much better visibility.
You’re encountering issues passing the combined outputs from multiple matrix jobs in a GitHub Actions workflow to a subsequent job. Each matrix job in your scan-repositories step generates output, but only the last job’s output is accessible to the process-updates job. You need a mechanism to aggregate results from all matrix jobs before proceeding to the next stage.
Understanding the “Why” (The Root Cause):
GitHub Actions’ matrix strategy executes jobs concurrently. Each job runs independently and has its own isolated output context. When a subsequent job depends on a matrix job, it only receives the outputs from the last completed job in the matrix. There’s no built-in mechanism to automatically collect and combine outputs from multiple matrix jobs.
Step-by-Step Guide:
Add an Aggregation Job: Introduce a new job specifically designed to collect and merge the outputs from your scan-repositories matrix job. This job will act as an intermediary.
Use Artifacts: Modify your scan-repositories job to upload its outputs as artifacts. Each matrix job will upload its own artifact. Use a unique identifier (e.g., the repository name) in the artifact name to distinguish them.
jobs:
scan-repositories:
# ... (your existing matrix configuration) ...
steps:
- name: Upload artifact
uses: actions/upload-artifact@v3
with:
name: ${{ matrix.repository }}-results
path: ./results.json # Assumes results are written to this file
Create an Artifact Aggregation Job: Create a new job that depends on scan-repositories. This job will download all the artifacts, parse them and combine their contents into a single JSON file.
Modify the Processing Job: Update your process-updates job to use the combined artifact from the aggregation job.
process-updates:
name: Process discovered items
needs: aggregate-results
steps:
- name: Download combined artifact
uses: actions/download-artifact@v3
with:
name: all-repo-tags
path: ./combined_results
- name: Process data
run: |
# Process the combined JSON data from ./combined_results/all_results.json
Common Pitfalls & What to Check Next:
Error Handling: Implement robust error handling in the aggregation step to gracefully manage scenarios where artifacts are missing or corrupted.
JSON Structure: Ensure the JSON structure of the individual artifacts is consistent to allow for seamless merging.
Large Datasets: For a very large number of repositories, consider optimizing the aggregation process (e.g., using parallel processing). Also, consider the size of the final combined artifact; very large files might exceed GitHub’s artifact size limits.
Still running into issues? Share your (sanitized) config files, the exact command you ran, and any other relevant details. The community is here to help!
Here’s what worked for me: restructure your workflow to use job dependencies better. Don’t try passing matrix outputs downstream - that’s a nightmare. Instead, modify your first job to output one consolidated JSON with all the repo-tag combinations from the entire matrix run. The trick is adding a final step to your scan job that collects results from a shared spot (environment files or temp artifact), then outputs everything as one unified structure. Your processing job just references this single output instead of trying to juggle multiple matrix outputs. Way cleaner and stays within GitHub Actions’ native functionality.