Generate Changelog Videos with GitHub Copilot Hooks and Microsoft Foundry Voice Models

In this blog post, you will learn how to use GitHub Copilot Agent Skills together with GitHub Copilot Hooks to generate a changelog video using Remotion. The agentic pipeline also creates an automated voice-over with Microsoft’s MAI-Voice-1 text-to-speech model through Microsoft Foundry.

Hooks allow you to execute custom shell commands at specific lifecycle points during a GitHub Copilot agent session. Unlike prompt files, skills, or AI-driven behaviour in general, hooks provide deterministic, code-driven automation that runs regardless of how the agent was prompted.

There are a lot of use cases for hooks, but a few examples are:

  • Code quality: automatically format code, run tests and lint
  • Inject context: run a script to retrieve Azure Key Vault secrets and inject them while GitHub Copilot is actively running
  • Auditing: log Copilot’s behaviour during an agent session

Hooks are configured in JSON files stored in the .github/hooks/ folder of your repository and are loaded automatically by VS Code for GitHub Copilot. If you want to use a hook across all of your repositories, you can store it in the ~/.copilot/hooks folder. A hook is a .json file. There are eight events available:

Hook eventWhen it firesExamples
SessionStartUser submits the first prompt of a new sessionFetch Azure Key Vault secrets and inject them as environment variables
UserPromptSubmitUser submits a promptValidate that a Bicep file exists before allowing deployment prompts
PreToolUseBefore the agent invokes a toolRun az account show to confirm the correct Azure subscription is set
PostToolUseAfter a tool completes successfullyAfter editFile tool is used trigger a az deployment group validate after a Bicep file is written
PreCompactBefore conversation context is compactedSave the current GitHub Copilot state to a log file for auditing
SubagentStartWhen a subagent is spawnedPass the Azure DevOps project and pipeline ID to the subagent
SubagentStopWhen a subagent completesAggregate subagent results and post a summary to an Azure DevOps PR
StopWhen the agent session endsRun a linter

For this blog, I will use the PostToolUse hook event to trigger a PowerShell script.
Note! Agent hooks in VS Code are currently in Preview. Hook events might change in the future.

The two main technologies used to generate the video and voice-over are Remotion for video generation and MAI-Voice-1, in Microsoft Foundry, for speech generation.

Remotion

Remotion is an open-source framework that lets you create videos programmatically using React. Instead of editing videos in a traditional timeline editor, you write code to define animations, layouts, and content, and Remotion renders them into MP4 files. To learn more about Remotion, I recommend visiting their website: https://www.remotion.dev/

MAI-Voice-1

MAI-Voice-1 is Microsoft’s first-party speech generation model available in Microsoft Foundry via Azure Speech. It can produce 60 seconds of expressive, natural-sounding audio in under one second, making it one of the most efficient text-to-speech systems available today. To learn more about MAI-Voice-1, I recommend reading the announcement blog post: https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/introducing-mai-transcribe-1-mai-voice-1-and-mai-image-2-in-microsoft-foundry/4507787

In this blog post, Remotion is used to generate the changelog video. Microsoft Foundry provides the voice-over by using MAI-Voice-1 for text-to-speech. GitHub Copilot drives the orchestration of skills and content generation.

To wire up the full video and voice generation pipeline, you need a few key components: Copilot skills, hooks, and scripts. The table below gives a high-level overview before diving into each component.

Skills

Four Copilot skills drive the content gathering and orchestration of the video and voice generation pipeline. Each skill is defined as a SKILL.md file, stored in .github/skills/<skill name>, and is automatically available to GitHub Copilot during an agent session.

ComponentNamePurpose
Skillfetching-bicep-changelogFetches the latest Bicep release notes from GitHub
Skillfetching-youtube-videoExtracts metadata from a Bicep community call on YouTube. This skill is optional.
Skillsummarizing-bicep-updateCombines changelog and YouTube data into a structured summary
Skillgenerate-videoOrchestrates the full flow and writes summary.json

The generate-video is the only skill that is user-invocable: true. This means it acts as the entry point you call directly as a slash command (/generate-video). The other three skills are invoked automatically by Copilot when needed.

You can find the skill definitions in my GitHub repository: https://github.com/johnlokerse/changelog-video-tts-generator/tree/main/.github/skills

Hook

The PostToolUse hook is configured in .github/hooks/video-pipeline.json and fires every time GitHub Copilot successfully uses a tool. The hook runs a PowerShell script that watches for one specific event: Copilot writing summary.json. All other tool usage is ignored, and the hook exits immediately.

At the moment, hooks cannot be configured to monitor edits to specific files directly. Because of this, you need to create your own middleware logic in the PowerShell script that you will read about later.

The contents of the video-pipeline.json hook looks as follows:

{
"hooks": {
"PostToolUse": [
{
"type": "command",
"command": "pwsh -NoProfile -File .github/hooks/scripts/Invoke-VideoPipeline.ps1 -HookMode",
"cwd": ".",
"timeout": 1800
}
]
}
}

You can find the hook definition in my GitHub repository: https://github.com/johnlokerse/changelog-video-tts-generator/blob/main/.github/hooks/video-pipeline.json

Script

Invoke-VideoPipeline.ps1 runs the three-step pipeline automatically once summary.json is created or updated:

  1. Voice generation: reads the summary, builds Speech Synthesis Markup Language (SSML) segments (intro, headline, one per highlight, breaking changes if any, outro), and calls the Azure Speech TTS endpoint using the MAI-Voice-1 voice for each segment. Each MP3 is measured for duration and written to an audio-manifest.json.
  2. Scaffolding: runs a Node.js scaffold script that wires the summary.json content and audio manifest into the Remotion project source files.
  3. Rendering: runs npx remotion render inside the Remotion project directory and outputs the final MP4 to the output/ folder.

The script reads credentials (AZURE_TTS_KEYAZURE_TTS_ENDPOINTAZURE_TTS_VOICE) from a .env file or environment variables, so no secrets are hardcoded.

Additionally, this PowerShell script imports a HookPayloadHelpers module to act as the middleware. Because PostToolUse fires after every tool invocation it must be checked if the summary.json is created or updated. It reads the hook payload, normalises the tool name and input across all common write tools (editFiles, createFile, apply_patch, etc.), and checks whether summary.json was the target. Any other tool call causes an immediate exit with no side effects.

Link to the PowerShell script and the helper module: https://github.com/johnlokerse/changelog-video-tts-generator/tree/main/.github/hooks/scripts

In this section, I will explain the three stages that make up the full video and voice-over generation pipeline. Each stage builds on the previous one: GitHub Copilot gathers and structures the changelog data, a hook detects when the data is ready, and a PowerShell script handles the audio synthesis and video rendering automatically.

Before you start, make sure you have the following:

  • Node.js for Remotion and scaffolding the Remotion project
  • Microsoft Foundry instance with access to the MAI-Voice-1 model. Use the variables AZURE_TTS_KEYAZURE_TTS_ENDPOINTAZURE_TTS_VOICE for the key, endpoint, and voice name stored in a .env file or as environment variables.
  • afinfo (macOS) or ffprobe (other platforms) for audio duration measurement

Stage 1:

  1. A user calls the /generate-video skill, which orchestrates the other skills. For example, fetching-changelog retrieves the changelog content from GitHub, fetch-youtube retrieves the YouTube video information tied to a specific Bicep release, and summarizing-bicep-update summarises the changes retrieved by the previous skills.
  2. The summarizing-bicep-update skill aggregates the gathered information into a summary.json file containing the highlights, headline, breaking changes, version, and release date.

Stage 2 and 3:

  1. When the PostToolUse GitHub Copilot hook fires, a PowerShell script calls the Microsoft Foundry text-to-speech (TTS) API and uses the MAI-Voice-1 model to generate the audio. The script also creates an audio manifest so the audio can be aligned with the video.
  2. Microsoft Foundry returns the MP3 files based on the content in summary.json.
  3. Finally, Remotion scaffolds and renders the video, including the generated audio.

Let’s take a look at each stage in more depth.

The first stage is handled entirely by GitHub Copilot. It uses four Agent Skills that together form the changelog video pipeline. Once the skills are installed, you can trigger the full workflow by calling the generate-video skill in GitHub Copilot Agent mode:

/generate-video Use the following changelog from GitHub: https://github.com/Azure/bicep/releases/tag/v0.41.2

You can also pass a YouTube community call URL alongside the GitHub release URL, or omit it when no community call video exists for that release.

GitHub Copilot will then:

  1. Fetch the release notes for the specified Bicep version from GitHub.
  2. Optionally fetch community call metadata from YouTube, such as the title, description, and thumbnail, when a URL is provided.
  3. Enrich each highlight by fetching the referenced GitHub pull request or issue, creating more accurate narration.
  4. Summarise the release into a structured summary.json file containing the version, headline, highlights, breaking changes, and a narration field for each highlight.
  5. Write summary.json to the repository root.

The moment summary.json is written the PostToolUse hook fires automatically.

Stage output

  • summary.json in the repo root, containing version, headline, highlights (up to 6 with narration), breaking changes, release date, and optional YouTube metadata.

You can find an example summary.json file in my GitHub repository: https://github.com/johnlokerse/changelog-video-tts-generator/blob/main/summary.json

The hook is configured in .github/hooks/video-pipeline.json. Whenever Copilot completes a tool call, VS Code executes the command defined under the PostToolUse event.

{
"hooks": {
"PostToolUse": [
{
"type": "command",
"command": "pwsh -NoProfile -File .github/hooks/scripts/Invoke-VideoPipeline.ps1 -HookMode",
"cwd": ".",
"timeout": 1800
}
]
}
}

When invoked with -HookMode, Invoke-VideoPipeline.ps1 calls two helper functions from the HookPayloadHelpers module: Read-HookPayload reads the PostToolUse payload, and Test-ShouldRunPipeline verifies that a write-type tool targeted summary.json. If not, the script exits immediately.

This means the hook fires after every tool use, but the video and voice generation pipeline only runs when the right file has been written: summary.json.

An example of a hook payload:

{
"timestamp": "2026-04-21T08:28:53.253Z",
"hook_event_name": "PostToolUse",
"session_id": "<session id guid>",
"transcript_path": "/Users/<user>/Library/Application Support/Code/User/workspaceStorage/<id>/GitHub.copilot-chat/transcripts/<id>.jsonl",
"tool_name": "create_file",
"tool_input": "...",
"tool_response": "",
"tool_use_id": "toolu_vrtx_01Tue5PksWNk5eSmAvMJphY3__vscode-1776759846716",
"cwd": "/Users/<user>/Documents/Repositories/changelog-video-tts-generator"
}

If a hook is triggered and the file is not summary.json it logs the following:

2026-04-21 08:42:13.110 [info] [#116] [PostToolUse] Output:
2026-04-21T06:42:13.0656950Z [INFO] Skipped: tool 'read_file' did not write summary.json.
view raw logs.txt hosted with ❤ by GitHub

You can find an example of the GitHub Copilot hook, the PowerShell script, and the helper function in my GitHub repository: https://github.com/johnlokerse/changelog-video-tts-generator/blob/main/.github/hooks

This stage is started by the GitHub Copilot hook and invokesInvoke-VideoPipeline.ps1 to perform three tasks in sequence: it synthesises per-scene MP3 audio from summary.json, scaffolds the Remotion project, and renders the finished video.

The script reads summary.json and builds a separate SSML document for each scene: intro, headline, one per highlight, an optional breaking-changes slide, and an outro. Each SSML block uses mstts:express-as with a media delivery style (via style="media") for a clear, broadcast-friendly tone:

<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis&#39;
xmlns:mstts='http://www.w3.org/2001/mstts&#39; xml:lang='en-US'>
<voice name='en-US-Jasper:MAI-Voice-1'>
<mstts:express-as style="media">
Hey everyone, and welcome to the latest Azure Bicep release update!
</mstts:express-as>
</voice>
</speak>
view raw ssml.xml hosted with ❤ by GitHub

For this blog, I went with the persona en-US-Jasper:MAI-Voice-1 and with expression media. Click here to learn more about which personas and expressions Microsoft Foundry provides.

Link to the Build-Ssml function in my GitHub repository: https://github.com/johnlokerse/changelog-video-tts-generator/blob/main/.github/hooks/scripts/Invoke-VideoPipeline.ps1#L139

The script sends each SSML document to the Azure Speech REST API and saves the generated audio as an MP3 file:

function Invoke-Tts {
param(
[string]$Ssml,
[string]$OutputFile
)
$headers = @{
'Ocp-Apim-Subscription-Key' = $script:SubscriptionKey
'Content-Type' = 'application/ssml+xml'
'X-Microsoft-OutputFormat' = 'audio-48khz-192kbitrate-mono-mp3'
'User-Agent' = 'MAIVoiceDemo'
}
Invoke-WebRequest -Uri $script:Endpoint -Method Post -Headers $headers -Body $Ssml -OutFile $OutputFile
}
view raw Invoke-Tts.ps1 hosted with ❤ by GitHub

After each MP3 file is saved, the script measures its actual duration using afinfo on macOS or ffprobe on other platforms. It then writes an audio/audio-manifest.json file containing the duration in seconds and frames The output of audio-manifest.json looks like this:

{
"intro": {
"file": "audio/intro.mp3",
"durationSeconds": 12.17,
"durationFrames": 396
},
"headline": {
"file": "audio/headline.mp3",
"durationSeconds": 10.13,
"durationFrames": 334
},
"highlight-0": {
"file": "audio/highlight-0.mp3",
"durationSeconds": 8.30,
"durationFrames": 280
}
}

The audio files and manifest are saved in the audio folder:

MP3 files and a manifest file saved in the audio folder

You can find these examples in my GitHub repository https://github.com/johnlokerse/changelog-video-tts-generator/tree/main/audio

Once the audio is ready, the script builds the video. It first runs a scaffold step that takes summary.json and the audio manifest, then generates a ready-to-render Remotion project in the bicep-video/ folder.

The scaffold script does a few things:

  • Generates the Remotion source files: it writes the React component files that make up each slide (intro, headline, highlights, outro), populating them with the content from summary.json.
  • Copies the audio files: it moves each per-scene MP3 into bicep-video/public/audio/ so Remotion can serve them as static assets during the render.
  • Sets scene durations: it reads durationFrames from the audio manifest and uses those values to size each slide so it matches the length of its narration clip exactly.
  • Styling: additionally, the scaffold handles the styling of the video.

Then it installs the project’s dependencies and starts the render via essentially this set of commands, but it is triggered via the scaffold.js:

# Scaffold
node remotion-scaffold/scaffold.js `
--input summary.json `
--output bicep-video/ `
--audio audio/audio-manifest.json
# Install and render
cd bicep-video
npm install --prefer-offline --no-audit --no-fund
npx remotion render BicepUpdate --output output/bicep-update-v0.41.2.mp4

The scaffold was generated beforehand by GitHub Copilot. You can find the scaffold example in my GitHub repository: https://github.com/johnlokerse/changelog-video-tts-generator/blob/main/remotion-scaffold/scaffold.js

Stage output

  1. Per-scene MP3 files, for example intro.mp3headline.mp3highlight-0.mp3
  2. audio-manifest.json which maps each scene name to its file path, duration in seconds, and duration in frames
  3. Final MP4 video (e.g. output/bicep-update-v0.41.2.mp4), rendered via Remotion from the scaffold

After running the full pipeline the output is a generated changelog video that includes the voice-over:

You might wonder why the audio and render steps are handled by a hook instead of additional Agent Skills. The key difference is determinism.

A skill can instruct Copilot to call a script, but the agent still decides when and how to use it. A hook is deterministic because it executes your code at a defined lifecycle point, regardless of how the agent was prompted.

In short, a skill helps Copilot decide what to do. A hook guarantees that your action/command/script runs when a specific lifecycle event happens.

At the moment, the interface is quite limited, and there is no easy way to see when a hook has fired apart from a message in the Copilot chat window. However, you can view the log trail of fired hooks in VS Code, including the executed events and payloads.

In VS Code, open the Output view and select GitHub Copilot Chat Hooks from the drop-down list to see the hook logs. These are live logs, meaning the Output view is updated after every Copilot call.

GitHub Copilot hooks output logs

In the image below, you can see the logs of the hook I created. First, the hook is executed. The payload then shows the tool name create_file, which means the script should continue. After that, you can see the logs for the voice-over generation, scaffolding, and rendering of the Remotion video.

GitHub Copilot hook execution log

GitHub Copilot Hooks in VS Code is in preview, and there are a few limitations I would like to see improved:

  • Trigger hooks from Agent Skills
    It is currently not possible to trigger hooks directly from called Agent Skills. This would make it easier to chain skills and hooks together in more advanced automation scenarios.
  • Native file-change conditions
    Today, you need to build your own middleware to check whether specific files have been created, modified, or deleted. It would be useful if hooks could support file-based conditions out of the box.
  • Better visibility and debugging experience
    The current user experience makes it difficult to see when hooks are triggered, how they are triggered, and what output they produce. A dedicated hooks view, structured logs, or clearer output in the chat window would make using GitHub Copilot Hooks much easier.

This is what you currently see in the chat window that indicates GitHub Copilot triggered a hook:

Hooks trigger indication in GitHub Copilot chat

This is how you can combine GitHub Copilot Agent Skills, a PostToolUse hook in VS Code, Azure Speech, and Remotion to fully automate narrated Azure Bicep changelog videos.

This was a fun project to set up, and it shows the power of hooks in GitHub Copilot. Hooks can help you save tokens by moving deterministic work out of the agent flow, while giving you more certainty that scripts run when specific lifecycle events happen.

The examples in this blog are entirely focused on generating Azure Bicep changelog videos, but the skills and the scaffold can be reused for repositories you want to generate videos of.

Leave a comment