Skip to content

First Generation

Start with a short verification render before attempting a long audio-driven clip.

Suggested baseline:

  • image mode enabled
  • short clean reference audio
  • 960x544 or 480x832
  • 5 seconds
  • fixed seed

Workflow import

Use the archived App workflow:

Keep the notebook as a dependency reference:

Input checklist

  • prompt text is set
  • reference image is loaded
  • reference audio is loaded
  • the model files listed in the setup guide exist on the remote GPU machine
  • USE ONLY VOCALS is chosen intentionally

What to validate

  • the MP4 is created
  • lip-sync and facial framing are plausible
  • duration and aspect ratio match expectations closely enough
  • the remote GPU machine stays stable under the chosen settings

Comparison rule

If the batch is about model comparison:

  • keep prompt fixed
  • keep image fixed
  • keep audio fixed
  • keep duration fixed
  • change only one or two variables at a time

If you already have the committed API prompt, use that for direct /prompt execution instead of repeating UI export.

Output Retrieval Rule

  • After pulling outputs back from the remote machine, compare the local file size with the recorded remote size before treating the file as valid.

Released under the MIT License.