First Generation
Recommended first run
Start with a short verification render before attempting a long audio-driven clip.
Suggested baseline:
- image mode enabled
- short clean reference audio
960x544or480x8325seconds- fixed seed
Workflow import
Use the archived App workflow:
Keep the notebook as a dependency reference:
Input checklist
- prompt text is set
- reference image is loaded
- reference audio is loaded
- the model files listed in the setup guide exist on the remote GPU machine
USE ONLY VOCALSis chosen intentionally
What to validate
- the MP4 is created
- lip-sync and facial framing are plausible
- duration and aspect ratio match expectations closely enough
- the remote GPU machine stays stable under the chosen settings
Comparison rule
If the batch is about model comparison:
- keep prompt fixed
- keep image fixed
- keep audio fixed
- keep duration fixed
- change only one or two variables at a time
If you already have the committed API prompt, use that for direct /prompt execution instead of repeating UI export.
Output Retrieval Rule
- After pulling outputs back from the remote machine, compare the local file size with the recorded remote size before treating the file as valid.