Notable points:
– All Imagery, Movement, and Voices were generated by AI.
– AI invented an alien language in about 10 seconds. We would feed it lines, and it would output the language with verb conjugations, tense, etc.
– There’s no reason why you couldn’t use the same tech to create a long-form content piece.
– Lip syncing still needs to be improved, but with recent innovations, we’re only a few months away from public-facing lip sync tools that feel natural and can overlay on existing video. We used Wav2Lip, but other tools like lalamu are also very helpful.
– The music and sound effects were from a stock library, but that was simply a time constraint. We could have easily used new tools like Stable Audio to create the sound. Sound effects need to be from a library at this point, we haven’t found any good AI sound effects tools.
– Audio generations in Elevenlabs took about 2 iterations each.
– Midjourney iterations took about 20 renders per shot to get the right scene (with repeaters to save time).
– Pika renders took about 5 renders per shot. The camera controls made movement a lot easier to control. Adding more motion than the default value seemed to work pretty well.
– We upresed using Topaz Video