Bernini Video

Edit & Generate Video with Bernini VideoFree & Open Source

The open source AI video editor by ByteDance. Edit videos with text prompts, generate from images, and run locally — no subscription, no watermark, your data stays on your machine. Built on MLLM semantic planning and DiT rendering, released under Apache 2.0.

Free & Open Source12+ Video Edit TypesRuns Locally

Models

Prompt*

Aspect Ratio

Resolution

Duration

33-15

Supports 3 to 15 seconds.

Everything You Can Do With Bernini Video

One unified model. Seven task types. Edit, generate, and compose — all from text prompts, images, or reference footage. No separate tools needed.

Video Editing (V2V) — 12+ Edit Types

Upload a source video, describe what you want to change in natural language, and Bernini Video applies the edit while preserving everything else. Change backgrounds, add or remove objects, transform styles, shift weather, adjust expressions, swap camera angles, add effects — 12+ editing operations in one model. Reaches first-tier quality matching leading closed-source commercial video editors.

Reference-Guided Editing (RV2V)

Combine a source video with reference images to guide precise edits. Upload a reference image and tell Bernini what to pull from it — object appearance, material texture, background scene, art style, or weather atmosphere. The DiT renderer preserves fine VAE details from the source so unedited regions stay pixel-perfect. Get exactly the look you want without trial-and-error prompting.

Text to Video Generation (T2V)

Turn a text description into a video clip. The MLLM semantic planner reasons about composition, motion timing, and object relationships before the DiT renderer produces the frames — so complex, multi-part prompts come out more faithfully. Ideal for B-roll, concept visualizations, or starting footage you'll refine with Bernini's editing tools.

Reference-to-Video (R2V)

Upload up to five reference images — characters, outfits, backgrounds, props — and Bernini Video assembles them into a single coherent video with consistent details across every frame. Use references to lock in subject appearance, material palette, or visual style without relying on prompt engineering alone.

Content Insertion (VV2V)

Place a product shot onto a billboard. Put your logo on a screen. Composite one video into another. Bernini Video inserts images or video clips into existing footage with natural blending — ideal for product placement, branded content, and scene compositing without a separate VFX tool.

Text to Image & Image Editing (T2I/I2I)

Bernini Video handles still images too — generate from text prompts or edit existing images on a single GPU. The same semantic planning pipeline works across stills and motion, so you can concept in images first and graduate to video in the same tool.

Start Editing & Generating in 3 Steps

No GPU, no installation, no setup. Use Bernini Video online or run it locally — the choice is yours.

1. Describe what you want to create or edit

Enter a text prompt describing the video you want to generate or the edit you want to apply. For reference-based tasks, upload source images or video clips. Bernini Video reads text, image, and video inputs together — describe the change in plain English and let the MLLM planner figure out the rest.

2. Choose your task and generate

Pick from text-to-video, video editing, reference-to-video, or content insertion. The semantic planner works out the target scene, then the DiT renderer synthesizes the frames. Adjust your prompt and re-run for variations. Also works with ComfyUI through community integration for node-based workflows.

3. Download, use, and own your video

Generation completes in minutes depending on length and resolution. Download the result — no watermark, no usage restrictions. Use it for social media, marketing, client work, or creative projects. Commercial use is fully covered under Apache 2.0. Outputs belong to you.

Who Uses Bernini Video

From content creators to indie hackers to researchers — anyone who needs AI video editing and generation without the subscription trap.

AI Content Creators

You're already making AI-generated content and want to push beyond basic text-to-video. Bernini Video lets you edit existing footage with text prompts — change expressions, swap backgrounds, add objects — without re-rendering from scratch. One unified tool for both generation and editing instead of stitching together multiple paid services.

Open Source Developers & Indie Hackers

You want AI video capabilities in your product but can't justify $12–35/month per user for Runway or Pika APIs. Bernini Video is Apache 2.0 licensed — integrate it, modify it, deploy it. Zero per-use cost, no API rate limits, full control over the stack. Built on open foundations (Wan 2.2, Qwen2.5-VL).

Privacy-Conscious Video Professionals

You work with sensitive footage that can't leave your machine — client projects, internal communications, unreleased products. Bernini Video runs entirely locally. No cloud uploads, no third-party data processing, no privacy policy to worry about. Your data stays on your hardware.

AI Researchers & Students

You're working on video generation, editing, or multimodal AI and need a strong open source baseline. Bernini Video achieves SOTA on video editing benchmarks with a novel MLLM-planner + DiT-renderer architecture. Full code and weights available — reproduce, modify, and build on published research (arXiv 2605.22344).

ComfyUI Enthusiasts & Workflow Builders

You build custom AI pipelines in ComfyUI and want to add video editing to your node graph. The community has integrated Bernini Video nodes — chain video edits with other models in your existing setup. Drop Bernini into your workflow instead of learning yet another tool.

Why Bernini Video Over Closed-Source Alternatives

The open source AI video editor that replaces your Runway or Pika subscription — free, private, and fully customizable.

100% Free. No Subscription. No Watermark.

Closed-source tools charge $12–35/month per user and still cap your generations. Bernini Video is Apache 2.0 — download the weights, run the code, generate as much as your hardware allows. No credit card, no usage limits, no watermark on your output. Use it for commercial projects, client work, or product videos with zero licensing fees — forever.

Your Data Never Leaves Your Machine

Runway, Pika, and Kling process your videos on their cloud — which means your content, your client footage, and your unreleased projects sit on someone else's server. Bernini Video runs entirely locally. All inference happens on your own hardware. Work offline, protect sensitive footage, and stay compliant with no third-party data processing.

Smarter Edits Through Semantic Planning

Most video AI tools jump straight from prompt to pixels — which is why they struggle with complex instructions. Bernini Video inserts a semantic planning step: the MLLM reasons about composition, object relationships, and motion logic before any frame is rendered. The result: better instruction following on multi-part prompts, and stronger consistency during edits where unchanged regions must stay intact.

Why Bernini Video Costs Nothing — and Always Will

Open source under Apache 2.0 means zero subscription fees, full model access, and the freedom to run it wherever you want.

100% Free & Open Source

Apache 2.0 license. No monthly fees, no credit card, no usage caps, no watermark. Download the weights, run the code, modify it, deploy it — all at zero cost. Compare that to $12–35/month per seat for closed-source alternatives.

Runs Locally — Your Data, Your Machine

All inference happens on your own hardware. Your videos never touch a third-party server. Work offline, protect sensitive footage, and stay compliant — no cloud uploads, no data processing by anyone but you.

Built by ByteDance, Backed by Research

Developed and open-sourced by one of the world's leading AI research organizations. Published on arXiv (2605.22344) with reproducible benchmarks, open weights on Hugging Face, and SOTA results on video editing leaderboards.

Technical Highlights

The architecture and specs that power Bernini Video's editing and generation — research-backed, open source, and built for real workloads.

MLLM Planner + DiT Renderer

A two-stage architecture: the MLLM semantic planner (Qwen2.5-VL) reasons about composition, motion, and object relationships first, then the DiT renderer (Wan 2.2) synthesizes the actual video frames. This separation means the model thinks before it draws — producing better instruction following on complex, multi-part prompts.

SA-3D RoPE Encoding

Segment-Aware 3D RoPE positional encoding distinguishes tokens from different visual inputs — source video, reference images, and generated content stay cleanly separated throughout the diffusion process. Critical for editing tasks where unchanged regions must stay pixel-perfect.

480p–720p at up to 24fps

Configurable output from 480p/16fps to 720p/24fps. Video length configurable via frame count — typically 2 to 15 seconds per generation. Single GPU handles image tasks; 8 GPUs recommended for full-quality video inference.

7 Task Types, One Architecture

T2V, I2V, V2V, RV2V, R2V, Content Insertion (VV2V), and T2I/I2I — all seven tasks run through the same unified MLLM + DiT pipeline. No switching between separate models or tools for different jobs.

What Is Bernini Video — and Why It Matters

Bernini Video is ByteDance's open source AI video editor and generator — a unified framework that handles video editing, text-to-video generation, reference-to-video creation, and content insertion in a single model. Most AI video tools force you to choose: generate from text with one tool, edit footage with another, animate from images with a third. Bernini Video does all of them in one architecture, and it's completely free under Apache 2.0. Under the hood, an MLLM-based semantic planner (Qwen2.5-VL) reasons about the scene first — working out composition, object relationships, and motion logic — then a DiT-based renderer (Wan 2.2) turns that plan into actual video frames. This two-stage approach means the model thinks before it draws, producing better instruction following for complex prompts and stronger consistency during edits where unchanged regions must stay intact. Built on Wan 2.2 as its video foundation, Bernini adds semantic-level understanding on top — so instead of just generating a video from text, it can reason about edits like 'change the background to a mountain scene' or 'make the character smile' while preserving everything else in the frame. Weights on Hugging Face, code on GitHub, published on arXiv (2605.22344, May 2026).

Edit videos with text prompts (12+ edit types), generate from text, and create from reference images — all in one unified model.

Two-stage architecture: MLLM semantic planner reasons about the scene first, DiT renderer produces frames second — the model thinks before it draws.

Apache 2.0 open source: completely free to use, modify, deploy commercially, and run locally — no subscription, no watermark, no vendor lock-in.

Free and Open Source — No Strings Attached

Bernini Video is Apache 2.0 licensed. Download the model, run it locally, modify the code, use it commercially — all free. Hosted online access is available with free trial credits to get started instantly.

Basic

$15.9/month

Unlock video and image generation. With 1,200 credits, generate up to about 600 basic images at 2 credits each.

1,200 credits included every month
Up to about 600 basic images at 2 credits per image
About 20 standard videos at 60 credits per video
Unlock advanced video and image models, including Kling, Veo, Seedance, LTX, Nano Banana, GPT Image 2, and more
Supports text-to-image, image-to-image, text-to-video, image-to-video, first/last-frame video, and motion control
Full commercial use rights included
24/7 customer support
No watermark on exported videos

Popular

Pro

$29.9/month

For steady image and video production. With 3,000 credits, generate up to about 1,500 basic images at 2 credits each.

3,000 credits included every month
Up to about 1,500 basic images at 2 credits per image
About 50 standard videos at 60 credits per video
Unlock advanced video and image models, including Kling, Veo, Seedance, LTX, Nano Banana, GPT Image 2, and more
Supports text-to-image, image-to-image, text-to-video, image-to-video, first/last-frame video, and motion control
Full commercial use rights included
24/7 customer support
No watermark on exported videos

Max

$69.9/month

For teams and high-volume production. With 8,000 credits, generate up to about 4,000 basic images at 2 credits each.

8,000 credits included every month
Up to about 4,000 basic images at 2 credits per image
About 133 standard videos at 60 credits per video
Unlock advanced video and image models, including Kling, Veo, Seedance, LTX, Nano Banana, GPT Image 2, and more
Supports text-to-image, image-to-image, text-to-video, image-to-video, first/last-frame video, and motion control
Full commercial use rights included
24/7 customer support
No watermark on exported videos

Top up

Need more credits?

One-time purchase. Add credits anytime - works alongside any plan.

$9.9600credits

Valid for 30 days600 credits that unlock advanced models. Generate up to about 300 basic images at 2 credits each, or about 10 standard videos. Valid for 30 daysCredit packs also unlock advanced video and image generation; only the credit amount and validity differ

Frequently Asked Questions

What is Bernini Video?

Bernini Video is ByteDance's open source AI video editor and generator, released under Apache 2.0. It combines an MLLM semantic planner with a DiT renderer to handle video editing (12+ edit types), text-to-video generation, reference-to-video creation, and content insertion — all in a single unified model. Think of it as a free, open source alternative to Runway or Pika that you can run on your own machine.

Is Bernini Video really free?

Yes, completely free. Bernini Video is open source under the Apache 2.0 license. You can download the code and model weights, run them locally, modify them, and even use them commercially — all at zero cost. No monthly subscription, no per-generation fee, no credit card required, no watermark on your output.

How does Bernini Video compare to Runway?

Bernini Video is open source and free, while Runway costs $12–35/month. On video editing quality, Bernini reaches first-tier performance matching leading closed-source commercial models (based on blind human evaluation). Runway has a polished web UI and stronger raw text-to-video visual quality. Bernini offers stronger editing consistency, open weights, local deployment for data privacy, and zero licensing cost. The tradeoff: you trade some visual polish in pure generation for complete freedom, privacy, and no recurring fees.

What kind of video editing can Bernini Video do?

Bernini Video supports 12+ types of video edits: style transfer, background replacement, object addition and removal, weather changes, facial expression changes, camera angle adjustments, focus shifts, temporal reasoning edits (actions across time), character interaction changes, special effect overlays, material and texture swaps, and more — all controlled by natural language prompts or reference images.

Can I use Bernini Video without a GPU?

Yes — through hosted online services that run the model in the cloud, you can generate and edit video from any device with no GPU, no installation, and no setup. Self-hosting requires 16GB+ VRAM for image tasks and 8 GPUs for full-quality video inference, but you don't need any of that to get started online.

Does Bernini Video work with ComfyUI?

Yes. The ComfyUI community has integrated Bernini Video through a community PR. You can use Bernini nodes in your existing ComfyUI workflows alongside other models and tools — chain video edits, combine with upscalers, or build custom multi-model pipelines.

Is my video data safe with Bernini Video?

Yes — because when you run it locally, your videos never leave your machine. Unlike cloud-based tools like Runway or Pika that require uploading your content to their servers, Bernini Video runs entirely on your own hardware. No uploads, no third-party data processing, no privacy concerns. You can even use it completely offline after the initial model download.

Can I use Bernini Video for commercial projects?

Yes. The Apache 2.0 license allows commercial use, modification, and distribution. You can integrate Bernini Video into commercial products, fine-tune it on your own data, use outputs for client work, social media, advertising, or product videos — all without licensing restrictions or royalty payments.

What's the difference between Bernini Video and Wan 2.2?

Bernini Video uses Wan 2.2 as its base video diffusion model but adds an MLLM-based semantic planner on top. Think of it this way: Wan 2.2 handles the pixel-level rendering, while Bernini's planner understands what you want semantically — so it can execute complex edits like 'change the background to a mountain scene but keep the lighting consistent' that pure text-to-video models struggle with. If you only need basic text-to-video, Wan 2.2 alone works. If you need editing precision and multi-modal inputs, you need Bernini.

Do I own the videos I create with Bernini Video?

Yes. Because Bernini Video is released under Apache 2.0, every output you generate belongs to you. Bernini does not add watermarks. Use your creations for commercial purposes — social media, advertising, client work, product videos — without restrictions from the model license.

What GPU or VRAM do I need to run Bernini Video locally?

For image tasks (T2I/I2I): a single GPU with 16GB+ VRAM works well. For video at 480p: a 16GB GPU with the distill LoRA gives reasonable speeds. Full-quality 480p/16fps video inference uses 8 GPUs (H100/H800 recommended). 720p is possible but significantly slower. Check the GitHub README for the latest hardware recommendations and the ComfyUI integration for easier local setup.

Does Bernini Video add watermarks to output videos?

No. Bernini Video does not add any watermarks to generated or edited videos. Everything you create is clean output that you own fully.

Ready to Edit & Generate Video for Free?

Start using Bernini Video today — open source, no subscription, no watermark. Edit videos with text prompts, generate from images, run locally. The free, open source alternative to Runway and Pika.

Try Free Online