Edit & Generate Video with Bernini VideoFree & Open Source
Supports 3 to 15 seconds.

Everything You Can Do With Bernini Video
One unified model. Seven task types. Edit, generate, and compose — all from text prompts, images, or reference footage. No separate tools needed.
Video Editing (V2V) — 12+ Edit Types
Upload a source video, describe what you want to change in natural language, and Bernini Video applies the edit while preserving everything else. Change backgrounds, add or remove objects, transform styles, shift weather, adjust expressions, swap camera angles, add effects — 12+ editing operations in one model. Reaches first-tier quality matching leading closed-source commercial video editors.
Reference-Guided Editing (RV2V)
Combine a source video with reference images to guide precise edits. Upload a reference image and tell Bernini what to pull from it — object appearance, material texture, background scene, art style, or weather atmosphere. The DiT renderer preserves fine VAE details from the source so unedited regions stay pixel-perfect. Get exactly the look you want without trial-and-error prompting.
Text to Video Generation (T2V)
Turn a text description into a video clip. The MLLM semantic planner reasons about composition, motion timing, and object relationships before the DiT renderer produces the frames — so complex, multi-part prompts come out more faithfully. Ideal for B-roll, concept visualizations, or starting footage you'll refine with Bernini's editing tools.
Reference-to-Video (R2V)
Upload up to five reference images — characters, outfits, backgrounds, props — and Bernini Video assembles them into a single coherent video with consistent details across every frame. Use references to lock in subject appearance, material palette, or visual style without relying on prompt engineering alone.
Content Insertion (VV2V)
Place a product shot onto a billboard. Put your logo on a screen. Composite one video into another. Bernini Video inserts images or video clips into existing footage with natural blending — ideal for product placement, branded content, and scene compositing without a separate VFX tool.
Text to Image & Image Editing (T2I/I2I)
Bernini Video handles still images too — generate from text prompts or edit existing images on a single GPU. The same semantic planning pipeline works across stills and motion, so you can concept in images first and graduate to video in the same tool.
Start Editing & Generating in 3 Steps
No GPU, no installation, no setup. Use Bernini Video online or run it locally — the choice is yours.
1. Describe what you want to create or edit
Enter a text prompt describing the video you want to generate or the edit you want to apply. For reference-based tasks, upload source images or video clips. Bernini Video reads text, image, and video inputs together — describe the change in plain English and let the MLLM planner figure out the rest.
2. Choose your task and generate
Pick from text-to-video, video editing, reference-to-video, or content insertion. The semantic planner works out the target scene, then the DiT renderer synthesizes the frames. Adjust your prompt and re-run for variations. Also works with ComfyUI through community integration for node-based workflows.
3. Download, use, and own your video
Generation completes in minutes depending on length and resolution. Download the result — no watermark, no usage restrictions. Use it for social media, marketing, client work, or creative projects. Commercial use is fully covered under Apache 2.0. Outputs belong to you.
Who Uses Bernini Video
From content creators to indie hackers to researchers — anyone who needs AI video editing and generation without the subscription trap.
AI Content Creators
You're already making AI-generated content and want to push beyond basic text-to-video. Bernini Video lets you edit existing footage with text prompts — change expressions, swap backgrounds, add objects — without re-rendering from scratch. One unified tool for both generation and editing instead of stitching together multiple paid services.
Open Source Developers & Indie Hackers
You want AI video capabilities in your product but can't justify $12–35/month per user for Runway or Pika APIs. Bernini Video is Apache 2.0 licensed — integrate it, modify it, deploy it. Zero per-use cost, no API rate limits, full control over the stack. Built on open foundations (Wan 2.2, Qwen2.5-VL).
Privacy-Conscious Video Professionals
You work with sensitive footage that can't leave your machine — client projects, internal communications, unreleased products. Bernini Video runs entirely locally. No cloud uploads, no third-party data processing, no privacy policy to worry about. Your data stays on your hardware.
AI Researchers & Students
You're working on video generation, editing, or multimodal AI and need a strong open source baseline. Bernini Video achieves SOTA on video editing benchmarks with a novel MLLM-planner + DiT-renderer architecture. Full code and weights available — reproduce, modify, and build on published research (arXiv 2605.22344).
ComfyUI Enthusiasts & Workflow Builders
You build custom AI pipelines in ComfyUI and want to add video editing to your node graph. The community has integrated Bernini Video nodes — chain video edits with other models in your existing setup. Drop Bernini into your workflow instead of learning yet another tool.
Why Bernini Video Over Closed-Source Alternatives
The open source AI video editor that replaces your Runway or Pika subscription — free, private, and fully customizable.
100% Free. No Subscription. No Watermark.
Closed-source tools charge $12–35/month per user and still cap your generations. Bernini Video is Apache 2.0 — download the weights, run the code, generate as much as your hardware allows. No credit card, no usage limits, no watermark on your output. Use it for commercial projects, client work, or product videos with zero licensing fees — forever.
Your Data Never Leaves Your Machine
Runway, Pika, and Kling process your videos on their cloud — which means your content, your client footage, and your unreleased projects sit on someone else's server. Bernini Video runs entirely locally. All inference happens on your own hardware. Work offline, protect sensitive footage, and stay compliant with no third-party data processing.
Smarter Edits Through Semantic Planning
Most video AI tools jump straight from prompt to pixels — which is why they struggle with complex instructions. Bernini Video inserts a semantic planning step: the MLLM reasons about composition, object relationships, and motion logic before any frame is rendered. The result: better instruction following on multi-part prompts, and stronger consistency during edits where unchanged regions must stay intact.
Why Bernini Video Costs Nothing — and Always Will
Open source under Apache 2.0 means zero subscription fees, full model access, and the freedom to run it wherever you want.
100% Free & Open Source
Apache 2.0 license. No monthly fees, no credit card, no usage caps, no watermark. Download the weights, run the code, modify it, deploy it — all at zero cost. Compare that to $12–35/month per seat for closed-source alternatives.
Runs Locally — Your Data, Your Machine
All inference happens on your own hardware. Your videos never touch a third-party server. Work offline, protect sensitive footage, and stay compliant — no cloud uploads, no data processing by anyone but you.
Built by ByteDance, Backed by Research
Developed and open-sourced by one of the world's leading AI research organizations. Published on arXiv (2605.22344) with reproducible benchmarks, open weights on Hugging Face, and SOTA results on video editing leaderboards.
Technical Highlights
The architecture and specs that power Bernini Video's editing and generation — research-backed, open source, and built for real workloads.
MLLM Planner + DiT Renderer
A two-stage architecture: the MLLM semantic planner (Qwen2.5-VL) reasons about composition, motion, and object relationships first, then the DiT renderer (Wan 2.2) synthesizes the actual video frames. This separation means the model thinks before it draws — producing better instruction following on complex, multi-part prompts.
SA-3D RoPE Encoding
Segment-Aware 3D RoPE positional encoding distinguishes tokens from different visual inputs — source video, reference images, and generated content stay cleanly separated throughout the diffusion process. Critical for editing tasks where unchanged regions must stay pixel-perfect.
480p–720p at up to 24fps
Configurable output from 480p/16fps to 720p/24fps. Video length configurable via frame count — typically 2 to 15 seconds per generation. Single GPU handles image tasks; 8 GPUs recommended for full-quality video inference.
7 Task Types, One Architecture
T2V, I2V, V2V, RV2V, R2V, Content Insertion (VV2V), and T2I/I2I — all seven tasks run through the same unified MLLM + DiT pipeline. No switching between separate models or tools for different jobs.
What Is Bernini Video — and Why It Matters
Bernini Video is ByteDance's open source AI video editor and generator — a unified framework that handles video editing, text-to-video generation, reference-to-video creation, and content insertion in a single model. Most AI video tools force you to choose: generate from text with one tool, edit footage with another, animate from images with a third. Bernini Video does all of them in one architecture, and it's completely free under Apache 2.0. Under the hood, an MLLM-based semantic planner (Qwen2.5-VL) reasons about the scene first — working out composition, object relationships, and motion logic — then a DiT-based renderer (Wan 2.2) turns that plan into actual video frames. This two-stage approach means the model thinks before it draws, producing better instruction following for complex prompts and stronger consistency during edits where unchanged regions must stay intact. Built on Wan 2.2 as its video foundation, Bernini adds semantic-level understanding on top — so instead of just generating a video from text, it can reason about edits like 'change the background to a mountain scene' or 'make the character smile' while preserving everything else in the frame. Weights on Hugging Face, code on GitHub, published on arXiv (2605.22344, May 2026).
Free and Open Source — No Strings Attached
Bernini Video is Apache 2.0 licensed. Download the model, run it locally, modify the code, use it commercially — all free. Hosted online access is available with free trial credits to get started instantly.
Need more credits?
One-time purchase. Add credits anytime - works alongside any plan.
Frequently Asked Questions
What is Bernini Video?
Is Bernini Video really free?
How does Bernini Video compare to Runway?
What kind of video editing can Bernini Video do?
Can I use Bernini Video without a GPU?
Does Bernini Video work with ComfyUI?
Is my video data safe with Bernini Video?
Can I use Bernini Video for commercial projects?
What's the difference between Bernini Video and Wan 2.2?
Do I own the videos I create with Bernini Video?
What GPU or VRAM do I need to run Bernini Video locally?
Does Bernini Video add watermarks to output videos?
Ready to Edit & Generate Video for Free?
Start using Bernini Video today — open source, no subscription, no watermark. Edit videos with text prompts, generate from images, run locally. The free, open source alternative to Runway and Pika.