-
Notifications
You must be signed in to change notification settings - Fork 17
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 9c176da
Showing
19 changed files
with
4,242 additions
and
0 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
![Screenshot 2023-08-04 212706](https://github.com/Hillobar/Rope/assets/63615199/114b4073-9a25-42cc-844d-1afc3625907b) | ||
|
||
# Rope | ||
Rope implements the insightface inswapper_128 model with a helpful GUI. | ||
### Discord link: ### | ||
[Discord](https://discord.gg/5CxhgRKBdN) | ||
|
||
### Features: ### | ||
* Ugly GUI, but incredible features and fast workflow | ||
* Fastest face swapper available | ||
* Real-time video player | ||
* Occlusion functions | ||
|
||
### Changes for Rope - Space Worm: ### | ||
* Updated video rendering to use Target Video parameters | ||
* Mousewheel scroll on the time bar to control frame position | ||
* Added an occluder model (experimental, very fast, make sure you download the new model-link below) | ||
* Greatly increased performance for larger videos/multiple faces | ||
* CLIP crashing fixed. Add as many words as you like! | ||
* Detachable video preview | ||
* Fixed most bugs related to changing options while playing. Adjust setting on the fly! | ||
* GFPGAN now renders up to 512x512 | ||
* Status bar (still adding features to this) | ||
|
||
### Known bugs: ### | ||
* Stop video playback before loading a new video, or bork | ||
|
||
### Preview: ### | ||
![Screenshot 2023-08-05 154156](https://github.com/Hillobar/Rope/assets/63615199/921698ab-af0e-43ca-b669-a2b2537d5c0f) | ||
### Getting Started: ### | ||
![Screenshot 2023-08-05 152851](https://github.com/Hillobar/Rope/assets/63615199/68b4ec4e-615f-4fd6-9215-f5a2ae8314b4) | ||
### Features: ### | ||
![Screenshot 2023-08-05 152835](https://github.com/Hillobar/Rope/assets/63615199/4e64237e-7d0f-4a83-a738-64b0df206766) | ||
|
||
### Disclaimer: ### | ||
Rope is a personal project that I'm making available to the community as a thank you for all of the contributors ahead of me. I don't have time to troubleshoot or add requested features, so it is provided as-is. Don't look at this code for example of good coding practices. I am primarily focused on performance and my specific use cases. There are plenty of ways to bork the workflow. Please see how to use below. | ||
|
||
### Install: ### | ||
Note: It's only configured for CUDA (Nvidia) | ||
* Set up a local venv | ||
* python.exe -m venv venv | ||
* Activate your new venv | ||
* .\venv\Scripts\activate | ||
* Install requirements | ||
* .\venv\Scripts\pip.exe install -r .\requirements.txt | ||
* Place [GFPGANv1.4.onnx](https://github.com/Hillobar/Rope/releases/download/Space_Worm/GFPGANv1.4.onnx), [inswapper_128_fp16.onnx](https://github.com/Hillobar/Rope/releases/download/Space_Worm/inswapper_128.fp16.onnx), and [occluder.ckpt](https://github.com/Hillobar/Rope/releases/download/Space_Worm/occluder.ckpt) in the root directory | ||
* Do this if you've never installed roop or Rope (or any other onnx runtimes): | ||
* Install CUDA Toolkit 11.8 | ||
* Install dependencies: | ||
* pip uninstall onnxruntime onnxruntime-gpu | ||
* pip install onnxruntime-gpu==1.15.1 | ||
* Double-click on Rope.bat! | ||
|
||
### To use: ### | ||
* Run Rope.bat | ||
* Set your Target Video, Source Faces, and Video Output folders | ||
* Buttons will be gold if they are not set | ||
* Only places videos or images in the respective folders. Other files my bork it | ||
* Rope creates a JSON file to remember your last set paths | ||
* I like to keep my folders <20 or so items. Helps to organize and reduces load times | ||
* Click on the Load Models button to initialize Rope | ||
* Select a video to load it into the player | ||
* Find Target Faces | ||
* Adds all faces in the current frame to the Found Faces pane | ||
* If a Face is already Found and in the pane, it won't re-add it | ||
* Click a Source Face | ||
* Source Face number will appear | ||
* Select a Target Face | ||
* Target Faces will show assignment number to the Source Face number | ||
* Toggle a Target Face to unselect and reassign to currently selected Source Face | ||
* Continue to select other Source Faces and assign them to Target Faces | ||
* Click SWAP to enable face swapping | ||
* Click PLAY to play | ||
* Click REC to arm recording | ||
* Click PLAY to start recording using the current settings | ||
* Click PLAY again to stop recording, otherwise it will record to the end of the Target Video | ||
* Toggle GFPGAN, adjust blending amount | ||
* Toggle Diffing, adjust blending amount | ||
* Lower the threshhold if you have multiple Source Faces assigned and they are jumping around. You can also try Clearing and Finding new Target Faces (disable SWAP first) | ||
* Modify the Masking boundaries | ||
* Use CLIP to identify objects to swap or not swap (e.g Pos: face, head; Neg: hair, hand), adjust the gain of the words, and set the blur amount around the items | ||
* Change # threads to match your GPU memory (24GB ~9 threads with GFPGAN on, more threads w/o GFPGAN) | ||
* Start with the lowest you think will run and watch your GPU memory. | ||
* Once you allocate memory by increasing # threads, you can't un-allocate it by reducing # threads. You will need to restart Rope. | ||
* In general, always stop the video before changing anything. Otherwise, it might bork. Reassigning faces is okay | ||
* If it does bork, reload the video (reclick on it). If that doesn't work you'll need to restart | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
call venv\Scripts\activate.bat | ||
python run_working.py | ||
pause |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
from .clip import * |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,245 @@ | ||
import hashlib | ||
import os | ||
import urllib | ||
import warnings | ||
from typing import Any, Union, List | ||
from pkg_resources import packaging | ||
|
||
import torch | ||
from PIL import Image | ||
from torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize | ||
from tqdm import tqdm | ||
|
||
from .model import build_model | ||
from .simple_tokenizer import SimpleTokenizer as _Tokenizer | ||
|
||
try: | ||
from torchvision.transforms import InterpolationMode | ||
BICUBIC = InterpolationMode.BICUBIC | ||
except ImportError: | ||
BICUBIC = Image.BICUBIC | ||
|
||
|
||
if packaging.version.parse(torch.__version__) < packaging.version.parse("1.7.1"): | ||
warnings.warn("PyTorch version 1.7.1 or higher is recommended") | ||
|
||
|
||
__all__ = ["available_models", "load", "tokenize"] | ||
_tokenizer = _Tokenizer() | ||
|
||
_MODELS = { | ||
"RN50": "https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt", | ||
"RN101": "https://openaipublic.azureedge.net/clip/models/8fa8567bab74a42d41c5915025a8e4538c3bdbe8804a470a72f30b0d94fab599/RN101.pt", | ||
"RN50x4": "https://openaipublic.azureedge.net/clip/models/7e526bd135e493cef0776de27d5f42653e6b4c8bf9e0f653bb11773263205fdd/RN50x4.pt", | ||
"RN50x16": "https://openaipublic.azureedge.net/clip/models/52378b407f34354e150460fe41077663dd5b39c54cd0bfd2b27167a4a06ec9aa/RN50x16.pt", | ||
"RN50x64": "https://openaipublic.azureedge.net/clip/models/be1cfb55d75a9666199fb2206c106743da0f6468c9d327f3e0d0a543a9919d9c/RN50x64.pt", | ||
"ViT-B/32": "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt", | ||
"ViT-B/16": "https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt", | ||
"ViT-L/14": "https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt", | ||
"ViT-L/14@336px": "https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt", | ||
} | ||
|
||
|
||
def _download(url: str, root: str): | ||
os.makedirs(root, exist_ok=True) | ||
filename = os.path.basename(url) | ||
|
||
expected_sha256 = url.split("/")[-2] | ||
download_target = os.path.join(root, filename) | ||
|
||
if os.path.exists(download_target) and not os.path.isfile(download_target): | ||
raise RuntimeError(f"{download_target} exists and is not a regular file") | ||
|
||
if os.path.isfile(download_target): | ||
if hashlib.sha256(open(download_target, "rb").read()).hexdigest() == expected_sha256: | ||
return download_target | ||
else: | ||
warnings.warn(f"{download_target} exists, but the SHA256 checksum does not match; re-downloading the file") | ||
|
||
with urllib.request.urlopen(url) as source, open(download_target, "wb") as output: | ||
with tqdm(total=int(source.info().get("Content-Length")), ncols=80, unit='iB', unit_scale=True, unit_divisor=1024) as loop: | ||
while True: | ||
buffer = source.read(8192) | ||
if not buffer: | ||
break | ||
|
||
output.write(buffer) | ||
loop.update(len(buffer)) | ||
|
||
if hashlib.sha256(open(download_target, "rb").read()).hexdigest() != expected_sha256: | ||
raise RuntimeError("Model has been downloaded but the SHA256 checksum does not not match") | ||
|
||
return download_target | ||
|
||
|
||
def _convert_image_to_rgb(image): | ||
return image.convert("RGB") | ||
|
||
|
||
def _transform(n_px): | ||
return Compose([ | ||
Resize(n_px, interpolation=BICUBIC), | ||
CenterCrop(n_px), | ||
_convert_image_to_rgb, | ||
ToTensor(), | ||
Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)), | ||
]) | ||
|
||
|
||
def available_models() -> List[str]: | ||
"""Returns the names of available CLIP models""" | ||
return list(_MODELS.keys()) | ||
|
||
|
||
def load(name: str, device: Union[str, torch.device] = "cuda" if torch.cuda.is_available() else "cpu", jit: bool = False, download_root: str = None): | ||
"""Load a CLIP model | ||
Parameters | ||
---------- | ||
name : str | ||
A model name listed by `clip.available_models()`, or the path to a model checkpoint containing the state_dict | ||
device : Union[str, torch.device] | ||
The device to put the loaded model | ||
jit : bool | ||
Whether to load the optimized JIT model or more hackable non-JIT model (default). | ||
download_root: str | ||
path to download the model files; by default, it uses "~/.cache/clip" | ||
Returns | ||
------- | ||
model : torch.nn.Module | ||
The CLIP model | ||
preprocess : Callable[[PIL.Image], torch.Tensor] | ||
A torchvision transform that converts a PIL image into a tensor that the returned model can take as its input | ||
""" | ||
if name in _MODELS: | ||
model_path = _download(_MODELS[name], download_root or os.path.expanduser("~/.cache/clip")) | ||
elif os.path.isfile(name): | ||
model_path = name | ||
else: | ||
raise RuntimeError(f"Model {name} not found; available models = {available_models()}") | ||
|
||
with open(model_path, 'rb') as opened_file: | ||
try: | ||
# loading JIT archive | ||
model = torch.jit.load(opened_file, map_location=device if jit else "cpu").eval() | ||
state_dict = None | ||
except RuntimeError: | ||
# loading saved state dict | ||
if jit: | ||
warnings.warn(f"File {model_path} is not a JIT archive. Loading as a state dict instead") | ||
jit = False | ||
state_dict = torch.load(opened_file, map_location="cpu") | ||
|
||
if not jit: | ||
model = build_model(state_dict or model.state_dict()).to(device) | ||
if str(device) == "cpu": | ||
model.float() | ||
return model, _transform(model.visual.input_resolution) | ||
|
||
# patch the device names | ||
device_holder = torch.jit.trace(lambda: torch.ones([]).to(torch.device(device)), example_inputs=[]) | ||
device_node = [n for n in device_holder.graph.findAllNodes("prim::Constant") if "Device" in repr(n)][-1] | ||
|
||
def _node_get(node: torch._C.Node, key: str): | ||
"""Gets attributes of a node which is polymorphic over return type. | ||
From https://github.com/pytorch/pytorch/pull/82628 | ||
""" | ||
sel = node.kindOf(key) | ||
return getattr(node, sel)(key) | ||
|
||
def patch_device(module): | ||
try: | ||
graphs = [module.graph] if hasattr(module, "graph") else [] | ||
except RuntimeError: | ||
graphs = [] | ||
|
||
if hasattr(module, "forward1"): | ||
graphs.append(module.forward1.graph) | ||
|
||
for graph in graphs: | ||
for node in graph.findAllNodes("prim::Constant"): | ||
if "value" in node.attributeNames() and str(_node_get(node, "value")).startswith("cuda"): | ||
node.copyAttributes(device_node) | ||
|
||
model.apply(patch_device) | ||
patch_device(model.encode_image) | ||
patch_device(model.encode_text) | ||
|
||
# patch dtype to float32 on CPU | ||
if str(device) == "cpu": | ||
float_holder = torch.jit.trace(lambda: torch.ones([]).float(), example_inputs=[]) | ||
float_input = list(float_holder.graph.findNode("aten::to").inputs())[1] | ||
float_node = float_input.node() | ||
|
||
def patch_float(module): | ||
try: | ||
graphs = [module.graph] if hasattr(module, "graph") else [] | ||
except RuntimeError: | ||
graphs = [] | ||
|
||
if hasattr(module, "forward1"): | ||
graphs.append(module.forward1.graph) | ||
|
||
for graph in graphs: | ||
for node in graph.findAllNodes("aten::to"): | ||
inputs = list(node.inputs()) | ||
for i in [1, 2]: # dtype can be the second or third argument to aten::to() | ||
if _node_get(inputs[i].node(), "value") == 5: | ||
inputs[i].node().copyAttributes(float_node) | ||
|
||
model.apply(patch_float) | ||
patch_float(model.encode_image) | ||
patch_float(model.encode_text) | ||
|
||
model.float() | ||
|
||
return model, _transform(model.input_resolution.item()) | ||
|
||
|
||
def tokenize(texts: Union[str, List[str]], context_length: int = 77, truncate: bool = False) -> Union[torch.IntTensor, torch.LongTensor]: | ||
""" | ||
Returns the tokenized representation of given input string(s) | ||
Parameters | ||
---------- | ||
texts : Union[str, List[str]] | ||
An input string or a list of input strings to tokenize | ||
context_length : int | ||
The context length to use; all CLIP models use 77 as the context length | ||
truncate: bool | ||
Whether to truncate the text in case its encoding is longer than the context length | ||
Returns | ||
------- | ||
A two-dimensional tensor containing the resulting tokens, shape = [number of input strings, context_length]. | ||
We return LongTensor when torch version is <1.8.0, since older index_select requires indices to be long. | ||
""" | ||
if isinstance(texts, str): | ||
texts = [texts] | ||
|
||
sot_token = _tokenizer.encoder["<|startoftext|>"] | ||
eot_token = _tokenizer.encoder["<|endoftext|>"] | ||
all_tokens = [[sot_token] + _tokenizer.encode(text) + [eot_token] for text in texts] | ||
if packaging.version.parse(torch.__version__) < packaging.version.parse("1.8.0"): | ||
result = torch.zeros(len(all_tokens), context_length, dtype=torch.long) | ||
else: | ||
result = torch.zeros(len(all_tokens), context_length, dtype=torch.int) | ||
|
||
for i, tokens in enumerate(all_tokens): | ||
if len(tokens) > context_length: | ||
if truncate: | ||
tokens = tokens[:context_length] | ||
tokens[-1] = eot_token | ||
else: | ||
raise RuntimeError(f"Input {texts[i]} is too long for context length {context_length}") | ||
result[i, :len(tokens)] = torch.tensor(tokens) | ||
|
||
return result |
Oops, something went wrong.