- 微信
- 复制链接
  
  复制链接到剪贴板

ai编辑神奇draggan火了：拯救手残党，拖拽实现精准p图-yb体育官方

码上开花_lancer 发表于 2023/07/05 12:00:06 2023/07/05

【摘要】 ai编辑神奇draggan火了：拯救手残党，拖拽实现精准p图🔹 本案例需使用 pytorch-1.8 gpu-p100 及以上规格运行🔹 点击run in modelarts，将会进入到modelarts codelab中，这时需要你登录华为云账号，如果没有账号，则需要注册一个，且要进行实名认证，参考《modelarts准备工作_简易版》即可完成账号注册和实名认证。登录之后，等待片...

🔹 本案例需使用 pytorch-1.8 gpu-p100 及以上规格运行

🔹 点击run in modelarts，将会进入到modelarts codelab中，这时需要你登录华为云账号，如果没有账号，则需要注册一个，且要进行实名认证，参考即可完成账号注册和实名认证。登录之后，等待片刻，即可进入到codelab的运行环境

🔹 出现 out of memory ，请检查是否为您的参数配置过高导致，修改参数配置，重启kernel或更换更高规格资源进行规避❗❗❗

draggan是由谷歌、麻省理工学院和马克斯普朗克研究所创建的一种新的人工智能模型。可以让你轻松通过点击拖动等简单的交互操作就能改变拍摄对象的姿势、形状和表情等。

dragdiffusion 进入了人们的视线。此前的 draggan 实现了基于点的交互式图像编辑，并取得像素级精度的编辑效果。但是也有不足，draggan 是基于生成对抗网络（gan），通用性会受到预训练 gan 模型容量的限制。

在新研究中，新加坡国立大学和字节跳动的几位研究者将这类编辑框架扩展到了扩散模型，提出了 dragdiffusion。他们利用大规模预训练扩散模型，极大提升了基于点的交互式编辑在现实世界场景中的适用性。

虽然现在大多数基于扩散的图像编辑方法都适用于文本嵌入，但 dragdiffusion 优化了扩散潜在表示，实现了精确的空间控制。

研究者表示，扩散模型以迭代方式生成图像，而「一步」优化扩散潜在表示足以生成连贯结果，使 dragdiffusion 高效完成了高质量编辑。

他们在各种具有挑战性的场景（如多对象、不同对象类别）下进行了广泛实验，验证了 dragdiffusion 的可塑性和通用性。相关代码也将很快放出、

下面是用draggan生成的几幅图，一起来看看吧👀

下面我们看看 dragdiffusion 效果如何。

首先，我们想让下图中的小猫咪的头再抬高一点，用户只需将红色的点拖拽至蓝色的点就可以了：

接下来，我们想让山峰变得再高一点，也没有问题，拖拽红色关键点就可以了：

还想让雕塑的头像转个头，拖拽一下就能办到：

让岸边的花，开的范围更广一点：

draggan是通过生成图像的3d模型来改变图片的，之后该模型可进行编辑。用户可以在不影响图像其余部分的情况下调整图像中物品的位置、形状、情感和布局。draggan首先使用卷积神经网络（cnn）从图像中提取特征。然后利用这些特征生成图像的3d表示。接着使用第二个cnn来修改3d模型。该cnn是使用已被人类修改过的图像数据集进行训练的。修改过的照片用于教导cnn如何修改3d模型。一旦训练完成，cnn就可以用于修改任何图片。

要使用draggan ai工具进行照片编辑，请按照以下步骤操作：

1.上传您想要修改的图像。

2.拖动图像中的点到所需的位置。

3.放开点后，draggan将自动修改图像以匹配您的修改。

4.继续拖动点并根据需要调整图像。

5.编辑完成后，单击“保存”按钮以保存更新后的图像。

以下是使用modelarts 上的notebook实现的部分核心代码，仅供研究参考。

1.环境设置

check gpu & 拷贝代码及数据

!nvidia-smi
import os
import moxing as mox
parent = "/home/ma-user/work/draggan"
bfp = "/home/ma-user/work/draggan/openai/clip-vit-large-patch14/pytorch_model.bin"
sfp = "/home/ma-user/work/draggan/models/draggan_sd15_scribble.pth"
if not os.path.exists(parent):
    mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/case_zoo/scribble2img/draggan',parent)
    if os.path.exists(parent):
        print('download success')
    else:
        raise exception('download failed')
elif os.path.exists(bfp)==false or os.path.getsize(bfp)!=1710671599: 
    mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/case_zoo/scribble2img/draggan/openai/clip-vit-large-patch14/pytorch_model.bin', bfp)
elif os.path.exists(sfp)==false or os.path.getsize(sfp)!=5710757851: 
    mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/case_zoo/scribble2img/draggan/models/draggan_sd15_scribble.pth', sfp)
else:
    print("model package already exists!")

tue apr 25 15:49:25 2023       
 ----------------------------------------------------------------------------- 
| nvidia-smi 440.33.01    driver version: 440.33.01    cuda version: 10.2     |
|------------------------------- ---------------------- ---------------------- 
| gpu  name        persistence-m| bus-id        disp.a | volatile uncorr. ecc |
| fan  temp  perf  pwr:usage/cap|         memory-usage | gpu-util  compute m. |
|=============================== ====================== ======================|
|   0  tesla p100-pcie...  off  | 00000000:00:0e.0 off |                    0 |
| n/a   27c    p0    25w / 250w |      0mib / 16280mib |      0%      default |
 ------------------------------- ---------------------- ---------------------- 
                                                                               
 ----------------------------------------------------------------------------- 
| processes:                                                       gpu memory |
|  gpu       pid   type   process name                             usage      |
|=============================================================================|
|  no running processes found                                                 |
 ----------------------------------------------------------------------------- 
info:root:using moxing-v2.1.0.5d9c87c8-5d9c87c8
info:root:using obs-python-sdk-3.20.9.1
model package already exists!

安装库，大约耗时1min，请耐心等待。

%cd /home/ma-user/work/draggan
!pip uninstall torch torchtext -y
!pip install torch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 
!pip install omegaconf==2.1.1 einops==0.3.0
!pip install pytorch-lightning==1.5.0
!pip install transformers==4.19.2 open_clip_torch==2.0.2
!pip install gradio==3.32.0
!pip install translate==3.6.1

/home/ma-user/work/controlnet
found existing installation: torch 1.8.0
uninstalling torch-1.8.0:
  successfully uninstalled torch-1.8.0
found existing installation: torchtext 0.5.0
uninstalling torchtext-0.5.0:
  successfully uninstalled torchtext-0.5.0
looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple
collecting torch==1.12.1
  downloading http://repo.myhuaweicloud.com/repository/pypi/packages/b9/af/23c13cd340cd333f42de225ba3da3b64e1a70425546d1a59bfa42d465a5d/torch-1.12.1-cp37-cp37m-manylinux1_x86_64.whl (776.3 mb)

2. 加载模型

导包并加载模型，加载约40s，请耐心等待。

import numpy as np
from pil import image as pilimage
import cv2
import einops
import matplotlib.pyplot as plt
from ipython.display import html, image
from base64 import b64decode
from translate import translator
import torch
from pytorch_lightning import seed_everything
import config
from draggan.model import create_model, load_state_dict
from ldm.models.diffusion.ddim import ddimsampler
from annotator.util import resize_image, hwc3
model = create_model('./models/draggan_v15.yaml')
model.load_state_dict(load_state_dict('./models/draggan_sd15_scribble.pth', location='cuda'))
model = model.cuda()
ddim_sampler = ddimsampler(model)

info:matplotlib.font_manager:generated new fontmanager
/home/ma-user/anaconda3/envs/pytorch-1.8/lib/python3.7/site-packages/requests/__init__.py:104: requestsdependencywarning: urllib3 (1.26.12) or chardet (5.1.0)/charset_normalizer (2.0.12) doesn't match a supported version!
  requestsdependencywarning)
info:torch.distributed.nn.jit.instantiator:created a temporary directory at /tmp/tmp383fe77g
info:torch.distributed.nn.jit.instantiator:writing /tmp/tmp383fe77g/_remote_module_non_scriptable.py
no module 'xformers'. proceeding without it.
controlldm: running in eps-prediction mode
diffusionwrapper has 859.52 m params.
making attention of type 'vanilla' with 512 in_channels
working with z of shape (1, 4, 32, 32) = 4096 dimensions.

3. 点击调节参数生成图像

调节参数生成图像函数

with torch.no_grad():
    if type(input_image) is str:
        input_image = np.array(pilimage.open(input_image))
        img = resize_image(hwc3(input_image), image_resolution)
    else:
        img = resize_image(hwc3(input_image['mask'][:, :, 0]), image_resolution)  # scribble
    h, w, c = img.shape
    # initialize detection map
    detected_map = np.zeros_like(img, dtype=np.uint8)
    detected_map[np.min(img, axis=2) > 127] = 255
    control = torch.from_numpy(detected_map.copy()).float().cuda() / 255.0
    control = torch.stack([control for _ in range(num_samples)], dim=0)
    control = einops.rearrange(control, 'b h w c -> b c h w').clone()
    # set random seed
    if seed == -1:
        seed = random.randint(0, 65535)
    seed_everything(seed)
    if config.save_memory:
        model.low_vram_shift(is_diffusing=false)
    cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([prompt  ', '  a_prompt] * num_samples)]}
    un_cond = {"c_concat": none if guess_mode else [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
    shape = (4, h // 8, w // 8)
    if config.save_memory:
        model.low_vram_shift(is_diffusing=true)
    # sampling
    model.control_scales = [strength * (0.825 ** float(12 - i)) for i in range(13)] if guess_mode else ([strength] * 13)  # magic number. 
    samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,
                                                 shape, cond, verbose=false, eta=eta,
                                                 unconditional_guidance_scale=scale,
                                                 unconditional_conditioning=un_cond)
    if config.save_memory:
        model.low_vram_shift(is_diffusing=false)
    # post-processing
    x_samples = model.decode_first_stage(samples)
    x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5  127.5).cpu().numpy().clip(0, 255).astype(np.uint8)
    results = [x_samples[i] for i in range(num_samples)]
return [255 - detected_map]  results

4.设置参数，生成图像

➡ 参数说明

🔸 模型：目前提供了10个模型（在web界面选择后会自动下载），不同模型输出图片分辨率，和对显存要求不一样。选择与上传图像最接近的模型。例如，为人脸选择。代表猫。

🔸 最大迭代步数：有些比较困难的拖拽，需要增大迭代次数，当然简单的也可以减少。

🔸 设置拖拽点对：模型会将蓝色的点拖拽到红色点位置。记住需要在 setup handle points 设置拖拽点对。

🔸 设置可变化区域（可选）：这部分是可选的，你只需要设置拖拽点对就可以正常允许。如果你想的话，你可以在 draw a mask 这个面板画出你允许模型改变的区域。注意这是一个软约束，即使你加了这个mask，模型还是有可能会改变超出许可范围的区域。

5. gradio可视化部署

如果想进行可视化部署，可以继续以下步骤: gradio应用启动后可在下方页面进行点击生成图像，您也可以分享public url在手机端，pc端进行访问生成图像。

📢draggan扩展说明

🟪 模型：为人脸选择 , 代表猫。

🟪 最大迭代步数：:20

🟪 设置拖拽点对: 记住需要在 setup handle points 设置拖拽点对。

🟫 设置可变化区域（可选）：这部分是可选的，你只需要设置拖拽点对就可以正常允许。如果你想的话，你可以在 draw a mask 这个面板画出你允许模型改变的区域。注意这是一个软约束，即使你加了这个mask，模型还是有可能会改变超出许可范围的区域。

请注意：图像生成消耗显存，您可以在左侧操作栏查看您的实时资源使用情况，点击gpu显存使用率即可查看，当显存不足时，您生成图像可能会报错，此时，您可以通过重启kernel的方式重置，然后重头运行即可规避。或更换更高规格的资源

改变参数以后

import gradio as gr
# function to create canvas
def create_canvas(w, h):
    img = np.zeros(shape=(h-2, w-2, 3), dtype=np.uint8)  255
    im = cv2.copymakeborder(img,1,1,1,1,cv2.border_constant)
    return im
block = gr.blocks().queue()
with block:
    with gr.row():
        gr.markdown("## 🎨scribble to image")
    with gr.row():
        with gr.column():
            canvas_width = gr.slider(label="canvas width", minimum=256, maximum=1024, value=512, step=1)
            canvas_height = gr.slider(label="canvas height", minimum=256, maximum=1024, value=512, step=1)
            create_button = gr.button(label="start", value='create canvas!')
            gr.markdown(value='click the little pencil icon below to change your brush width to make it finer (gradio does not allow developers to set brush width, so it needs to be set manually) ')
            input_image = gr.image(source='upload', type='numpy', tool='sketch')
            create_button.click(fn=create_canvas, inputs=[canvas_width, canvas_height], outputs=[input_image])
            prompt = gr.textbox(label="prompt")
            run_button = gr.button(label="run")
            with gr.accordion("advanced options", open=false):
                num_samples = gr.slider(label="images", minimum=1, maximum=3, value=1, step=1)
                image_resolution = gr.slider(label="image resolution", minimum=256, maximum=768, value=512, step=64)
                strength = gr.slider(label="control strength", minimum=0.0, maximum=2.0, value=1.0, step=0.01)
                ddim_steps = gr.slider(label="steps", minimum=1, maximum=30, value=20, step=1)
                scale = gr.slider(label="guidance scale", minimum=0.1, maximum=30.0, value=9.0, step=0.1)
                seed = gr.slider(label="seed", minimum=-1, maximum=2147483647, step=1, randomize=true)
                eta = gr.number(label="eta (ddim)", value=0.0)
                a_prompt = gr.textbox(label="added prompt", value='best quality, very detailed')
                n_prompt = gr.textbox(label="negative prompt",
                                      value='cropped, worst quality, low quality')
        with gr.column():
            result_gallery = gr.gallery(label='output', show_label=false, elem_id="gallery").style(grid=2, height='auto')
    ips = [input_image, prompt, a_prompt, n_prompt, num_samples, image_resolution, ddim_steps, strength, scale, seed, eta]
    run_button.click(fn=process, inputs=ips, outputs=[result_gallery])
block.launch(share=true)

info:botocore.vendored.requests.packages.urllib3.connectionpool:starting new http connection (1): proxy.modelarts.com
info:botocore.vendored.requests.packages.urllib3.connectionpool:starting new https connection (1): www.huaweicloud.com
running on local url:  http://127.0.0.1:7861
running on public url: https://d6ad282fad59a417d6.gradio.live
this share link expires in 72 hours. for free permanent hosting and gpu upgrades (new!), check out spaces: https://huggingface.co/spaces

参考

论文地址：

github 地址：

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000 优质创作者共同成长

立即加入

请填写举报理由

垃圾广告违规内容恶意灌水侮辱谩骂内容侵权其它

请输入举报理由，不超过200字

请填写举报理由

0/200

ai编辑神奇draggan火了：拯救手残党，拖拽实现精准p图-yb体育官方

draggan是由谷歌、麻省理工学院和马克斯普朗克研究所创建的一种新的人工智能模型。可以让你轻松通过点击拖动等简单的交互操作就能改变拍摄对象的姿势、形状和表情等。

dragdiffusion 进入了人们的视线。此前的 draggan 实现了基于点的交互式图像编辑，并取得像素级精度的编辑效果。但是也有不足，draggan 是基于生成对抗网络（gan），通用性会受到预训练 gan 模型容量的限制。

在新研究中，新加坡国立大学和字节跳动的几位研究者将这类编辑框架扩展到了扩散模型，提出了 dragdiffusion。他们利用大规模预训练扩散模型，极大提升了基于点的交互式编辑在现实世界场景中的适用性。

虽然现在大多数基于扩散的图像编辑方法都适用于文本嵌入，但 dragdiffusion 优化了扩散潜在表示，实现了精确的空间控制。

下面是用draggan生成的几幅图，一起来看看吧👀

要使用draggan ai工具进行照片编辑，请按照以下步骤操作：

1.上传您想要修改的图像。

2.拖动图像中的点到所需的位置。

3.放开点后，draggan将自动修改图像以匹配您的修改。

4.继续拖动点并根据需要调整图像。

5.编辑完成后，单击“保存”按钮以保存更新后的图像。

以下是使用modelarts 上的notebook实现的部分核心代码，仅供研究参考。

1.环境设置

2. 加载模型

3. 点击调节参数生成图像

4.设置参数，生成图像

5. gradio可视化部署

📢draggan扩展说明

改变参数以后

参考

论文地址：

github 地址：

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

ai编辑神奇draggan火了：拯救手残党，拖拽实现精准p图-yb体育官方

draggan是由谷歌、麻省理工学院和马克斯普朗克研究所创建的一种新的人工智能模型。可以让你轻松通过点击拖动等简单的交互操作就能改变拍摄对象的姿势、形状和表情等。

dragdiffusion 进入了人们的视线。此前的 draggan 实现了基于点的交互式图像编辑，并取得像素级精度的编辑效果。但是也有不足，draggan 是基于生成对抗网络（gan），通用性会受到预训练 gan 模型容量的限制。

在新研究中，新加坡国立大学和字节跳动的几位研究者将这类编辑框架扩展到了扩散模型，提出了 dragdiffusion。他们利用大规模预训练扩散模型，极大提升了基于点的交互式编辑在现实世界场景中的适用性。

虽然现在大多数基于扩散的图像编辑方法都适用于文本嵌入，但 dragdiffusion 优化了扩散潜在表示，实现了精确的空间控制。

下面是用draggan生成的几幅图，一起来看看吧👀

要使用draggan ai工具进行照片编辑，请按照以下步骤操作：

1.上传您想要修改的图像。

2.拖动图像中的点到所需的位置。

3.放开点后，draggan将自动修改图像以匹配您的修改。

4.继续拖动点并根据需要调整图像。

5.编辑完成后，单击“保存”按钮以保存更新后的图像。

以下是使用modelarts 上的notebook实现的部分核心代码，仅供研究参考。

1.环境设置

2. 加载模型

3. 点击调节参数生成图像

4.设置参数，生成图像

5. gradio可视化部署

📢draggan扩展说明

改变参数以后

参考

论文地址：

github 地址：

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

推荐阅读

相关产品