编辑“︁Stable Diffusion”︁

{{NoteTA
| G1 = IT
| G2 = 地名
| 1 = zh-cn:高斯噪声;zh-tw:高斯雜訊;
}}
{{Infobox software
| name = Stable Diffusion
| screenshot = Astronaut Riding a Horse (SD3.5).webp
| screenshot size = 250px
| caption = 由Stable Diffusion根据文本提示“a photograph of an astronaut riding a horse”生成的图像
| author = Runway、CompVis、Stability AI
| developer = Stability AI
| released = 2022年8月22日
| repo =
| programming language = [[Python]]
| operating system = 任何支持[[CUDA]][[內核函數]]的操作系統
| genre = [[文本到图像生成模型]]
| website =
}}
'''Stable Diffusion'''是2022年發布的[[深度學習]][[文本到图像生成模型]]。它主要用於根據文本的描述產生詳細圖像，儘管它也可以應用於其他任務，如[[圖像修復|內補繪製]]、外補繪製，以及在[[提示工程|提示詞]]指導下產生圖生圖的转变。<ref>{{Cite web |title=Diffuse The Rest - a Hugging Face Space by huggingface |url=https://huggingface.co/spaces/huggingface/diffuse-the-rest |access-date=2022-09-05 |website=huggingface.co |archive-date=2022-09-05 |archive-url=https://web.archive.org/web/20220905141431/https://huggingface.co/spaces/huggingface/diffuse-the-rest |url-status=live|language=en}}</ref>

它是一種{{en-link|潛在變量模型|Latent variable model|潛在}}[[擴散模型]]，由[[慕尼黑大學]]的CompVis研究團體開發的各種生成性[[人工神經網絡]]之一。<ref name="paper"/>它是由[[初創公司]]StabilityAI、CompVis與Runway合作開發，並得到{{en-link|EleutherAI}}和{{en-link|LAION}}的支持。{{r|stable-diffusion-launch|stable-diffusion-github}}<ref>{{cite web |title=Revolutionizing image generation by AI: Turning text into images |url=https://www.lmu.de/en/newsroom/news-overview/news/revolutionizing-image-generation-by-ai-turning-text-into-images.html |website=LMU Munich |access-date=2022-09-17 |language=en |archive-date=2022-09-17 |archive-url=https://web.archive.org/web/20220917200820/https://www.lmu.de/en/newsroom/news-overview/news/revolutionizing-image-generation-by-ai-turning-text-into-images.html |dead-url=no }}</ref> 截至2022年10月，StabilityAI籌集了1.01億美元的資金。<ref>{{Cite web |last=Wiggers |first=Kyle |title=Stability AI, the startup behind Stable Diffusion, raises $101M |url=https://techcrunch.com/2022/10/17/stability-ai-the-startup-behind-stable-diffusion-raises-101m/ |access-date=2022-10-17 |website=Techcrunch |language=en |archive-date=2022-10-17 |archive-url=https://web.archive.org/web/20221017170503/https://techcrunch.com/2022/10/17/stability-ai-the-startup-behind-stable-diffusion-raises-101m/ |dead-url=no }}</ref>

Stable Diffusion的源代碼和模型權重已分别公開發布在[[GitHub]]和[[Hugging Face]]，可以在大多數配備有適度[[圖形處理器|GPU]]的電腦硬件上運行。而以前的專有文生圖模型（如[[DALL-E]]和[[Midjourney]]）只能通過[[雲端運算]]服務訪問。<ref name="pcworld">{{cite web |title=The new killer app: Creating AI art will absolutely crush your PC |url=https://www.pcworld.com/article/916785/creating-ai-art-local-pc-stable-diffusion.html |access-date=2022-08-31 |website=PCWorld |archive-date=2022-08-31 |archive-url=https://web.archive.org/web/20220831065139/https://www.pcworld.com/article/916785/creating-ai-art-local-pc-stable-diffusion.html |url-status=live|language=en}}</ref><ref name="verge"/>

== 技術架構 ==
[[File:Stable Diffusion architecture.png|thumb|upright=1.3|Stable Diffusion使用的潛在擴散結構圖。]]
[[File:X-Y plot of algorithmically-generated AI art of European-style castle in Japan demonstrating DDIM diffusion steps.png|thumb|300px|擴散模型所用的去噪過程。]]
Stable Diffusion是一種[[擴散模型]]（diffusion model）的變體，叫做「潛在擴散模型」（latent diffusion model; LDM）。擴散模型是在2015年推出的，其目的是消除對訓練圖像的連續應用[[高斯噪聲]]，可以將其視為一系列去噪[[自編碼器]]。Stable Diffusion由3個部分組成：[[变分自编码器]]（VAE）、[[U-Net]]和一個文本編碼器。與其學習去噪圖像數據（在「像素空間」中），而是訓練VAE將圖像轉換為低維[[潜空间 (机器学习)|潜在空间]]。添加和去除高斯噪聲的過程被應用於這個潛在表示，然後將最終的去噪輸出解碼到像素空間中。在前向擴散過程中，高斯噪聲被迭代地應用於壓縮的潜在表徵。每個去噪步驟都由一個包含[[殘差神經網絡|ResNet]]骨干的U-Net架構完成，通過從前向擴散往反方向去噪而獲得潜在表徵。最後，VAE解碼器通過將表徵轉換回像素空間來生成輸出圖像。研究人員指出，降低訓練和生成的計算要求是LDM的一個優勢。{{r|stable-diffusion-launch|paper}}

去噪步驟可以以文本串、圖像或一些其他數據為條件。調節數據的編碼通過[[注意力機制|交叉注意機制]]（cross-attention mechanism）暴露給去噪U-Net的架構。為了對文本進行調節，一個預訓練的固定CLIP ViT-L/14文本編碼器被用來將提示詞轉化為嵌入空間。{{r|paper}}<ref name="stable-diffusion-github"/>

== 用法 ==
Stable Diffusion模型支持通過使用提示詞來產生新的圖像，描述要包含或省略的元素，<ref name="stable-diffusion-github"/>以及重新繪製現有的圖像，其中包含提示詞中描述的新元素（該過程通常被稱為「指導性圖像合成」（guided image synthesis）<ref>{{cite journal|date=2021-08-02|first1=Chenlin|last1=Meng|first2=Yutong|last2=He|first3=Yang|last3=Song|first4=Jiaming|last4=Song|first5=Jiajun|last5=Wu|first6=Jun-Yan|last6=Zhu|first7=Stefano|last7=Ermon|title=SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations|publisher=arXiv|journal=arXiv|url=https://arxiv.org/abs/2108.01073|doi=10.48550/arXiv.2108.01073|language=en|access-date=2022-10-10|archive-date=2022-12-09|archive-url=https://web.archive.org/web/20221209012934/https://arxiv.org/abs/2108.01073|dead-url=no}}</ref>）通過使用模型的擴散去噪機制（diffusion-denoising mechanism）。<ref name="stable-diffusion-github"/> 此外，該模型還允許通過提示詞在現有的圖中進行內補繪製和外補繪製來部分更改，當與支持這種功能的用戶界面使用時，其中存在許多不同的[[開源軟件]]。<ref name="webui_showcase">{{cite web|url=https://github.com/AUTOMATIC1111/stable-diffusion-webui-feature-showcase|title=Stable Diffusion web UI|website=GitHub|language=en|access-date=2022-10-10|archive-date=2023-01-20|archive-url=https://web.archive.org/web/20230120032734/https://github.com/AUTOMATIC1111/stable-diffusion-webui-feature-showcase|dead-url=no}}</ref>

Stable Diffusion建議在10GB以上的[[显存]]下運行，
但是显存較少的用戶可以選擇以[[半精度浮點數|float16]]的精度加載權重，而不是默認的[[單精度浮點數|float32]]，以降低显存使用率。<ref name="diffusers"/>

=== 文生圖 ===
{{multiple image
 | direction = vertical
 | align = right
 | total_width = 200
 | image1 = Algorithmically-generated landscape artwork of forest with Shinto shrine.png
 | image2 = Algorithmically-generated landscape artwork of forest with Shinto shrine using negative prompt for green trees.png
 | image3 = Algorithmically-generated landscape artwork of forest with Shinto shrine using negative prompt for round stones.png
 | footer = 演示反向提示詞對圖像生成的影響。
*'''上''': 無反向提示詞
*'''中''': "綠樹"
*'''下''': "圓形石頭"
}}

Stable Diffusion中的文生圖採樣腳本，稱為"txt2img"，接受一個提示詞，以及包括採樣器（sampling type），圖像尺寸，和[[隨機種子]]的各種選項參數，並根據模型對提示的解釋生成一個圖像文件。<ref name="stable-diffusion-github"/> 生成的圖像帶有不可見的[[數位浮水印]]標籤，以允許用戶識別由Stable Diffusion生成的圖像，<ref name="stable-diffusion-github"/>儘管如果圖像被調整大小或旋轉，該水印將失去其有效性。<ref>{{cite web|url=https://github.com/ShieldMnt/invisible-watermark/blob/main/README.md|title=invisible-watermark README.md|website=GitHub|language=en|access-date=2022-10-10|archive-date=2022-09-29|archive-url=https://web.archive.org/web/20220929054846/https://github.com/ShieldMnt/invisible-watermark/blob/main/README.md|dead-url=no}}</ref> Stable Diffusion模型是在由512×512分辨率圖像組成的數據集上訓練出來的，<ref name="stable-diffusion-github"/>{{r|Waxy}}這意味著txt2img生成圖像的最佳配置也是以512×512的分辨率生成的，偏離這個大小會導致生成輸出質量差。<ref name="diffusers">{{cite web|date=2022-08-22|url=https://huggingface.co/blog/stable_diffusion|title=Stable Diffusion with 🧨 Diffusers|website=Hugging Face official blog|language=en|access-date=2022-10-10|archive-date=2023-01-17|archive-url=https://web.archive.org/web/20230117222142/https://huggingface.co/blog/stable_diffusion|dead-url=no}}</ref> Stable Diffusion 2.0版本後來引入了以768×768分辨率圖像生成的能力。<ref name="release2.0"/>

每一個txt2img的生成過程都會涉及到一個影響到生成圖像的隨機種子；用戶可以選擇隨機化種子以探索不同生成結果，或者使用相同的種子來獲得與之前生成的圖像相同的結果。<ref name="diffusers"/> 用戶還可以調整採樣迭代步數（inference steps）；較高的值需要較長的運行時間，但較小的值可能會導致視覺缺陷。<ref name="diffusers"/> 另一個可配置的選項，即無分類指導比例值，允許用戶調整提示詞的相關性（classifier-free guidance scale value）；<ref>{{cite journal|date=2022-07-26|first1=Jonathan|last1=Ho|first2=Tim|last2=Salimans|title=Classifier-Free Diffusion Guidance|publisher=arXiv|journal=arXiv|url=https://arxiv.org/abs/2207.12598|doi=10.48550/arXiv.2207.12598|language=en|access-date=2022-10-10|archive-date=2023-01-03|archive-url=https://web.archive.org/web/20230103042523/https://arxiv.org/abs/2207.12598|dead-url=no}}</ref>更具實驗性或創造性的用例可以選擇較低的值，而旨在獲得更具體輸出的用例可以使用較高的值。<ref name="diffusers"/>

反向提示詞（negative prompt）是包含在Stable Diffusion的一些用戶界面軟件中的一個功能（包括StabilityAI自己的「Dreamstudio」[[雲端運算|雲端]][[軟件即服務]]模式[[訂閱|訂閱制]]服務），它允許用戶指定模型在圖像生成過程中應該避免的提示，適用於由於用戶提供的普通提示詞，或者由於模型最初的訓練，造成圖像輸出中出現不良的圖像特徵，例如畸形手脚。<ref name="webui_showcase"/><ref name="release2.1">{{cite web|url=https://stability.ai/blog/stablediffusion2-1-release7-dec-2022|title=Stable Diffusion v2.1 and DreamStudio Updates 7-Dec 22|website=stability.ai|language=en|archive-date=2022-12-10|archive-url=https://web.archive.org/web/20221210062732/https://stability.ai/blog/stablediffusion2-1-release7-dec-2022|url-status=no|access-date=2022-12-11}}</ref> 與使用強調符（emphasis marker）相比，使用反向提示詞在降低生成不良的圖像的頻率方面具有高度統計顯著的效果；強調符是另一種為提示的部分增加權重的方法，被一些Stable Diffusion的開源實現所利用，在關鍵詞中加入括號以增加或減少強調。<ref>{{cite web|url=https://github.com/JohannesGaessler/stable-diffusion-tools/tree/master/emphasis|date=2022-09-11|author=Johannes Gaessler|title=Emphasis|website=GitHub|language=en|access-date=2022-10-10|archive-date=2022-12-09|archive-url=https://web.archive.org/web/20221209053625/https://github.com/JohannesGaessler/stable-diffusion-tools/tree/master/emphasis|dead-url=no}}</ref>
{{Gallery
     | height = 300
     | width = 640
     | File:X-Y plot of algorithmically-generated AI art by different science-fiction subgenres.png|演示當指示繪製同一主題時，不同的提示詞如何影響由Stable Diffusion模型產生的圖像輸出。每一列代表輸入到模型中的不同提示詞。左到右：[[賽博朋克]]，[[蒸汽朋克]]，[[柴油朋克]]，[[生物朋克]]，{{tsl|en|Cyberpunk derivatives#Cassette futurism/Formicapunk|磁帶朋克}}，[[:wikt:en:atompunk|原子朋克]]，[[:wikt:en:cyberpop|賽博POP]]，[[哥德次文化]]，[[奇幻作品|奇幻]]
}}

=== 圖生圖 ===
{{Multiple image
| direction         = horizontal
| align             = right
| total_width       = 400
| image1            = NightCitySphere (SD1.5).jpg
| image2            = NightCitySphere (SDXL).jpg
| footer            = 演示img2img修改
*'''左''': 最初用Stable Diffusion 1.5制作的图像
*'''右''': 用Stable Diffusion XL 1.0修改后的图像
}}
Stable Diffusion包括另一個取樣腳本，稱為"img2img"，它接受一個提示詞、現有圖像的文件路徑和0.0到1.0之間的去噪強度，並在原始圖像的基礎上產生一個新的圖像，該圖像也具有提示詞中提供的元素；去噪強度表示添加到輸出圖像的噪聲量，值越大，圖像變化越多，但在語義上可能與提供的提示不一致。<ref name="stable-diffusion-github"/> 圖像升頻是img2img的一個潛在用例，除此之外。<ref name="stable-diffusion-github"/>

2022年11月24日發布的Stable Diffusion 2.0版本包含一個深度引導模型，稱為"depth2img"，該模型推斷所提供的輸入圖像的{{en-link|深度貼圖|Depth map|深度}}，並根據提示詞和深度信息生成新圖像，在新圖像中保持原始圖像的連貫性和深度。<ref name="release2.0"/>

==== 內補繪製與外補繪製 ====
Stable Diffusion模型的許多不同用戶界面軟件提供了通過img2img進行圖生圖的其他用例。內補繪製（inpainting）由用戶提供的{{en-link|蒙版|Layers (digital image editing)#Layer mask}}描繪的現有圖像的一部分，根據所提供的提示詞，用新生成的內容填充蒙版的空間。<ref name="webui_showcase"/> 隨著Stable Diffusion 2.0版本的發布，StabilityAI同時創建了一個專門針對內補繪製用例的專用模型。<ref name="release2.0">{{cite web|url=https://stability.ai/blog/stable-diffusion-v2-release|title=Stable Diffusion 2.0 Release|website=stability.ai|language=en|archive-date=2022-12-10|archive-url=https://web.archive.org/web/20221210062729/https://stability.ai/blog/stable-diffusion-v2-release|url-status=no|access-date=2022-12-11}}</ref> 相反，外補繪製（outpainting）將圖像擴展到其原始尺寸之外，用根據所提供的提示詞生成的內容來填補以前的空白空間。<ref name="webui_showcase"/>
{{multiple image
 | direction = horizontal
 | align = none
 | total_width = 500
 | image1 = Demonstration of inpainting and outpainting using Stable Diffusion (step 1 of 4).png
 | width1 = 125
 | height1 = 218 
 | caption1 = '''第一步：''' 使用txt2img生成新圖像。巧合的是，它無意中生成了這個缺少一隻手臂的人。
 | image2 = Demonstration of inpainting and outpainting using Stable Diffusion (step 2 of 4).png
 | width2 = 125
 | caption2 = '''第二步：''' 通過外補繪製，圖像底部被擴展了512像素，並被AI生成的內容所填充。
 | image3 = Demonstration of inpainting and outpainting using Stable Diffusion (step 3 of 4).png
 | width3 = 125
 | caption3 = '''第三步：''' 在準備內補繪製時，使用[[GIMP]]中的畫筆繪製了一個臨時的手臂。
 | image4 = Demonstration of inpainting and outpainting using Stable Diffusion (step 4 of 4).png
 | width4 = 125
 | caption4 = '''第四步：''' 在臨時手臂上應用內補繪製蒙版，img2img生成一個新手臂，同時保持圖像的其餘部分保持不變。
 | header = 在Stable Diffusion中使用img2img的內補繪製與外補繪製技術的演示
}}

== 許可證 ==
與[[DALL-E]]等模型不同，Stable Diffusion[[看源軟件|提供其源代碼]]<ref name="stability">{{cite web|url=https://stability.ai/blog/stable-diffusion-public-release|title=Stable Diffusion Public Release|website=Stability.Ai|access-date=2022-08-31|archive-date=2022-08-30|archive-url=https://web.archive.org/web/20220830210535/https://stability.ai/blog/stable-diffusion-public-release|url-status=live|language=en}}</ref><ref name="stable-diffusion-github" />以及預訓練的權重。其許可證禁止某些使用案例，包括犯罪，[[誹謗]]，[[騷擾]]，[[人肉搜索]]，「剝削…未成年人」，提供醫療建議，自動創建法律義務，偽造法律證據，以及「基於…社會行為或…個人或人格特徵…或[[反歧視法|受法律保護的特徵或類別]]而歧視或傷害個人或群體」。<ref name="washingtonpost">{{cite news |date=2022-08-30 |title=Ready or not, mass video deepfakes are coming |newspaper=The Washington Post |url=https://www.washingtonpost.com/technology/2022/08/30/deep-fake-video-on-agt/ |url-status=live |access-date=2022-08-31 |archive-url=https://web.archive.org/web/20220831115010/https://www.washingtonpost.com/technology/2022/08/30/deep-fake-video-on-agt/ |archive-date=2022-08-31|language=en}}</ref><ref>{{Cite web |title=License - a Hugging Face Space by CompVis |url=https://huggingface.co/spaces/CompVis/stable-diffusion-license |access-date=2022-09-05 |website=huggingface.co |archive-date=2022-09-04 |archive-url=https://web.archive.org/web/20220904215616/https://huggingface.co/spaces/CompVis/stable-diffusion-license |url-status=live|language=en}}</ref> 用戶擁有其生成的圖像的權利，並可自由地將其用於商業用途。<ref>{{cite web|author=Katsuo Ishida|date=2022-08-26|url=https://forest.watch.impress.co.jp/docs/review/1434893.html|title=言葉で指示した画像を凄いAIが描き出す「Stable Diffusion」 ～画像は商用利用も可能|website=Impress Corporation|language=ja|access-date=2022-10-10|archive-date=2022-11-14|archive-url=https://web.archive.org/web/20221114020520/https://forest.watch.impress.co.jp/docs/review/1434893.html|dead-url=no}}</ref>

== 模型训练 ==
Stable Diffusion是在LAION-5B的圖片和標題對上訓練的，LAION-5B是一個公開的數據集，源自從網絡上[[網頁抓取|抓取]]的{{en-link|公用抓取|Common Crawl}}數據。該數據集由{{en-link|LAION}}創建，LAION是一家德國非營利組織，接受StabilityAI的資助。{{r|Waxy|MIT-LAION}} 該模型最初是在LAION-5B的一個大子集上訓練的，最後幾輪訓練是在「LAION-Aesthetics v2 5+」上進行的，這是一個由6億張帶標題的圖片組成的子集，人工智能預測人類在被要求對這些圖片的喜歡程度打分時至少會給5/10打分。{{r|Waxy|LAION-Aesthetics}} 這個最終的子集也排除了低分辨率的圖像和被人工智能識別為帶有[[水印]]的圖像。{{r|Waxy}} 對該模型的訓練數據進行的第三方分析發現，在從所使用的原始更廣泛的數據集中抽取的1200萬張圖片的較小子集中，大約47%的圖像樣本量來自100個不同的網站，其中[[Pinterest]]佔8.5%子集，其次是[[WordPress]]，[[Blogger|Blogspot]]，[[Flickr]]，[[DeviantArt]]和[[維基共享資源]]等網站。{{r|Waxy}}

該模型是在[[亞馬遜雲計算服務]]上使用256個[[安培微架構|NVIDIA A100]] GPU訓練，共花费15萬個GPU小時，成本為60萬美元。<ref>{{Cite web |last=Mostaque |first=Emad |date=2022-08-28 |title=Cost of construction |url=https://twitter.com/emostaque/status/1563870674111832066 |access-date=2022-09-06 |website=Twitter |language=en |archive-date=2022-09-06 |archive-url=https://web.archive.org/web/20220906155426/https://twitter.com/EMostaque/status/1563870674111832066 |url-status=live }}</ref><ref name="stable-diffusion-model-card-1-4">{{cite web|url=https://huggingface.co/CompVis/stable-diffusion-v1-4|title=Stable Diffusion v1-4 Model Card|website=huggingface.co|access-date=2022-09-20|url-status=no|language=en|archive-date=2023-01-11|archive-url=https://web.archive.org/web/20230111161920/https://huggingface.co/CompVis/stable-diffusion-v1-4}}</ref><ref name="techcrunch-model">{{cite web|url=https://techcrunch.com/2022/08/12/a-startup-wants-to-democratize-the-tech-behind-dall-e-2-consequences-be-damned/|title=This startup is setting a DALL-E 2-like AI free, consequences be damned|website=TechCrunch|access-date=2022-09-20|url-status=no|language=en|archive-date=2023-01-19|archive-url=https://web.archive.org/web/20230119005503/https://techcrunch.com/2022/08/12/a-startup-wants-to-democratize-the-tech-behind-dall-e-2-consequences-be-damned/}}</ref>

=== 終端用戶微調訓練 ===
為了糾正模型初始訓練的局限性，終端用戶可以選擇實施額外的訓練，以微調生成輸出以匹配更具體的使用情況。有三種方法可以讓用戶對Stable Diffusion模型權重存檔點進行微調：
* 「嵌入」（Embedding）可以從用戶提供的一些圖像被訓練出來，並允許模型在提示詞中使用嵌入的名稱時生成視覺上相似的圖像。<ref>{{cite web|author=Dave James|date=2022-10-28|url=https://www.pcgamer.com/nvidia-rtx-4090-stable-diffusion-training-aharon-kahana/|title=I thrashed the RTX 4090 for 8 hours straight training Stable Diffusion to paint like my uncle Hermann|website=[[PC Gamer]]|language=en|archive-url=https://web.archive.org/web/20221109154310/https://www.pcgamer.com/nvidia-rtx-4090-stable-diffusion-training-aharon-kahana/|archive-date=2022-11-09|url-status=no|access-date=2022-12-11}}</ref>嵌入是基於2022年[[臺拉維夫大學]]的研究人員在[[輝達]]的支持下開發的「文本倒置」（Textual Inversion）概念，其中模型的文本編碼器使用的特定標記的矢量表示與新的偽詞相關聯。嵌入可以用來減少原始模型中的偏差，或模仿風格。<ref>{{cite arXiv|first1=Rinon|last1=Gal|first2=Yuval|last2=Alaluf|first3=Yuval|last3=Atzmon|first4=Or|last4=Patashnik|first5=Amit H.|last5=Bermano|first6=Gal|last6=Chechik|first7=Daniel|last7=Cohen-Or|date=2022-08-02|title=An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion|class=cs.CV|eprint=2208.01618|language=en}}</ref>
* 「超網路」（Hypernetwork）是[[NovelAI]]軟件開發員Kurumuz在2021年創造的一種技術，最初用於調節文本生成的[[Transformer模型]]，它能讓Stable Diffusion衍生的文生圖模型模仿各種特定藝術家的風格，無論原始模型能否識別此藝術家，通過在較大的神經網路中的不同點應用一個預訓練的小神經網路。超網路將文生圖或圖生圖結果導向特定方向，例如加上藝術風格，當與一個較大的神經網絡結合使用時。它通過尋找重要的關鍵區域來處理圖像（例：眼睛，頭髮），然後在二級潛在空間中修補這些區域。超網路的一個缺點是它們的準確性相對較低，也有時會產生不可預知的結果。因此，超網路適用於加上視覺風格或清理人體瑕疵。<ref>{{cite web|date=2022-10-11|url=https://blog.novelai.net/novelai-improvements-on-stable-diffusion-e10d38db82ac|title=NovelAI Improvements on Stable Diffusion|website=NovelAI|language=en|archive-url=https://archive.today/20221027041603/https://blog.novelai.net/novelai-improvements-on-stable-diffusion-e10d38db82ac|archive-date=2022-10-27|url-status=live}}</ref>
:[[File:X-Y plot of algorithmically-generated AI art demonstrating Hypernetworks.png|thumb|none|500px|演示Stable Diffusion的「超網路」（Hypernetwork）技術。]]
* [[DreamBooth]]是一個深度學習模型，由[[Google|Google Research]]和[[波士頓大學]]的研究人員於2022年開發，可以微調模型以產生與指定主題相關的輸出圖像。<ref>{{cite web|author=山下裕毅|date=2022-09-01|url=https://www.itmedia.co.jp/news/articles/2209/01/news041.html|title=愛犬の合成画像を生成できるAI　文章で指示するだけでコスプレ　米Googleが開発|website=ITmedia Inc.|language=ja|archive-url=https://web.archive.org/web/20220831232021/https://www.itmedia.co.jp/news/articles/2209/01/news041.html|archive-date=2022-08-31|url-status=no|access-date=2022-12-11}}</ref>

== 发行 ==
{| class="wikitable"
|+
!版本号
!发行日期
!参数
!注释
|-
|1.1, 1.2, 1.3, 1.4<ref>{{Cite web |title=CompVis/stable-diffusion-v1-4 · Hugging Face |url=https://huggingface.co/CompVis/stable-diffusion-v1-4 |url-status=live |archive-url=https://web.archive.org/web/20230111161920/https://huggingface.co/CompVis/stable-diffusion-v1-4 |archive-date=2023-01-11 |access-date=2023-08-17 |website=huggingface.co}}</ref>
|2022年8月
|
|都由CompVis发行。没有版本1.0。1.1引发1.2，而1.2引发1.3和1.4二者<ref>{{Cite web |date=2023-08-23 |title=CompVis (CompVis) |url=https://huggingface.co/CompVis |access-date=2024-03-06 |website=huggingface.co |archive-date=2025-02-01 |archive-url=https://web.archive.org/web/20250201095251/https://huggingface.co/CompVis |dead-url=no }}</ref>。
|-
|1.5<ref>{{Cite web |title=runwayml/stable-diffusion-v1-5 · Hugging Face |url=https://huggingface.co/runwayml/stable-diffusion-v1-5 |access-date=2023-08-17 |website=huggingface.co |archive-date=2023-09-21 |archive-url=https://web.archive.org/web/20230921025150/https://huggingface.co/runwayml/stable-diffusion-v1-5 |dead-url=no }}</ref>
|2022年10月
|983M
|以1.2而非1.4的权重初始化。由RunwayML发行。
|-
|2.0<ref>{{Cite web |title=stabilityai/stable-diffusion-2 · Hugging Face |url=https://huggingface.co/stabilityai/stable-diffusion-2 |access-date=2023-08-17 |website=huggingface.co |archive-date=2023-09-21 |archive-url=https://web.archive.org/web/20230921135247/https://huggingface.co/stabilityai/stable-diffusion-2 |dead-url=no }}</ref>
|2022年11月
|
|从头在过滤后的数据集上重新训练<ref>{{Cite web |title=stabilityai/stable-diffusion-2-base · Hugging Face |url=https://huggingface.co/stabilityai/stable-diffusion-2-base |access-date=2024-01-01 |website=huggingface.co |archive-date=2025-02-09 |archive-url=https://web.archive.org/web/20250209212718/https://huggingface.co/stabilityai/stable-diffusion-2-base |dead-url=no }}</ref>。
|-
|2.1<ref>{{Cite web |title=stabilityai/stable-diffusion-2-1 · Hugging Face |url=https://huggingface.co/stabilityai/stable-diffusion-2-1 |access-date=2023-08-17 |website=huggingface.co |archive-date=2023-09-21 |archive-url=https://web.archive.org/web/20230921025146/https://huggingface.co/stabilityai/stable-diffusion-2-1 |dead-url=no }}</ref>
|2022年12月
|
|以2.0的权重初始化。
|-
|XL 1.0<ref>{{Cite web |title=stabilityai/stable-diffusion-xl-base-1.0 · Hugging Face |url=https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0 |access-date=2023-08-17 |website=huggingface.co |archive-date=2023-10-08 |archive-url=https://web.archive.org/web/20231008071719/https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0 |dead-url=no }}</ref>
|2023年7月
|3.5B
|XL 1.0基础模型有35亿个参数，使其比以前版本大了约3.5倍。<ref>{{Cite web |title=Announcing SDXL 1.0 |url=https://stability.ai/news/stable-diffusion-sdxl-1-announcement |access-date=2024-01-01 |website=Stability AI |language=en-GB |archive-date=2024-06-01 |archive-url=https://web.archive.org/web/20240601005434/https://stability.ai/news/stable-diffusion-sdxl-1-announcement |dead-url=no }}</ref>
|-
|XL Turbo<ref>{{Cite web |title=stabilityai/sdxl-turbo · Hugging Face |url=https://huggingface.co/stabilityai/sdxl-turbo |access-date=2024-01-01 |website=huggingface.co |archive-date=2024-05-23 |archive-url=https://web.archive.org/web/20240523005700/https://huggingface.co/stabilityai/sdxl-turbo |dead-url=no }}</ref>
|2023年11月
|
|提取自XL 1.0而以更少扩散步骤运行。<ref>{{Cite web |title=Adversarial Diffusion Distillation |url=https://stability.ai/research/adversarial-diffusion-distillation |access-date=2024-01-01 |website=Stability AI |language=en-GB |archive-date=2024-04-15 |archive-url=https://web.archive.org/web/20240415165742/https://stability.ai/research/adversarial-diffusion-distillation |dead-url=no }}</ref>
|-
|3.0<ref>{{Cite web |title=Stable Diffusion 3 |url=https://stability.ai/news/stable-diffusion-3 |access-date=2024-03-05 |website=Stability AI |language=en-GB |archive-date=2025-02-03 |archive-url=https://web.archive.org/web/20250203054235/https://stability.ai/news/stable-diffusion-3 |dead-url=no }}</ref><ref name=":6">{{Citation |last1=Esser |first1=Patrick |title=Scaling Rectified Flow Transformers for High-Resolution Image Synthesis |date=2024-03-05 |arxiv=2403.03206 |last2=Kulal |first2=Sumith |last3=Blattmann |first3=Andreas |last4=Entezari |first4=Rahim |last5=Müller |first5=Jonas |last6=Saini |first6=Harry |last7=Levi |first7=Yam |last8=Lorenz |first8=Dominik |last9=Sauer |first9=Axel}}</ref>
|2024年2月（早期预览）
|800M到8B
|模型家族。
|-
|3.5<ref name="release-sd3.5">{{cite web|url=https://stability.ai/news/introducing-stable-diffusion-3-5|title=Stable Diffusion 3.5|website=[[Stability AI]]|access-date=2024-10-23|archive-date=2024-10-23|archive-url=https://archive.today/20241023040750/https://stability.ai/news/introducing-stable-diffusion-3-5|url-status=live}}</ref>
|2024年10月
|2.5B到8B
|具有Large（80亿个参数）、Large Turbo（提取自SD 3.5）和Medium (25亿个参数）的模型家族。
|}

== 社會影響 ==
由於[[藝術風格]]和[[構圖]]不受版權保護，因此通常認為使用Stable Diffusion生成藝術品圖像的用戶不應被視為侵犯視覺相似作品的版權，绝大部分的画作作者也没有授权允许用他们的作品训练ai，这将导致画师的失业。<ref>{{Cite web |url=https://www.digitaling.com/articles/875566.html |title=存档副本 |access-date=2023-04-13 |archive-date=2023-04-17 |archive-url=https://web.archive.org/web/20230417173024/https://www.digitaling.com/articles/875566.html |dead-url=no }}</ref><ref name="automaton"/> 如果生成的圖像中所描述的真人被使用，他們仍然受到[[人格權]]的保護，<ref name="automaton">{{cite web|date=2022-08-24|url=https://automaton-media.com/articles/newsjp/20220824-216074/|title=高性能画像生成AI「Stable Diffusion」無料リリース。「kawaii」までも理解し創造する画像生成AI|website=Automaton Media|language=ja|access-date=2022-10-10|archive-date=2022-12-08|archive-url=https://web.archive.org/web/20221208021831/https://automaton-media.com/articles/newsjp/20220824-216074/|dead-url=no}}</ref> 而且諸如可識別的品牌標識等[[知識產權]]仍然受到版權保護。儘管如此，藝術家們表示擔心Stable Diffusion等模型的廣泛使用最終可能導致人類藝術家以及攝影師、模特、電影攝影師和演員逐漸失去與基於人工智能的競爭對手的商業可行性。<ref name="MIT-LAION"/>

與其他公司的類似機器學習圖像合成產品相比，Stable Diffusion在用戶可能產生的內容類型方面明顯更加寬容，例如暴力或性暴露的圖像。<ref name="bijapan">{{cite web|author=Ryo Shimizu|date=2022-08-26|url=https://www.businessinsider.jp/post-258369|title=Midjourneyを超えた？ 無料の作画AI｢ #StableDiffusion ｣が｢AIを民主化した｣と断言できる理由|website=Business Insider Japan|language=ja|access-date=2022-10-10|archive-date=2022-12-10|archive-url=https://web.archive.org/web/20221210192453/https://www.businessinsider.jp/post-258369|dead-url=no}}</ref>

StabilityAI的首席執行官Emad Mostaque解決了該模型可能被用於濫用目的的擔憂，他解釋說：「人們有責任了解他們在操作這項技術時是否符合道德、道德和法律」，<ref name="verge"/>將Stable Diffusion的能力交到公眾手中會使該技術在整體上提供淨收益，即使有潛在的負面後果。<ref name="verge"/> 此外，Mostaque認為，Stable Diffusion的開放可用性背後的意圖是結束大公司對此類技術的控制和主導地位，他們之前只開發了封閉的人工智能系統進行圖像合成。<ref name="verge"/><ref name="bijapan"/>

== 參見 ==
* [[生成式人工智能]]
* [[15.ai]]
* [[文本到图像生成模型]]
* [[人工智慧藝術]]
* [[DALL-E]]
* [[谷歌大腦]]
* {{tsl|en|Synthography}}
* [[Hugging Face]]

== 參考文獻 ==
{{reflist|2|refs=
<ref name="MIT-LAION">{{cite web
|work=MIT Technology Review
|last=Heikkilä
|first=Melissa
|date=2022-09-16
|title=This artist is dominating AI-generated art. And he's not happy about it.
|url=https://www.technologyreview.com/2022/09/16/1059598/this-artist-is-dominating-ai-generated-art-and-hes-not-happy-about-it/
|language=en
|access-date=2022-10-10
|archive-date=2023-01-14
|archive-url=https://web.archive.org/web/20230114125952/https://www.technologyreview.com/2022/09/16/1059598/this-artist-is-dominating-ai-generated-art-and-hes-not-happy-about-it/
|dead-url=no
}}</ref>
<ref name="Waxy">{{cite web
|work=Waxy.org
|last=Baio
|first=Andy
|date=2022-08-30
|title=Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion's Image Generator
|url=https://waxy.org/2022/08/exploring-12-million-of-the-images-used-to-train-stable-diffusions-image-generator/
|language=en
|access-date=2022-10-10
|archive-date=2023-01-20
|archive-url=https://web.archive.org/web/20230120124332/https://waxy.org/2022/08/exploring-12-million-of-the-images-used-to-train-stable-diffusions-image-generator/
|dead-url=no
}}</ref>
<ref name="LAION-Aesthetics">{{Cite web |title=LAION-Aesthetics {{!}} LAION |url=https://laion.ai/blog/laion-aesthetics |access-date=2022-09-02 |website=laion.ai |language=en |archive-date=2022-08-26 |archive-url=https://web.archive.org/web/20220826121216/https://laion.ai/blog/laion-aesthetics/ |url-status=live }}</ref>
<ref name="paper">{{cite conference |last1=Rombach |last2=Blattmann |last3=Lorenz |last4=Esser |last5=Ommer |title=High-Resolution Image Synthesis with Latent Diffusion Models |conference=International Conference on Computer Vision and Pattern Recognition (CVPR) |pages=10684–10695 |date=June 2022 |location=New Orleans, LA |url=https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf |arxiv=2112.10752 |language=en |access-date=2022-10-10 |archive-date=2023-01-20 |archive-url=https://web.archive.org/web/20230120163151/https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf |dead-url=no }}</ref>
<ref name="stable-diffusion-launch">{{cite web|url=https://stability.ai/blog/stable-diffusion-announcement|title=Stable Diffusion Launch Announcement|website=Stability.Ai|access-date=2022-09-06|archive-date=2022-09-05|archive-url=https://web.archive.org/web/20220905105009/https://stability.ai/blog/stable-diffusion-announcement|url-status=live|language=en}}</ref>
<ref name="stable-diffusion-github">{{cite web |title=Stable Diffusion Repository on GitHub |url=https://github.com/CompVis/stable-diffusion |publisher=CompVis - Machine Vision and Learning Research Group, LMU Munich |access-date=2022-09-17 |date=2022-09-17 |language=en |archive-date=2023-01-18 |archive-url=https://web.archive.org/web/20230118183342/https://github.com/CompVis/stable-diffusion |dead-url=no }}</ref>
<ref name="verge">{{cite web
|work=The Verge
|last=Vincent
|first=James
|date=2022-09-15
|title=Anyone can use this AI art generator — that’s the risk
|url=https://www.theverge.com/2022/9/15/23340673/ai-image-generation-stable-diffusion-explained-ethics-copyright-data
|language=en
|access-date=2022-10-10
|archive-date=2023-01-21
|archive-url=https://web.archive.org/web/20230121153021/https://www.theverge.com/2022/9/15/23340673/ai-image-generation-stable-diffusion-explained-ethics-copyright-data
|dead-url=no
}}</ref>
}}

== 外部鏈接 ==
{{Commons category}}
* [https://huggingface.co/spaces/stabilityai/stable-diffusion Stable Diffusion演示] {{Wayback|url=https://huggingface.co/spaces/stabilityai/stable-diffusion |date=20230121205631 }}
* [https://poloclub.github.io/diffusion-explainer/ Interactive Explanation of Stable Diffusion] {{Wayback|url=https://poloclub.github.io/diffusion-explainer/ |date=20231012084858 }}
* [https://github.com/AUTOMATIC1111/stable-diffusion-webui AUTOMATIC1111的開源Stable Diffusion網絡用戶界面（支持繁體中文，簡體中文）] {{Wayback|url=https://github.com/AUTOMATIC1111/stable-diffusion-webui |date=20230121212439 }}

{{Differentiable computing}}
{{生成式人工智能}}

[[Category:2022年軟體]]
[[Category:数据挖掘和机器学习软件]]
[[Category:人工神经网络]]
[[Category:应用机器学习]]
[[Category:开源人工智能]]
[[Category:计算语言学]]
[[Category:深度学习]]
[[Category:计算机图形学]]
[[Category:自然語言處理]]