Stable Diffusion

Stable Diffusion
	由Stable Diffusion根據文字提示「a photograph of an astronaut riding a horse」生成的圖像由Stable Diffusion根據文字提示「a photograph of an astronaut riding a horse」生成的圖像
原作者	Runway、CompVis、Stability AI
開發者	Stability AI
首次發佈	2022年8月22日
目前版本	Module:EditAtWikidata第29行Lua錯誤：attempt to index field 'wikibase' (a nil value)
原始碼庫	{{URL\|example.com\|可选的显示文本}}; Module:EditAtWikidata第29行Lua錯誤：attempt to index field 'wikibase' (a nil value)
程式語言	Python
引擎	Module:EditAtWikidata第29行Lua錯誤：attempt to index field 'wikibase' (a nil value)
作業系統	任何支援CUDA內核函數的作業系統
類型	文字到圖像生成模型
許可協定	Module:EditAtWikidata第29行Lua錯誤：attempt to index field 'wikibase' (a nil value)

Stable Diffusion是2022年發佈的深度學習文字到圖像生成模型。它主要用於根據文字的描述產生詳細圖像，儘管它也可以應用於其他任務，如內補繪製、外補繪製，以及在提示詞指導下產生圖生圖的轉變。^[1]

它是一種潛在（英語：Latent variable model）擴散模型，由慕尼黑大學的CompVis研究團體開發的各種生成性類神經網絡之一。^[2]它是由初創公司StabilityAI、CompVis與Runway合作開發，並得到EleutherAI（英語：EleutherAI）和LAION（英語：LAION）的支援。^[3]^[4]^[5] 截至2022年10月，StabilityAI籌集了1.01億美元的資金。^[6]

Stable Diffusion的原始碼和模型權重已分別公開發佈在GitHub和Hugging Face，可以在大多數配備有適度GPU的電腦硬件上運行。而以前的專有文生圖模型（如DALL-E和Midjourney）只能通過雲端運算服務訪問。^[7]^[8]

技術架構[編輯]

File:Stable Diffusion architecture.png

Stable Diffusion使用的潛在擴散結構圖。

File:X-Y plot of algorithmically-generated AI art of European-style castle in Japan demonstrating DDIM diffusion steps.png

擴散模型所用的去噪過程。

Stable Diffusion是一種擴散模型（diffusion model）的變體，叫做「潛在擴散模型」（latent diffusion model; LDM）。擴散模型是在2015年推出的，其目的是消除對訓練圖像的連續應用高斯噪聲，可以將其視為一系列去噪自編碼器。Stable Diffusion由3個部分組成：變分自編碼器（VAE）、U-Net和一個文字編碼器。與其學習去噪圖像數據（在「像素空間」中），而是訓練VAE將圖像轉換為低維潛在空間。添加和去除高斯噪聲的過程被應用於這個潛在表示，然後將最終的去噪輸出解碼到像素空間中。在前向擴散過程中，高斯噪聲被迭代地應用於壓縮的潛在表徵。每個去噪步驟都由一個包含ResNet骨幹的U-Net架構完成，通過從前向擴散往反方向去噪而獲得潛在表徵。最後，VAE解碼器通過將表徵轉換回像素空間來生成輸出圖像。研究人員指出，降低訓練和生成的計算要求是LDM的一個優勢。^[3]^[2]

去噪步驟可以以文字串、圖像或一些其他數據為條件。調節數據的編碼通過交叉注意機制（cross-attention mechanism）暴露給去噪U-Net的架構。為了對文字進行調節，一個預訓練的固定CLIP ViT-L/14文字編碼器被用來將提示詞轉化為嵌入空間。^[2]^[4]

用法[編輯]

Stable Diffusion模型支援通過使用提示詞來產生新的圖像，描述要包含或省略的元素，^[4]以及重新繪製現有的圖像，其中包含提示詞中描述的新元素（該過程通常被稱為「指導性圖像合成」（guided image synthesis）^[9]）通過使用模型的擴散去噪機制（diffusion-denoising mechanism）。^[4] 此外，該模型還允許通過提示詞在現有的圖中進行內補繪製和外補繪製來部分更改，當與支援這種功能的用戶介面使用時，其中存在許多不同的開源軟件。^[10]

Stable Diffusion建議在10GB以上的影像記憶體下運行，但是影像記憶體較少的用戶可以選擇以float16的精度載入權重，而不是默認的float32，以降低影像記憶體使用率。^[11]

文生圖[編輯]

File:Algorithmically-generated landscape artwork of forest with Shinto shrine.png

File:Algorithmically-generated landscape artwork of forest with Shinto shrine using negative prompt for green trees.png

File:Algorithmically-generated landscape artwork of forest with Shinto shrine using negative prompt for round stones.png

演示反向提示詞對圖像生成的影響。

上: 無反向提示詞
中: "綠樹"
下: "圓形石頭"

Stable Diffusion中的文生圖採樣指令碼，稱為"txt2img"，接受一個提示詞，以及包括採樣器（sampling type），圖像尺寸，和隨機種子的各種選項參數，並根據模型對提示的解釋生成一個圖像檔案。^[4] 生成的圖像帶有不可見的數碼水印標籤，以允許用戶識別由Stable Diffusion生成的圖像，^[4]儘管如果圖像被調整大小或旋轉，該浮水印將失去其有效性。^[12] Stable Diffusion模型是在由512×512解像度圖像組成的數據集上訓練出來的，^[4]^[13]這意味着txt2img生成圖像的最佳組態也是以512×512的解像度生成的，偏離這個大小會導致生成輸出質素差。^[11] Stable Diffusion 2.0版本後來引入了以768×768解像度圖像生成的能力。^[14]

每一個txt2img的生成過程都會涉及到一個影響到生成圖像的隨機種子；用戶可以選擇隨機化種子以探索不同生成結果，或者使用相同的種子來獲得與之前生成的圖像相同的結果。^[11] 用戶還可以調整採樣迭代步數（inference steps）；較高的值需要較長的運行時間，但較小的值可能會導致視覺缺陷。^[11] 另一個可組態的選項，即無分類指導比例值，允許用戶調整提示詞的相關性（classifier-free guidance scale value）；^[15]更具實驗性或創造性的用例可以選擇較低的值，而旨在獲得更具體輸出的用例可以使用較高的值。^[11]

反向提示詞（negative prompt）是包含在Stable Diffusion的一些用戶介面軟件中的一個功能（包括StabilityAI自己的「Dreamstudio」雲端軟件即服務模式訂閱制服務），它允許用戶指定模型在圖像生成過程中應該避免的提示，適用於由於用戶提供的普通提示詞，或者由於模型最初的訓練，造成圖像輸出中出現不良的圖像特徵，例如畸形手腳。^[10]^[16] 與使用強調符（emphasis marker）相比，使用反向提示詞在降低生成不良的圖像的頻率方面具有高度統計顯著的效果；強調符是另一種為提示的部分增加權重的方法，被一些Stable Diffusion的開源實現所利用，在關鍵詞中加入括號以增加或減少強調。^[17]

演示當指示繪製同一主題時，不同的提示詞如何影響由Stable Diffusion模型產生的圖像輸出。每一列代表輸入到模型中的不同提示詞。左到右：賽博朋克，蒸汽朋克，柴油朋克，生物朋克，（英語：），原子朋克，賽博POP，哥德次文化，奇幻

演示當指示繪製同一主題時，不同的提示詞如何影響由Stable Diffusion模型產生的圖像輸出。每一列代表輸入到模型中的不同提示詞。左到右：賽博朋克，蒸汽朋克，柴油朋克，生物朋克，磁帶朋克（英語：Cyberpunk derivatives#Cassette futurism/Formicapunk），原子朋克，賽博POP，哥德次文化，奇幻

圖生圖[編輯]

File:NightCitySphere (SD1.5).jpg

File:NightCitySphere (SDXL).jpg

演示img2img修改

左: 最初用Stable Diffusion 1.5製作的圖像
右: 用Stable Diffusion XL 1.0修改後的圖像

Stable Diffusion包括另一個取樣指令碼，稱為"img2img"，它接受一個提示詞、現有圖像的檔案路徑和0.0到1.0之間的去噪強度，並在原始圖像的基礎上產生一個新的圖像，該圖像也具有提示詞中提供的元素；去噪強度表示添加到輸出圖像的噪聲量，值越大，圖像變化越多，但在語義上可能與提供的提示不一致。^[4] 圖像升頻是img2img的一個潛在用例，除此之外。^[4]

2022年11月24日發佈的Stable Diffusion 2.0版本包含一個深度引導模型，稱為"depth2img"，該模型推斷所提供的輸入圖像的深度（英語：Depth map），並根據提示詞和深度資訊生成新圖像，在新圖像中保持原始圖像的連貫性和深度。^[14]

內補繪製與外補繪製[編輯]

Stable Diffusion模型的許多不同用戶介面軟件提供了通過img2img進行圖生圖的其他用例。內補繪製（inpainting）由用戶提供的蒙版（英語：Layers (digital image editing)#Layer mask）描繪的現有圖像的一部分，根據所提供的提示詞，用新生成的內容填充蒙版的空間。^[10] 隨着Stable Diffusion 2.0版本的發佈，StabilityAI同時建立了一個專門針對內補繪製用例的專用模型。^[14] 相反，外補繪製（outpainting）將圖像擴展到其原始尺寸之外，用根據所提供的提示詞生成的內容來填補以前的空白空間。^[10]

在Stable Diffusion中使用img2img的內補繪製與外補繪製技術的演示

File:Demonstration of inpainting and outpainting using Stable Diffusion (step 1 of 4).png

第一步： 使用txt2img生成新圖像。巧合的是，它無意中生成了這個缺少一隻手臂的人。

File:Demonstration of inpainting and outpainting using Stable Diffusion (step 2 of 4).png

第二步： 通過外補繪製，圖像底部被擴展了512像素，並被AI生成的內容所填充。

File:Demonstration of inpainting and outpainting using Stable Diffusion (step 3 of 4).png

第三步： 在準備內補繪製時，使用GIMP中的畫筆繪製了一個臨時的手臂。

File:Demonstration of inpainting and outpainting using Stable Diffusion (step 4 of 4).png

第四步： 在臨時手臂上應用內補繪製蒙版，img2img生成一個新手臂，同時保持圖像的其餘部分保持不變。

許可證[編輯]

與DALL-E等模型不同，Stable Diffusion提供其原始碼^[18]^[4]以及預訓練的權重。其許可證禁止某些使用案例，包括犯罪，誹謗，騷擾，人肉搜尋，「剝削…未成年人」，提供醫療建議，自動建立法律義務，偽造法律證據，以及「基於…社會行為或…個人或人格特徵…或受法律保護的特徵或類別而歧視或傷害個人或群體」。^[19]^[20] 用戶擁有其生成的圖像的權利，並可自由地將其用於商業用途。^[21]

模型訓練[編輯]

Stable Diffusion是在LAION-5B的圖片和標題對上訓練的，LAION-5B是一個公開的數據集，源自從網絡上抓取的公用抓取（英語：Common Crawl）數據。該數據集由LAION（英語：LAION）建立，LAION是一家德國非營利組織，接受StabilityAI的資助。^[13]^[22] 該模型最初是在LAION-5B的一個大子集上訓練的，最後幾輪訓練是在「LAION-Aesthetics v2 5+」上進行的，這是一個由6億張帶標題的圖片組成的子集，人工智能預測人類在被要求對這些圖片的喜歡程度打分時至少會給5/10打分。^[13]^[23] 這個最終的子集也排除了低解像度的圖像和被人工智能識別為帶有浮水印的圖像。^[13] 對該模型的訓練數據進行的第三方分析發現，在從所使用的原始更廣泛的數據集中抽取的1200萬張圖片的較小子集中，大約47%的圖像樣本量來自100個不同的網站，其中Pinterest佔8.5%子集，其次是WordPress，Blogspot，Flickr，DeviantArt和維基共享資源等網站。^[13]

該模型是在亞馬遜雲端運算服務上使用256個NVIDIA A100 GPU訓練，共花費15萬個GPU小時，成本為60萬美元。^[24]^[25]^[26]

終端用戶微調訓練[編輯]

為了糾正模型初始訓練的局限性，終端用戶可以選擇實施額外的訓練，以微調生成輸出以匹配更具體的使用情況。有三種方法可以讓用戶對Stable Diffusion模型權重存檔點進行微調：

「嵌入」（Embedding）可以從用戶提供的一些圖像被訓練出來，並允許模型在提示詞中使用嵌入的名稱時生成視覺上相似的圖像。^[27]嵌入是基於2022年特拉維夫大學的研究人員在輝達的支援下開發的「文字倒置」（Textual Inversion）概念，其中模型的文字編碼器使用的特定標記的向量表示與新的偽詞相關聯。嵌入可以用來減少原始模型中的偏差，或模仿風格。^[28]
「超網絡」（Hypernetwork）是NovelAI軟件開發員Kurumuz在2021年創造的一種技術，最初用於調節文字生成的Transformer模型，它能讓Stable Diffusion衍生的文生圖模型模仿各種特定藝術家的風格，無論原始模型能否識別此藝術家，通過在較大的神經網絡中的不同點應用一個預訓練的小神經網絡。超網絡將文生圖或圖生圖結果導向特定方向，例如加上藝術風格，當與一個較大的神經網絡結合使用時。它通過尋找重要的關鍵區域來處理圖像（例：眼睛，頭髮），然後在二級潛在空間中修補這些區域。超網絡的一個缺點是它們的準確性相對較低，也有時會產生不可預知的結果。因此，超網絡適用於加上視覺風格或清理人體瑕疵。^[29]

File:X-Y plot of algorithmically-generated AI art demonstrating Hypernetworks.png

演示Stable Diffusion的「超網絡」（Hypernetwork）技術。

DreamBooth是一個深度學習模型，由Google Research和波士頓大學的研究人員於2022年開發，可以微調模型以產生與指定主題相關的輸出圖像。^[30]

發行[編輯]


版本號	發行日期	參數	註釋
1.1, 1.2, 1.3, 1.4^[31]	2022年8月		都由CompVis發行。沒有版本1.0。1.1引發1.2，而1.2引發1.3和1.4二者^[32]。
1.5^[33]	2022年10月	983M	以1.2而非1.4的權重初始化。由RunwayML發行。
2.0^[34]	2022年11月		從頭在過濾後的數據集上重新訓練^[35]。
2.1^[36]	2022年12月		以2.0的權重初始化。
XL 1.0^[37]	2023年7月	3.5B	XL 1.0基礎模型有35億個參數，使其比以前版本大了約3.5倍。^[38]
XL Turbo^[39]	2023年11月		提取自XL 1.0而以更少擴散步驟執行。^[40]
3.0^[41]^[42]	2024年2月（早期預覽）	800M到8B	模型家族。
3.5^[43]	2024年10月	2.5B到8B	具有Large（80億個參數）、Large Turbo（提取自SD 3.5）和Medium (25億個參數）的模型家族。

社會影響[編輯]

由於藝術風格和構圖不受版權保護，因此通常認為使用Stable Diffusion生成藝術品圖像的用戶不應被視為侵犯視覺相似作品的版權，絕大部分的畫作作者也沒有授權允許用他們的作品訓練ai，這將導致畫師的失業。^[44]^[45] 如果生成的圖像中所描述的真人被使用，他們仍然受到人格權的保護，^[45] 而且諸如可識別的品牌標識等知識產權仍然受到版權保護。儘管如此，藝術家們表示擔心Stable Diffusion等模型的廣泛使用最終可能導致人類藝術家以及攝影師、模特、電影攝影師和演員逐漸失去與基於人工智能的競爭對手的商業可行性。^[22]

與其他公司的類似機器學習圖像合成產品相比，Stable Diffusion在用戶可能產生的內容類型方面明顯更加寬容，例如暴力或性暴露的圖像。^[46]

StabilityAI的行政總裁Emad Mostaque解決了該模型可能被用於濫用目的的擔憂，他解釋說：「人們有責任了解他們在操作這項技術時是否符合道德、道德和法律」，^[8]將Stable Diffusion的能力交到公眾手中會使該技術在整體上提供淨收益，即使有潛在的負面後果。^[8] 此外，Mostaque認為，Stable Diffusion的開放可用性背後的意圖是結束大公司對此類技術的控制和主導地位，他們之前只開發了封閉的人工智能系統進行圖像合成。^[8]^[46]

參見[編輯]

參考文獻[編輯]

↑ Diffuse The Rest - a Hugging Face Space by huggingface. huggingface.co. [2022-09-05]. （原始內容存檔於2022-09-05）（English）.
↑ ^2.0 ^2.1 ^2.2 Rombach; Blattmann; Lorenz; Esser; Ommer. High-Resolution Image Synthesis with Latent Diffusion Models (PDF). International Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA: 10684–10695. June 2022 [2022-10-10]. arXiv:2112.10752 可免費查閱. （原始內容存檔 (PDF)於2023-01-20）（English）.
↑ ^3.0 ^3.1 Stable Diffusion Launch Announcement. Stability.Ai. [2022-09-06]. （原始內容存檔於2022-09-05）（English）.
↑ ^4.00 ^4.01 ^4.02 ^4.03 ^4.04 ^4.05 ^4.06 ^4.07 ^4.08 ^4.09 Stable Diffusion Repository on GitHub. CompVis - Machine Vision and Learning Research Group, LMU Munich. 2022-09-17 [2022-09-17]. （原始內容存檔於2023-01-18）（English）.
↑ Revolutionizing image generation by AI: Turning text into images. LMU Munich. [2022-09-17]. （原始內容存檔於2022-09-17）（English）.
↑ Wiggers, Kyle. Stability AI, the startup behind Stable Diffusion, raises $101M. Techcrunch. [2022-10-17]. （原始內容存檔於2022-10-17）（English）.
↑ The new killer app: Creating AI art will absolutely crush your PC. PCWorld. [2022-08-31]. （原始內容存檔於2022-08-31）（English）.
↑ ^8.0 ^8.1 ^8.2 ^8.3 Vincent, James. Anyone can use this AI art generator — that’s the risk. The Verge. 2022-09-15 [2022-10-10]. （原始內容存檔於2023-01-21）（English）.
↑ Meng, Chenlin; He, Yutong; Song, Yang; Song, Jiaming; Wu, Jiajun; Zhu, Jun-Yan; Ermon, Stefano. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. arXiv (arXiv). 2021-08-02 [2022-10-10]. doi:10.48550/arXiv.2108.01073. （原始內容存檔於2022-12-09）（English）.
↑ ^10.0 ^10.1 ^10.2 ^10.3 Stable Diffusion web UI. GitHub. [2022-10-10]. （原始內容存檔於2023-01-20）（English）.
↑ ^11.0 ^11.1 ^11.2 ^11.3 ^11.4 Stable Diffusion with 🧨 Diffusers. Hugging Face official blog. 2022-08-22 [2022-10-10]. （原始內容存檔於2023-01-17）（English）.
↑ invisible-watermark README.md. GitHub. [2022-10-10]. （原始內容存檔於2022-09-29）（English）.
↑ ^13.0 ^13.1 ^13.2 ^13.3 ^13.4 Baio, Andy. Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion's Image Generator. Waxy.org. 2022-08-30 [2022-10-10]. （原始內容存檔於2023-01-20）（English）.
↑ ^14.0 ^14.1 ^14.2 Stable Diffusion 2.0 Release. stability.ai. [2022-12-11]. （原始內容存檔於2022-12-10）（English）.
↑ Ho, Jonathan; Salimans, Tim. Classifier-Free Diffusion Guidance. arXiv (arXiv). 2022-07-26 [2022-10-10]. doi:10.48550/arXiv.2207.12598. （原始內容存檔於2023-01-03）（English）.
↑ Stable Diffusion v2.1 and DreamStudio Updates 7-Dec 22. stability.ai. [2022-12-11]. （原始內容存檔於2022-12-10）（English）.
↑ Johannes Gaessler. Emphasis. GitHub. 2022-09-11 [2022-10-10]. （原始內容存檔於2022-12-09）（English）.
↑ Stable Diffusion Public Release. Stability.Ai. [2022-08-31]. （原始內容存檔於2022-08-30）（English）.
↑ Ready or not, mass video deepfakes are coming. The Washington Post. 2022-08-30 [2022-08-31]. （原始內容存檔於2022-08-31）（English）.
↑ License - a Hugging Face Space by CompVis. huggingface.co. [2022-09-05]. （原始內容存檔於2022-09-04）（English）.
↑ Katsuo Ishida. 言葉で指示した画像を凄いAIが描き出す「Stable Diffusion」～画像は商用利用も可能. Impress Corporation. 2022-08-26 [2022-10-10]. （原始內容存檔於2022-11-14）（日本語）.
↑ ^22.0 ^22.1 Heikkilä, Melissa. This artist is dominating AI-generated art. And he's not happy about it.. MIT Technology Review. 2022-09-16 [2022-10-10]. （原始內容存檔於2023-01-14）（English）.
↑ LAION-Aesthetics | LAION. laion.ai. [2022-09-02]. （原始內容存檔於2022-08-26）（English）.
↑ Mostaque, Emad. Cost of construction. Twitter. 2022-08-28 [2022-09-06]. （原始內容存檔於2022-09-06）（English）.
↑ Stable Diffusion v1-4 Model Card. huggingface.co. [2022-09-20]. （原始內容存檔於2023-01-11）（English）.
↑ This startup is setting a DALL-E 2-like AI free, consequences be damned. TechCrunch. [2022-09-20]. （原始內容存檔於2023-01-19）（English）.
↑ Dave James. I thrashed the RTX 4090 for 8 hours straight training Stable Diffusion to paint like my uncle Hermann. PC Gamer. 2022-10-28 [2022-12-11]. （原始內容存檔於2022-11-09）（English）.
↑ Gal, Rinon; Alaluf, Yuval; Atzmon, Yuval; Patashnik, Or; Bermano, Amit H.; Chechik, Gal; Cohen-Or, Daniel. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. 2022-08-02. arXiv:2208.01618 可免費查閱 [cs.CV] （English）.
↑ NovelAI Improvements on Stable Diffusion. NovelAI. 2022-10-11. （原始內容存檔於2022-10-27）（English）.
↑ 山下裕毅. 愛犬の合成画像を生成できるAI　文章で指示するだけでコスプレ　米Googleが開発. ITmedia Inc. 2022-09-01 [2022-12-11]. （原始內容存檔於2022-08-31）（日本語）.
↑ CompVis/stable-diffusion-v1-4 · Hugging Face. huggingface.co. [2023-08-17]. （原始內容存檔於2023-01-11）.
↑ CompVis (CompVis). huggingface.co. 2023-08-23 [2024-03-06]. （原始內容存檔於2025-02-01）.
↑ runwayml/stable-diffusion-v1-5 · Hugging Face. huggingface.co. [2023-08-17]. （原始內容存檔於2023-09-21）.
↑ stabilityai/stable-diffusion-2 · Hugging Face. huggingface.co. [2023-08-17]. （原始內容存檔於2023-09-21）.
↑ stabilityai/stable-diffusion-2-base · Hugging Face. huggingface.co. [2024-01-01]. （原始內容存檔於2025-02-09）.
↑ stabilityai/stable-diffusion-2-1 · Hugging Face. huggingface.co. [2023-08-17]. （原始內容存檔於2023-09-21）.
↑ stabilityai/stable-diffusion-xl-base-1.0 · Hugging Face. huggingface.co. [2023-08-17]. （原始內容存檔於2023-10-08）.
↑ Announcing SDXL 1.0. Stability AI. [2024-01-01]. （原始內容存檔於2024-06-01）（British English）.
↑ stabilityai/sdxl-turbo · Hugging Face. huggingface.co. [2024-01-01]. （原始內容存檔於2024-05-23）.
↑ Adversarial Diffusion Distillation. Stability AI. [2024-01-01]. （原始內容存檔於2024-04-15）（British English）.
↑ Stable Diffusion 3. Stability AI. [2024-03-05]. （原始內容存檔於2025-02-03）（British English）.
↑ Esser, Patrick; Kulal, Sumith; Blattmann, Andreas; Entezari, Rahim; Müller, Jonas; Saini, Harry; Levi, Yam; Lorenz, Dominik; Sauer, Axel, Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, 2024-03-05, arXiv:2403.03206 可免費查閱
↑ Stable Diffusion 3.5. Stability AI. [2024-10-23]. （原始內容存檔於2024-10-23）.
↑ 存档副本. [2023-04-13]. （原始內容存檔於2023-04-17）.
↑ ^45.0 ^45.1 高性能画像生成AI「Stable Diffusion」無料リリース。「kawaii」までも理解し創造する画像生成AI. Automaton Media. 2022-08-24 [2022-10-10]. （原始內容存檔於2022-12-08）（日本語）.
↑ ^46.0 ^46.1 Ryo Shimizu. Midjourneyを超えた？無料の作画AI｢ #StableDiffusion ｣が｢AIを民主化した｣と断言できる理由. Business Insider Japan. 2022-08-26 [2022-10-10]. （原始內容存檔於2022-12-10）（日本語）.

外部連結[編輯]

Stable Diffusion演示（頁面存檔備份，存於互聯網檔案館）
Interactive Explanation of Stable Diffusion （頁面存檔備份，存於互聯網檔案館）
AUTOMATIC1111的開源Stable Diffusion網絡用戶介面（支援繁體中文，簡體中文）（頁面存檔備份，存於互聯網檔案館）

[1] Diffuse The Rest - a Hugging Face Space by huggingface. huggingface.co. [2022-09-05]. （原始內容存檔於2022-09-05）（English）.

[paper-2] 2.0 ^2.1 ^2.2 Rombach; Blattmann; Lorenz; Esser; Ommer. High-Resolution Image Synthesis with Latent Diffusion Models (PDF). International Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA: 10684–10695. June 2022 [2022-10-10]. arXiv:2112.10752 可免費查閱. （原始內容存檔 (PDF)於2023-01-20）（English）.

[stable-diffusion-launch-3] 3.0 ^3.1 Stable Diffusion Launch Announcement. Stability.Ai. [2022-09-06]. （原始內容存檔於2022-09-05）（English）.

[stable-diffusion-github-4] 4.00 ^4.01 ^4.02 ^4.03 ^4.04 ^4.05 ^4.06 ^4.07 ^4.08 ^4.09 Stable Diffusion Repository on GitHub. CompVis - Machine Vision and Learning Research Group, LMU Munich. 2022-09-17 [2022-09-17]. （原始內容存檔於2023-01-18）（English）.

[5] Revolutionizing image generation by AI: Turning text into images. LMU Munich. [2022-09-17]. （原始內容存檔於2022-09-17）（English）.

[6] Wiggers, Kyle. Stability AI, the startup behind Stable Diffusion, raises $101M. Techcrunch. [2022-10-17]. （原始內容存檔於2022-10-17）（English）.

[pcworld-7] The new killer app: Creating AI art will absolutely crush your PC. PCWorld. [2022-08-31]. （原始內容存檔於2022-08-31）（English）.

[verge-8] 8.0 ^8.1 ^8.2 ^8.3 Vincent, James. Anyone can use this AI art generator — that’s the risk. The Verge. 2022-09-15 [2022-10-10]. （原始內容存檔於2023-01-21）（English）.

[9] Meng, Chenlin; He, Yutong; Song, Yang; Song, Jiaming; Wu, Jiajun; Zhu, Jun-Yan; Ermon, Stefano. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. arXiv (arXiv). 2021-08-02 [2022-10-10]. doi:10.48550/arXiv.2108.01073. （原始內容存檔於2022-12-09）（English）.

[webui_showcase-10] 10.0 ^10.1 ^10.2 ^10.3 Stable Diffusion web UI. GitHub. [2022-10-10]. （原始內容存檔於2023-01-20）（English）.

[diffusers-11] 11.0 ^11.1 ^11.2 ^11.3 ^11.4 Stable Diffusion with 🧨 Diffusers. Hugging Face official blog. 2022-08-22 [2022-10-10]. （原始內容存檔於2023-01-17）（English）.

[12] visible-watermark README.md. GitHub. [2022-10-10]. （原始內容存檔於2022-09-29）（English）.

[Waxy-13] 13.0 ^13.1 ^13.2 ^13.3 ^13.4 Baio, Andy. Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion's Image Generator. Waxy.org. 2022-08-30 [2022-10-10]. （原始內容存檔於2023-01-20）（English）.

[release2.0-14] 14.0 ^14.1 ^14.2 Stable Diffusion 2.0 Release. stability.ai. [2022-12-11]. （原始內容存檔於2022-12-10）（English）.

[15] Ho, Jonathan; Salimans, Tim. Classifier-Free Diffusion Guidance. arXiv (arXiv). 2022-07-26 [2022-10-10]. doi:10.48550/arXiv.2207.12598. （原始內容存檔於2023-01-03）（English）.

[release2.1-16] Stable Diffusion v2.1 and DreamStudio Updates 7-Dec 22. stability.ai. [2022-12-11]. （原始內容存檔於2022-12-10）（English）.

[17] Johannes Gaessler. Emphasis. GitHub. 2022-09-11 [2022-10-10]. （原始內容存檔於2022-12-09）（English）.

[stability-18] Stable Diffusion Public Release. Stability.Ai. [2022-08-31]. （原始內容存檔於2022-08-30）（English）.

[washingtonpost-19] Ready or not, mass video deepfakes are coming. The Washington Post. 2022-08-30 [2022-08-31]. （原始內容存檔於2022-08-31）（English）.

[20] License - a Hugging Face Space by CompVis. huggingface.co. [2022-09-05]. （原始內容存檔於2022-09-04）（English）.

[21] Katsuo Ishida. 言葉で指示した画像を凄いAIが描き出す「Stable Diffusion」～画像は商用利用も可能. Impress Corporation. 2022-08-26 [2022-10-10]. （原始內容存檔於2022-11-14）（日本語）.

[MIT-LAION-22] 22.0 ^22.1 Heikkilä, Melissa. This artist is dominating AI-generated art. And he's not happy about it.. MIT Technology Review. 2022-09-16 [2022-10-10]. （原始內容存檔於2023-01-14）（English）.

[LAION-Aesthetics-23] LAION-Aesthetics | LAION. laion.ai. [2022-09-02]. （原始內容存檔於2022-08-26）（English）.

[24] Mostaque, Emad. Cost of construction. Twitter. 2022-08-28 [2022-09-06]. （原始內容存檔於2022-09-06）（English）.

[stable-diffusion-model-card-1-4-25] Stable Diffusion v1-4 Model Card. huggingface.co. [2022-09-20]. （原始內容存檔於2023-01-11）（English）.

[techcrunch-model-26] This startup is setting a DALL-E 2-like AI free, consequences be damned. TechCrunch. [2022-09-20]. （原始內容存檔於2023-01-19）（English）.

[27] Dave James. I thrashed the RTX 4090 for 8 hours straight training Stable Diffusion to paint like my uncle Hermann. PC Gamer. 2022-10-28 [2022-12-11]. （原始內容存檔於2022-11-09）（English）.

[28] Gal, Rinon; Alaluf, Yuval; Atzmon, Yuval; Patashnik, Or; Bermano, Amit H.; Chechik, Gal; Cohen-Or, Daniel. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. 2022-08-02. arXiv:2208.01618 可免費查閱 [cs.CV] （English）.

[29] NovelAI Improvements on Stable Diffusion. NovelAI. 2022-10-11. （原始內容存檔於2022-10-27）（English）.

[30] 山下裕毅. 愛犬の合成画像を生成できるAI　文章で指示するだけでコスプレ　米Googleが開発. ITmedia Inc. 2022-09-01 [2022-12-11]. （原始內容存檔於2022-08-31）（日本語）.

[31] CompVis/stable-diffusion-v1-4 · Hugging Face. huggingface.co. [2023-08-17]. （原始內容存檔於2023-01-11）.

[32] CompVis (CompVis). huggingface.co. 2023-08-23 [2024-03-06]. （原始內容存檔於2025-02-01）.

[33] runwayml/stable-diffusion-v1-5 · Hugging Face. huggingface.co. [2023-08-17]. （原始內容存檔於2023-09-21）.

[34] stabilityai/stable-diffusion-2 · Hugging Face. huggingface.co. [2023-08-17]. （原始內容存檔於2023-09-21）.

[35] stabilityai/stable-diffusion-2-base · Hugging Face. huggingface.co. [2024-01-01]. （原始內容存檔於2025-02-09）.

[36] stabilityai/stable-diffusion-2-1 · Hugging Face. huggingface.co. [2023-08-17]. （原始內容存檔於2023-09-21）.

[37] stabilityai/stable-diffusion-xl-base-1.0 · Hugging Face. huggingface.co. [2023-08-17]. （原始內容存檔於2023-10-08）.

[38] Announcing SDXL 1.0. Stability AI. [2024-01-01]. （原始內容存檔於2024-06-01）（British English）.

[39] stabilityai/sdxl-turbo · Hugging Face. huggingface.co. [2024-01-01]. （原始內容存檔於2024-05-23）.

[40] Adversarial Diffusion Distillation. Stability AI. [2024-01-01]. （原始內容存檔於2024-04-15）（British English）.

[41] Stable Diffusion 3. Stability AI. [2024-03-05]. （原始內容存檔於2025-02-03）（British English）.

[:6-42] Esser, Patrick; Kulal, Sumith; Blattmann, Andreas; Entezari, Rahim; Müller, Jonas; Saini, Harry; Levi, Yam; Lorenz, Dominik; Sauer, Axel, Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, 2024-03-05, arXiv:2403.03206 可免費查閱

[release-sd3.5-43] Stable Diffusion 3.5. Stability AI. [2024-10-23]. （原始內容存檔於2024-10-23）.

[44] 存档副本. [2023-04-13]. （原始內容存檔於2023-04-17）.

[automaton-45] 45.0 ^45.1 高性能画像生成AI「Stable Diffusion」無料リリース。「kawaii」までも理解し創造する画像生成AI. Automaton Media. 2022-08-24 [2022-10-10]. （原始內容存檔於2022-12-08）（日本語）.

[bijapan-46] 46.0 ^46.1 Ryo Shimizu. Midjourneyを超えた？無料の作画AI｢ #StableDiffusion ｣が｢AIを民主化した｣と断言できる理由. Business Insider Japan. 2022-08-26 [2022-10-10]. （原始內容存檔於2022-12-10）（日本語）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

閱論編可微分計算
概論	可微分編程自動微分張量微積分資訊幾何統計流形神經形態工程（英語：Neuromorphic engineering）模式辨識運算學習理論（英語：Computational learning theory）歸納偏置
概念	梯度下降 SGD（英語：Stochastic gradient descent）聚類回歸過適注意力卷積損失函數反向傳播激活函數 softmax sigmoid ReLU 正則化資料集擴散（英語：Diffusion process）自回歸
應用	機器學習類神經網絡深度學習科學計算人工智能語言模型大型語言模型
硬件	TPU VPU IPU（英語：Graphcore）憶阻器 SpiNNaker（英語：SpiNNaker）
軟件庫	Theano TensorFlow Keras PyTorch Caffe JAX MindSpore（英語：MindSpore） Flux.jl（英語：Flux (machine-learning framework)）
File:Symbol portal class.svg 主題電腦編程技術 File:Symbol category class.svg 分類類神經網絡機器學習