编辑“︁Stable Diffusion”︁（章节）

== 模型训练 ==
Stable Diffusion是在LAION-5B的圖片和標題對上訓練的，LAION-5B是一個公開的數據集，源自從網絡上[[網頁抓取|抓取]]的{{en-link|公用抓取|Common Crawl}}數據。該數據集由{{en-link|LAION}}創建，LAION是一家德國非營利組織，接受StabilityAI的資助。{{r|Waxy|MIT-LAION}} 該模型最初是在LAION-5B的一個大子集上訓練的，最後幾輪訓練是在「LAION-Aesthetics v2 5+」上進行的，這是一個由6億張帶標題的圖片組成的子集，人工智能預測人類在被要求對這些圖片的喜歡程度打分時至少會給5/10打分。{{r|Waxy|LAION-Aesthetics}} 這個最終的子集也排除了低分辨率的圖像和被人工智能識別為帶有[[水印]]的圖像。{{r|Waxy}} 對該模型的訓練數據進行的第三方分析發現，在從所使用的原始更廣泛的數據集中抽取的1200萬張圖片的較小子集中，大約47%的圖像樣本量來自100個不同的網站，其中[[Pinterest]]佔8.5%子集，其次是[[WordPress]]，[[Blogger|Blogspot]]，[[Flickr]]，[[DeviantArt]]和[[維基共享資源]]等網站。{{r|Waxy}}

該模型是在[[亞馬遜雲計算服務]]上使用256個[[安培微架構|NVIDIA A100]] GPU訓練，共花费15萬個GPU小時，成本為60萬美元。<ref>{{Cite web |last=Mostaque |first=Emad |date=2022-08-28 |title=Cost of construction |url=https://twitter.com/emostaque/status/1563870674111832066 |access-date=2022-09-06 |website=Twitter |language=en |archive-date=2022-09-06 |archive-url=https://web.archive.org/web/20220906155426/https://twitter.com/EMostaque/status/1563870674111832066 |url-status=live }}</ref><ref name="stable-diffusion-model-card-1-4">{{cite web|url=https://huggingface.co/CompVis/stable-diffusion-v1-4|title=Stable Diffusion v1-4 Model Card|website=huggingface.co|access-date=2022-09-20|url-status=no|language=en|archive-date=2023-01-11|archive-url=https://web.archive.org/web/20230111161920/https://huggingface.co/CompVis/stable-diffusion-v1-4}}</ref><ref name="techcrunch-model">{{cite web|url=https://techcrunch.com/2022/08/12/a-startup-wants-to-democratize-the-tech-behind-dall-e-2-consequences-be-damned/|title=This startup is setting a DALL-E 2-like AI free, consequences be damned|website=TechCrunch|access-date=2022-09-20|url-status=no|language=en|archive-date=2023-01-19|archive-url=https://web.archive.org/web/20230119005503/https://techcrunch.com/2022/08/12/a-startup-wants-to-democratize-the-tech-behind-dall-e-2-consequences-be-damned/}}</ref>

=== 終端用戶微調訓練 ===
為了糾正模型初始訓練的局限性，終端用戶可以選擇實施額外的訓練，以微調生成輸出以匹配更具體的使用情況。有三種方法可以讓用戶對Stable Diffusion模型權重存檔點進行微調：
* 「嵌入」（Embedding）可以從用戶提供的一些圖像被訓練出來，並允許模型在提示詞中使用嵌入的名稱時生成視覺上相似的圖像。<ref>{{cite web|author=Dave James|date=2022-10-28|url=https://www.pcgamer.com/nvidia-rtx-4090-stable-diffusion-training-aharon-kahana/|title=I thrashed the RTX 4090 for 8 hours straight training Stable Diffusion to paint like my uncle Hermann|website=[[PC Gamer]]|language=en|archive-url=https://web.archive.org/web/20221109154310/https://www.pcgamer.com/nvidia-rtx-4090-stable-diffusion-training-aharon-kahana/|archive-date=2022-11-09|url-status=no|access-date=2022-12-11}}</ref>嵌入是基於2022年[[臺拉維夫大學]]的研究人員在[[輝達]]的支持下開發的「文本倒置」（Textual Inversion）概念，其中模型的文本編碼器使用的特定標記的矢量表示與新的偽詞相關聯。嵌入可以用來減少原始模型中的偏差，或模仿風格。<ref>{{cite arXiv|first1=Rinon|last1=Gal|first2=Yuval|last2=Alaluf|first3=Yuval|last3=Atzmon|first4=Or|last4=Patashnik|first5=Amit H.|last5=Bermano|first6=Gal|last6=Chechik|first7=Daniel|last7=Cohen-Or|date=2022-08-02|title=An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion|class=cs.CV|eprint=2208.01618|language=en}}</ref>
* 「超網路」（Hypernetwork）是[[NovelAI]]軟件開發員Kurumuz在2021年創造的一種技術，最初用於調節文本生成的[[Transformer模型]]，它能讓Stable Diffusion衍生的文生圖模型模仿各種特定藝術家的風格，無論原始模型能否識別此藝術家，通過在較大的神經網路中的不同點應用一個預訓練的小神經網路。超網路將文生圖或圖生圖結果導向特定方向，例如加上藝術風格，當與一個較大的神經網絡結合使用時。它通過尋找重要的關鍵區域來處理圖像（例：眼睛，頭髮），然後在二級潛在空間中修補這些區域。超網路的一個缺點是它們的準確性相對較低，也有時會產生不可預知的結果。因此，超網路適用於加上視覺風格或清理人體瑕疵。<ref>{{cite web|date=2022-10-11|url=https://blog.novelai.net/novelai-improvements-on-stable-diffusion-e10d38db82ac|title=NovelAI Improvements on Stable Diffusion|website=NovelAI|language=en|archive-url=https://archive.today/20221027041603/https://blog.novelai.net/novelai-improvements-on-stable-diffusion-e10d38db82ac|archive-date=2022-10-27|url-status=live}}</ref>
:[[File:X-Y plot of algorithmically-generated AI art demonstrating Hypernetworks.png|thumb|none|500px|演示Stable Diffusion的「超網路」（Hypernetwork）技術。]]
* [[DreamBooth]]是一個深度學習模型，由[[Google|Google Research]]和[[波士頓大學]]的研究人員於2022年開發，可以微調模型以產生與指定主題相關的輸出圖像。<ref>{{cite web|author=山下裕毅|date=2022-09-01|url=https://www.itmedia.co.jp/news/articles/2209/01/news041.html|title=愛犬の合成画像を生成できるAI　文章で指示するだけでコスプレ　米Googleが開発|website=ITmedia Inc.|language=ja|archive-url=https://web.archive.org/web/20220831232021/https://www.itmedia.co.jp/news/articles/2209/01/news041.html|archive-date=2022-08-31|url-status=no|access-date=2022-12-11}}</ref>