Imagen

Imagen
	由 Imagen 3 所生成的圖像。部分提示語：柔和光照的午後山谷與蜿蜒的河流。由 Imagen 3 所生成的圖像。部分提示語：柔和光照的午後山谷與蜿蜒的河流。
開發者	Google DeepMind
當前版本	Imagen 3（2024年8月13日，22個月前）
原始碼庫	{{URL\|example.com\|可选的显示文本}}; Module:EditAtWikidata第29行Lua錯誤：attempt to index field 'wikibase' (a nil value)
引擎	Module:EditAtWikidata第29行Lua錯誤：attempt to index field 'wikibase' (a nil value)
類型	文字轉圖像模型
許可協議	Module:EditAtWikidata第29行Lua錯誤：attempt to index field 'wikibase' (a nil value)
網站	deepmind.google/technologies/imagen-3/

Imagen、Imagen 2和Imagen 3是由Google DeepMind開發的文字轉圖像模型。在2023年4月Google Brain與Google DeepMind合併前，該模型由Google Brain 負責開發。^[1]Imagen主要用於從文字提示生成圖像，類似於Stability AI的Stable Diffusion、OpenAI的DALL-E或Midjourney。

該模型的初代版本最早於2022年5月的一篇論文中介紹。^[2]它能產生高品質圖像，目前任何擁有Google帳號的使用者皆可透過Gemini、ImageFX和Vertex AI等服務使用此工具^[3]

歷史[編輯]

Imagen的初代版本於2022年5月首次在一篇論文中發表，具備從自然語言生成高保真圖像的能力。^[2]第二代Imagen 2發佈於2023年12月，^[4]其突出特點是能生成文字與標誌圖像。^[5] Imagen 3則於2024年8月推出，^[6]Google表示此新版在細節與光影呈現方面有明顯提升。^[7]

技術[編輯]

Imagen 運用了兩項關鍵技術：

第一，採用了基於Transformer架構的大型語言模型，尤其是T5，用以理解文字並對其進行編碼，供圖像生成之用；

第二，使用階層式擴散模型進行高保真圖像生成。其生成流程分為三個階段：先產生64x64的基礎圖像，接着依序升級至256x256與1024x1024。^[2]

功能[編輯]

Imagen可依文字提示生成寫實風格圖像。^[3]它也支援多種風格，包括電影感、35毫米膠片風、插畫風和超現實風。該模型可輸出五種畫面比例：9:16、3:4、1:1、4:3、16:9。此外，Imagen還可透過修改文字提示來編輯已生成的圖像。^[7]

參見[編輯]

參考資料[編輯]

^ Roth, Emma; Peters, Jay. Google's big AI push will combine Brain and DeepMind into one team. The Verge. April 20, 2023 [March 18, 2025]. （原始內容存檔於April 20, 2023）.
^ ^2.0 ^2.1 ^2.2 Saharia, Chitwan; Chan, William; Saxena, Saurabh; Li, Lala; Whang, Jay; Denton, Emily; Seyed Kamyar Seyed Ghasemipour; Burcu Karagol Ayan; Sara Mahdavi, S.; Rapha Gontijo Lopes; Salimans, Tim; Ho, Jonathan; David J Fleet; Norouzi, Mohammad. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. 2022. arXiv:2205.11487 可免費查閱 [cs.CV].
^ ^3.0 ^3.1 Peterson, Jake. Anyone With a Google Account Can Try Google's Latest AI Image Generator Right Now. Lifehacker. 2024-08-16 [2025-03-18] （English）.
^ Imagen 2 - our most advanced text-to-image technology. Google DeepMind. 2025-03-12 [2025-03-18] （English）.
^ Wiggers, Kyle. Google debuts Imagen 2 with text and logo generation. TechCrunch. 2023-12-13 [2025-03-18] （en-US）.
^ Schoon, Ben. Google opens access to Imagen 3, its latest model for AI image generation. 9to5Google. 2024-08-16 [2025-03-18]. （原始內容存檔於2024-08-18）（en-US）.
^ ^7.0 ^7.1 Christian Rowlands. Some of the most realistic AI images you'll see were created with this free tool. TechRadar. 2025-02-26 [2025-03-18] （English）.

外部連結[編輯]

[1] Roth, Emma; Peters, Jay. Google's big AI push will combine Brain and DeepMind into one team. The Verge. April 20, 2023 [March 18, 2025]. （原始內容存檔於April 20, 2023）.

[:0-2] 2.0 ^2.1 ^2.2 Saharia, Chitwan; Chan, William; Saxena, Saurabh; Li, Lala; Whang, Jay; Denton, Emily; Seyed Kamyar Seyed Ghasemipour; Burcu Karagol Ayan; Sara Mahdavi, S.; Rapha Gontijo Lopes; Salimans, Tim; Ho, Jonathan; David J Fleet; Norouzi, Mohammad. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. 2022. arXiv:2205.11487 可免費查閱 [cs.CV].

[:2-3] 3.0 ^3.1 Peterson, Jake. Anyone With a Google Account Can Try Google's Latest AI Image Generator Right Now. Lifehacker. 2024-08-16 [2025-03-18] （English）.

[4] Imagen 2 - our most advanced text-to-image technology. Google DeepMind. 2025-03-12 [2025-03-18] （English）.

[5] Wiggers, Kyle. Google debuts Imagen 2 with text and logo generation. TechCrunch. 2023-12-13 [2025-03-18] （en-US）.

[6] Schoon, Ben. Google opens access to Imagen 3, its latest model for AI image generation. 9to5Google. 2024-08-16 [2025-03-18]. （原始內容存檔於2024-08-18）（en-US）.

[:1-7] 7.0 ^7.1 Christian Rowlands. Some of the most realistic AI images you'll see were created with this free tool. TechRadar. 2025-02-26 [2025-03-18] （English）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Imagen

目次

歷史[編輯]

技術[編輯]

功能[編輯]

參見[編輯]

參考資料[編輯]

外部連結[編輯]

導覽菜單

Imagen

歷史[編輯]

技術[編輯]

功能[編輯]

參見[編輯]

參考資料[編輯]

外部連結[編輯]

導覽菜單

搜尋