Midjourney founder says ‘the world needs more imagination’



OpenAI, the artificial intelligence (AI) company cofounded by Elon Musk, Sam Altman, Ilya Sutskever, Greg Brockman, Wojciech Zaremba and John Schulman, debuted DALL-E 2, an AI tool that can create realistic images and art from a description in natural language in April 2022. DALL-E 2 was recently used to make the first magazine cover generated by AI. The company claims that the technology will empower people to express themselves creatively.


Openai、Elon Musk、Sam Altman、Ilya Sutskever、Greg Brockman、Wojciech Zaremba、John Schulmanによって共同設立されたOpenaiは、自然の説明から現実的な画像と芸術を作成できるAIツールであるDall-E 2をデビューしました。2022年4月の言語。Dall-E 2は最近、AIによって生成された最初の雑誌カバーを作成するために使用されました。同社は、このテクノロジーが人々に創造的に自分自身を表現できるようにすると主張しています。


In April 2022, OpenAI — the artificial intelligence (AI) company cofounded by Elon Musk, Sam Altman, Ilya Sutskever, Greg Brockman, Wojciech Zaremba and John Schulman — debuted DALL-E 2, an AI tool that can create realistic images and art from a description in natural language, like “teddy bears working on new AI research on the moon in the 1980s,” for instance.

2022年4月、Openai – Elon Musk、Sam Altman、Ilya Sutskever、Greg Brockman、Wojciech Zaremba、John Schulmanによって共同設立された人工知能(AI)会社 – は、Dall-E 2をデビューしました。たとえば、「1980年代の月に関する新しいAI研究に取り組んでいるテディベア」など、自然言語の説明。

In an attempt to take a step toward artificial general intelligence (AGI) by rendering it with the sense of sight, OpenAI created an internet sensation. In the company’s words, “DALL-E 2 will empower people to express themselves creatively.”

Openaiは、視覚感覚でそれをレンダリングすることにより、人工的な一般情報(AGI)に向けて一歩踏み出そうとするために、インターネットの感覚を生み出しました。同社の言葉では、「Dall-E 2は人々が創造的に自分自身を表現できるようにします。」

Think of anything as random as “Cookie Monster reacting to his cookie stocks tanking” or “astronaut riding a horse in the style of Andy Warhol” and DALL-E 2 could generate it. 

「彼のクッキーストックタンキングに反応するクッキーモンスター」や「アンディウォーホルのスタイルで馬に乗っている宇宙飛行士」とDall-E 2のようなランダムなものを考えてみてください。

The technology was even recently used to make the first magazine cover generated by AI. 


Although OpenAI just expanded early access to the tool, it inspired the creation of many similar image generator tools, including Google’s Imagen, Meta’s Make-A-Scene, TikTok’s AI green screen and the fun-yet-horrifying DALL-E mini by Boris Dayma. 

Openaiはこのツールへの早期アクセスを拡大したばかりですが、Googleのイメージェン、メタのMake-a-Scene、TiktokのAI Green Screen、Boris DaymaによるFun-Yet-Horrifyed Dall-E Miniなど、多くの同様の画像ジェネレーターツールの作成に影響を与えました。。

As these tech giants battled for AI art supremacy, The Economist featured a new entrant to the game – Midjourney – on its June 2022 issue cover. 


David Holz’s version of this technology, known as Midjourney, quickly rose to prominence, and everyone who got their hands on this text-to-image generator was thoroughly impressed. Most recently, the Colorado State Fair’s annual art competition awarded its blue ribbon to emerging digital artists to Jason M. Allen, who had used Midjourney to create an artwork called “Théâtre D’opéra Spatial.” 

Midjourneyとして知られるこのテクノロジーのDavid Holzのバージョンはすぐに著名になり、このテキストからイメージまでのジェネレーターを手に入れたすべての人は、完全に感銘を受けました。最近では、コロラドステートフェアの年次アートコンペティションは、「ThéâtreD’OpéraSpatial」と呼ばれるアートワークを作成するためにMidjourneyを使用したJason M. Allenに新興のデジタルアーティストに青いリボンを授与しました。

To understand Midjourney, it’s important to look back to 2011 — the year David Holz launched his first AI-based startup, Leap Motion. 

Midjourneyを理解するには、2011年を振り返ることが重要です。デビッドホルツが最初のAIベースのスタートアップであるLeap Motionを開始した年です。

“In many ways, I wasn’t interested in artificial intelligence (AI) because I did not care much for making machines better,” Holz told VentureBeat. “Coming from the IA [intelligence augmentation] school of thought, I’ve always been more interested in empowering people and trying to make people better.” 

「多くの点で、私は機械をより良くすることをあまり気にしなかったので、人工知能(AI)に興味がありませんでした」とホルツはVentureBeatに語った。「IA [Intelligence報告]の考え方から来て、私は常に人々に力を与え、人々をより良くしようとすることにもっと興味を持っていました。」

Like many experts in AI who believe in using machines to perform tasks that humans would consider intelligent or smart —and experts in IA, who place humans at the center of the system and use technology to support and complement human cognitive functions, Holz chose a path that would let him enjoy the best of both worlds. 


“Over the years, I’ve realized that we can use AI to empower people and to make people better and those people can make better AI — it’s like coming full circle and everyone wins,” he said. 


Leap Motion transpired out of this ideology. The company developed an optical hand tracking module that captures the movements of human hands using AI. “The goal wasn’t to replace a sign language person, but it was to allow us to literally be embodied in virtual spaces inside of computers. And now, with Midjourney, we are not trying to replace an artist but are giving them tools to explore new mediums of thought and expand their imaginative powers,” Holz explained. 


In 2021, Holz started Midjourney as an independent research lab. Around the same time, industry buzzwords like ‘diffusion models’ and ‘contrastive language-image pre-training (CLIP)’ were on everyone’s lips. 


Building on these developments, the lab began offering its text-to-image service in 2022. Similar to its counterparts, the AI system accepts a design prompt or idea in the form of a phrase and uses it as inspiration to create captivating images. Midjourney stands out because the AI bot can only be accessed via the voice over Internet protocol, instant messaging social platform, Discord — rather than via its own website or mobile app. 


When a natural language query is issued, the bot responds with four low-resolution images in about 60 seconds. Users can generate variants and new generations at this point to get closer to their desired ideation. Users can change the aspect ratio of the prompt with a maximum resolution of 2048×1280 pixels, much higher than DALL-E 2’s 1024×1024 resolution. 

自然言語クエリが発行されると、ボットは約60秒で4つの低解像度画像で応答します。ユーザーは、この時点でバリアントと新しい世代を生成して、希望するアイデアに近づくことができます。ユーザーは、Dall-E 2の1024×1024解像度よりもはるかに高い2048×1280ピクセルの最大解像度で、プロンプトのアスペクト比を変更できます。

Close-up photographs of discrete objects, pop culture references, charcoal or pencil sketches, paintings in the styles of various renowned artists — Midjourney can do it all. It’s exceptional at creating larger-than-life scenes. 

個別のオブジェクト、ポップカルチャーの参照、木炭または鉛筆のスケッチ、さまざまな有名なアーティストのスタイルの絵画のクローズアップ写真 – Midjourneyはそれをすべて行うことができます。実物よりも大きなシーンを作成するのは例外的です。

As to the competition, Holz said, “I don’t really want to spend too much time comparing ourselves to others. I like to hope that the results speak for themselves. Kind of like how Apple doesn’t spend all their time talking about how Android sucks.” 


Given the grand scale on which Midjourney performs, artists and researchers alike have begun expressing concerns about this technology’s collateral damage. Of the many questions raised, three garnered much attention: 


Holz addressed the three, extensively, below:


No, it cannot. As per Holz, Midjourney is meant to augment our capabilities, not replace us by any means. 


“It’s kind of like the moment humans invented cars. Just because cars can go faster than humans, doesn’t mean we cut our legs off. You are going to use cars to get someplace faster. It’s basically augmenting our speed,” he said. “Similarly, our product involves an iterative, beautiful explorative process, where it becomes an extension of your imagination. And you can wander, explore and figure out what you want on the fly. That’s a positive thing.” 


This is a particularly interesting and controversial question, as Midjourney pulls its training data from the internet. However, Holz claims that the AI engine is designed to only “take inspiration” from the data and ensure that the output is entirely novel, that is, unlike any image that’s publicly available. Oddly enough, Holz claims to have received multiple requests from artists to double down on Midjourney’s ability to take inspiration from their own work as well as others. 


“The number one request from artists is to make Midjourney better at copying, to which I don’t fully know how to respond yet. They’re like, David, ‘let me put all of my art into the system. I want to copy it as well as possible so that it can be part of my artistic flow,’” he explained. “They think that the better they can get at copying their personal art style, the more useful it is. Whereas if it has its own style, they have to kind of meet it halfway and pull their stuff out of it. Which is interesting. It’s a little scary for me because I see how it could be used for good and evil.”


As Midjourney is intended to be open by default, it has strict policies on ensuring that content is PG-13. It automatically blocks text inputs that are inherently disrespectful, aggressive, abusive or sexual, Holz confirmed Most importantly, the rules are enforced for all content, including interactions in private mode. 


Midjourney currently offers a limited “freemium” model that allows users to submit 20-25 prompts for image generation. After that, users can choose from a range of subscription packages — ranging from a basic membership package of 200 images, a standard membership, which includes unlimited images — or a premium corporate membership, which includes both unlimited images and complete privacy. 


It’s important to note that “corporate membership” does not refer to an enterprise software-as-a-service (SaaS) product. In fact, Holz explicitly mentioned that the company has no interest in building one either, even though they have many customers who use the product to make commercial video games, concept art and videos. 


“Our technology is moving so fast that it makes sense to focus on the consumer side because that’s where people can just take things and run. Also, there’s something very simple and beautiful about making a cool thing,” Holz said. “It only gets better when regular people can pay and have fun with it while professionals pay less than they would for an enterprise product, and still enjoy the product and use it for work. I think this simplicity is worth a lot and we want to keep it.” 


While the world believes that the next phase of text-to-image evolution will move towardfull-blown videos or movies, Midjourney begs to differ. In fact, the company might avoid that as much as possible — as incorporating text-to-video capabilities could make the product more expensive and the output could be a dealbreaker, if it’s not thoroughly thought through. 


That said, Holz does plan to take things to the next level via text-to-3D. He detailed Midjourney’s quest to make the output more real and move towards augmented and virtual reality. It aspires to bring the liquid imagination to the real world. 


“I care about three things: Reflection, coordination and imagination. To make a better world, we need to be more reflective, more imaginative and we need to be better at coordinating. And I want to build something big in each area, and then bring them together one day,” he said. 


That aside, the company does intend to build out the existing product with more enhanced features, thereby making the output more realistic and nuanced. 


In addition, Midjourney’s technology uses a combination of its own models and open-source codes to create art. Holz’s near-term goal is to stop using open-source products and create the codes 100% in-house. 


“I feel like there are people in technology who basically act like we have no past, and there’s a lot of people in the world in fear of not having a future. But I feel like the truth is we’re actually very much mid-journey,” Holz said. “We have this beautiful and rich history behind us, and an equally rich wonderful future ahead of us,” Holz concluded on an optimistic note, hinting at AI’s promise of limitless possibilities and the company’s ethos.”