DALL-E 3 by OpenAI: guide for the neural network
A brief history of DALL-E 3
In September 2023, OpenAI, the company behind the popular chatbot ChatGPT, introduced a new version of the neural network, DALL-E 3, capable of generating near-photographic quality images by precisely following given instructions. In this article, we will explore its advantages.
The first iteration, DALL-E, was introduced by OpenAI in January 2021. This system, named after the artist Salvador Dalí and the robot WALL-E from the popular animated film, demonstrated the ability to generate unique images from textual descriptions. Based on a variant of the GPT-3 transformer model, DALL-E used 12 billion built-in parameters to showcase realistic and animated images combining elements from different prompts.
In April 2022, OpenAI announced the release of DALL-E 2, a more powerful version. Compared to the original, the quality and detail of the images improved. DALL-E 2 introduced new features such as editing parts of existing images and creating images based on a combination of text and visual inputs.
Access to DALL-E 2 was initially possible through a waiting list. Since few users managed to get it, this version was rarely discussed on the internet. However, in September 2022, the neural network became available to everyone: DALL-E 2 appeared on the OpenAI website and was later integrated into the Bing search engine.
Public access to DALL-E 3 was added to ChatGPT 4 at the end of October 2023.
Features and advantages of DALL-E 3
Request: beauty co-working space WOW Space, painted in watercolor.
Simple Prompting
Prompting is the process of inputting data (prompts) into an artificial intelligence system to elicit or direct a specific response or action. A prompt typically consists of one or two sentences that initiate the text generation by the neural network.
For another popular image generation neural network, Midjourney, prompts must be written in English, following strict syntactic rules, with detailed descriptions and additional parameters. Prompts in DALL-E 3 can contain abstract information.
The algorithm refines all additional details independently. The text AI module integrated into the ChatGPT-4 neural network analyzes your request according to built-in instructions and adapts it to optimize the perception of the image generation algorithm. More on this in the next section.
Integration with other OpenAI products
When users request image creation from ChatGPT-4, it uses an integrated interface to interact with DALL-E 3. The ChatGPT-4 text neural network can formulate prompts or requests based on textual data and send them to DALL-E 3. Upon receiving the prompt from ChatGPT-4, DALL-E 3 processes it and generates the corresponding image. The generated image is then sent back to ChatGPT-4, which, in turn, provides it to the user in response to their request. Users can give feedback or request modifications to the image.
ChatGPT-4 adds a brief description to each image.
To adjust the images, it is not necessary to create a new prompt; simply describe the necessary details in plain language.
High quality and detailed images
DALL-E 3, like its predecessors, is based on the GPT (Generative Pre-trained Transformer) architecture, which contains billions of parameters, enabling the neural network to process and generate highly detailed images.
The machine learning of DALL-E 3 is based on a vast dataset of images and their corresponding textual descriptions. This allows the model to deeply analyze the context of the request and the morphology of natural language, capturing fine details and nuances.
Request: chickens staged a revolt in the backyard
Request: Leo Tolstoy as a mirror of the Russian Revolution. The neural network didn’t forget to generate the Palace Square and the Hermitage – the epicenter of the revolutionary events
You can request the neural network to generate a photographic image. Here, an editor is reprimanding the writer
The neural network still reveals its artificial nature when generating photorealistic images. This is how DALL-E 3 envisions young people photographed with a Leica M11 Rangefinder camera
How does DALL-E 3 generate images?
A guide was posted on Reddit describing the algorithm according to which the neural network processes requests. Here’s a summary of it:
- DALL-E 3 automatically translates texts from any language into English to process the request.
- The neural network cannot generate more than 4 images, even if the user requests more.
- It is prohibited to generate photos of politicians and other famous personalities, as well as real people without their consent. This restriction was imposed following a scandal involving images of the Pope in a Balenciaga puffer jacket and the arrest of Donald Trump, which some global media outlets mistook for real photos.
- If the image contains people, DALL-E 3 creates images based on gender and ethnic diversity, depending on the request.
What cannot be generated
- It is prohibited to generate content in the style of artists whose latest works were created less than 100 years ago. For example, images in the style of Pablo Picasso, Salvador Dalí, and Andy Warhol cannot be generated. Instead, the neural network happily creates images in the style of Vincent Van Gogh, Pieter Bruegel, and Claude Monet.
- Other copyrighted images: illustrations from books, scenes from movies and TV shows. The ban includes using logos; for instance, the neural network refused to create an image with a Microsoft sign in the center of London.
- Pornographic or sexually explicit material.
Images containing scenes of violence, cruelty, or abuse. - Images containing discriminatory or degrading elements, including racism, sexism, and other forms of hate.
At the same time, DALL-E 3 generates variations of fictional logos for company branding. For example, a potato chips logo in the style of Microsoft
DALL-E 3 can refer to the style of artists who lived more than 100 years ago. For example, an image of Moscow painted in the style of Paul Gauguin
The ban applies only to photographic images. We managed to generate an animated Leonardo DiCaprio fishing
- Images containing scenes of violence, cruelty, or abuse.
- Images containing discriminatory or degrading elements, including racism, sexism, and other forms of hate.
- The neural network does not allow the creation of photographic images of real people without their consent, especially in compromising contexts.
- Political images: pictures related to specific officials or events that may be perceived as interference in public processes. For instance, we were unable to generate a propaganda poster with Emmanuel Macron in the style of Soviet avant-garde.
How to Use DALL-E 3
В ChatGPT
- Visit the ChatGPT website and sign up.
- Subscribe to a paid plan. In the lower left corner, click on the user icon and open the «My Plan» section.
- Select «Upgrate to Pro». The subscription costs $20 per month.
Congratulations, you have access to DALL-E 3
- Open a new chat in the top left corner of the screen.
- nter your request in the chat window.
Please note: DALL-E 3 is integrated into the same chat as ChatGPT-4
Through Bing Image Creator
Like its previous versions, DALL-E 3 is integrated into the Bing browser. Thanks to Bing Image Creator, you can use it completely free of charge. To do so, you need to follow a few simple steps.
Start page Bing Image Creator
- Enable VPN
- Create a Microsoft account on the Bing website
- Go to Bing Image Creator and click «Join & Create»
You can «quickly» generate 25 images per day. After that, generation occurs in a queue, but it will save you $20 per month.