When this was up yesterday I complained that the refusal rate was super high especially on government and military shaped tasks, and that this would only push contractors to use CN-developed open source models for work that could then be compromised.
Today I'm discovering there is a tier of API access with virtually no content moderation available to companies working in that space. I have no idea how to go about requesting that tier of access, but have spoken to 4 different defense contractors in the last day who seem to already be using it.
show comments
qhwudbebd
I hope the images support in the responses API is more competently executed than the mess piling up in the v1/images/generations endpoint.
To pick an example, we have a model parameter and a response_format parameter. The response_format parameter selects whether image data should be returned as a URL (old method) or directly, base64-encoded. The new model only supports base64, whereas the old models default to a URL return, which is fine and understandable.
But the endpoint refuses to accept any value for response_format including b64_json with the new model, so you can't set-and-forget the new behaviour and allow the model to be parameterised without worrying about it. Instead, you have to request the new behaviour with the older models, and not request it (but still get it) with the new one. sigh
tezza
For the curious I generated the same prompt for each of the quality types. ‘Auto’, ‘low’, ‘medium’, ‘high’.
Openai's Playground doesn't expose all the API options.
Mine covers all options, has built in mask creation and cost tracking as well.
film42
I generated 5 images in the playground. One using a text-only prompt and 4 using images from my phone. I spent $0.85 which isn't bad for a fun round of Studio Ghibli portraits for the family group chat, but too expensive to be used in a customer facing product.
show comments
Imnimo
I'm curious what the applications are where people need to generate hundreds or thousands of these images. I like making Ghibli-esque versions of family photos as much as the next person, but I don't need to make them in volume. As far as I can recall, every time I've used image generation, it's been one-off things that I'm happy to do in the ChatGPT UI.
show comments
PeterStuer
My number one ask as am almost 2 year OpenAI in production user: Enable Tool Use in the API so I can evaluate OpenAI models in agentic environments without jumping through hoops.
show comments
minimaxir
Pricing-wise, this API is going to be hard to justify the value unless you really can get value out of providing references. A generated `medium` 1024x1024 is $0.04/image, which is in the same cost class as Imagen 3 and Flux 1.1 Pro. Testing from their new playground (https://platform.openai.com/playground/images), the medium images are indeed lower quality than either of of two competitor models and still takes 15+ seconds to generate: https://x.com/minimaxir/status/1915114021466017830
Prompting the model is also substantially more different and difficult than traditional models, unsurprisingly given the way the model works. The traditional image tricks don't work out-of-the-box and I'm struggling to get something that works without significant prompt augmentation (which is what I suspect was used for the ChatGPT image generations)
show comments
badmonster
Usage of gpt-image-1 is priced per token, with separate pricing for text and image tokens:
Text input tokens (prompt text): $5 per 1M tokens
Image input tokens (input images): $10 per 1M tokens
Image output tokens (generated images): $40 per 1M tokens
In practice, this translates to roughly $0.02, $0.07, and $0.19 per generated image for low, medium, and high-quality square images, respectively.
that's a bit pricy for a startup.
show comments
jumploops
This new model is autoregression-based (similar to LLMs, token by token) rather than diffusion based, meaning that it adheres to text prompts with much higher accuracy.
As an example, some users (myself included) of a generative image app were trying to make a picture of person in the pouch of a kangaroo.
No matter what we prompted, we couldn’t get it to work.
GPT-4o did it in one shot!
show comments
_pdp_
We have integrated it into our platform and we already have use-cases for it to help create ads and other marketing material.
However, while being better than my other models, it is not perfect. The image edit api will make a similar looking picture (even with masking) but exactly the same with some modifications.
gervwyk
Great svg generation would be far more userful! For example, being able to edit svg images after generated by Ai would be quick to modify the last mile.. For our new website https://resonancy.io the simple svg workflow images created was still very much created by hand.. and trying various ai tools to make such images yielded shockingly bad off-brand results even when provided multiple examples. By far the best tool for this is still canva for us..
Anyone know of an Ai model for generating svg images? Please share.
show comments
hnthrowaway0315
I wonder which model is the best to output standard 2d game resources:
- N by N sprite sheets
- Isometric sprite sheets
Basically anything that I can directly drop into my little game engine.
show comments
sebastiennight
Hmm seems pricey.
What's the current state of the art for API generation of an image from a reference plus modifier prompt?
Say, in the 1c per HD (1920*1080) image range?
show comments
pknerd
I would like to know some resources about prompt engineering to use the Image gen module by OpenAI, especially for products related to images or Ads.
PS: Does anyone know a good LLM/service to turn images into Videos?
ChaitanyaSai
Almost every image has a yellow tint. Any discussion of why and when that's being fixed?
show comments
claiir
> GoDaddy is actively experimenting to integrate image generation so customers can easily create logos that are editable [..]
I remember meeting someone on Discord 1-2 years ago (?) working on a GoDaddy effort to have customer-generated icons using bespoke foundation image gen models? Suppose that kind of bespoke model at that scale is ripe for replacement by gpt-image-1, given the instruction-following ability / steerability?
greatgib
Any one has an idea of what represent an "image token" for the pricing?
Is it a block of an image from a given fixed size?
verelo
“ Editing videos: invideo enables millions of users to transform their ideas into videos using AI. With the integration of gpt-image-1, the platform now offers improved text generation, fine-grain editing controls, and advanced style guidance.”
Does this mean this also does video in some manner?
MisterBiggs
Lots of comments on the price being too high, what are the odds this is a subsidized bare metal cost?
show comments
jeevships
Genuinely curious, why would someone buy from your gpt image wrapper when they can just create it in gpt themselves?
show comments
scyzoryk_xyz
Intelligence is fast approaching utility status.
jonplackett
Does anyone know if you can give this endpoint an image as input along with text - not just an image to mask, but an image as part of a text input description.
I can’t see a way to do this currently, you just get a prompt.
This, I think, is the most powerful way to use the new image model since it actually understands the input image and can make a new one based on it.
Eg you can give it a person sitting at a desk and it can make one of them standing up. Or from another angle. Or in the moon.
show comments
drakenot
Does the AI have the same content restrictions that the chat service does?
gcrfelix
lesson: never build your moat around optimizing the existing AI capability
smrt
I don't understand why this api needs organization verification. More paperwork ahead. Facepalm
Far too expensive, I think I will wait for an equivalent Gemini model.
1oooqooq
aren't you all embarrassed seeing lame press releases of the most uninteresting things on the top of HN front page? i kinda feel bad.
show comments
animanoir
Wow more AI slop
hexo
Thank you for a great contribution to global warming.
p1dda
For how long can OpenAI beat the dead horse that is LLM
pkulak
I don't get it. I've been using `dall-e-3` over the public API for a couple years now. Is this just a new model?
EDIT: Oh, yes, that's what it appears to be. Is it better? Why would I switch?
show comments
rahulg
Been waiting for this to implement Ghibli, Muppets etc. in my WhatsApp bot that converts your photos into AI generated art. Check it out at https://artstudiobot.com. 80% vibe-coded, 20% engineer friend.
When this was up yesterday I complained that the refusal rate was super high especially on government and military shaped tasks, and that this would only push contractors to use CN-developed open source models for work that could then be compromised.
Today I'm discovering there is a tier of API access with virtually no content moderation available to companies working in that space. I have no idea how to go about requesting that tier of access, but have spoken to 4 different defense contractors in the last day who seem to already be using it.
I hope the images support in the responses API is more competently executed than the mess piling up in the v1/images/generations endpoint.
To pick an example, we have a model parameter and a response_format parameter. The response_format parameter selects whether image data should be returned as a URL (old method) or directly, base64-encoded. The new model only supports base64, whereas the old models default to a URL return, which is fine and understandable.
But the endpoint refuses to accept any value for response_format including b64_json with the new model, so you can't set-and-forget the new behaviour and allow the model to be parameterised without worrying about it. Instead, you have to request the new behaviour with the older models, and not request it (but still get it) with the new one. sigh
For the curious I generated the same prompt for each of the quality types. ‘Auto’, ‘low’, ‘medium’, ‘high’.
Prompt: “a cute dog hugs a cute cat”
https://x.com/terrylurie/status/1915161141489136095
I also then showed a couple of DALL:E 3 images for comparison in a comment
I built a local playground for it if anyone is interested (your openai org needs to be verified btw..)
https://github.com/Alasano/gpt-image-1-playground
Openai's Playground doesn't expose all the API options.
Mine covers all options, has built in mask creation and cost tracking as well.
I generated 5 images in the playground. One using a text-only prompt and 4 using images from my phone. I spent $0.85 which isn't bad for a fun round of Studio Ghibli portraits for the family group chat, but too expensive to be used in a customer facing product.
I'm curious what the applications are where people need to generate hundreds or thousands of these images. I like making Ghibli-esque versions of family photos as much as the next person, but I don't need to make them in volume. As far as I can recall, every time I've used image generation, it's been one-off things that I'm happy to do in the ChatGPT UI.
My number one ask as am almost 2 year OpenAI in production user: Enable Tool Use in the API so I can evaluate OpenAI models in agentic environments without jumping through hoops.
Pricing-wise, this API is going to be hard to justify the value unless you really can get value out of providing references. A generated `medium` 1024x1024 is $0.04/image, which is in the same cost class as Imagen 3 and Flux 1.1 Pro. Testing from their new playground (https://platform.openai.com/playground/images), the medium images are indeed lower quality than either of of two competitor models and still takes 15+ seconds to generate: https://x.com/minimaxir/status/1915114021466017830
Prompting the model is also substantially more different and difficult than traditional models, unsurprisingly given the way the model works. The traditional image tricks don't work out-of-the-box and I'm struggling to get something that works without significant prompt augmentation (which is what I suspect was used for the ChatGPT image generations)
Usage of gpt-image-1 is priced per token, with separate pricing for text and image tokens:
Text input tokens (prompt text): $5 per 1M tokens Image input tokens (input images): $10 per 1M tokens Image output tokens (generated images): $40 per 1M tokens
In practice, this translates to roughly $0.02, $0.07, and $0.19 per generated image for low, medium, and high-quality square images, respectively.
that's a bit pricy for a startup.
This new model is autoregression-based (similar to LLMs, token by token) rather than diffusion based, meaning that it adheres to text prompts with much higher accuracy.
As an example, some users (myself included) of a generative image app were trying to make a picture of person in the pouch of a kangaroo.
No matter what we prompted, we couldn’t get it to work.
GPT-4o did it in one shot!
We have integrated it into our platform and we already have use-cases for it to help create ads and other marketing material.
However, while being better than my other models, it is not perfect. The image edit api will make a similar looking picture (even with masking) but exactly the same with some modifications.
Great svg generation would be far more userful! For example, being able to edit svg images after generated by Ai would be quick to modify the last mile.. For our new website https://resonancy.io the simple svg workflow images created was still very much created by hand.. and trying various ai tools to make such images yielded shockingly bad off-brand results even when provided multiple examples. By far the best tool for this is still canva for us..
Anyone know of an Ai model for generating svg images? Please share.
I wonder which model is the best to output standard 2d game resources:
- N by N sprite sheets
- Isometric sprite sheets
Basically anything that I can directly drop into my little game engine.
Hmm seems pricey.
What's the current state of the art for API generation of an image from a reference plus modifier prompt?
Say, in the 1c per HD (1920*1080) image range?
I would like to know some resources about prompt engineering to use the Image gen module by OpenAI, especially for products related to images or Ads.
PS: Does anyone know a good LLM/service to turn images into Videos?
Almost every image has a yellow tint. Any discussion of why and when that's being fixed?
> GoDaddy is actively experimenting to integrate image generation so customers can easily create logos that are editable [..]
I remember meeting someone on Discord 1-2 years ago (?) working on a GoDaddy effort to have customer-generated icons using bespoke foundation image gen models? Suppose that kind of bespoke model at that scale is ripe for replacement by gpt-image-1, given the instruction-following ability / steerability?
Any one has an idea of what represent an "image token" for the pricing? Is it a block of an image from a given fixed size?
“ Editing videos: invideo enables millions of users to transform their ideas into videos using AI. With the integration of gpt-image-1, the platform now offers improved text generation, fine-grain editing controls, and advanced style guidance.”
Does this mean this also does video in some manner?
Lots of comments on the price being too high, what are the odds this is a subsidized bare metal cost?
Genuinely curious, why would someone buy from your gpt image wrapper when they can just create it in gpt themselves?
Intelligence is fast approaching utility status.
Does anyone know if you can give this endpoint an image as input along with text - not just an image to mask, but an image as part of a text input description.
I can’t see a way to do this currently, you just get a prompt.
This, I think, is the most powerful way to use the new image model since it actually understands the input image and can make a new one based on it.
Eg you can give it a person sitting at a desk and it can make one of them standing up. Or from another angle. Or in the moon.
Does the AI have the same content restrictions that the chat service does?
lesson: never build your moat around optimizing the existing AI capability
I don't understand why this api needs organization verification. More paperwork ahead. Facepalm
PermissionDeniedError: Error code: 403 - {'error': {'message': 'To access gpt-image-1, please complete organization verification
Criminally wasteful.
Far too expensive, I think I will wait for an equivalent Gemini model.
aren't you all embarrassed seeing lame press releases of the most uninteresting things on the top of HN front page? i kinda feel bad.
Wow more AI slop
Thank you for a great contribution to global warming.
For how long can OpenAI beat the dead horse that is LLM
I don't get it. I've been using `dall-e-3` over the public API for a couple years now. Is this just a new model?
EDIT: Oh, yes, that's what it appears to be. Is it better? Why would I switch?
Been waiting for this to implement Ghibli, Muppets etc. in my WhatsApp bot that converts your photos into AI generated art. Check it out at https://artstudiobot.com. 80% vibe-coded, 20% engineer friend.