Today I’ll be talking about the current state of generative AI. I’ve been presenting to CEO groups about it all month, and I’ve learned there’s a lot of interest, but people aren’t quite sure how to use it. So I’ll be stepping through the various platforms with recommendations and my two cents.
GenAI came onto the scene in a major way in November of 2022 when Microsoft-backed OpenAI released ChatGPT. By January, it had over 100M users making it the fastest growing application in the history of the internet. It also opened the eyes of the world to the jaw-dropping power of transformer models.
I tend to think of GenAI in three buckets: text, images, and everything else. It is the ability to generate entirely new content based on inputs + training data that classifies a model as GenAI.
Text generation is the first “killer app” for GenAI, in the way that email was for the internet. It demonstrated mainstream usefulness. It can be used to write blog posts, summarize research, ideate on product names, and really anything that can take text as an input and receive it as an output. Hint: text doesn’t just mean spoken languages. Computer code, database contents, math – all expressed in text form.
ChatGPT – OpenAI’s signature product is still the leader right now, but it is an all-out arms race with each new model release beating those that came before it in various benchmarks which are quickly becoming obsolete. If you want to explore its capabilities further, I recommend browsing the “Explore GPTs” section (link in upper-left). These are from individuals and companies that do additional training to focus on more narrow use cases. You should pay the $20/mo. for the Plus subscription as their free plan has major limitations.
Gemini – This is Google’s LLM. A Google researcher actually discovered transformer models in 2015, but they failed to productize that research prior to other launches and got beat to the punch. Google then rushed to release their first version, Bard, which was so bad they needed to rebrand as Gemini once they had something presentable. The saving grace of Gemini at the moment is its integration with G Suite products like Gmail and Docs. Look for the little star icon or where it says “Help me Write”.
Claude – From Anthropic, this is the dark horse to watch out for. It’s competitive with GPT4, and its free plan offers up to 30 messages per day. The Claude free plan is better than the ChatGPT free plan, so if you’re just a light user then I suggest Claude.
Perplexity – This is based on other LLMs like the ones above. They have layered in their own model as well and have positioned themselves as the next generation search engine. Great for answering questions and conducting research. Because it is a little narrower in scope, you can get better responses than with the foundational / broad LLMs in those use cases.
Llama – From Facebook, Llama 3 just dropped last week and it looks very powerful. The most important part of Llama is that it’s open source, allowing developers to use it for all kinds of purposes that the closed source models (all of the above) don’t allow.
I’m also keeping an eye on Mistral (the other major open source LLM), and Grok(xAI).
The primary use case at the moment for diffusion models is image generation. Diffusion models train by adding Gaussian noise to images, and then reversing that process to recreate the images, thus learning how to perform a series of transformations to go from all noise to a realistic image.
Don’t worry – no test on this.
Dall-E – Once again, OpenAI leads the pack in being able to generate high quality images. Dall-E is the OpenAI diffusion model that can generate realistic images based on text or image prompts. One reason to pay for ChatGPT Plus is that you get Dall-E integrated directly into chat. It wasn’t like that too long ago but it’s helpful to not have to jump back and forth between applications. If you used chat to write a blog post, Dall-E can make your header image.
Midjourney – If you want to level-up your image generation skills, then Midjourney would be the next step after Dall-E. Harder to use than Dall-E, but far more power and control for a somewhat skilled user. I also think it tends to be more creative. It’s annoying that you have to use Discord to generate the images (there is no visual UI other than Discord at the time of writing this), and occasionally it will miss key parts of your prompt. But the ability to continue refining images and a more advanced set of features allow a creator to do things you just can’t with Dall-E. There is also a wonderful Midjourney community on Discord where you can learn from others, which is becoming increasingly important as documentation can’t keep up with these new platforms.
Stable Diffusion – If Midjourney is the quirky cousin, then Stable Diffusion is the creepy uncle. This is where the misfits are. But this is also the most powerful tool available. It enables you to give both positive and negative prompts, set all sorts of parameters, and even train and run your own models on top of other foundational Diffusion models. It also has an extensive set of third party extensions that can help with specific goals for the image output. It is not for the faint of heart, requires some technical skill as the application runs locally on your computer, and is most likely to generate a third arm or NSFW content. But if you’re looking to really put some time into image generation, then this needs to be a tool in the belt.
So goes the saying, “there’s an AI for that”. It is simply impossible to keep up with all of the new tools that get released every day. Exploring GPTs and theresanaiforthat.com can be a good way to see what everyone is working on these days. But a few thoughts…
Video – we are not there yet. I haven’t used a tool that can produce a realistic video yet. But that will change in 2024 when OpenAI launches their video tool Sora. Word is that Tyler Perry halted development of a $600M film production studio when he saw a beta demo of Sora. So stay tuned.
Presentations – beautiful.ai is a great tool for building decks.
Graphic Design – check out Canva. Good for general design and decks as well.
Code – I’m watching Devin closely. Although there are rumors that the founders might have exaggerated capabilities in the demo.
There are also tools for notetaking, calendar and inbox management, summaries of anything you could ask for, and so much more.
So there we have it. If you haven’t yet tinkered with this tech, now is the time. And if you’ve already tinkered, perhaps it’s time to level up. Signing off.
MIDJOURNEY PROMPT:
Highly minimalist, single-line drawing where the outline of a tiny robot on the left smoothly and fluidly transitions continuing into a handwritten signature on the right. The signature, prominently spelling out the name ‘Hunter Jensen’, should be clear, legible, and stylish on a white canvas. This drawing should embody simplicity and elegance, capturing the essence of a sophisticated and personal brand with an artistic and graceful feel.
4241 Jutland Dr., Suite 300
San Diego, CA 92117