editione1.0.2
Updated November 2, 2022Youβre reading an excerpt of Making Things Think: How AI and Deep Learning Power the Products We Use, by Giuliano Giacaglia. Purchase the book to support the author and the ad-free Holloway reading experience. You get instant digital access, plus future updates.
This chapter reflects recent developments and was last updated in October of 2022.
This landscape of the top artificial intelligence teams aims to track the most prominent teams developing products and tools in each of several areas. Tracking these teams gives a good starting point of the activity of where future development will be.
2022 has seen remarkable tools being developed by top teams, including some, especially DALL-E 2, that are so impressive they have gone viral. This builds on high-profile tools released to the public in the past two years, including GPT-3 in 2020 and GitHub CoPilot (based on GPT-3) in June 2021, which now enjoys widespread use by almost 2 million developers.
Loading chartβ¦
Loading chartβ¦
In 2022, we continue to see the growing size of neural networks, even though there hasnβt been a new development in neural networks as game-changing as Transformers in 2017. In 2021, Microsoft released a 135 billion parameter neural network model, and at the end of 2021, Nvidia together with Microsoft released an even larger model, called Megatron-Turing NLG, with 530 billion parameters. There is no reason to believe that the growth will stop any time soon. We havenβt seen a model of headline-grabbing size as of July 2022, but that could change by the end of the year.
Credit: Generated with DALL-E 2.
2022 has seen remarkable tools being developed by top teams. In April 2022, DALL-E 2, an AI system that can create realistic images and art from a description in natural language, was released. It took the world by storm. This builds on high-profile tools released to the public in the past two years, including GPT-3 in 2020 and GitHub CoPilot (based on GPT-3) in June 2021, which now enjoys widespread use by almost 2 million developers.
Moreover, these tools have been adopted faster and faster. It took around 2 years for GPT-3 to gather 1 million signups. GitHub CoPilot took around 6 months, and DALL-E 2 only 2.5 months to hit the same milestone.
The capabilities of AI systems are improving steadily and predictably. For example, it is not a surprise to anyone following the industry to see DALL-E 2 come to lifeβit was a natural evolution of generative capabilities for images, the pieces had been built, and the quality of image generation has been improving steadily.
A rapidly expanding number of companies have formed to help machine learning engineers. In the landscape of top artificial intelligence companies, there are more than 20 companies serving developers. That reflects growth in each area. As these tools mature, itβs likely we will see more consolidation in machine learning developer tooling.
One of the most important companies in the latest round of innovation is Hugging Face. Originally started in 2016 by two French entrepreneurs, ClΓ©ment Delangue and Julien Chaumond, Hugging Face provides the tools that engineers need to create new natural language processing (NLP) services. It serves research labs such as Meta AI and Google AI and companies like Grammarly and Microsoft. Just as we are seeing Hugging Face transform NLP, we can expect to see companies emerge on top of newer image generation tools. Four years ago, NLP was in a similar state to image generation tools right now.
Hugging Face is also leading the BigScience Research Workshop, a one-year long research workshop on large multilingual models and datasets. During one year, 900 researchers from 60 countries and more than 250 institutions created a very large multilingual neural network language model and text dataset on the 28 petaflop Jean Zay (IDRIS) supercomputer located near Paris, France. It is all open source. The model finished training in June 2022, and it will be available to the public. This is the first massive project of its kind where the model is openly available to the public.
Some of the latest models are being offered as a service to other companies. An important example of what is coming is what Replit is doing by using OpenAIβs APIs: rolling out a tool that explains code with natural language and a tool to help fix buggy code before it is deployed. Similarly, Hugging Face hosts the state-of-the-art models that other companies and research labs can use.
This reflects a transition from developer APIs that require developers to build their own models to ready-to-use models, which will unleash capabilities rapidly, as powerful models with APIs are integrated into many productsββdevelop once, deploy everywhere.β The faster adoption will be both by end users and by companies. As these teams sell their tools to companies, they will have a bigger base to sell newer tools. If the past cycle of ML tools product development was defined by software-as-a-service, this cycle is seeing the emergence of models-as-a-service.
There has been an explosion of generative media companies addressing a range of applications, from writing text to creating personalized videos. One of the most popular new tools is Lex, which helps writers be more productive. There is an explosion of tools, especially helping consumers write better marketing copy. There are at least nine companies tackling marketing copy: Anyword, Copysmith, Writesonic, Hypotenuse AI, Jasper, Copy.ai, Peppertype, Regie.ai, and Contenda. Generative media is currently making rapid progress on text and images, but as the algorithms improve, video will become a bigger focus. There are a few players in the ecosystem helping create generative media, including big companies, startups, artists, and chip makers. It is not clear which group will capture most of the value of these models. Generative media also enables a new kind of artist to emerge. βPrompt engineering,β which is the technique for which these artists use to manipulate the prompts that produce images, audio, and videos, allows artists to become more productive and unlock their creative minds. However, there is now an emerging legal debate around the copyright for AI-generated media.
Image editing is used everywhere in the creator economy, and better editing capabilities are in constant demand. Facet.ai exemplifies this trend: users segment their images with a one-click tool, apply style-transfer from other images, and also easily apply the same style to other images.
Lightricks is another company focused on mobile first. One of its apps, Facetune, helps users edit and modify their selfies with simple gestures. Another company building editing tools is toolbox is Topaz Labs, which helps upscale or denoise images or videos.
Audio processing and creation has seen massive improvements in the past two years or so. AI assistants like Siri, Alexa and Google Assistant, made the average consumer aware and used to the fact that you can interact with AI through speech.
We are now seeing the creation of new tools that help people communicate and edit audio almost as easily as text. The leader in the field is Descript. It helps users remove unwanted audio through its text editing tool. It also allows users to create new clips of audio with the userβs own voice with its text-to-speech models. Some companies like Krisp are helping users remove unwanted background noise.
Intelligent video editing requires significantly greater compute capabilities compared to image and audio editing, so is not as advanced yet.
Tools like RunwayML help editors modify videos by adding new styles or segmenting them automatically. It can remove subjects with a simple brush stroke. Going further, Synthesia can create fully synthetic tutorial videos out of text. We are seeing only the very beginning of these capabilities. We could well begin to see AI-generated or AI-enhanced influencers in a few years. (Do any old TV fans remember Max Headroom?)
Self-driving companies were heavily hyped about five years ago, when it seemed like every other week another company was raising yet more venture funding. Then there was significant consolidation in the market as some failed and others were sold. Now, 2022 is the year that a lot of the successful efforts are deploying their system to the general public. For example, Cruise is now being paid for their rides in San Francisco. Comma.ai is profitable and sells a device that connects to cars to help them navigate on highways.
But the most prominent success in self-driving technology is Tesla. It is selling almost one million cars per year and is increasing its production 50% year-over-year. But included with its cars is self-driving car software. Tesla releases its safety numbers. It has consistently increased safety for drivers, though with broader use, both accidents and regulatory scrutiny are growing: 873,000 vehicles now have Tesla Autopilot and these were involved in 273 accidents last year.
In the past year Tesla displayed a move towards usage of neural networks in every area of its stack: from perception to prediction. This year, Elon Musk stated in a TED talk that Tesla self-driving cars will be better than humans by the end of the year. His predictions about self-driving technology have been over-optimistic in the past, but the progress is undeniable.
The leader in self-flying drones is Skydio, which first released its drone with object avoidance in 2018. Now all major drone companies offer it. Self-flying drones are now the standard. Companies are now starting to offer smaller drones. Snapchat has recently launched a new small drone called Pixy to help users take selfies. They are much more lightweight and pack in fewer features compared to full-fledged drones.
The ever-increasing computational demands of bigger and bigger neural network models means more companies are specializing in creating chips for these workloads. The most prominent one is Cerebras, which announced their waffle-sized chip in 2019. The problem is that there is a need for software to translate the neural network code into what is implemented in the hardware. That is a big leg up that Nvidia has compared to competitors.
More companies, including Meta and Tesla, are announcing big clusters of GPUs or AI chips to train their models that have been increasing at an exponential rate. This year, Meta AI announced the AI Research SuperCluster with 16,000 GPUs. This is the fastest AI supercomputer in the world. In 2021, Tesla also announced its supercomputer called Dojo, which when announced had around 5,000 GPUs, and it is increasing the amount of GPUs over time. They are also developing their own chips for training neural networks.
The amount of compute for machine learning and AI tools will only increase over time. ARK Invest predicts that AI-relative compute unit production costs could decline at a 39% annual rate and that software improvements could contribute an additional 37% in cost declines during the next eight years. That means that the GPT-3 model, which cost around 12 million dollars when created, would cost a few hundred dollars in 2030. In short, there is a lot of room for new entrants and competition for hardware to power AI applications.
Predictive coding has stalled in its current form. Yann LeCun has been on a quest to create a model that is better at predictions, beginning with a specific proposal: to predict the next frames of a video. However, this work hasnβt had significant breakthroughs yet.
LeCun is now working on a new architecture with six separate modules to try to break through this wall. The hope is to step up deep learning algorithms in ways that go beyond simply increasing the size of neural networks. The six modules are: the configurator, the perception module, the world model module, the cost module, the actor module, and the short-term memory module. These modules were created based on how the human brain works.
This is a novel idea and it will take researchers some time to figure out if this new approach with multiple modules works.
Beyond the continuous movement toward faster computation and bigger neural nets, there seems to be another significant trend: more tools and models are becoming reusable in open source and as APIs, and more money is flowing into key groups that fund these flexible and reusable tools and models. So more combinations of tools are possible more quickly than ever beforeβa sort of βnetwork effectβ for ML tooling and models. This means we can realistically expect unusually rapid adoption of neural nets in more and more software.
Given the increase of computation and the exciting new models that were released in the past few years, including GPT-3, DALL-E 2, Github CoPilot, and LaMDA (Googleβs powerful new language model for conversational applications) we continue to see rapid advancements in software for text, image, video, and audio understanding and generation. In spite of the viral, intelligent-sounding chat shared by one Google employee, the system is only synthesizing human-like responses to questions and it does not make sense to say that Google LaMDA, or any other neural network model, is sentient.
Coupled with the trend toward models-as-a-service, we are likely to see these features embedded in many more products quickly. While specific concerns like deep fakes and βsentientβ chatbots will continue to grab headlines, it seems much more likely that more and more machine learning-powered features will appear simply as highly useful features embedded in the products we use every day.
The resources here are a small subset of the full set of resources available on the web, selected for their breadth, notability, and depth on specific issues.