Temporal

Analyst Briefing Submitted

temporal.io

Founded Year

2019

Stage

Series C - II | Alive

Total Raised

$342.75M

Valuation

$0000

Last Raised

$146M | 3 mos ago

Mosaic Score
The Mosaic Score is an algorithm that measures the overall financial health and market potential of private companies.

+225 points in the past 30 days

About Temporal

Temporal develops an execution platform within the software development and cloud computing sectors. The company's main offering is a system that simplifies the creation of applications by abstracting the complexity of distributed systems and providing tools for managing workflow execution and visibility. Temporal's solutions focus on application reliability and scalability, enabling developers to build and maintain software. It was founded in 2019 and is based in Bellevue, Washington.

Headquarters Location

2337 148th Avenue North East Suite 1335

Bellevue, Washington, 98007,

United States

ESPs containing Temporal

The ESP matrix leverages data and analyst insight to identify and rank leading companies in a given technology landscape.

Java frameworks

Enterprise Tech / Development

Java frameworks provide developers with pre-built, reusable components that can be easily integrated into their applications, reducing the amount of time and effort needed to build complex software systems. With the availability of a wide range of Java frameworks in the market, developers can choose the ones that best suit their needs and preferences, thereby increasing productivity and efficiency…

Temporal named as Highflier among 9 other companies, including DataStax, The Apache Software Foundation, and Akka.

Temporal's Products & Differentiators

Temporal

Open source developer platform that helps developers ensure the durable execution of their code. Failures happen. Temporal makes them irrelevant. Build applications that never lose state, even when everything else fails.

Expert Collections containing Temporal

Expert Collections are analyst-curated lists that highlight the companies you need to know in the most important technology spaces.

Temporal is included in 1 Expert Collection, including Unicorns- Billion Dollar Startups.

Unicorns- Billion Dollar Startups

1,276 items

Latest Temporal News

Reliability for unreliable LLMs – Stack Overflow

Jul 1, 2025

Read Time: min As generative AI technologies become more integrated into our software products and workflows, those products and workflows start to look more and more like the LLMs themselves. They become less reliable, less deterministic, and occasionally wrong. LLMs are fundamentally non-deterministic, which means you’ll get a different response for the same input. If you’re using reasoning models and AI agents, then those errors can compound when earlier mistakes are used in later steps. “Ultimately, any kind of probabilistic model is sometimes going to be wrong,” said Dan Lines, COO of LinearB. “These kinds of inconsistencies that are drawn from the absence of a well-structured world model are always going to be present at the core of a lot of the systems that we’re working with and systems that we’re reasoning about.” The non-determinism of these systems is a feature of LLMs, not a bug. We want them to be “dream machines,” to invent new and surprising things. By nature, they are inconsistent—if you drop the same prompt ten times, you’ll get ten responses, all of them given with a surety and confidence that can only come from statistics. When those new things are factually wrong, then you’ve got a bug. With the way that most LLMs work, it’s very difficult to understand why the LLM got it wrong and sort it out. In the world of enterprise-ready software, this is what is known as a big no-no. You (and the customers paying you money) need reliable results. You need to gracefully handle failures without double-charging credit cards or providing conflicting results. You need to provide auditable execution trails and understand why something failed so it doesn’t happen again in a more expensive environment. “It becomes very hard to predict the behavior,” said Daniel Loreto, Jetify CEO. “You need certain tools and processes to really ensure that those systems behave the way you want to.” This article will go into some of the processes and technologies that may inject a little bit of determinism into GenAI workflows. The quotes here are from conversations we’ve had on the Stack Overflow Podcast; check out the full episodes linked for more information on the topics covered here. Enterprise applications succeed and fail on the trust they build. For most processes, this trust rests on authorized access, high availability, and idempotency. For GenAI processes, there’s another wrinkle: accuracy. “A lot of the real success stories that I hear about are apps that have relatively little downside if it goes down for a couple of minutes or there’s a minor security breach or something like that,” Sonar CEO Tariq Shaukat said. “I think JP Morgan AI’s team just put out some research on the importance of hallucinations in banking code, and I think it’s probably obvious to say that it’s a much bigger deal in banking code than it would be in my kid’s web app.” The typical response to hallucinations is to ground responses in factual information, usually through retrieval-augmented generation. But even RAG systems can be prone to hallucinations. “Even when you ground LLMs, 1 out of every 20 tokens coming out might be completely wrong, completely off topic, or not true,” said Amr Awadallah, CEO of GenAI platform Vectara. “Gemini 2.0 from Google broke new benchmarks and they’re around 0.8%, 0.9% hallucinations, which is amazing. But I think we’re going to be saturating around 0.5%. I don’t think we’ll be able to beat 0.5%. There are many, many fields where that 0.5% is not acceptable.” You’ll need additional guardrails on prompts and responses. Because these LLMs can accept any text prompt, they could respond with anything within their training data. When the training data includes vast swaths of the open internet, those models can say some wild stuff. You can try fine-tuning toxic responses out or removing personally identifiable information (PII) on responses, but eventually, someone is going to throw you a curve ball. “You want to protect the model from behaviors like jailbreaking,” said Maryam Ashoori, Head of Product, watsonx.ai, at IBM. “Before the data is passed to the LLM, make sure that you put guardrails in place in terms of input. We do the same thing on the output. Hate, abusive language, and profanity is filtered. PII is all filtered. Jailbreak is filtered. But you don’t wanna just filter everything, right? If you filter everything potentially, there’s nothing left to come out of the model.” Filtering on the prompt side is defense; filtering on the output side is preventing accidents. The prompt might not be malicious, but the data could be harmful anyway. “On the way back from the LLM, you’re looking at doing data filtering, data loss prevention, data masking controls,” said Keith Babo, Head of Product at Solo.io. “If I say to the LLM, ‘What are three fun facts about Ben?’ it could respond with one of those facts as your Social Security number because it’s trying to be helpful. So I’m not deliberately trying to phish for your Social Security number but it could just be out there.” With the introduction of agents, it gets worse. Agents can use tools, so if an agent hallucinates and uses a tool, it could take real actions that affect you. “We have all heard these stories of agents getting out of control and starting to do things that they were not supposed to do,” Christophe Coenraets, SVP of developer relations at Salesforce. “Guardrails make sure that the agent stays on track and define the parameters of what an agent can do. It can be as basic as, initially, ‘Answer that type of questions, but not those.’ That’s very basic, but you can go really deep in providing these guardrails.” Agents, in a way, show how to make LLMs less non-deterministic: don’t have them do everything. Give them access to a tool—an API or SMTP server, for example—and let them use it. “How do you make the agents extremely reliable?” asked Jeu George, CEO of Orkes. “There are pieces that are extremely deterministic. Sending an email, sending a notification, right? There are things which LLMs are extremely good at. It gives the ability to pick and choose what you want to use.” But eventually something is going to get past you. Hopefully, it happens in testing. Either way, you’ll need to see what went wrong. The ability to observe it, if you will. On the podcast, we’ve talked a lot about observability and monitoring, but that’s dealt with the stuff of traditional computing: logs, metrics, stack traces, etc. You drop a breakpoint or a println statement and, with aggregation and sampling, can get a view of the way your system works (or doesn’t). In an LLM, it’s a little more obtuse. “I was poking on that and I was like, ‘Explain this to me,’” said Alembic CTO Abby Kearns. “I’m so used to having all of the tools at my disposal to do things like CI/CD and automation. It’s just baffling to me that we’re having to reinvent a lot of that tooling in real time for a machine workload.” Outside the standard software metrics, it can be difficult to get metrics that show equivalent performance in real time. You can get aggregate values for things like hallucination rates, factual consistency, bias, and toxicity/inappropriate content. You can find leaderboards for many of these metrics over on Hugging Face. Most of these evaluate on multiple and holistic benchmarks, but there are specialized leaderboards for things you don’t want to rank highly on: hallucinations and toxicity. These metrics don’t really do anything for you in live situations. You’re still relying on probabilities to keep your GenAI applications from saying something embarrassing or legally actionable. Here’s where the LLM version of logging comes into play. “You need a system of record where you can see—for any session—exactly what the end user typed, exactly what was the prompt that your system internally created, exactly what did the LLM respond to that prompt, and so on for each step of the system or the workflow so that you can get in the habit of really looking at the data that is flowing and the steps that are being taken,” said Loreto. You can also use other LLMs to evaluate outputs to generate the metrics above—an “LLM-as-judge” approach. It’s how one of the most popular leaderboards works. It may feel a little like a student correcting their own tests, but by using multiple different models, you can ensure more reliable outputs. “If you put a smart human individual, lock them away in a room with some books, they’re not going to think their way to higher levels of intelligence,” said Mark Doble, CEO of Alexi. “Put five people in a room, they’re debating, discussing ideas, correcting each other. Now let’s make this a thousand—ten thousand. Regardless of the fixed constraint of the amount of data they have access to, it’s very plausible that they might get to levels of higher intelligence. I think that’s exactly what’s happening right now with multiple agents interacting.” Agents and chain-of-thought models can make the internal workings of LLMs more visible, but the errors from hallucinations and other mistakes can compound. While there are some advances into LLM mind reading—Anthropic published research on the topic—the process is still opaque. While not every GenAI process can peer into the mind of an LLM, there are ways to make that thought process more visible in outputs. “One approach that we were talking about was chain of reasoning,” said Ashoori. “Break a prompt down to smaller pieces and solve them. Now when we break it down step-by-step, you can think of a node at each step, so we can use LLMs as a judge to evaluate the efficiency of each node.” Fundamentally, though, LLM observability is nowhere near as mature as its umbrella domain. What the chain-of-thought method essentially does is improve LLM logging. But there are lots of factors that affect the output response in ways that are not well understood. “There’s still questions around tokenization, how that impacts your output,” said Raj Patel, AI transformation lead at Holistic AI. “There is properly understanding the attention mechanism. Interpretability of outcomes has a big question mark over it. At the moment, a lot of resources are being put into output testing. As long as you’re comfortable with the output, are you okay with putting that into production?” One of the most fun parts of GenAI is that you can get infinite little surprises; you press a button and a new poem about development velocity in the style of T.S. Eliot emerges. When this is what you want, it sparks delight. When it isn’t, there is much gnashing of teeth and huddles with the leadership team. Most enterprise software depends on getting things done reliably, so the more determinism you can add to an AI workflow, the better. GenAI workflows increasingly lean on APIs and external services, which themselves can be unreliable. When a workflow fails midway, that can mean rerunning prompts and getting entirely different responses for that workflow. “We’ve always had a cost to downtime, right?” said Jeremy Edberg, CEO of DBOS. “Now, though, it’s getting much more important because AI is non-deterministic. It’s inherently unreliable because you can’t get the same answer twice. Sometimes you don’t get an answer or it cuts off in the middle—there’s lots of things that can go wrong with the AI itself. With the AI pipelines, we need to clean a ton of data and get it in there.” Failures within these workflows can be more costly than failures within standard service-oriented architectures. GenAI API calls can cost money per token sent and received, so a failure costs money. Agents and chain-of-thought processes can put web data for inference-time processing. A failure here would pay the fee but lose the product. “One of the biggest pain points is that those LLMs could be unstable,” said Qian Li, cofounder at DBOS. “They can return failures, but also they’ll rate limit you. LLMs are expensive, and most of the APIs will say, don’t call me more than five times per minute or so.” You can use durable execution technologies to save progress in any workflow. As Qian Li said, “It’s checkpointing your application.” When your Gen AI application or agent processes a prompt, inferences data, or calls tools, durable execution tools store the result. “If a call completes and is recorded, it will never will repeat that call,” said Maxim Fateev, Cofounder and CTO of Temporal. “It doesn’t matter if it’s AI or whatever.” How it works is similar to autosave in video games. “We use the database to store your execution state so that it also combines with idempotency,” said Li. “Every time we start a workflow, we store a database record saying this workflow has started. And then before executing each step, we check if this step has executed before from the database. And then if it has executed before, we’ll skip the step and then just use the recorded output. By looking up the database and checkpointing your state to the database, we’ll be able to guarantee anything called exactly once, or at least once plus idempotency is exactly once.” Another way to make GenAI workflows more deterministic is to not use LLMs for everything. With LLMs being the new hotness, some folks may be using them in places where it doesn’t make sense. One of the reasons everyone is getting onboard the agent train is that it explicitly enables non-deterministic tool use as part of a GenAI-powered feature. “When people build agents, there are pieces that are extremely deterministic, right?” said George. “Sending an email, sending a notification, that’s part of the whole agent flow. You don’t need to ask an agent to do this if you already have an API for that.” In a world where everyone is building GenAI into their software, you can adapt some standard processes to make the non-determinism of LLMs a little more reliable: sanitize your inputs and outputs, observe as much of the process as possible, and ensure your processes run once and only once. GenAI systems can be incredibly powerful, but they introduce a lot of complexity and a lot of risk. For personal programs, this non-determinism can be overlooked. For enterprise software that organizations pay a lot of money for, not so much. In the end, how well your software does the thing you claim it does is the crux of your reputation. When prospective buyers are comparing products with similar features, reputation is the tie breaker. “Trust is key,” said Patel. “I think trust takes years to build, seconds to break, and then a fair bit to recover.”

Jun 30, 2025

Building a culture that will drive platform engineering success

Jun 4, 2025

Seattle doesn't have many unicorns. Does it matter?

May 28, 2025

Total Economic Impact™ Study Finds 201% ROI for Companies Using Temporal Cloud

May 28, 2025

Total Economic Impact™ Study Finds 201% ROI for Companies Using Temporal Cloud

Temporal Frequently Asked Questions (FAQ)

When was Temporal founded?
Temporal was founded in 2019.
Where is Temporal's headquarters?
Temporal's headquarters is located at 2337 148th Avenue North East, Bellevue.
What is Temporal's latest funding round?
Temporal's latest funding round is Series C - II.
How much did Temporal raise?
Temporal raised a total of $342.75M.
Who are the investors of Temporal?
Investors of Temporal include Amplify Partners, Sequoia Capital, Index Ventures, Hanwha Group, Stepstone Group and 9 more.
Who are Temporal's competitors?
Competitors of Temporal include Diagrid and 3 more.
What products does Temporal offer?
Temporal's products include Temporal and 1 more.

Compare Temporal to Competitors

Trek10

Trek10 is an AWS Premier Tier Services Partner that focuses on cloud-native development and managed AWS services. The company provides various services such as serverless architecture, IoT solutions, DevOps transformation, cloud migration, data analytics, and enterprise architecture. Trek10 serves sectors that need cloud infrastructure and support, including the public sector and retail industry. It is based in South Bend, Indiana.

Dokku

Dokku is a compact platform as a service (PaaS) solution that offers an alternative to Heroku. It enables users to build, deploy, and manage application lifecycles using Heroku buildpacks and isolated containers. Dokku is extensible and customizable with plugins, allowing for additional features and personalization. It was founded in 2013 and is based in Austin, Texas.

Hookdeck

Hookdeck provides an event gateway for event-driven applications within the technology sector. Their platform allows developers to receive, process, and deliver asynchronous messages, facilitating communication between various services. Hookdeck's offerings include infrastructure for webhook management, message routing and transformation, and tools for building, deploying, and monitoring event-driven applications. It was founded in 2020 and is based in Montreal, Canada.

Platform9

Platform9 provides private cloud solutions within the cloud computing industry. It offers software that allows existing infrastructure to function as a private cloud, and includes a community edition and a VMware migration tool. It serves sectors that utilize cloud infrastructure management and virtualization, including the enterprise and Information technology (IT) services industries. It was founded in 2013 and is based in San Jose, California.

Getup

Getup specializes in cloud infrastructure and cloud native environments, focusing on Kubernetes support and optimization. The company offers services including Kubernetes support, security practices, monitoring, observability, cloud cost optimization, and training, aimed at improving production environments. Getup's expertise includes DevOps, DevSecOps, and Infrastructure as Code (IaC), providing solutions for cloud infrastructure management. It was founded in 2013 and is based in Sao Paulo, Brazil.

Keptn

Keptn specializes in cloud-native application lifecycle orchestration, focusing on the automation of deployment and operations within the Kubernetes ecosystem. The company offers solutions for enhancing Kubernetes monitoring, streamlining metrics ingestion, and automating deployment validation without the need for multiple plugins. Keptn primarily serves sectors that require robust deployment observability and automation, such as the cloud computing industry. It was founded in 2019 and is based in Linz, Austria.

CBI websites generally use certain cookies to enable better interactions with our sites and services. Use of these cookies, which may be stored on your device, permits us to improve and customize your experience. You can read more about your cookie choices at our privacy policy here. By continuing to use this site you are consenting to these choices.

How VCs Use CB Insights

Professional Services

Platform Overview

Temporal

Founded Year

Stage

Total Raised

Valuation

Last Raised

Mosaic Score
The Mosaic Score is an algorithm that measures the overall financial health and market potential of private companies.

About Temporal

Headquarters Location

ESPs containing Temporal

Temporal's Products & Differentiators

Expert Collections containing Temporal

Unicorns- Billion Dollar Startups

Latest Temporal News

Temporal Frequently Asked Questions (FAQ)

Compare Temporal to Competitors

How VCs Use CB Insights

Professional Services

Platform Overview

Founded Year

Stage

Total Raised

Valuation

Last Raised

Mosaic Score The Mosaic Score is an algorithm that measures the overall financial health and market potential of private companies.

About Temporal

Headquarters Location

ESPs containing Temporal

Temporal's Products & Differentiators

Expert Collections containing Temporal

Unicorns- Billion Dollar Startups

Latest Temporal News

Temporal Frequently Asked Questions (FAQ)

Compare Temporal to Competitors

Mosaic Score
The Mosaic Score is an algorithm that measures the overall financial health and market potential of private companies.