Open AI Reliability and Performance
While Open AI's ChatGPT APIs are some of the most powerful and exciting developments in technology in general, and conversational technology like Flow XO specifically, their overwhelming popularity and rapid growth have resulted in inconsistent and unreliable service, at least as of the time of this writing (March 20, 2023). Some requests to ChatGPT may take a second or less, while the exact same request might take 60 seconds or more. One API request may appear to hang, while the very next succeeds quickly. 60 second or longer delays might be OK when you're building a blog writing tool, but when you're trying to use this technology in the context of an automated chat conversation, that is too long. Especially when you can't anticipate when it is going to happen.
While we obviously cannot solve this problem, we have implemented some enhancements to the Open AI integration with Flow XO that we are hoping can mitigate the problem enough to make the Open AI APIs usable within the context of a chatbot. While Open AI will surely figure out how to scale their service to meet demand, while we wait it would be nice to still be able to somewhat reliably use their wonderful tools in our conversational flows.
Based on our tests and experimentation, we have observed that very long wait times on Open AI requests appear to be almost random. Meaning that one request sent at one moment may take a minute to get a result, and another result sent less than a second later takes only a second or two. One way to try to work around this behavior, is to make multiple identical requests to the Open AI API, and take the fastest one. Usually, this will result in a response within 5 seconds or so, rather than 30 and 60 second delays. The downside is that almost every request to the Open AI API eventually succeeds - that means that you are charged by Open AI for each simultaneous request made to try to get the fastest one. So when using this feature, you need to be aware of the cost/performance trade off.
There are two parameters that can control how this feature works:
* Maximum retries
* Timeout in seconds
Maximum Retries controls the maximum number of extra requests will be made to the Open AI for a single task. The default is 0 (we will only make the initial request). Setting the value to three means that for each Open AI task, we will issue up to 3 requests to the Open AI API, trying to get a response as quickly as possible.
Timeout in seconds controls how long to wait in between each request. For example, if you set the timeout to 0, Flow XO would immediately make Maximum Retries requests to the Open AI API for every task, and the fastest result would win. For example, if Maximum Retries is set to 3, and Timeout in seconds is set to zero, the moment the Open API task is started, we will 3 identical API calls. If the first API call takes 22 seconds, the second API call takes 2 seconds, and the final API call takes 9 seconds, your user will get a response in 2 seconds.
If the timeout in seconds is more than 0, then each request will be delayed by the timeout number of seconds. This is useful because most Open AI API calls take more than 1 second to get a result. When you use a timeout of 0, then you may always be paying for more Open AI requests than you need to. As an example, if Maximum Retries is set to 3, and Timeout in seconds is set to 2, then when the Open AI task is first run, we will only make on API call. Two seconds later, if we haven't received any result, we will make another API call. Two seconds after that, the same, until either we receive a result, or Maximum Retries is exhausted.
If you don't see these settings, you may need to create a new Open AI connection. To do that, click the Account button on the editor for your Open AI task, and choose "Configure new service"
Enabling this feature will increase your Open AI API usage and therefore your cost with them. Only use this feature if you need it, and make sure you experiment with the settings to find the right balance between cost and performance.
Please note there are two places these settings can be configured. You can set global defaults when you set up your Open API credentials:
You can also override these defaults in the settings for each task:
There are two reasons for this:
1. Not every Open AI prompt or API call takes the same amount of time. Larger, more complex prompts with more data may naturally take more than 1 or 2 seconds even at their fastest. For these tasks that always take longer, it does not make any sense to issues a bunch of extra API calls to try to make them faster, it won't be possible. So in these cases, you may want to override the individual task to make the Retry timeout in seconds longer than your normal default so that the task has enough time to process normally before issuing (and paying for) extra Flow XO API requests
2. Some places in your workflows are more time sensitive than others. For example, perhaps you are using Intent Detection in the beginning of a flow to route your user to the right sub-flow. If this step takes 30 seconds, your new users will be sure your bot is broken. So in this case, you may want to configure your intent detection task for the maximum possible speed (i.e a higher Maximum retries and a lower Retry timeout). But if you are generating content for an internal tool, you may not care how long it takes. In this case, you may choose to set Max retries to 0, meaning this feature is entirely disabled.
Which is best for your flows, and your users and your specific data will best be determined (like everything else when using Open AI) by experimentation and testing. There is no one size fits all way to configure these settings, and settings that work great now might need to be tweaked later, as Open AI adjusts their capacity and begins to fix their performance problems, or, conversely, if their service slows down due to increased traffic or some other reason, you may need to make your settings more aggressive for your critical flows.
Do be aware that even with the most aggressive settings, Open AI may still take longer than you like to respond. Their service is currently experiencing frequent downtime events, and highly unpredictable behavior, so these tools we have provided are simply band-aids to help you work around this situation as best you can. If you need consistent performance, you may want to look at alternatives, like https://cohere.ai/ or even the Open AI API's available in Azure (https://azure.microsoft.com/en-us/products/cognitive-services/openai-service)
As always, feel free to send any feedback or questions to firstname.lastname@example.org
The Flow XO Support Team