Monitor your endpoints effectively by understanding job states and key metrics.
Understanding job states and metrics is essential for effectively managing your Serverless endpoints. This documentation covers the different states your jobs can be in and the key metrics available to monitor endpoint performance and health.
You can find endpoint metrics in the Metrics tab of the Serverless endpoint details page in the Brightnode web interface.
Requests: Displays the total number of requests received by your endpoint, along with the number of completed, failed, and retried requests.
Execution time: Displays the P70, P90, and P98 execution times for requests on your endpoint. These percentiles help analyze execution time distribution and identify potential performance bottlenecks.
Delay time: Delay time is the duration a request spends waiting in the queue before it is picked up by a worker. Displays the P70, P90, and P98 delay times for requests on your endpoint. These percentiles help assess whether your endpoint is scaling efficiently.
Cold start time: Cold start time measures how long it takes to wake up a worker. This includes the time needed to start the container, load the model into GPU VRAM, and get the worker ready to process a job. Displays the P70, P90, and P98 cold start times for your endpoint.
Cold start count: Displays the number of cold starts your endpoint has during a given period. The fewer, the better, as fewer cold starts mean faster response times.
WebhookRequest responses: Displays the number of webhook requests sent and their corresponding responses, including success and failure counts.
Worker states: Displays the number of workers that are running, idle, throttled, etc. across the selected time interval.