Back to Signal Feed
CodeTracked since May 21, 2026

Cache openai-python response typing helpers in cognee LLM path

This PR adds memoization for openai-python type-introspection helpers used in Cognee’s OpenAI response pipeline, wrapping `get_origin`, `get_args`, `is_annotated_type`, and `is_literal_type` with `functools.lru_cache` and rebinding both source modules and imported aliases so cached versions are used by response/client call sites; a measured 200-document Cognify run shows CPU time falling from 93.85s to 67.70s (~28%).

openai-pythoncognee.infrastructure.llmfunctools.lru_cacheget_origin

What Happened

  • This PR adds memoization for openai-python type-introspection helpers used in Cognee’s OpenAI response pipeline, wrapping `get_origin`, `get_args`, `is_annotated_type`, and `is_literal_type` with `functools.lru_cache` and rebinding both source modules and imported aliases so cached versions are used by response/client call sites; a measured 200-document Cognify run shows CPU time falling from 93.85s to 67.70s (~28%).
  • This PR adds memoization for openai-python type-introspection helpers used in Cognee’s OpenAI response pipeline, wrapping `get_origin`, `get_args`, `is_annotated_type`, and `is_literal_type` with `functools.lru_cache` and rebinding both source modules and imported aliases so cached versions are used by response/client call sites; a measured 200-document Cognify run shows CPU time falling from 93.85s to 67.70s (~28%).
  • 1 evidence item attached for review.

What is Different

Before

Scattered source updates, isolated context, and manual follow-up across multiple feeds.

Now

Implemented an idempotent cache installation that wraps core openai typing helpers in `lru_cache(maxsize=4096)` and rebinds those helpers in all currently used openai SDK import sites (`_models`, `_response`, `_legacy_response`, `_base_client`) so repeated structured-output processing avoids recomputing identical type checks.

Why Track This

Why It Matters

Operators running Cognee workloads that use OpenAI structured-output handling can process the same amount of work with noticeably less CPU headroom, which can improve queue throughput and reduce the chance of saturation under heavy document processing; in follow-up monitoring, check whether openai-python updates introduce new helper call sites or import paths that are not covered by the current rebind list, since those would fall back to uncached calls and restore extra overhead. The cache is one-time and idempotent, with unhashable arguments safely bypassing caching to preserve correctness.

Impact

Operators running Cognee workloads that use OpenAI structured-output handling can process the same amount of work with noticeably less CPU headroom, which can improve queue throughput and reduce the chance of saturation under heavy document processing; in follow-up monitoring, check whether openai-python updates introduce new helper call sites or import paths that are not covered by the current rebind list, since those would fall back to uncached calls and restore extra overhead. The cache is one-time and idempotent, with unhashable arguments safely bypassing caching to preserve correctness.

What To Watch Next

  • Watch whether openai-python becomes a repeated pattern.
  • Track follow-up changes around Structured Outputs.
  • Compare future signals against this evidence trail.
  • Re-check risk flags: openai_module_layout_change_bypasses_cached_aliases, new_import_aliases_not_rebound.
Open Topic TimelineOpen Technical EventOpen Original Sourceopenai_module_layout_change_bypasses_cached_aliases / new_import_aliases_not_rebound / cache_install_order_before_first_openai_call

Supporting Evidence