Skip to main content

The Pattern

Every request should follow this pattern:
  1. Check - Pre-flight check against limits
  2. Request - If allowed, make your request
  3. Record - Log the actual usage after the request

Why This Pattern?

Check Before, Record After

  • Check verifies the customer is within limits
  • Record logs actual usage (for metering and analytics)
This separation allows you to:
  • Reject requests before incurring costs
  • Track accurate values (not estimates)
  • Handle failures gracefully (don’t record failed requests)

Implementation

from limitry import Limitry
from openai import OpenAI

limitry = Limitry()
openai = OpenAI()

def chat(customer_id: str, message: str) -> str:
    # 1. Check limits
    check = limitry.limits.check(customer_id=customer_id)
    
    if not check.allowed:
        # Return limit info to user
        raise Exception(f"Limit exceeded: {check.limits[0].name}")
    
    # 2. Make request
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": message}]
    )
    
    # 3. Record actual usage
    limitry.events.record(
        customer_id=customer_id,
        event_type="llm.completion",
        values={"tokens": response.usage.total_tokens},
        dimensions={"model": "gpt-4"}
    )
    
    return response.choices[0].message.content

Handling Failures

If the request fails, don’t record usage:
try:
    response = openai.chat.completions.create(...)
except Exception as e:
    # Don't record - the request failed
    raise e

# Only record on success
limitry.events.record(...)

Check Response

The check response tells you about all matching limits:
{
  "allowed": true,
  "limits": [
    {
      "id": "lmt_abc123",
      "name": "Daily token limit",
      "meterId": "mtr_xyz789",
      "period": "day",
      "limit": 100000,
      "used": 45000,
      "remaining": 55000,
      "exceeded": false,
      "reset": 1704153600
    }
  ]
}
Use this to show users their remaining allowance or when limits reset. The reset field is a Unix timestamp (null for all_time limits).

With Balances

If using prepaid credits, check and debit the balance:
def chat_with_credits(customer_id: str, balance_name: str, message: str) -> str:
    # 1. Check balance
    check = limitry.balances.check_sufficiency(
        customer_id=customer_id,
        name=balance_name,
        amount=100  # Estimate tokens
    )

    if not check.sufficient:
        raise Exception("Insufficient credits")

    # 2. Make request
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": message}]
    )

    # 3. Calculate actual cost
    actual_tokens = response.usage.total_tokens

    # 4. Debit balance
    limitry.balances.debit(
        customer_id=customer_id,
        name=balance_name,
        amount=actual_tokens,
        description=f"GPT-4: {actual_tokens} tokens"
    )

    # 5. Record event (for analytics)
    limitry.events.record(
        customer_id=customer_id,
        event_type="llm.completion",
        values={"tokens": actual_tokens},
        dimensions={"model": "gpt-4"}
    )

    return response.choices[0].message.content

Both Limits and Balances

You can use both — limits for caps, balances for prepaid credits:
def chat_hybrid(customer_id: str, balance_name: str, message: str) -> str:
    # 1. Check limits
    limit_check = limitry.limits.check(customer_id=customer_id)
    if not limit_check.allowed:
        raise Exception("Limit exceeded")

    # 2. Check balance
    balance_check = limitry.balances.check_sufficiency(
        customer_id=customer_id,
        name=balance_name,
        amount=100
    )
    if not balance_check.sufficient:
        raise Exception("Insufficient credits")

    # 3. Make request
    response = openai.chat.completions.create(...)
    actual_tokens = response.usage.total_tokens

    # 4. Debit balance
    limitry.balances.debit(
        customer_id=customer_id,
        name=balance_name,
        amount=actual_tokens,
        description=f"GPT-4: {actual_tokens} tokens"
    )

    # 5. Record event (increments meters for limit checks)
    limitry.events.record(...)

    return response.choices[0].message.content