Developer Guide

Claude code tool use monitoring: practical logging and guardrails for agent runs

Published for engineers running Claude Code in local or CI workflows

Claude code tool use monitoring matters once an agent can run shell commands, read files, or invoke other tools on your machine. If you need an audit trail, or you want to stop certain actions before they happen, you need more than transcripts. You need structured logs and an enforcement layer around each Claude process.

That is the problem agentcheck is built to solve. It wraps Claude Code with hook-based guardrails, writes tool activity to JSONL, and applies YAML rule packs that constrain what an agent can do. This is useful when you want repeatable visibility into tool calls, especially in team environments, CI jobs, or any workflow where "the agent probably did X" is not good enough.

What you actually need to monitor

For most developer workflows, good monitoring is not about recording everything forever. It is about answering a few concrete questions quickly:

Claude code tool use monitoring is most useful when logs are machine-readable. JSONL works well because each event is a standalone line, which means you can inspect it with jq, stream it, or ship it into a log pipeline without special parsing.

Install and run agentcheck

The install path is intentionally simple:

npm install -g github:paprika-org/agentcheck

Once installed, run Claude Code through the wrapper instead of launching it directly. The exact invocation depends on how you use Claude Code locally, but the pattern is the same: start the agent process under agentcheck so hooks can observe tool calls and apply policies.

# Example pattern: run Claude Code through agentcheck
agentcheck claude

# Or pass through normal arguments
agentcheck claude --prompt "inspect the repo and summarize failing tests"

If your environment writes logs to a dedicated directory, keep that directory outside the repo and make rotation explicit. Agent traces can include file paths, command arguments, and snippets of content, so treat them as operational data, not harmless debug output.

Example: inspect tool activity with JSONL

A typical JSONL event stream for tool monitoring should be boring and parseable. That is a good thing. Here is the kind of output you want to be able to query:

{"ts":"2026-04-23T12:00:01Z","tool":"shell","args":{"cmd":"ls -la"},"decision":"allow"}
{"ts":"2026-04-23T12:00:03Z","tool":"read_file","args":{"path":"src/index.ts"},"decision":"allow"}
{"ts":"2026-04-23T12:00:05Z","tool":"shell","args":{"cmd":"rm -rf /tmp/build"},"decision":"deny","rule":"block-destructive-shell"}

Once you have that, normal Unix tooling becomes enough for quick analysis:

# Count tool usage by tool name
jq -r '.tool' agentcheck.log.jsonl | sort | uniq -c | sort -nr

# Show denied actions
jq 'select(.decision == "deny")' agentcheck.log.jsonl

# Review shell commands only
jq 'select(.tool == "shell") | {ts, cmd: .args.cmd, decision, rule}' agentcheck.log.jsonl

This is where the monitoring becomes practical. You are no longer reading a chat transcript trying to infer whether the agent used grep, touched a secret file, or retried a blocked command five times.

Example: constrain behavior with YAML rule packs

Logging alone is passive. If you already know what must not happen, add rules and fail fast. A rule pack can block high-risk commands, prevent writes outside a working directory, or restrict tools entirely for certain jobs.

rules:
  - id: block-destructive-shell
    when:
      tool: shell
      args:
        cmd_regex: '(^|\\s)(rm\\s+-rf|mkfs|dd\\s+if=)(\\s|$)'
    action: deny

  - id: block-home-directory-writes
    when:
      tool: write_file
      args:
        path_regex: '^/home/'
    action: deny

  - id: allow-readonly-repo-analysis
    when:
      tool: shell
      args:
        cmd_regex: '^(ls|find|rg|cat|git\\s+status|git\\s+diff)'
    action: allow

That setup will not solve every policy problem, but it covers the common ones: destructive shell calls, writes to the wrong place, and enforcing a read-only analysis mode. For CI, this is often enough to turn an agent from "interesting but risky" into something you can actually run in automation.

What this solves well, and what it does not

There are real limits here, and they matter.

The honest framing is this: agentcheck gives you structured visibility and enforceable constraints around Claude Code, but it is not a full sandbox. If you need isolation from the host system, use OS-level controls as well. Behavioral governance and system isolation solve different parts of the problem.

Recommended baseline for teams

If you are introducing Claude code tool use monitoring for a team, a reasonable baseline looks like this:

That gives you something operationally useful without overengineering the first pass.

Practical takeaway: if your current answer to "what did the agent actually do?" is "read the transcript," you do not have reliable tool monitoring yet.

Call to action

If you want a straightforward starting point for Claude code tool use monitoring, install agentcheck, run Claude Code through it, and inspect the JSONL output before writing any complex rules. Then add a small YAML rule pack that blocks destructive shell commands and out-of-bounds writes. That sequence gives you immediate visibility and a sane first layer of control.

Project repo: github.com/paprika-org/agentcheck

Install with npm install -g github:paprika-org/agentcheck and use the repository examples as the starting point for your own rule packs.

Frequently Asked Questions

What is Claude Code tool use monitoring?
Claude Code tool use monitoring is the practice of tracking which tools an AI coding agent calls, what arguments it passes, and what actions it takes during a session. For teams using Claude Code, this helps surface risky commands, permission gaps, and unexpected agent behavior. In the context of claude code tool use monitoring, the goal is usually better security, debugging, and auditability.
How do I fix Claude Code tool use monitoring if it is not catching actions?
If claude code tool use monitoring is missing events, start by checking whether your monitor is attached to the actual tool execution layer rather than only reading static config or prompts. Many gaps happen when sub-agents, dynamic arguments, or wrapper scripts bypass the expected logging path. A reliable fix is to capture runtime tool calls with timestamps, tool names, arguments, and outcomes so you can see what really happened.
Why does Claude Code use tools I did not expect?
This usually happens because AI agents choose tools dynamically based on the task, not from a fixed script. Claude Code may select a shell command, file read, or web action that was logically available even if you did not anticipate that exact path. That is why claude code tool use monitoring matters: it shows real runtime behavior instead of relying only on assumed permissions.
What are the alternatives to Claude Code tool use monitoring?
Alternatives include static allowlists, deny rules, sandboxing, wrapper scripts, and full session logging platforms. These can help, but they often miss context about why a tool was called or whether the call succeeded. Compared with static controls alone, claude code tool use monitoring gives more actionable visibility into agent tool usage and misuse.
What is a practical tip for Claude Code tool use monitoring?
A practical tip is to review a few real agent runs and build alert rules from actual tool patterns instead of hypothetical ones. For example, flag repeated shell retries, file writes outside expected directories, or web calls during sensitive workflows. This makes claude code tool use monitoring more useful for both debugging and security without creating noisy logs.