Published on

Part 1: I ragged every conversation I ever had with AI

Authors

I extracted 727MB of conversations from Cursor, Claude, and Codex. Then I ran a privacy preserving analysis on 809/100,000 conversations spanning 4 months. What I found changed how I think about working with AI.

The Setup

I built a pipeline that:

  1. Extracted conversations from local AI tools: GitHub
  2. Analyzed behavioral signals without storing raw text
  3. Removed all environment variables and sensitive information
  4. Generated composite indices from keyword patterns

Total: 19,539 user messages, 11,726 assistant messages, 16,484 tool uses.

The Strengths According to Data

1. Bias toward actionability

My spec completeness score averaged 25%~ of all my conversations. I try to frame problems with clear next steps rather than abstract discussions.

Examples, my most common conversations are debugging related loops:

"Fix the TypeScript error in file X. Error: Property 'toString' does not exist on type 'never'."

Not "I have a problem." Just the problem, the file, the error, the ask.

2. Debug loops over single-shot asks

40.3% of my conversations included error sharing. I iterate. I share the error, get a fix, encounter the next error, share that, repeat. This is incredibly wasteful, given i am in the loop when i shouldn't be. I jump straight to the errors as opposed to having the model lint, typecheck, and fix the errors autonomously.

3. Multi-system orchestration

Top languages by conversation:

  • Bash: 175
  • Go: 156
  • Rust: 133
  • Solidity: 55
  • TypeScript: 335
  • SQL: 95

I move between frontend (React, Next.js), backend (Go, TypeScript), infrastructure (Docker, AWS), automation (n8n, Slack integrations).

4. Testing awareness

43.6% of conversations mentioned testing. I do not think of myself as test-disciplined. But I mention tests frequently even if I am bad at writing them. I actually wish I was more disciplined about testing, it saves a lot of time specifically because it helps catch bugs early and reduces the need for manual in-the-loop debugging.

The Improvement Opportunities

The report was honest. Here are the gaps.

1. Minimal reproduction

User conversations with reproduction language: 0.4% That means 99.6% of the time, I ask for help without providing a minimal reproducible example. I just dump the error and expect the AI to figure it out, but it's not always enough. I need to improve my minimal reproduction skills.

2. Test discipline

In repositories where I have tests I did much less debugging, and much more feature development. In repositories without tests I did much more debugging, and much less feature development. This is very obvious, but it's not always easy to remember. I need to improve my test discipline.

3. Security hygiene

Risky disclosure signals: 32.5%

32.5% of my conversations contained signals that could indicate sensitive data exposure. API keys, environment variables, addresses. The heuristics do not store tokens, but they flag patterns. This makes life easier, but if I had spent more time configuring my tools and environment I wouldn't have to rely on the AI to flag them. I need to improve my security hygiene.

4. Acceptance criteria

When delegating multi-file changes, I rarely specify acceptance criteria upfront. This shows up in the spec completeness index being barely above 25%. The data caught what I knew was true. I get vague, then iterate, instead of being clear, then shipping. This habit is not sustainable and has made me actively less productive. I need to improve my acceptance criteria.

The weekly breakdown told a story.

WeeknSpecDebugTestSecurityRisky%Tool Uses
2025-09-22190.4870.0490.3570.11342.1%0
2025-10-131560.3230.0730.0780.03620.5%0
2025-12-221170.2370.0730.1100.09341.9%6,261
2026-01-05700.2130.0500.0980.11838.6%3,044

Spec completeness degrades with volume

As conversations increased, spec completeness dropped. More volume, less care. This is the classic trade-off. Speed over quality. It's easy to surrender to the temptation of shipping quickly, but it's important to prioritize quality over speed. By taking the time to write clear acceptance criteria, I can ensure that my work meets the necessary standards and reduces the risk of errors or security vulnerabilities.

Specification, planning, and tests all improve my mental map of the system. By taking the time to write clear acceptance criteria, I can ensure that my work meets the necessary standards and reduces the risk of errors or security vulnerabilities.

Security awareness is cyclical

Security mentions spiked in weeks 50 (0.135), 47 (0.133), and 46 (0.118). These correlate with integration work, adding new tools, connecting new systems.

The Composite Indices

The report included heuristic proxies (0-1 scale):

IndexUser (Me)Assistant
Spec completeness0.2750.277
Debug maturity0.0560.040
Testing discipline0.0960.173
Security awareness0.0580.064
Architecture-thinking0.1680.129

The assistant beats me on testing (0.173 vs 0.096). It beats me on security (0.064 vs 0.058). But I beat it on architecture (0.168 vs 0.129).

This is telling. I think about structure more than execution. The AI executes better than I do.

What This Taught Me

1. I am an action-oriented, high-volume developer

100,000+ conversations in 8 months. I use AI as for practically everything from code generation to debugging to testing to security to architecture, I also over use it when I want to run commands, setup docker, even when I want to commit and push. I use it for everything. This is telling, I need to be more intentional about how I can decrease my token usage and optimize my workflows.

2. I iterate more than I plan

40% error sharing but only 0.4% repro language. I debug in public. This is efficient for me but exhausting for collaborators.

3. I delegate testing, rarely doing it myself

Talking about tests (43.6%) is not writing tests. The assistant has higher testing discipline than I do.

4. I leak too much sensitive data

32.5% of conversations contained risky disclosure signals. This is a concrete, measurable hygiene problem.

5. The assistant complements my weaknesses

The AI has higher testing discipline and security awareness. It catches what I miss. This is the right mental model. AI as amplifier, not replacement.

The Action Plan

Based on the data, here is what I am changing.

1. Add repro template

## Error
[exact error message]

## Expected
[what should happen]

## Actual
[what actually happens]

## Minimal repro
[shortest code that demonstrates the issue]
```

### 2. Test discipline checklist

  • Write test before fix
  • Run tests after fix
  • Add regression check
  • Document test coverage

### 3. Security scan before commit

```bash
# Pre-commit hook
grep -r "sk-\\|pk-\\|0x[a-fA-F0-9]{64}" --exclude-dir=node_modules

4. Acceptance criteria for delegation

## Deliverables
- [ ] File A modified
- [ ] File B created
- [ ] Tests pass

## Acceptance
- [ ] Compiles without errors
- [ ] Handles edge case X
- [ ] Matches style of existing code
```

## The Bigger Picture

The question is not whether I use AI. I clearly do, aggressively.

The question is: am I using it to grow, or to avoid growth?
The data suggests both. I ship faster (actionability is high). I think less (spec completeness is low). I delegate testing (assistant beats me).

This is a trade-off. Every speedup has a cost. Every delegation has a gap.

## What I Would Tell Someone Else

If you are analyzing your own AI usage:

1. Extract your data. It is easier than it sounds, and worth it
2. Run privacy-preserving analysis. Do not store raw conversations
3. Look for patterns, not scores. The indices are heuristics. Trends are truth
4. Find gaps. Where are you delegating what you should own?
5. Set concrete changes. Vague improvement goals produce vague results

## The Verdict

The workstyle report was humbling. It confirmed suspicions I had and revealed blind spots I did not.

I am not a great debugger (low debug maturity). I do not write tests (low testing discipline). I leak sensitive data (high risky disclosure).

But I am action-oriented (high constraint framing), multi-system (broad language distribution), and iterative (high error sharing).

The profile is not good or bad. It is just true.

And now that it is true, I can work with it.

---

Continue to Part 2: Training My Own Coding Model, where I take these insights and my actual conversations to train sero-nouscoder [I trained a model on all my chats](https://www.sybilsolutions.ai/blog/02-training-my-coding-model-the-pipeline)

```

```