Data Security and AI Coding Tools

Data security concerns shouldn’t keep you from using the best AI Coding tools out there.

Here are a few nudges to get your thinking gears moving:

🤔 Are you confusing code classification with data classification? Your data might be strictly confidential but the code itself could be minimally classified.

(Example: You are probably use nginx to power a defense system. Nginx is open source. But the data it is serving could be Secret.)

🤔 Could your system architecture offload sensitive business rules parameters somewhere other than code? Could it live in a database and the code could be minimally classified?

(Example: Eligibility criteria)

🤔 Could you make a new module that is invisible to LLMs? Sensitive logic resides there but you can build around it!

🤔 How about a tiered LLM approach?

Your complex application gets built with the state of the art models from commercial vendors. (Claude Code/Codex).
But then the sensitive business rules could be written as a DSL. Build the DSL with big models, and train a small model for your DSL. Don’t go super crazy, think Ruby or Kotlin.

A few months ago I had a entire file classified as secret because of 2/10000 rows.

✨What can we remove / mask from the codebase to lower its confidentiality/sensitivity? What can we compartmentalize? ✨

What do you do to abstract away sensitive information from LLMs? Did this mental model help?