Data Security and AI Coding Tools
Data security concerns shouldn’t keep you from using the best AI Coding tools out there.
Here are a few nudges to get your thinking gears moving:
🤔 Are you confusing code classification with data classification? Your data might be strictly confidential but the code itself could be minimally classified.
(Example: You are probably use nginx to power a defense system. Nginx is open source. But the data it is serving could be Secret.)
🤔 Could your system architecture offload sensitive business rules parameters somewhere other than code? Could it live in a database and the code could be minimally classified?
(Example: Eligibility criteria)
🤔 Could you make a new module that is invisible to LLMs? Sensitive logic resides there but you can build around it!
🤔 How about a tiered LLM approach?
-
Your complex application gets built with the state of the art models from commercial vendors. (Claude Code/Codex).
-
But then the sensitive business rules could be written as a DSL. Build the DSL with big models, and train a small model for your DSL. Don’t go super crazy, think Ruby or Kotlin.
A few months ago I had a entire file classified as secret because of 2/10000 rows.
✨What can we remove / mask from the codebase to lower its confidentiality/sensitivity? What can we compartmentalize? ✨
What do you do to abstract away sensitive information from LLMs? Did this mental model help?