
Table of Contents
We’re told constantly that data is the new oil: an asset that gets more precious the longer you hold it. That’s a dangerously misleading metaphor. Some data (your chart of accounts, product catalog, customer master) does gain value with careful stewardship. But most enterprise data isn’t wine. It’s milk. It has a shelf life, and keeping it indefinitely will sour system performance, bloat costs, add security exposure, and pollute analytics.
This conclusion comes from years of analyzing ERP, CRM, and financial platforms across industries. The pattern is consistent: organizations become digital hoarders. Terabytes of old transactional records linger in primary systems “just in case,” turning databases into overstuffed garages. The ROI rarely materializes. The hidden costs almost always do.
The Hidden Costs of Hoarding
- Degraded performance: Reporting on last quarter shouldn’t require wading through 15 years of orders. Unbounded tables slow indexes, queries, batch jobs, and backups. Users feel it as latency everywhere.
- Compliance and security risk: Regulations like GDPR and state privacy laws presume purpose limitation and retention limits. Old PII and HR data expand breach blast radius and raise penalties. Stale data is uninsurable risk.
- Polluted analytics: Seven-year-old transactions from a discontinued product line don’t improve forecasts; they add noise and bias, especially when business context (pricing, channels, policy) has changed.
- Operational drag: Backups, DR replication, reindexing, and upgrades all take longer. RTO/RPO targets inch out. eDiscovery scope balloons.
- Cloud costs that creep: “Storage is cheap” until you count snapshots, replication, query scans, and cross-region retrieval.
What Should Age and What Shouldn’t
Think in classes, not a monolith:
- Master and reference data (wine): Chart of accounts, customers, suppliers, products, tax codes. Curate carefully, version intentionally, and retain.
- Transactional data (milk): Orders, invoices, journal lines, time entries, shipments. High value when fresh; diminishing utility over time.
- Logs and telemetry (fresh produce): Useful for debugging, capacity, and security analytics on short horizons; decays fastest.
- Documents and communications (policy-bound): Contracts, statements, emails: retention driven by regulation and legal hold.
A Pragmatic Retention and Archiving Playbook
This isn’t about indiscriminate deletion. It’s disciplined, governed lifecycle management: a core tenet of effective financial data governance.
- Classify and scope
- Define domains: finance, order-to-cash, procure-to-pay, HR, CX.
- Label datasets by sensitivity (PII/PHI/PCI), system of record, and criticality.
- Set retention policies by class
- Active window in the primary system (e.g., 24 to 36 months for high-volume transactions; longer for GL where needed).
- Archive window in cold or nearline storage (e.g., 5 to 7 years to meet audit requirements).
- Deletion window when legal and business obligations end, with secure erase and audit trail.
- Make access intentional
- Keep recent data “hot” and indexed.
- Move older data to cheaper stores with lifecycle policies and clear SLAs (e.g., 48-hour restore or query-on-demand).
- Provide governed access paths for auditors and analysts so people don’t default to keeping everything hot.
- Automate the pipeline
- Use extract/archive jobs tied to fiscal calendar close.
- Partition primary tables by date; implement rolling detach/attach or partition switching.
- Enforce lifecycle rules in cloud storage (S3/GCS/Azure Blob) to transition and expire objects.
- Prove compliance every cycle
- Evidence: policy docs, job logs, deletion certificates, chain-of-custody for legal holds.
- KPIs: storage growth rate, percentage archived vs. hot, query latency, backup duration, percentage of datasets with an assigned owner.
Queryable Archives Without Slowing Your Core
The biggest pushback is, “But what if we need it?” The answer isn’t “keep it all in ERP.” The answer is “make archives queryable.” Options:
- External tables over object storage: Query S3/GCS via Athena/BigQuery/Synapse without rehydrating into the ERP database.
- Analytical warehouses: Land archives in a warehouse with cheaper compute, columnar storage, and strong partition pruning.
- App-native archiving: Many ERPs provide archive objects with on-demand restore; use them with strict SLAs.
Design so that 95% of daily work stays fast, while auditors and power analysts can still reach history without dragging the core.
Governance, Controls, and Compliance
- RACI and ownership: Every domain has an accountable data owner. Lifecycle jobs run under change control with approvals.
- Legal holds: Retention schedules pause automatically when a hold is in place, with documented scope and release.
- PII minimization: Tokenize or delete unneeded PII fields earlier than bulk transactional retention when allowed.
- Encryption and keys: Ensure archives inherit encryption-at-rest, rotation, and key escrow policies; test restores routinely.
- Auditability: Produce artifacts quarterly, including policy versions, job runs, exceptions, and remediation.
Common Pitfalls to Avoid
- Orphaning referential integrity: Archive related tables together; maintain minimal surrogate keys in the archive for joins.
- Rebuilding history in reporting: Don’t backfill reports against hot plus archive ad hoc. Use a semantic model spanning both.
- Skipping context: Archive metadata with the facts, such as tax rules, pricing lists, and currency tables, so old numbers still reconcile.
- One-size-fits-all windows: Finance may need 7 years, but telemetry might need 30 to 90 days. Tune by domain.
- “Delete later” culture: If deletion requires a special project, it won’t happen. Automate and schedule.
A 90-Day Starter Plan
- Days 0 to 15: Inventory top 20 data sets by size and access frequency; assign owners; draft retention targets by class.
- Days 16 to 45: Implement table partitioning and lifecycle jobs in 1 to 2 domains; stand up an object-store archive with lifecycle rules; document SLAs.
- Days 46 to 75: Wire a queryable archive path (external tables or warehouse) and validate analytics and audit access.
- Days 76 to 90: Turn on deletion for one low-risk domain; establish quarterly evidence package; set KPIs and dashboards.
Stop treating your data estate like a museum where everything is preserved forever. Treat it like a pantry: clearly labeled shelves, rotation, and expiration dates. A clean, well-managed environment is faster, safer, cheaper, and ultimately more valuable, because the right people can find the right, fresh data at the right time.
For more discussion on data management strategy, let’s connect on LinkedIn.