CDM, LDM, PDM: The Three-Layer Architecture That Makes Geospatial Data Last

Geospatial data projects have a particular pathology: they start with a shapefile, graduate to a GeoPackage, and eventually arrive at a PostGIS schema that nobody fully understands, carrying geometry in a projection nobody chose deliberately, with attribute names inherited from a 2003 ArcInfo coverage.

The CDM → LDM → PDM hierarchy exists to prevent this. It forces deliberate thinking at three levels of abstraction before a single table is created.

The Conceptual Data Model (CDM)

The CDM answers: what entities exist in this domain, and how do they relate to each other? It is technology-agnostic, uses business language, and is the layer where domain experts (not database architects) should lead.

For a utility network, the CDM entities might be: Asset, Asset Class, Owner, Maintenance Zone, Survey Record, and Feature of Interest. The CDM makes explicit that a Survey Record belongs to exactly one Asset but an Asset can have many Survey Records — a cardinality decision that will propagate through every subsequent layer.

The CDM is where you discover that "pipeline" means different things to the integrity team (a physical object with a weld count) and the GIS team (a line geometry with a licence reference) — and where you reconcile that before it becomes a schema bug.

COMMON MISTAKE
Treating the CDM as optional documentation produced after the physical schema is built. By that point the CDM is archaeology, not design.

The Logical Data Model (LDM)

The LDM translates CDM entities into normalised tables with typed attributes, but still without committing to a specific database technology. It is where you make decisions about normalisation (3NF as default, with documented exceptions), primary and foreign key strategies, and the handling of temporal data.

For geospatial systems, the LDM is where geometry enters the model — but as a typed attribute (geometry: Polygon, SRID: 27700) not as a physical column definition. The LDM should specify: what geometry types are allowed, what coordinate reference systems are permitted, whether geometry is stored alongside attributes or in a linked geometry table, and what topology rules govern relationships between features.

Projects that skip the LDM typically discover CRS ambiguity (mixed OSGB36 and WGS84 in the same table) and geometry type confusion (LineString vs MultiLineString) only when spatial queries start returning wrong results in production.

→Define geometry types and permitted CRS at LDM stage, not at ETL time
→Specify temporal modelling strategy — valid time, transaction time, or bitemporal
→Document all many-to-many relationships that will require bridge tables
→Define the coordinate precision policy — how many decimal places are meaningful?

The Physical Data Model (PDM)

The PDM is the technology-specific realisation of the LDM. For a PostGIS deployment, this means: table DDL with specific column types, index strategy (GIST for geometry, B-tree for attribute lookups), partition strategy for large feature classes, row-level security definitions, and materialised view specifications for performance-critical queries.

The PDM is also where you document the non-obvious decisions: why GEOGRAPHY type was chosen over GEOMETRY for a global pipeline network (because the Haversine distance calculation matters at those scales); why a BRIN index was chosen over a GIST index for a time-series raster product; why the geometry is stored in EPSG:27700 internally but reprojected to EPSG:4326 at the API layer.

Why geospatial projects skip CDM and LDM

The honest answer is that GIS tools make it easy. QGIS will import a shapefile and create a PostGIS table in three clicks. ArcGIS Pro will propose a schema from a CSV. The tooling creates an illusion that the modelling work is done when the data is loaded.

The second reason is that geospatial projects are often initiated by field data collection rather than system design — a survey team has captured data in a particular format, and the project brief is "put this in a database," not "design a data architecture."

ActiveSense addresses this by insisting on a CDM workshop before any schema work begins. This typically takes one day for a bounded domain and two to three days for multi-source integration projects. The investment is returned within the first month of development.

The cost of skipping

On a recent UK government spatial data platform project, ActiveSense inherited a PostGIS database with 340 tables, no documented schema, mixed CRS (WGS84 and OSGB36 in the same feature class, no column indicating which), and a geometry column called "geom" in some tables and "shape" in others. The retroactive data modelling exercise took six weeks and required a controlled migration of 2.4 billion geometries.

A CDM + LDM exercise at the start of that project would have taken five days. The lesson is consistent across every engagement where we have encountered it.

CDMLDMPDMdata architecturegeospatial modellingdata model3NF

Soheil Sotoodeh

Principal Geospatial Data Architect

Esri Advanced & Enhanced Certified · PMP · 12+ years geospatial data architecture

Need expert help?

Talk to a geospatial data architect

ActiveSense provides architecture reviews, CDM/LDM/PDM design, GEMINI 2.3 compliance programmes, and NUAR advisory across public sector and energy clients. Available via G-Cloud.

GET IN TOUCH DATA HEALTH CHECK

Acorn CCS — North Sea Geospatial Platform