Efficient storage and presentation of your geoscientific data
It is quite common to see data such as laboratory assay results stored in geochemical databases in a “flat” table structure – a column for each assay element, with the analytical values for the element underneath. It’s a simple structure that is fine for basic presentation and management, but the system is not without its limitations.
Maintainability of data stored this way can be a headache – for example you have to add an additional column whenever you decide to assay for a new element or chemical species. Let’s face it: you may have analysed a particular set of samples for only a few elements, meaning that you have a whole stack of Null or empty values for all of the elements not assayed for.
Separating the way you store your data (storage layer) from the ways you would like to present it (your presentation layers) is a useful concept to understand. In the simplified example below geochemical assay data is used as a way of illustrating the point:
For most software applications that can connect to a database (GIS, 3D Modelling), a stored SQL query is treated much the same way as a table, typically presenting the two object types together in the same user-dialog to select from. This aids in separation of the way the data is stored, from the way the data is presented to common software such as Surpac, MapInfo or ArcView etc, used by geologists. You essentially store your data in a well thought-out and efficient database design, and create a new presentation layer for each intended use. Presentation layers are called Views in Oracle and SQL Server, or Queries in MS Access.
Using an MS Access environment to illustrate the point, a Crosstab query is one solution to present your analyses in a simple flat format for a GIS application whilst still storing your data efficiently. The example presented below only uses a few records, so if your dataset consists of many hundreds of thousands of records you may want to consider performance implications at an early stage. Having said this, I have experimented with a dataset of ~ 212,000 analyses records in MS Access, with no significant performance issues using Crosstab queries.
I have created a simple MS Access database for download and inspection of the query design. Open transpose_assays.mdb (Access 2000 file format), then have a look at the query named QRY_Analyses_Crosstab_Example. This query selects data from the analyses table (which you may also inspect), presenting the analytical data in a familiar transposed structure.
SQL Server and Oracle databases also allow you to transpose data; using the Pivot function in TSQL, and a similarly named function in Oracle PL/SQL.
This same logic may be applied to other types of geological data you commonly work with, and I hope the article provides some food for thought when you design or review your own geoscientific database.