Articles and information relating to geological data management

Geochemical assay data – Storage and Presentation

Efficient storage and presentation of your geoscientific data

It is quite common to see data such as laboratory assay results stored in geochemical databases in a “flat” table structure – a column for each assay element, with the analytical values for the element underneath. It’s a simple structure that is fine for basic presentation and management, but the system is not without its limitations.

Maintainability of data stored this way can be a headache – for example you have to add an additional column whenever you decide to assay for a new element or chemical species. Let’s face it: you may have analysed a particular set of samples for only a few elements, meaning that you have a whole stack of Null or empty values for all of the elements not assayed for.

Separating the way you store your data (storage layer) from the ways you would like to present it (your presentation layers) is a useful concept to understand. In the simplified example below geochemical assay data is used as a way of illustrating the point:

Storage layer
Store the analyses like this
Analyses Crosstab example
But display the analyses like this

For most software applications that can connect to a database (GIS, 3D Modelling), a stored SQL query is treated much the same way as a table, typically presenting the two object types together in the same user-dialog to select from. This aids in separation of the way the data is stored, from the way the data is presented to common software such as Surpac, MapInfo or ArcView etc, used by geologists. You essentially store your data in a well thought-out and efficient database design, and create a new presentation layer for each intended use. Presentation layers are called Views in Oracle and SQL Server, or Queries in MS Access.

Using an MS Access environment to illustrate the point, a Crosstab query is one solution to present your analyses in a simple flat format for a GIS application whilst still storing your data efficiently. The example presented below only uses a few records, so if your dataset consists of many hundreds of thousands of records you may want to consider performance implications  at an early stage. Having said this, I have experimented with a dataset of  ~ 212,000 analyses records in MS Access, with no significant performance issues using Crosstab queries.

I have created a simple MS Access database for download and inspection of the query design. Open transpose_assays.mdb (Access 2000 file format), then have a look at the query named QRY_Analyses_Crosstab_Example. This query selects data from the analyses table (which you may also inspect), presenting the analytical data in a familiar transposed structure.

Download Access 2000 example

SQL Server and Oracle databases also allow you to transpose data; using the Pivot function in TSQL, and a similarly named function in Oracle PL/SQL.

This same logic may be applied to other types of geological data you commonly work with, and I hope the article provides some food for thought when you design or review your own geoscientific database.

Geological Data Management for the Small Explorer

Download article in pdf format

This Article was published in the Australian Institute of Geoscientists (AIG) News No. 107, February 2012.

Solid foundations

It is essential for a small, recently listed or low market-cap exploration company to lay a solid foundation for the management of its geological data. The geological database a company uses provides this foundation, and is a key asset, with its value being increased by improving the inherent quality of the data. Improving the quality of this data, which can provide good dividends both now and in the future, is the focus of this article.

Acquisition of data

Typically, a company will have acquired its tenement holdings with an accompanying package of legacy geoscientific and spatial (GIS) data. This data represents the sum total of the previous explorers’ investment in the ground that makes up the tenement holdings. It will take many forms; and is usually of widely varying quality and completeness. In its most basic form, the data will be a pile of field notes and associated maps and plans. You maybe a little luckier and find a collection of quizzically named spreadsheets, which when opened display an artistic collage of colourful cells, the exact meaning of which has been lost long ago. At least in this instance much of the data is in digital form. Yet there are many good reasons why a company should be thinking beyond spreadsheets.

Thinking beyond spreadsheets

Why do so many small companies continue to use spreadsheets when a database is the better choice? A spreadsheet has serious drawbacks when used for data storage. It is cumbersome to retrieve selective data from a spreadsheet. Spreadsheets offer little or no data validation and little or no protection against data corruption from well-meaning but poorly trained users (Read: Data sorting). So why do so many small explorers stick with spreadsheets? It is often because of familiarity and a lack of understanding of the alternatives.

Some companies will have discovered they have one or more ‘desktop’ databases, most commonly MS Access or the equivalent. This is a promising sign because there is a good chance at least some data will be in relational tables. However, before an existing desktop database is used, a company should consider the following questions:

  1. Does the database design suit your current business requirements?
  2. Does the database have a means of data viewing, entry and editing using well-designed forms?
  3. Is there an inbuilt means to create presentation reports for the stored data?

If the answer to any of these questions is no, then either a commercial solution, or some in-house development should be considered. If you are already in possession of a 3D mining package, most provide some means of creating a geological database, however, these databases are often too simplistic and inflexible for other uses. Often in a small company the expense, or indeed the need, for a fully-fledged commercial data-management solution can’t be justified. However, there are other solutions available to improve or extend the design and functionality of your current database. There is also the option of creating a new solution from scratch.

There are good reasons to consider your data management needs early, so as not to compound problems over time. Ignoring the health of your geological data will mean that when it comes time for a resource calculation you may have to pay for a consultant and a significant part of their time (and your money) will be spent making sense of your poorly-organised data.

Organising data – some solutions

Drill hole and Samples Database

For the small explorer, the starting point (or scope of the undertaking) is usually a means of managing reconnaissance ‘point’ data e.g. soil samples, stream sediment samples, rock chip samples etc., as well as drill hole (or linear data). A practical and proven solution for the small explorer is to use a desktop database which also has some means of creating data-entry forms, a graphic user-interface (GUI) to aid in creating queries, a method to produce basic reports such as a summary drill hole listing (for exploration announcements) and a detailed drill hole report (annual reporting to the government body).

Microsoft Access is a common choice in industry for both a data entry and reporting tool, especially for those with little programming experience. This is a valid choice for the small operation; there is also a good business case for using MS Access as a front-end CRUD tool, with a separate instance for the back-end database (where only the data will reside). Some suggested back-end options are MS Access again, or for a sturdier solution; SQL Server Express 2008 R2, or Oracle XE which are both freely down-loadable tools and very capable. With respect to CRUD, I’m not talking about a ‘coating or an incrustation of filth or refuse’ (to which the geologist’s amongst us are intimately familiar). I am referring to Create, Read, Update and Delete, basic functions of a computer database system.

The separation of front-end from back-end is a good future-proofing idea; the data tables in the back-end are linked to the front-end database containing the forms, queries and reports. This means that work can be done on the front-end without worrying about taking the database off-line or data synchronisation later.

The Database Model

For the beginning geology database modeller there are a number of common pitfalls. Whilst there are important criteria to generally follow in order to create or model a database, there is no single, correct design. Many volumes have been written regarding general database design, however I will discuss only the most problematic pitfalls as they apply to geological and drilling-related data repositories.

It is crucial to understand the importance of storing your geoscientific data in the most logically consistent and efficient manner. It is a common misconception that you store your data according to a specific end use. An example would be the modelling of a database in such a way that collar, sample and perhaps even analysis data are held in the same database table. This may have been designed to eliminate the use of joins in GIS/3D software, but this practice severely limits the flexibility and use of the stored data for other future software or purposes. I have reviewed a database from a medium sized mining company where the ‘Prospect’ field was duplicated in several tables, including the collar table, the geology table and the assay table. Data redundancy such as this can lead to data anomalies and corruption. As an example, if the prospect for a given collar required editing, it would have to be changed in literally thousands of records, instead of one edit in one lookup table.

This leads to the practice of separating the storage from the presentation layers. Ideally you will have one storage layer for your data, but many presentation layers. For example, a GIS presentation layer for each of your GIS software tools of choice, a 3D presentation layer for your modelling software of choice and so on, the key point being that you can easily add presentation layers as the business need arises with the proviso that your storage layer has been properly designed from the start. Presentation layers are commonly created through the use of database Views (saved queries).

To achieve the best protection for your valuable geological data, it is important to understand that the current generation of desktop database engines typically provide inbuilt tools and functionality to ensure the data in your database is accurate and consistent i.e. has integrity.

Whilst no amount of programming can prevent every type of error that could be introduced, it is important to learn how to utilise the provided tools to the best of your ability in order to maximise the validity and quality of your geoscientific data. You will have heard of the old adage: Garbage In – Garbage Out , well let’s keep the garbage out!

Ensuring data integrity

There are four primary types of data integrity: entity, domain, referential, and user-defined. In general terms; entity integrity applies at the row level; domain integrity applies at the column (or field) level; and referential integrity applies at the table level.

1. Entity Integrity

AKA: Hey, my drill hole is in there twice!

No duplicate rows. Entity Integrity ensures that the data that you store remains in the proper format as well as remaining comprehensible. Every row has a unique field that can’t be null or empty.

In a typical table designed to hold collar information it is very common practice to designate the name of the drill hole or costean as the Primary Key or unique, not null field. Many drill databases generated from 3D modelling software such as Surpac do this. Text fields traditionally make poor primary keys, especially fields that the user manually enters, such as the drill hole name.

This happens both in commercial software, and more than likely in your very own database. For a small scale, simple to maintain system suited to the small explorer, this is understandable particularly given the user-friendliness it offers.

Drill hole ‘DDH-97 GREENFIELDS-009′ is the same as ‘DDH97 GREENFIELDS-009′ isn’t it? You and I know that, but to your database they couldn’t be more different.

During the data collation stage where you may be initially loading your database tables with data, the practice of using the drill hole name as the Primary Key is going to be at its highest risk in terms of data integrity between tables.

You may find that the geology logs are not displaying for a drill hole you could swear has been logged and the data entered. The end result may be that you enter this log information again and duplication occurs – your entity integrity has been compromised.

One partial solution here is to provide the user a means of easily performing a search for existing data before new rows are created. A search performed in the background to display drill holes with similar names as you enter your new hole, for example. Another, even better solution is to use integer type fields to link tables, such as a drill-id.

2. Domain Integrity

The values in any given column fall within an accepted range. (Validation)

It is important to make sure that the data entered into a table is not only correct, but appropriate for the fields it is entered into.

Some familiar examples may include:

  1. A numeric field to hold end of hole (E.O.H) depth.
  2. A date field (with date picker in your entry form) for the drill start date, drill end date
  3. Drill hole collar azimuth in the range (0-360 degrees)
  4. Drill hole collar dip in the range (-90 – +90).
  5. Multi-field validation: hole survey depths must be less than or equal to E.O.H depth.

If the range of acceptable values for a given field must conform to a list of values (e.g. a rock code or alteration type), then typically a lookup list would be employed within the application, with or without a referential integrity constraint (see next) applied at database level on a matching lookup table to enforce this referential integrity.

3. Referential Integrity

Foreign key values point to valid rows in the referenced table. In geological database terms, a consequence will be the requirement to have a drill hole collar entered first, before you can enter its geological log. Additionally you will be prevented from deleting a drill hole collar before first deleting all its associated logs, down-hole survey entries, RQD and Recovery values etc.

4. User-defined Integrity

The data complies with applicable business rules. For instance you may have a rule that states all analyses are to be entered in parts per million units rather than percent. You may require that a drill hole name be entered in the form: “DDH“+ Year + Prospect Name. It is almost certain that you will want to prevent lithology intervals for a given hole from overlapping.

Summary

Given that geologists are normal, fallible human beings a database design and its application code should be able to anticipate and prevent many types of errors, there should also be an expectation that every effort is made to enter data correctly. Some errors may be impossible to trap systematically (for example, an incorrectly typed name or rock description), but any business rules which can be defined should be enforced in your database. Avoid the temptation to rely solely on vigilance by end-users to ensure accurate data.

A database and application is ultimately worthless if the data is of poor quality and cannot be relied upon. The geological database a company uses is a key asset, with its value being increased by improving the inherent quality of the data – and improving the quality of data can provide good dividends both now and in the future.

A Geological Database Template for the Small Explorer

The following article outlines aspects of a design for a practical and proven geological database that hopefully provides some ideas for your own business requirements. This model has been influenced by the Mineral Resources Tasmania (MRT) drilling database design, as a direct result of my involvement with Project Tiger several years ago.

CoreStore datamodel
Figure 1: CoreStore datamodel

At the logical centre of the model is an entity to storecollar details – Drill hole Collars shown as one of the two entities in light blue (see Figure 1) The light blue colour indicates a spatial entity, the Collars entity representing the surface position of linear features (drill holes, costeans etc). If you are dealing mostly with geological point data: soils, rock chips, stream sediments samples and the like, you will want to focus mainly on the Sites entity, which is also shown in light blue.

The Drill hole Collars entity typically contains attributes (fields or columns) to hold values such as the 3D collar position in space (Easting, Northing, RL), and the corresponding datum e.g. GDA94, AMG66 etc. The collar azimuth and the collar dip are other key attributes. There are a number of additional attributes that you may want to store depending upon the specific nature of your own drill holes or drilling program.

The important thing to ensure is that these attributes are actually drill hole attributes and not attributes of another entity, such as samples or assays. If you find that you need to store more than one similar attribute for a drill hole e.g. drill type 1 =’percussion’ (pre-collar), drill type 2 = ‘diamond’ (tail), then this is a signal that the information is better stored in a separate Drill-Type entity (table), as illustrated in Figure 1. This would be more logically consistent, and also allow for extra fields such as depth from, depth to, core diameter (NQ, HQ etc.) to be stored to better describe the hole with depth. A single “Primary Drill type” may be a better attribute for the collar table – allowing easy query retrieval of all diamond holes for example.

The Downhole Survey entity is conceptually fairly simple. It is readily modelled as a table with depth attributes, azimuth and dip readings. You may also want to include an attribute to describe the survey method e.g. Eastman, gyro etc. In some instances an x,y and z coordinate for each down-hole depth may need to be stored in order to describe the drill hole path in three dimensions for certain applications. As this information can be considered to be derived data it may be better represented in a presentation layer, rather than physically stored as such.

A separate Sample Information entity is shown. This makes it easy to store depth from, depth to and sample type attributes, e.g. Blanks, Standards and True (core, chip etc.) samples. This allows for a hole to have many samples.

A separate Analyses entity is shown in the datamodel. This allows for a sample to have more than one (many) assays. Included in the data model is both an Analysis Type and Results entities. This provides for a quite powerful way to store any number of scientific readings, including assays results from chemical species. This essentially allows for a given sample to have many analyses of different types, which could be particularly useful for example in the academic arena  in areas such as micro-probing of minerals and isotope studies. For the small explorer, it also easily allows for, say, the same piece of core to have XRF and ICP (and other) analyses stored optimally in the database. You will also notice that the Results entity has both a numeric value and a description field. The description field gives the added flexibility for analytical results to be stored that have a text-based description rather than a purely numerical result. For the small explorer or resources company, the simpler Analyses entity is probably the better choice based purely on the ease of use this simple model provides. However, for a growing exploration and mining company, it is likely that the alternate Analysis Type and Results approach to modelling analyses would be the more future-proof and powerful solution.

Lithology entity allows your drill hole to have many lithologies i.e. many intervals with a chosen lithology. Extra fields often stored here include full description of the lithology, stratigraphic unit etc.

Figure 2: Geology Log

A separate Modifiers entity is shown in Figure 1. This design allows for a lithology to have many modifiers such as alteration, mineralisation, weathering, colours and textures. This table stores the information that the geologist collects while logging and will often see the largest variation in required fields. This table represents the centre of customisation depending upon the data capture requirements of the commodity or mineral sought. It is very common to find the greatest number of data normalisation issues surrounding the geological logging storage tables. There is a good chance that your database has only a single table to store logging data, meaning that for a given drill hole the logging interval is duplicated

The Drill hole Structures entity allows for a drill hole to have many structural readings. The most common data fields incorporated here include RQD (Rock Quality Designation), and core recovery, although any structural type feature of the drill core or costean that need to be recorded with depth is suitable for inclusion in this table.

Drill hole intersections. One often overlooked area of data storage is drill hole intersections. In many respects this is derived data – derived from the sample and assay data, but enough geological interpretation is usually required that intersections can be considered as ‘raw’ data items in many cases. It is handy to be able to store and extract a piece of information such as: DDH-97 GREENFIELDS-009 from 76 – 87 m, 11m @ 2.02 % Pb, 3.01 % Zn, which could then be exported to a variety of reports. This information could be stored in a number of formats, however free-text fields are adequate for most purposes.

Figure 3: Individuals and Parties

In the context of this database template, a Party refers to an individual person, company or other organisation. For the purposes of a geological database, common Party-types include the drilling company that drilled the hole, the geologist or geologists(s) that logged the hole, and the resources company that initiated the hole. The Party Details entity stores the name of the Party along with a host of other information such as address and contact numbers. It is worth noting that the Party Details entity could also be referenced in an expanded database system within your business to hold personal details for many other uses. In the Organisations entity, there is a field present to store the Party-type, which in turn relates back to a record in the General Codes table. The Organisations entity acts as a link table between the drill hole and the Party table, thereby providing a Many-to-Many functionality. The Company comment field in the Organisations entity allows for a comment specific to a drill hole and a related Party to be assigned.

The Drill hole Purpose entity is depicted as a One-to-Many relationship with drill holes. A drill hole can have many Purposes in this model, another name for a purpose could be a drill hole aim, in which case there would often be multiple aims for a given exploration hole. The Drill hole Purpose entity is probably optional for many small exploration companies, as this functionality could also be accomplished using the General Comments entity, which leads us to the next discussion.

To reduce the number of tables having their own individual comments fields, the template illustrates using a General Comments entity.

Figure 4: Comments

This entity has a Comment Table field, which is a character format field to store the name of the table the comment relates to. Along with the Record Id field, the Comment Table field uniquely identifies the specific record to which the comment belongs. One benefit of this approach to managing comments is that no new fields need to be created in your entities each time you decide to include comments in them. The Comment-type field in the General Comments entity provides the ability to assign a standard reusable Comment-type from the General Codes table. Using the previous example, one Type you could include is a Drill hole Aim. Another example would be to create a drill hole summary Comment-type to record an overall description of the drill hole, whether the aims were achieved and whether any mineralisation was intersected.

Two simple but key entities are the code tables shown below. There is no reason that you couldn’t merge the two tables into one, but I prefer to have one code table to store drilling (geology) related codes, and another more general code table to store code values that might also be applicable to other databases or models I work on. A useful attribute in the two code tables is the Parent attribute. This allows the ability to specify a code parent-child relationships and build up a hierarchy for lookups in a geological application. For code values I have got into the habit of using three-letter codes rather than numbers, with some approximate abbreviation of the parameter where possible. This gives a measure of user-friendliness to the data when viewed in queries, and can help at times when performing data validation etc. By having all the geology codes in one Drillhole Codes table, rather than a dozen or more separate lookup tables, data management is simplified. As your business requirements inevitably evolve, insertion of new code types and records in the code tables is much simpler than having to create new physical lookup tables.

Figure 5: General Codes
Figure 6: Drill hole Codes

That completes my summary of a geological database template for the small explorer. If you have a geological database and/or application of your own that needs some work, or you need some professional advice and assistance, please feel free to contact me to discuss by using the Contact Me tab at the top of the page.