Shapefile wins the FME lifetime achievement award on its 25th birthday. And Twitter goes into a buzz

Shapefile is the most popular geospatial vector data format for geographic information system (GIS) software. Having been developed by Esri, this doesn’t come as a surprise. Also, this is due to it usability and simplicity. Its ubiquity, literally places it in the front row support in all GIS applications as the de facto format.

When I first created my custom GIS application using MapWinGIS Engine, there was just one spatial data format it supported. Yes, you guessed right, Shapefile for the start, then everything else followed.

And Friday was a special day for Shapefile at the FME UC, with a Twitter parody and commentary account which has lived to its reputation, hilarious and inspiring.

If you thought your twitter bio rocks, no, not even close.

Shapefile has been said to be a host of major weaknesses in data storage and management, which has raised a lot of controversy in the GIS community.

Just take a look at these links for example;

In this link is a discussion on Shapefile manifesto  shades light into the weaknesses of Shapefile and fronting SQLLite extended with Spatialite format to replace it.

Follow this stack exchange discussion titled Are there any attempts to replace the shapefile? [closed], quite a number of replacements have been fronted, Geopackage, Spatialite of course on top of SQLLite, FileGDB and many more. This thread is quite intriguing and I may not summarise it for you precisely, just take a look for yourself.

There are also Shapefile’s documented limitations

  • The shapefile format does not have the ability to store topological information
  • The size of both .shp and .dbf component files cannot exceed 2 GB (or 231 bytes) — around 70 million point features at best.

The .dbf format also has numerous weaknesses inherent in it, especially those that it borrows from its parent dBase. If you’ve ever wondered why you cannot use field names longer than 10 characters in Shapefile it is because of dBase standards. Read more here

The worst of it all is Shapefile representing null values as Zero, now if you’re dealing with quantitative data for analytics this will return skewed results which is dangerous. And it gets dirty really fast for programmers as null and Zero are completely different things values.

Read here for more limitations

These Oddities did not help Shapefile’s competitors much during the FME UC Lifetime Achievement Award for data formats, as Shapefile emerged as the winner of the FME Lifetime Achievement Award for Data formats beating 350 + other geographic data formats.

The FME International User Conference (FME UC) is where hundreds of the world’s top data experts come together for three days of inspiring and informative sessions to advance their skills, exchange knowledge, and discover new ideas for their data in beautiful Vancouver, Canada.

FME (Feature Manipulation Engine) software providers Safe Software announced Shapefiles achievement on its 25th BD and quickly twitter turned into a buzz.

So what’s it all about the Shapefile format?

The shapefile format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes ~Wikipedia

It’s a little confusing talking about “shapefile” since its normally a collection of files usually with a common file name prefix, and always in the same directory. Shapefile with the extension .shp being one of the three mandatory files in the collection together with its sisters  .shx and .dbf.

Mandatory files

.shp — shape format; the feature geometry itself

.shx — shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly

.dbf — attribute format; columnar attributes for each shape, in dBase IV format

Other files

.prj — projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format

.sbn and .sbx — a spatial index of the features

.fbn and .fbx — a spatial index of the features that are read-only

.ain and .aih — an attribute index of the active fields in a table

.ixs — a geocoding index for read-write datasets

.mxs — a geocoding index for read-write datasets (ODB format)

.atx — an attribute index for the .dbf file in the form of shapefile.columnname.atx (ArcGIS 8 and later)

.shp.xml — geospatial metadata in XML format, such as ISO 19115 or other XML schema

.cpg — used to specify the code page (only for .dbf) for identifying the character encoding to be used

.qix — an alternative quadtree spatial index used by MapServer and GDAL/OGR software

Read more from the Esri Whitepaper 

Steve Ochieng
About Steve Ochieng 15 Articles
Geospatial Specialist | Avid Reader and Vivid Writer here | Geospatial Tech Advocacy and Evangelism | GeoProgrammer and Follower of @geohipster

Be the first to comment

Leave a reply on this post