May 13, 2014

A guide to the rasterization of vector coverages in PostGIS

A frequent question since the beginning of the PostGIS Raster adventure has been "How can I convert my geometry table to a raster coverage?" (here also). People often describe the way to rasterize a single geometry using ST_AsRaster() but I don’t know any text or tutorial explaining how to convert a complete vector coverage to a single raster or to a tiled raster coverage.

In this article I will describe two methods to rasterize a vector coverage: one very fast but not very flexible and one very flexible but unfortunately a bit slow! I will rasterize a vector forest cover so that it aligns perfectly with an already loaded raster elevation layer. I will use the "height" column of the forest cover to assign values to pixels. I will assume that both layers are in the same SRID and that the forest only partially covers the elevation layer. The elevation layer is divided in 625 tiles of 100x100 pixels for a total of 6 250 000 pixels.


"Intelligent" vector VS "stupid" raster!

Why would one go from a human parsed, precise, object oriented, "intelligent" vector table to an unparsed, jagged, single variable, "stupid" raster layer? Over the years I could identify only two good reasons to do so. The first is to get all your data layers into a single analysis paradigm (in this case the raster one) to simplify the geoprocessing chain. The second is because this "stupid" raster paradigm is still also the fastest when it’s time to analyse large volume of data. Processing a raster stack relies on very simple matrix operations generally available through what geo people call map algebra operations. This is still generally faster than the linear algebra necessary to analyse vector coverages. Anyway, our task here is not to decide whether doing it in vector mode is better than in raster mode. Our task is just to get the conversion done properly…


"Rasterizing" vs "rendering"

Another interesting matter is the difference between rasterizing and rendering. Rendering is the conversion of a vector model to an image for display purpose. Rasterizing is the conversion of a vector model to a raster for data analysis purpose. Renderers, like MapServer, GeoServer and Mapnik, besides blending many layers together, symbolizing geometries and labeling them, will generally produce anti-aliasing along the edge of polygons and translate values into symbolic colors. Any original data value is lost and there is no way to use the product of a renderer for GIS analysis. On the other hand, a rasterizer function, like the ones provided by GDAL, QGIS, ArcGIS or PostGIS, will produce raw pixels trying to represent the original vector surface as crisply and accurately as possible, assigning to each pixel one of the value (and only those values) associated with the original geometry. The raster resulting from a rasterization process is a valuable input into a geoprocessing analysis chain. The result of a rendering process is not. We must also say that although there are some methods to rasterize a vector coverage in PostGIS, there are still no ways to render one (unless you consider ST_AsJPEG(), ST_AsTIFF() and ST_AsPNG() as very basic renderers).


Method 1.1 – Basic ST_Union() and ST_Tile()

The first rasterization method I present is the one that was planned from the beginning of the PostGIS Raster project: simply convert all the geometries to small rasters and union them into a single raster like we union many geometries together using ST_Union(geometry). A complete query, aligning the resulting raster on an existing elevation raster coverage, should looks like this:

CREATE TABLE forestheight_rast AS
SELECT ST_Union(ST_AsRaster(geom, rast, '32BF', height, -9999)) rast
FROM forestcover, (SELECT rast FROM elevation LIMIT 1) rast;

Close-up on the unioned rasterized
version of the forest cover resulting
in "jagged" areas of same height.
This results is a unique "big" raster, aggregating all the "small" rasterizations of the geometries.

Note the arguments to ST_AsRaster(). The first one is the geometry to rasterize. The second one is the reference raster from which the alignment parameters are borrowed (a pixel corner x and y, the scale (or pixel size) and the skew (or rotation)). The third argument is the pixel type of the expected raster (32 bit float). The fourth parameter is the value to assign to the pixel. Here is it the 'height' column from the forest cover table. The last parameter ('-9999') is the nodata value to assign to pixels padding the area surrounding each rasterized geometry. This is the very first variant of the ST_AsRaster() function in the PostGIS Raster reference.

Like its geometric counterpartST_Union() simply aggregate (merge) all those small rasters together into a unique raster. Without ST_Union(), the query would have resulted in 2755 small rasters. One for each geometry.

Note also that we picked only one raster from the elevation layer as the reference raster to ST_AsRaster(). This is because all the rasters in this table are well aligned together and we need only one set of pixel corners coordinates to align everything. We could also have put this query selecting a single raster directly into the SELECT part:

CREATE TABLE forestheight_rast AS
SELECT ST_Union(ST_AsRaster(geom, (SELECT rast FROM elevation LIMIT 1), '32BF', height, -9999)) rast
FROM forestcover;

We normally don’t want to keep big rasters into PostGIS so we can split the merged raster into smaller tiles having the same dimensions as the elevation coverage with the ST_Tile() function:

CREATE TABLE tiled_forestheight_rast AS
SELECT ST_Tile(rast, 100, 100) rast
FROM forestheight_rast;

The tiling of the rasterized forest cover, in red, is not well
aligned with the tiling of the the elevation raster, in grey.
Note that the squares are 100x100 pixels tiles, not pixels...
This results in 48 tiles, most being 100x100 pixels. Some are only 18x100 pixels and some are only 100x46 pixels. The lower right one is only 18x46 pixels...

There are 625 tiles of 100x100 pixels in the elevation coverage and 2755 geometries in the forest cover. These geometries cover only 31 of those tiles and the resulting big raster covers 56 of them. The query for creating the "big" raster takes 5 seconds (thanks to Bborie Park who speeded up ST_Union() by a huge factor in PostGIS 2.1) and the query to resample/retile it takes only 2 seconds.

One problem with this result is that even though the pixels are well aligned with the elevation layer and most of the tiles have the same dimensions, the tiles themselves are not aligned with the elevation coverage and they do not cover the same area. Method 1.2 will fix that.


Method 1.2 – ST_Union() and ST_MapAlgebra()

In order to produce a raster coverage having a footprint identical to the elevation coverage we must use this coverage not only for alignment but also as the base for our final tiles. The trick is to "burn" the big raster into empty tiles based on each of the elevation tiles. For that we will use a quite simple (!) two rasters ST_MapAlgebra() call:

CREATE TABLE welltiled_forestheight_rast AS
SELECT ST_MapAlgebra(fc.rast, ST_AddBand(ST_MakeEmptyRaster(e.rast), '32BF'::text, -9999, -9999), '[rast1]', '32BF', 'SECOND') rast
FROM forestheight_rast fc, elevation e;

The two rasters ST_MapAlgebra() function overlays the two rasters and compute an expression for each aligned set of two pixels. The result of this expression becomes the value assigned to each pixel of a new raster.

As the two rasters are not necessarily well aligned, the new raster extent can be equal to 1) the extent of the union of both raster, 2) the extent of the intersecting area 3) the extent of the first raster or 4) the extent of the second raster. Here, no two rasters involved in the map algebra have the same extent. Since we want to "burn" the big raster into tiles having the same extent as the elevation tiles, which are second in the list of arguments, we have to specify that the new raster will have the same extent as the "SECOND" raster.

The first argument is the "big" forest height raster. It is the only row of the forestheight_rast table.

The second argument is a new tile generated with ST_MakeEmptyRaster() and ST_AddBand() from each of the 626 elevation tiles. We could have used the original tiles directly, but since we are specifying the resulting extent to be "SECOND" and that the nodata value of the result from ST_MapAlgebra() is picked from the "FIRST" or the "SECOND" raster when those values are specified, then the resulting raster would have had 32767 as nodata value. This is the nodata value of the elevation raster coverage. We want the nodata value of the result to be the same as the one of the forest height coverage, which is -9999. So the trick is to "build" a new tile from scratch with ST_MakeEmptyRaster() with the original elevation tiles as alignment reference and add them a band with ST_AddBand() specifying a pixel type ('32BF'), an initial value ('-9999') and a nodata value ('-9999') identical to the forest height raster.

The third argument is the expression computing the value to assign to each pixel. Pixel values from the first raster are refered as "[rast1]" and pixel values from the second raster are referred as "[rast1]". You can also reference the x coordinate of the pixel with "[rast.x]" and the y coordinate with "[rast.y]". The expression is any normal PostgreSQL expression like for example "([rast1] + [rast2]) / 2".

In our case the expression is simply the value of the first raster, the forest height one, that we want to "burn" into the tile. It does not involve values from the elevation raster. In other word the elevation tiles are merely used as well aligned blank pieces of paper on which we "print" the value of the forest height raster...

The fourth argument is the pixel type ('32BF') expected for the resulting raster. It is the same as the forest height raster.

The last argument ('SECOND') tells ST_MapAlgebra() to use the extent of the second raster (the elevation tiles) to build the resulting raster.

ST_MapAlgebra() will hence copy each value from the forest height raster to new empty tiles which dimensions are based on the elevation ones. The query results in 625 tiles. When a tile does not intersect the area covered by the forest cover, all its pixel are simply set to nodata.

If your big raster does not cover much of the reference raster coverage, you might speed up the process by adding a spatial index to the reference coverage (it should already be there) and restraining the ST_MapAlgebra() computation to tiles intersecting with the vector coverage:

CREATE INDEX elevation_rast_gist ON elevation USING gist (st_convexhull(rast));

CREATE TABLE welltiled_forestcover_rast AS
SELECT ST_MapAlgebra(fc.rast, ST_AddBand(ST_MakeEmptyRaster(e.rast), '32BF'::text, -9999, -9999), '[rast1]', '32BF', 'SECOND') rast
FROM forestcover_rast fc, elevation e
WHERE ST_Intersects(fc.rast, e.rast);

Keep in mind however that this query is faster because it does not process and hence does not return tiles not touching the forest cover area. If you want to produce a complete set of tile, identical in number and extent to the elevation coverage, keep with the previous query.


Some drawbacks to the methods using ST_AsRaster() and ST_Union()

All those queries involving ST_AsRaster() and ST_Union() are quite fast but they present two drawbacks:
  • ST_Union limited by RAM - Any query involving ST_Union() is limited by the RAM available to PostgreSQL. If the big raster resulting from the union of all the rasterized geometries is bigger than the available RAM, the query will fail. The trick to avoid this is to aggregate the small rasters not into a huge unique raster, but into smaller groups of rasters intersecting with tiles from the reference raster. Method 1.3 will show how to do that.
  • Only one extractable metric - A second limitation is that ST_AsRaster() has only one method to determine if the geometry value must be assigned to the intersecting pixel or not. When the geometry is a polygon, for example, ST_AsRaster() assigns the value of the geometry to the pixel if the center of the pixel is inside this polygon whatever what proportion of this polygon covers the pixel area. In the worst case the value of a very small polygon encircling the center of the pixel would be assigned when all the rest of the pixel area would be covered by a polygon with a different value. You can think of many cases where the value at the center of the pixel is not necessarily representative of the area covered by the pixel.

    This is because ST_AsRaster() rasterize only one geometry at a time without consideration for other geometries intersecting with the pixel. If you consider all the geometries of a layer when assigning a value, you might want to assign other metrics to the pixels like the count of geometry intersecting with the pixel, the total length of all the polylines intersecting with the pixel, the value associated with the smallest polygon intersecting with the pixel, etc...

    Method 2 will show how to use the PostGIS Addons ST_ExtractToRaster() function to extract some of those metrics from a whole vector coverage and how to easily implement new ones.

Method 1.3 – Memory efficient ST_Union() and ST_MapAlgebra()

This method is RAM safe in that it never produces a "big" raster bigger than the area covered by one tile from the reference raster. It is a little bit slower than the first query of method 1.2 because it union some rasters many times when they touches many tiles. It also produces 625 tiles like the elevation coverage.

CREATE TABLE ramsafe_welltiled_forestcover_rast AS
WITH forestrast AS (
  SELECT rid, ST_MapAlgebra(
                ST_Union(ST_AsRaster(geom, rast, '32BF', height, -9999)),
                ST_AddBand(ST_MakeEmptyRaster(rast), '32BF'::text, -9999, -9999), 
                '[rast1]', '32BF', 'SECOND') rast
  FROM a_forestcover2, elevation
  WHERE ST_Intersects(geom, rast)
GROUP BY rid, rast
)
SELECT a.rid,
       CASE
         WHEN b.rid IS NULL THEN ST_AddBand(ST_MakeEmptyRaster(a.rast), '32BF'::text, -9999, -9999)
         ELSE b.rast
       END rast
FROM elevation a LEFT OUTER JOIN forestrast b 
ON a.rid = b.rid;

The first query in the WITH statement (forestrast) rasterize the geometries, union them by group intersecting the reference raster tiles and burn each of them to a new tile as seen before. This is done by the addition of a WHERE clause limiting the operation to geometries intersecting with tiles and a GROUP BY clause insuring that ST_Union() only aggregate small rasters "belonging" to the same tile.

The WHERE clause however cause the extent to be limited to the area covered by the forest cover. This first part alone would return only 32 tiles. The second part of the query make sure we get the 593 remaining tiles. It uses a LEFT OUTER JOIN to make sure a row is returned for every 625 tiles from the reference raster coverage. When the rid of the reference raster coverage match one of the 32 tiles, these tiles are returned. Otherwise a new empty tile is returned. You might skip this part if your source vector layer covers the full area of the reference raster or if you wish to limit the rasterized area to the one of the vector layer.


Method 2 – PostGIS Add-ons ST_ExtractToRaster()

As said before, ST_AsRaster() is quite limited in terms of the kind of value it can assign to the pixels. Not so many values can be extracted from a single polygon: the value at the centroid, the proportion of the pixel area covered, the value weighted by this proportion, the length of the intersecting part of a polyline being rasterized... ST_AsRaster() however, because it is based on GDAL, only implements the first one.

Furthermore, many more metrics can be imagined if the value assigned to each pixel comes not only from one geometry but from the aggregation of all the geometries (or the values associated with them) of a vector coverage intersecting the centroid or the surface of the pixel. We might, for example, want to compute the number of geometries intersecting with the pixel. We might be interested by the value of the polygon covering the biggest part of the pixel or of the polyline having the longest length inside the pixel are. We might want to merge together geometries having the same value before computing the metric. I could identify nearly 76 different possible metrics extractable from point, polyline and polygon coverages.

We hence need a function that is able to extract metrics from a vector coverage as a whole and which allows us to implement new extraction methods easily. The PostGIS Add-ons ST_ExtractToRaster() function was written with exactly this goal in mind.

The PostGIS Add-ons is a set of about 15 pure PL/pgSQL community written functions helping to write advanced PostGIS queries. One of the most useful of these functions is ST_ExtractToRaster().

A query using ST_ExtractToRaster(), returning the same set of tiles as method 1.3 and also not suffering from RAM limitations would look like this:

CREATE INDEX forestcover_geom_gist ON forestcover USING gist (geom);

CREATE TABLE extracttoraster_forestcover AS
SELECT ST_ExtractToRaster(
         ST_AddBand(ST_MakeEmptyRaster(rast), '32BF'::text, -9999, -9999), 
         'public', 
         'forestcover', 
         'geom', 
         'height', 
         'MEAN_OF_VALUES_AT_PIXEL_CENTROID') rast
FROM elevation;

First and very important remark: this query takes around 200 seconds! This is about 16 times more than the equivalent RAM safe query of method 1.3! This is because ST_ExtractToRaster() does much more than ST_AsRaster(). It not only considers the whole coverage, it can compute very different things from this coverage. It is definitely not worth using ST_ExtractToRaster() if what you want is the value of the geometry intersecting the pixel centroid. The interest for this function is the other methods available...

Note the last argument to ST_ExtractToRaster(): 'MEAN_OF_VALUES_AT_PIXEL_ CENTROID'. This is the method used to extract a value for the pixel. Fifteen (15) of these methods have been implemented with ST_ExtractToRaster() up to now. They are listed below with a short description.

The 'MEAN_OF_VALUES_AT_PIXEL_CENTROID' method returns a raster coverage which values are identical to the raster coverage produced with the previous methods. Most other methods, however, will return quite different rasters…

The other arguments to ST_ExtractToRaster() are quite trivial: the first is the raster which values are extracted to. It’s a good practice to pass a new empty raster band having the right pixel type, initial value and nodata value and which alignment is based on tiles from the reference raster. This is identical as what we did in 1.2 and 1.3. The second argument is the schema of the vector table to rasterize ('public'). The third and the fourth are the name of the table ('forestcover') and the name of the column  ('geom') containing the geometry to rasterize. The fifth argument is the column containing the value to be used in the computation of a new value when it is necessary ('height'). When the computation involve only geometries (e.g. COUNT_OF_VALUES_AT_PIXEL_ CENTROID or SUM_OF_LENGTHS), this argument is not necessary.

The query will return a new raster for each raster in the elevation table, in this case 625.

Close-up on four pixel taking different values depending on the method used to assign them a value. The green and blue pixels were extracted using the MEAN_OF_VALUES_AT_PIXEL_CENTROID method. The pixels delimited by the red lines were extracted using the VALUE_OF_MERGED_BIGGEST method. The values displayed are the values associated with the polygons which a delineated in black. Note how the pixels delineated by the red lines take the values of the polygons occupying the biggest proportion of the pixel.
Here is a list of methods already implemented for ST_ExtractToRaster(). All methods using the pixel centroid are postfixed with "_AT_PIXEL_CENTROID". All other methods involve the pixel as a rectangular surface.
  • COUNT_OF_VALUES_AT_PIXEL_CENTROID: Number of geometries intersecting with the pixel centroid.
  • MEAN_OF_VALUES_AT_PIXEL_CENTROID: Average of all values intersecting with the pixel centroid.
  • COUNT_OF_POLYGONS: Number of polygons or multipolygons intersecting with the pixel surface.
  • COUNT_OF_LINESTRINGS: Number of linestrings or multilinestrings intersecting with the pixel surface.
  • COUNT_OF_POINTS: Number of points or multipoints intersecting with the pixel surface.
  • COUNT_OF_GEOMETRIES: Number of geometries (whatever they are) intersecting with the pixel surface.
  • VALUE_OF_BIGGEST: Value associated with the polygon covering the biggest area intersecting with the pixel surface.
  • VALUE_OF_MERGED_BIGGEST: Value associated with the polygon covering the biggest area intersecting with the pixel surface. Polygons with identical values are merged before computing the area of the surface.
  • VALUE_OF_MERGED_SMALLEST: Value associated with the polygon covering the smallest area in the pixel. Polygons with identical values are merged before computing the area of the surface.
  • MIN_AREA: Area of the geometry occupying the smallest portion of the pixel surface.
  • SUM_OF_AREAS: Sum of the areas of all polygons intersecting with the pixel surface.
  • SUM_OF_LENGTHS: Sum of the lengths of all linestrings intersecting with the pixel surface.
  • PROPORTION_OF_COVERED_AREA: Proportion, between 0.0 and 1.0, of the pixel surface covered by the union of all the polygons intersecting with the pixel surface.
  • AREA_WEIGHTED_MEAN_OF_VALUES: Mean of all polygon values weighted by the area they occupy relative to the pixel being processed. The weighted sum is divided by the maximum between the intersecting area of the geometry and the sum of all the weighted geometry areas. That means if the pixel being processed is not completely covered by polygons, the value is multiplied by the proportion of the covering area. For example, if a polygon covers only 25% of the pixel surface and there are no other polygons covering the surface, only 25% of the value associated with the covering polygon is assigned to the pixel. This is generally the favorite behavior.
  • AREA_WEIGHTED_MEAN_OF_VALUES_2: Mean of all polygon values weighted by the area they occupy relative to the pixel being processed. The weighted sum is divided by the sum of all the weighted geometry areas. That means if the pixel being processed is not completely covered by polygons, the full weighted value is assigned to the pixel. For example, even if a polygon covers only 25% of the pixel surface, 100% of its value is assigned to the pixel.

Adding methods to ST_ExtractToRaster()

Want to compute some other esoteric metrics with ST_ExtracToRaster()? Internally the function use the callback version of ST_MapAlgebra() which call one of the two predefined callbacks depending on if the value is computed for the pixel centroid or if the value is computed using the full surface of the pixel. Those functions are named ST_ExtractPixelCentroidValue4ma() and ST_ExtractPixelValue4ma(). They fulfil to the criteria for ST_MapAlgebra() callback functions. They take three arguments defining the value of the input pixel (there can be many), the position of the pixel in the raster and some extra parameters when necessary. In order to build a query for each pixel, both functions expect extra parameters: the alignment parameters of the raster for which values are being computed and the parameters of the geometry column for which values are being extracted. The callbacks then define a series of extraction methods.

To add a new method to these callbacks, you can simply copy the query associated with an existing method that extract a value similar to what you want to extract, give it a proper name, modify it to compute the expected value and make sure it uses the callback extra arguments properly. It is a good idea to test the query directly on the vector coverage prior to test it in one of the two callback. Most of the methods do a ST_Intersects() between the shape of the pixel and the vector coverage (this is why it is important that the vector coverage be spatially indexed) and extract some metrics from the list of intersecting geometries.

If you add a new method which might be useful to others, let me know. I will add it to the PostGIS Add-ons code…


Conclusion

Rasterizing a complete vector coverage can be done very quickly with ST_AsRaster() and ST_Union() but some projects require more sophisticated methods to assign values to pixels considering the coverage as a whole. PostGIS Add-ons ST_ExtractToRaster() is slow but provides much more control on the metric that is extracted from the vector coverage.

In both case it is possible to extract values following a preloaded reference tiled coverage. That should be the case for most applications. It is also easy to add more extraction methods to ST_ExtractToRaster().

November 18, 2013

Launching the PostGIS Add-ons! A new PostGIS extension for pure PL/pgSQL contributions.

I'm glad today to launch the PostGIS Add-ons. This new PostGIS extension aims at providing a quick, more Agile way to release pure PL/pgSQL functions developed by the PostGIS user community. It includes 15 new functions. Among them, two functions to aggregate raster/vector overlay statistics, one to extract values from a vector coverage to a raster, one to generate random points inside a polygon and two new aggregates functions helping removing overlaps in a table of geometry.

You can download the PostGIS Add-ons from my GitHub account. I'm already at version 1.19. The second version number should increment for each significant addition. The major version number should increase when there will be a major change in some function parameters or when we will stop supporting the actual minimal required versions of PostgreSQL and PostGIS which are now PostgreSQL 9.x and PostGIS 2.1.x.

You can contribute to the PostGIS Add-ons with your own PL/pgSQL functions by following the Fork & Pull GitHub model. Just make sure that:

  • your function is written in pure PL/pgSQL (no C!),
  • your function is generic enough to be useful to other PostGIS users, 
  • you follow functions and variables naming and indentation conventions already in use in the files, 
  • you document your function like the ones already provided, 
  • you provide some tests in the postgis_addons_test.sql file,
  • you add the necessary DROP statements in the postgis_addons_uninstall.sql file.

So why not contributing your PL/pgSQL work directly to the core PostGIS project? Good question! We can say that the it's easier to contribute pure PL/pgSQL functions to the PostGIS Add-ons for a number of reasons:
  • Because you can do it yourself without relying on one of the core PostGIS developers. No need to compile PostGIS, no need to write to the mailing list and explain the usefulness of your function, no need to write a ticket and wait for one of the main PostGIS developer to add it for you. Just make sure your code is ok and send a pull request to the GitHub project. It should be added pretty fast and trigger a new PostGIS Add-ons release. Your new function will be released in a matter of hours or days, not weeks or months.
  • Because your function does not fit well in the PostGIS SQL API. Sometimes PL/pgSQL functions are not as generic as the one provided in the PostGIS SQL API or do not perform as well as most core PostGIS functions which are written in C. The PostGIS Add-ons will try to be very open to functions performing very high level tasks or which, because there are implemented in PL/pgSQL, would not perform as well as if they would be implemented in C. Requesting a function to be implemented in C generally means you will not see your function added to the PostGIS core before weeks or months. The PostGIS Add-ons allows you to release them NOW, making them easily accessible, documented, testable and changeable before they get a more definitive signature and find their way in the PostGIS core in C. Some PostGIS Add-ons functions will find their way in the core after being converted to C, some will not and will stay as they are.
So the main goal is to provide casual PL/pgSQL developers, not necessarily skilled in C or not wanting to build PostGIS (on Windows for example which can take days), a channel to contribute to PostGIS. I'm sure there are dozens of PostGIS users out there who developed quite useful stuff but never took time to share them because they found it "too complicated". The PostGIS Add-ons is the first effort to give them and easy and consistent way to contribute to PostGIS.

The PostGIS Add-ons tries to make things as simple as they can be. It consist in a single, easy to install, postgis_addons.sql file accompanied by an uninstall and a test file. For the sake of simplicity, documentation about each function is embedded in the main file, before each function declaration. A list of all the function available with a short description is provided at the beginning of the file.

So here is a list of what functions are available in this first release (in order of what I think are the most interesting contributions):

ST_ExtractToRaster() - Compute a raster band by extracting values for the centroid or the footprint of each pixel from a global geometry coverage using different methods like count, min, max, mean, value of biggest geometry or area weighted mean of values. This function use ST_MapAlgebra() to compute stats about a geometry or a raster coverage, for each pixel of a raster. Stats are computed using custom SQL queries exploiting the spatial index existing on the coverage being extracted. New stats queries can easily be added on demand. A list of possible stats is provided in objective FV.27 in the PostGIS raster working specifications. A number of them are already implemented.

ST_SplitAgg() - Returns the first geometry as a set of geometries after being split by all the second geometries being part of the aggregate. This function is a more robust and powerful alternative to the solution provided in the PostGIS wiki to overlay two vector coverages. It is normally used to remove overlaps in a vector coverage by splitting overlapping polygons. It does not imply the union of all the geometry so it can work on very large geometry tables. a tolerance parameter allows removing slivers resulting from the spitting.

ST_DifferenceAgg() - Returns the first geometry after having removed all the subsequent geometries in the aggregate. It is used to remove overlaps in a geometry table by erasing parts from all but one of many overlapping polygons.

ST_RandomPoints() - Generates points located randomly inside a geometry.

ST_GlobalRasterUnion() - Build a new raster by extracting all the pixel values from a global raster coverage using different methods like count, min, max, mean, stddev and range. Similar and slower, but more flexible than ST_Union.

ST_AreaWeightedSummaryStats() - Aggregate function computing statistics on a series of intersected values, weighted by the area of the corresponding geometry. The most notable statistics is the weighted mean very useful when intersecting buffers with a raster coverage.

ST_SummaryStatsAgg() - Aggregate function computing statistics on a series of rasters generally clipped by a geometry.

ST_CreateIndexRaster() - Creates a new raster as an index grid. Many parameters allow creating grids indexed following any order.

ST_BufferedSmooth() - Returns a smoothed version of the geometry. The smoothing is done by making a buffer around the geometry and removing it afterward. This technique, sometimes called dilatation/erosion or inward/outward polygon offseting, was described years ago by Paul Ramsey and more recently by "spluque" in the PostGIS wiki (with a nice animated GIF).

ST_BufferedUnion() - Alternative to ST_Union(geometry) making a buffer around each geometry before unioning and removing it afterward. Used when ST_Union leaves internal undesirable vertexes after a complex union or when holes want to be removed from the resulting union. This function, used with some very complex and detailed geometry coverages, fasten and simplify the unioning of all the geometries into one.

ST_DeleteBand() - Removes a band from a raster.

ST_AddUniqueID() - Adds a column to a table and fill it with a unique integer starting at 1.

ST_NBiggestExteriorRings() - Returns the n biggest exterior rings of the provided geometry based on their area or thir number of vertex. Mostly useful to the ST_BufferedUnion() function.

ST_TrimMulti() - Returns a multigeometry from which simple geometries having an area smaller than the tolerance parameter have been removed. Mostly useful to the ST_SplitAgg() function.

ST_ColumnExists() - Returns true if a column exist in a table. Mostly useful to the ST_AddUniqueID() function.

I hope the PostGIS Add-ons will be useful to some people and trigger a new wave of contributions to PostGIS.



October 7, 2013

Semantic MediaWiki: A promising platform for the development of web geospatial crowdsourcing applications


Are you looking for a simple web platform to develop a crowdsourcing data application? Are you planning to develop your own application with some web development language like PHP, Java, Python or Ruby? Do you want to do it quick at a minimal cost? Do you want your users to be able to capture geographical entities as part of the data? Read on...

Semantic MediaWiki is a semantic extension to MediaWiki, the wiki platform on which Wikipedia is built. MediaWiki itself is serious business: It is used by thousands of organizations as the base for their websites and Wikipedia is the sixth most active website in the world. MediaWiki can be scaled to accept thousands of editions and millions of visits per hours. There is no doubt MediaWiki is an incredibly robust website development platform.

Semantic MediaWiki (SMW) adds the possibility to add structured data to MediaWiki pages. In other terms: "to create objects (pages) of a certain class (category) having certain properties". For those with a relational database background the transcription looks like: "to create records in tables having certain attributes". With the SMW extension (and a bunch of other extensions), you can build a complete relational database on the web, present the data  into pages formatted with wikitext templates (instead of complex CSS or HTML) and list and edit them with fully customizable forms. You can query the data to create views and format them in a multitude of ways. All that online, without having to write one line of low level Python, Java, JavaScript or PHP code! Only wiki markups! The same simple markup language that has allowed hundred of thousand of people with no development skills to contribute to Wikipedia!

I challenge anybody to find a CMS offering the same features and flexibility SMW does. Seriously! All the big players in data management, Microsoft, Google, Oracle (certainly not Oracle) still don't offer any system that compete with Semantic MediaWiki in terms of simplicity, from an administrator and a user point of view, to develop a database on the web, styled with templates and editable with forms.

In this article, I want to emphasis the characteristics of a good crowdsourcing web development platform and try to demonstrate that MediaWiki, when it is installed with the Semantic MediaWiki extension, is about the best platform for developing crowdsourcing websites and that it could also become, with little efforts, a very good platform for crowdsourcing of geographic information. I also try to show how fundamental geographic information systems concepts could be transposed to the world of a web semantic database like SMW. I presented most of these ideas during the last New-York City's SMW conference. I’m addressing two kinds of reader: geospatial web application developers familiar with geospatial and web technology but unfamiliar with wikis and mostly with MediaWiki, and Semantic MediaWiki gurus who might be interested in adding complex geospatial information to their wiki.

December 13, 2012

Duncan Golicher's series of articles on PostGIS Raster

Duncan Golicher is researcher in forest ecology at the Colegio de la Frontera sur in Mexico. "The Colegio de la Frontera Sur is a scientific research center that seeks to contribute to the sustainable development of the southern border of Mexico, Central America and the Caribbean through the generation of knowledge, the formation of human resources and the linkage between social and natural sciences."

In the past few months, Duncan has been experiencing a lot with the new raster/vector analysis capacities in PostGIS 2.0 and the integration of PostGIS data in his R workflow. The nicest about it is that he has posted a quite impressive series of articles about his experience in his blog and I thought it was worth sharing them here (so I gain a bit of time to write my next long article :). Here are the posts in chronological order:


Nice work Duncan!

October 22, 2012

Read and write PostGIS rasters with FME Beta now!

It's done! FME 2013 Beta reads and writes rasters from and to PostGIS! Safe Software posted a short tutorial in YouTube at the beginning of the month explaining how to write rasters into PostGIS and how to read them back using FME Workbench. There is no option to tile the raster before loading it because FME provides another tool, the "RasterTiler" transformer, to tile a raster source before feeding the PostGIS Writer.

September 5, 2012

Loading many rasters into separate tables with the PostGIS raster loader

There is no way yet, with the PostGIS raster loader, to load a series of rasters into separate tables. You can certainly load a series of rasters invoking the loader like this:

raster2pgsql -s 4236 -t 100x100 -I -C -F dem*.* public.dem | psql -d mydb

But all the rasters will be loaded into the same table, splited into 100x100 tiles. If you load ten 1000x1000 rasters, you end up with a table of 1000 rows.

August 27, 2012

FME 2013 will soon read and write rasters stored in PostGIS

Safe Software will write a reader, a writer and some others functionalities to interact with rasters stored in PostGIS! These tools should be released with FME 2013 and already available in future beta versions of FME this fall. A great news for the PostGIS community!

On the GDAL side, thanks to Jorge Arevalo, lots of improvements have been recently done on the PostGIS Raster driver. You can now read irregular arrangements of raster and translate them into any format supported by GDAL. We are still looking for sponsors to implement the writer part of the driver.