***2004 Florida Cropland Data Layer specific information***
The processing for the Florida CDL differed from the other 2004 CDLs. Firstly, the Florida CDL used additional citrus training/validation data provided by the Florida NASS Field Office. The Citrus Grove Data Layer is confidential and for internal NASS use only. The second major difference in processing is that special processing was required to properly identify sugarcane. This included photo intrepretation to identify additional sugarcane training data and to limit the spatial extent of the sugarcane area.
The Cropland Data Layer (CDL) Program provides the National Agricultural Statistics Service (NASS) with internal proprietary county and state level acreage indications of major crop commodities, and secondarily provides the public with "statewide" (where available) raster, geo-referenced, categorized land cover data products after the public release of county estimates. This project builds upon NASS' traditional crop acreage estimation program, and integrates enumerator collected ground survey data and/or Farm Service Agency field data with satellite imagery to create an unbiased statistical estimator of crop area at the state and county level for internal use. Please note that in no case is farmer reported data revealed or derivable from the public use Cropland Data Layer. The 2004 Florida Cropland Data Layer was developed using Leica Geosystems ERDAS Imagine in tandem with Rulequest See5.0. Both are commercial software packages. ERDAS Imagine, being a comprehensive image processing suite, handled the bulk of the processing steps including preprocessing and managing of the raw imagery and training data, building of the scene classifications, and creation of the final statewide mosaics. See5.0 was solely used to derive the classification rules, based on training data, for which ERDAS Imagine then applies back to the input imagery. Broadly defined, See5.0 is a niche data mining tool that derives decision trees, or a set of if-then rules, to assemble data into categories. It is not a GIS application in itself.Decision trees offer several advantages over the more traditional maximum likelihood classification method. The advantages include being: 1) non-parametric by nature and thus not reliant on the assumption of the input data being normally distributed, 2) efficient to construct and thus capable handling large and complex data sets, 3) able to incorporate missing and non-continuous data, and 4) able to sort out non-linear relationships. These reasons combined usually lead to improved classifications over the maximum likelihood method. Additionally, there are several varieties of decision tree classifiers but See5.0 stands out because it further employs a statistical technique known as "boosting" which has been shown to improve results even further. More information on Rulequest's See5.0 software can be found at <http://www.rulequest.com/>.As with the maximum likelihood method, decision tree analysis is a supervised classification technique. Thus, it relies on having a sample of known ground truth areas in which to "train" the classifier. In turn, the classifier can then "learn" how to most reasonably place into a category the rest of the unknown pixels. The best ground truth comes from a statistically representative probability sample that is dense enough to account for the variability of the land cover types that are being mapped. Traditionally, NASS CDLs have utilized ground truth data from the annual June Agricultural Survey (JAS). More information about the JAS can be found at <https://www.nass.usda.gov/Surveys/June_Area/>. To make this survey data available for use within a classification takes a fair amount of labor because the field boundaries and attributes have to be manually digitized into a GIS since natively they are recorded only on paper. More recently, very comprehensive ground truth data has been provided from the Farm Service Agency (FSA) which NASS has begun utilizing as a replacement for the JAS information. The FSA data has the advantage of natively being in a GIS and containing magnitudes more of field level information. Disadvantages include it is not truly a probability sample of land cover and has bias toward subsidized "program" crops. Additional information about the FSA data can be found at <https://www.fsa.usda.gov/> and <https://datagateway.nrcs.usda.gov/>.All available raw satellite imagery for the region was used as input along with the non-agricultural portion of the United States Geological Survey's (USGS) 2001 National Land Cover Data (NLCD). Additional information on the USGS NLCD can be found at <https://www.mrlc.gov/>.Scene selection begins in early summer, and could run into the late fall depending on image availability. The Cropland Data Layer program primarily uses the Landsat TM or IRS AWIFS platform for acreage estimation. However, other platforms such as Spot or gap-filled Landsat ETM+ are used to fill "data acquisition" holes within a state. A spring and summer date of observation is preferred for maximum crop cover separation for multi-temporal analysis of summer crops. If only one date of observation is available (unitemporal), a mid summer date is preferred. If only an early spring date March-May or a fall date September-October is available (unitemporal) during the growing season, then it is best to not use that scene or analysis district for estimation, as bare soil in the spring and fully senesced crops in the fall will provide erroneous results.For estimation purposes, clouds can be minimized by defining Analysis Districts (AD) along adjacent scene edges, by cutting the Analysis Districts by county boundary, or cutting the clouds out by primary sampling units. Analysis Districts can be individual or multiple scenes footprints that have to be observed on the same date, and analyzed as one. An AD can be comprised of one or more scenes. An AD can be defined by either a scene edge or a county boundary. Multi-temporal AD's are possible as long as both dates in all scenes are the same. A single or multi-scene AD will use all potential training fields for clustering/classification/estimation. Several factors can lead to problems in a classification, some get corrected in early edits and some do not.Several factors can lead to problems in a classification, some get corrected in early edits and some do not: poor imagery dates, with respect to the major crops of interest, incomplete or incorrect the ground truth, irrigation ditches, wooded areas, low spots filled with water, and/or bare soil areas in an otherwise vegetated field. Crops that look alike to the clustering algorithm(s) due to planting/growing cycle: spring wheat and barley at almost any time, crops in senescence, and grassy waste fields and idle cropland. Cover types that are essentially the same but used differently: wooded pasture versus woods or waste fields (only difference may be the presence of livestock), corn for grain versus corn silage, and cover crops such as rye and oats. Cover types that change signatures back and forth during the growing season: alfalfa and other hays before and after cutting, with multiple cuttings per year.Each categorized scene is co-registered to MDA Federal Inc's GeoCover LC imagery (50 meters RMS), and then stitched together using Peditor's Batch program. A block correlation is run between band two from each raw scene, and band two of the ortho-base image. The registration of the GeoCover mosaicked scene and the individual raw input scenes are used to get an approximate correspondence. A correlation procedure is used on the raw scenes and the mosaicked scene to get an exact mapping of each pixel from the input scenes to the mosaicked scene. The results of the correlation are used to remap the pixels from the individual input scenes into the coordinate system of the mosaicked scene. The mosaic process now performs: 1) Precision registration of images automatically, 2) Converts each categorized image and associated statistics file to a set standard automatically (recode), 3) Specify overlap priority by scene or county, 4) Filters out clouds when possible. The scenes are stitched together using the priorities previously assigned from the scene observation dates/analysis districts map. Scenes/analysis districts with better quality observation dates are assigned a higher priority when stitching the images together. Clouds are assigned a null value on all scenes, and scenes of lower priority that are cloud free, take precedence over clouded higher priority images. Once cloud cover is established throughout the mosaic the clouds are assigned a digital value.The Cropland Data Layer products contain imagery in GEOTIFF image file format.All CDL distribution for the previous crop year is held until the release of the official NASS county estimates for the major commodities grown within a given state. Corn and Soybeans are released in March for the previous crop year - Midwestern States. Rice and Cotton are released in June for the previous crop year - Delta States. Small grains are released in March for the Great Plains States.NASS publishes all available accuracy statistics for end-user viewing. The Percent Correct is calculated for each cover type in the ground truth, it shows how many of the total pixels were correctly classified (i.e. across all cover types). 'Commission Error' is the calculated percentage of all pixels categorized to a specific cover type that were not of that cover type in the ground truth (i.e. incorrectly categorized). CAUTION: a quoted Percent Correct for a specific cover type is worthless unless accompanied by its respective Commission Error. Example: if you classify every pixel in a scene to 'wheat', then you have a 100% correct wheat classifier (however its Commission Error is also almost 100%). The 'Kappa Statistic' is an attempt to adjust the Percent Correct using information gained from the confusion matrix for that cover type.The NASS CDL Program is continuing efforts to reduce end-user burden, increase functionality, and take advantage of enhancements in computer technology.
PUBLIC RELEASE: The USDA, NASS Cropland Data Layer is considered public domain and free to redistribute. The official website is <https://www.nass.usda.gov/Research_and_Science/Cropland/SARS1a.php>. The data is available free for download through CropScape <https://nassgeodata.gmu.edu/CropScape/> and the Geospatial Data Gateway <https://datagateway.nrcs.usda.gov/>. See the 'Ordering Instructions' section of this metadata file for detailed download instructions. Please note that in no case are farmer reported data revealed or derivable from the public use Cropland Data Layer.