isdparser is an parser for ISD/ISD NOAA files

Code liberated from rnoaa to focus on ISD parsing since it’s sorta complicated. Has minimal dependencies, so you can parse your ISD/ISH files without needing the deps that rnoaa needs. Will be used by rnoaa once on CRAN.

Documentation at ftp://ftp.ncdc.noaa.gov/pub/data/noaa/ish-format-document.pdf

Package API:

  • isd_parse() - parse all lines in a file, with parallel option
  • isd_parse_line() - parse a single line - you choose which lines to parse and how to apply the function to your lines
  • isd_transform() - transform ISD data variables
  • isd_parse_csv() - parse csv format files

isd_parse_csv() parses NOAA ISD csv files, whereas isd_parse() and isd_parse_line() both handle compressed files where each row of data is a string that needs to be parsed.

isd_parse_csv() is faster than isd_parse() because parsing each line takes some time - although using isd_parse(parallel = TRUE) option gets closer to the speed of isd_parse_csv().

Install

Stable from CRAN

install.packages("isdparser")

Dev version

remotes::install_github("ropensci/isdparser")
library("isdparser")

isd_parse_csv: parse a CSV file

Using a csv file included in the package:

path <- system.file('extdata/00702699999.csv', package = "isdparser")
isd_parse_csv(path)
#> # A tibble: 6,843 x 68
#>    station date                source latitude longitude elevation name 
#>      <int> <dttm>               <int>    <dbl>     <dbl>     <dbl> <chr>
#>  1  7.03e8 2017-02-10 14:04:00      4        0         0      7026 WXPO…
#>  2  7.03e8 2017-02-10 14:14:00      4        0         0      7026 WXPO…
#>  3  7.03e8 2017-02-10 14:19:00      4        0         0      7026 WXPO…
#>  4  7.03e8 2017-02-10 14:24:00      4        0         0      7026 WXPO…
#>  5  7.03e8 2017-02-10 14:29:00      4        0         0      7026 WXPO…
#>  6  7.03e8 2017-02-10 14:34:00      4        0         0      7026 WXPO…
#>  7  7.03e8 2017-02-10 14:39:00      4        0         0      7026 WXPO…
#>  8  7.03e8 2017-02-10 14:44:00      4        0         0      7026 WXPO…
#>  9  7.03e8 2017-02-10 14:49:00      4        0         0      7026 WXPO…
#> 10  7.03e8 2017-02-10 14:54:00      4        0         0      7026 WXPO…
#> # … with 6,833 more rows, and 61 more variables: report_type <chr>,
#> #   call_sign <int>, quality_control <chr>, wnd <chr>, cig <chr>, vis <chr>,
#> #   tmp <chr>, dew <chr>, slp <chr>, wind_direction <chr>,
#> #   wind_direction_quality <chr>, wind_code <chr>, wind_speed <chr>,
#> #   wind_speed_quality <chr>, ceiling_height <chr>,
#> #   ceiling_height_quality <chr>, ceiling_height_determination <chr>,
#> #   ceiling_height_cavok <chr>, visibility_distance <chr>,
#> #   visibility_distance_quality <chr>, visibility_code <chr>,
#> #   visibility_code_quality <chr>, temperature <chr>,
#> #   temperature_quality <chr>, temperature_dewpoint <chr>,
#> #   temperature_dewpoint_quality <chr>, air_pressure <chr>,
#> #   air_pressure_quality <chr>, automated_atmospheric_condition_code <chr>,
#> #   quality_automated_atmospheric_condition_code <chr>, coverage_code <chr>,
#> #   coverage_quality_code <chr>, base_height_dimension <chr>,
#> #   base_height_quality_code <chr>, cloud_type_code <chr>,
#> #   cloud_type_quality_code <chr>, connective_cloud_attribute <chr>,
#> #   vertical_datum_attribute <chr>, base_height_upper_range_attribute <chr>,
#> #   base_height_lower_range_attribute <chr>, coverage <chr>,
#> #   opaque_coverage <chr>, coverage_quality <chr>, lowest_cover <chr>,
#> #   lowest_cover_quality <chr>, low_cloud_genus <chr>,
#> #   low_cloud_genus_quality <chr>, lowest_cloud_base_height <chr>,
#> #   lowest_cloud_base_height_quality <chr>, mid_cloud_genus <chr>,
#> #   mid_cloud_genus_quality <chr>, high_cloud_genus <chr>,
#> #   high_cloud_genus_quality <chr>, altimeter_setting_rate <chr>,
#> #   altimeter_quality_code <chr>, station_pressure_rate <chr>,
#> #   station_pressure_quality_code <chr>, speed_rate <chr>, quality_code <chr>,
#> #   rem <chr>, eqd <chr>

Download a file first:

path <- file.path(tempdir(), "00702699999.csv")
x <- "https://www.ncei.noaa.gov/data/global-hourly/access/2017/00702699999.csv"
download.file(x, path)
isd_parse_csv(path)
#> # A tibble: 6,843 x 68
#>    station date                source latitude longitude elevation name 
#>      <int> <dttm>               <int>    <dbl>     <dbl>     <dbl> <chr>
#>  1  7.03e8 2017-02-10 14:04:00      4        0         0      7026 WXPO…
#>  2  7.03e8 2017-02-10 14:14:00      4        0         0      7026 WXPO…
#>  3  7.03e8 2017-02-10 14:19:00      4        0         0      7026 WXPO…
#>  4  7.03e8 2017-02-10 14:24:00      4        0         0      7026 WXPO…
#>  5  7.03e8 2017-02-10 14:29:00      4        0         0      7026 WXPO…
#>  6  7.03e8 2017-02-10 14:34:00      4        0         0      7026 WXPO…
#>  7  7.03e8 2017-02-10 14:39:00      4        0         0      7026 WXPO…
#>  8  7.03e8 2017-02-10 14:44:00      4        0         0      7026 WXPO…
#>  9  7.03e8 2017-02-10 14:49:00      4        0         0      7026 WXPO…
#> 10  7.03e8 2017-02-10 14:54:00      4        0         0      7026 WXPO…
#> # … with 6,833 more rows, and 61 more variables: report_type <chr>,
#> #   call_sign <int>, quality_control <chr>, wnd <chr>, cig <chr>, vis <chr>,
#> #   tmp <chr>, dew <chr>, slp <chr>, wind_direction <chr>,
#> #   wind_direction_quality <chr>, wind_code <chr>, wind_speed <chr>,
#> #   wind_speed_quality <chr>, ceiling_height <chr>,
#> #   ceiling_height_quality <chr>, ceiling_height_determination <chr>,
#> #   ceiling_height_cavok <chr>, visibility_distance <chr>,
#> #   visibility_distance_quality <chr>, visibility_code <chr>,
#> #   visibility_code_quality <chr>, temperature <chr>,
#> #   temperature_quality <chr>, temperature_dewpoint <chr>,
#> #   temperature_dewpoint_quality <chr>, air_pressure <chr>,
#> #   air_pressure_quality <chr>, automated_atmospheric_condition_code <chr>,
#> #   quality_automated_atmospheric_condition_code <chr>, coverage_code <chr>,
#> #   coverage_quality_code <chr>, base_height_dimension <chr>,
#> #   base_height_quality_code <chr>, cloud_type_code <chr>,
#> #   cloud_type_quality_code <chr>, connective_cloud_attribute <chr>,
#> #   vertical_datum_attribute <chr>, base_height_upper_range_attribute <chr>,
#> #   base_height_lower_range_attribute <chr>, coverage <chr>,
#> #   opaque_coverage <chr>, coverage_quality <chr>, lowest_cover <chr>,
#> #   lowest_cover_quality <chr>, low_cloud_genus <chr>,
#> #   low_cloud_genus_quality <chr>, lowest_cloud_base_height <chr>,
#> #   lowest_cloud_base_height_quality <chr>, mid_cloud_genus <chr>,
#> #   mid_cloud_genus_quality <chr>, high_cloud_genus <chr>,
#> #   high_cloud_genus_quality <chr>, altimeter_setting_rate <chr>,
#> #   altimeter_quality_code <chr>, station_pressure_rate <chr>,
#> #   station_pressure_quality_code <chr>, speed_rate <chr>, quality_code <chr>,
#> #   rem <chr>, eqd <chr>

isd_parse_line: parse lines from an ASCII strings file

path <- system.file('extdata/024130-99999-2016.gz', package = "isdparser")
lns <- readLines(path, encoding = "latin1")
isd_parse_line(lns[1])
#> # A tibble: 1 x 38
#>   total_chars usaf_station wban_station date  time  date_flag latitude longitude
#>   <chr>       <chr>        <chr>        <chr> <chr> <chr>     <chr>    <chr>    
#> 1 0054        024130       99999        2016… 0000  4         +60750   +012767  
#> # … with 30 more variables: type_code <chr>, elevation <chr>,
#> #   call_letter <chr>, quality <chr>, wind_direction <chr>,
#> #   wind_direction_quality <chr>, wind_code <chr>, wind_speed <chr>,
#> #   wind_speed_quality <chr>, ceiling_height <chr>,
#> #   ceiling_height_quality <chr>, ceiling_height_determination <chr>,
#> #   ceiling_height_cavok <chr>, visibility_distance <chr>,
#> #   visibility_distance_quality <chr>, visibility_code <chr>,
#> #   visibility_code_quality <chr>, temperature <chr>,
#> #   temperature_quality <chr>, temperature_dewpoint <chr>,
#> #   temperature_dewpoint_quality <chr>, air_pressure <chr>,
#> #   air_pressure_quality <chr>,
#> #   AW1_present_weather_observation_identifier <chr>,
#> #   AW1_automated_atmospheric_condition_code <chr>,
#> #   AW1_quality_automated_atmospheric_condition_code <chr>, REM_remarks <chr>,
#> #   REM_identifier <chr>, REM_length_quantity <chr>, REM_comment <chr>

Or, give back a list

head(
  isd_parse_line(lns[1], as_data_frame = FALSE)
)
#> $total_chars
#> [1] "0054"
#> 
#> $usaf_station
#> [1] "024130"
#> 
#> $wban_station
#> [1] "99999"
#> 
#> $date
#> [1] "20160101"
#> 
#> $time
#> [1] "0000"
#> 
#> $date_flag
#> [1] "4"

Optionally don’t include “Additional” and “Remarks” sections in parsed output.

isd_parse_line(lns[1], additional = FALSE)
#> # A tibble: 1 x 31
#>   total_chars usaf_station wban_station date  time  date_flag latitude longitude
#>   <chr>       <chr>        <chr>        <chr> <chr> <chr>     <chr>    <chr>    
#> 1 0054        024130       99999        2016… 0000  4         +60750   +012767  
#> # … with 23 more variables: type_code <chr>, elevation <chr>,
#> #   call_letter <chr>, quality <chr>, wind_direction <chr>,
#> #   wind_direction_quality <chr>, wind_code <chr>, wind_speed <chr>,
#> #   wind_speed_quality <chr>, ceiling_height <chr>,
#> #   ceiling_height_quality <chr>, ceiling_height_determination <chr>,
#> #   ceiling_height_cavok <chr>, visibility_distance <chr>,
#> #   visibility_distance_quality <chr>, visibility_code <chr>,
#> #   visibility_code_quality <chr>, temperature <chr>,
#> #   temperature_quality <chr>, temperature_dewpoint <chr>,
#> #   temperature_dewpoint_quality <chr>, air_pressure <chr>,
#> #   air_pressure_quality <chr>

isd_parse: parse an ASCII strings file

Downloading a new file

path <- file.path(tempdir(), "007026-99999-2017.gz")
y <- "ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2017/007026-99999-2017.gz"
download.file(y, path)
isd_parse(path)
#> # A tibble: 6,843 x 72
#>    total_chars usaf_station wban_station date  time  date_flag latitude
#>    <chr>       <chr>        <chr>        <chr> <chr> <chr>     <chr>   
#>  1 0157        007026       99999        2017… 1404  4         +00000  
#>  2 0157        007026       99999        2017… 1414  4         +00000  
#>  3 0157        007026       99999        2017… 1419  4         +00000  
#>  4 0157        007026       99999        2017… 1424  4         +00000  
#>  5 0157        007026       99999        2017… 1429  4         +00000  
#>  6 0144        007026       99999        2017… 1434  4         +00000  
#>  7 0157        007026       99999        2017… 1439  4         +00000  
#>  8 0157        007026       99999        2017… 1444  4         +00000  
#>  9 0172        007026       99999        2017… 1449  4         +00000  
#> 10 0157        007026       99999        2017… 1454  4         +00000  
#> # … with 6,833 more rows, and 65 more variables: longitude <chr>,
#> #   type_code <chr>, elevation <chr>, call_letter <chr>, quality <chr>,
#> #   wind_direction <chr>, wind_direction_quality <chr>, wind_code <chr>,
#> #   wind_speed <chr>, wind_speed_quality <chr>, ceiling_height <chr>,
#> #   ceiling_height_quality <chr>, ceiling_height_determination <chr>,
#> #   ceiling_height_cavok <chr>, visibility_distance <chr>,
#> #   visibility_distance_quality <chr>, visibility_code <chr>,
#> #   visibility_code_quality <chr>, temperature <chr>,
#> #   temperature_quality <chr>, temperature_dewpoint <chr>,
#> #   temperature_dewpoint_quality <chr>, air_pressure <chr>,
#> #   air_pressure_quality <chr>, GF1_sky_condition <chr>, GF1_coverage <chr>,
#> #   GF1_opaque_coverage <chr>, GF1_coverage_quality <chr>,
#> #   GF1_lowest_cover <chr>, GF1_lowest_cover_quality <chr>,
#> #   GF1_low_cloud_genus <chr>, GF1_low_cloud_genus_quality <chr>,
#> #   GF1_lowest_cloud_base_height <chr>,
#> #   GF1_lowest_cloud_base_height_quality <chr>, GF1_mid_cloud_genus <chr>,
#> #   GF1_mid_cloud_genus_quality <chr>, GF1_high_cloud_genus <chr>,
#> #   GF1_high_cloud_genus_quality <chr>, MA1_atmospheric_pressure <chr>,
#> #   MA1_altimeter_setting_rate <chr>, MA1_altimeter_quality_code <chr>,
#> #   MA1_station_pressure_rate <chr>, MA1_station_pressure_quality_code <chr>,
#> #   REM_remarks <chr>, REM_identifier <chr>, REM_length_quantity <chr>,
#> #   REM_comment <chr>, OC1_wind_gust_observation_identifier <chr>,
#> #   OC1_speed_rate <chr>, OC1_quality_code <chr>,
#> #   GA1_sky_cover_layer_identifier <chr>, GA1_coverage_code <chr>,
#> #   GA1_coverage_quality_code <chr>, GA1_base_height_dimension <chr>,
#> #   GA1_base_height_quality_code <chr>, GA1_cloud_type_code <chr>,
#> #   GA1_cloud_type_quality_code <chr>, GE1_sky_condition <chr>,
#> #   GE1_connective_cloud_attribute <chr>, GE1_vertical_datum_attribute <chr>,
#> #   GE1_base_height_upper_range_attribute <chr>,
#> #   GE1_base_height_lower_range_attribute <chr>,
#> #   AW1_present_weather_observation_identifier <chr>,
#> #   AW1_automated_atmospheric_condition_code <chr>,
#> #   AW1_quality_automated_atmospheric_condition_code <chr>

Parallel

isd_parse(path, parallel = TRUE)

Progress

note: Progress not printed if parallel = TRUE

isd_parse(path, progress = TRUE)
#>
#>   |========================================================================================| 100%
#> # A tibble: 2,601 × 42
#>    total_chars usaf_station wban_station       date  time date_flag latitude longitude type_code
#>          <dbl>        <chr>        <chr>     <date> <chr>     <chr>    <dbl>     <dbl>     <chr>
#> 1           54       024130        99999 2016-01-01  0000         4    60.75    12.767     FM-12
#> 2           54       024130        99999 2016-01-01  0100         4    60.75    12.767     FM-12
#> 3           54       024130        99999 2016-01-01  0200         4    60.75    12.767     FM-12
#> 4           54       024130        99999 2016-01-01  0300         4    60.75    12.767     FM-12
#> 5           54       024130        99999 2016-01-01  0400         4    60.75    12.767     FM-12
#> 6           39       024130        99999 2016-01-01  0500         4    60.75    12.767     FM-12
#> 7           54       024130        99999 2016-01-01  0600         4    60.75    12.767     FM-12
#> 8           39       024130        99999 2016-01-01  0700         4    60.75    12.767     FM-12
#> 9           54       024130        99999 2016-01-01  0800         4    60.75    12.767     FM-12
#> 10          54       024130        99999 2016-01-01  0900         4    60.75    12.767     FM-12
#> # ... with 2,591 more rows, and 33 more variables: elevation <dbl>, call_letter <chr>, quality <chr>,
#> #   wind_direction <dbl>, wind_direction_quality <chr>, wind_code <chr>, wind_speed <dbl>,
#> #   wind_speed_quality <chr>, ceiling_height <chr>, ceiling_height_quality <chr>,
#> #   ceiling_height_determination <chr>, ceiling_height_cavok <chr>, visibility_distance <chr>,
#> #   visibility_distance_quality <chr>, visibility_code <chr>, visibility_code_quality <chr>,
#> #   temperature <dbl>, temperature_quality <chr>, temperature_dewpoint <dbl>,
#> #   temperature_dewpoint_quality <chr>, air_pressure <dbl>, air_pressure_quality <chr>,
#> #   AW1_present_weather_observation_identifier <chr>, AW1_automated_atmospheric_condition_code <chr>,
#> #   AW1_quality_automated_atmospheric_condition_code <chr>, N03_original_observation <chr>,
#> #   N03_original_value_text <chr>, N03_units_code <chr>, N03_parameter_code <chr>, REM_remarks <chr>,
#> #   REM_identifier <chr>, REM_length_quantity <chr>, REM_comment <chr>

Additional data

Optionally don’t include “Additional” and “Remarks” sections in parsed output.

isd_parse(path, additional = FALSE)
#> # A tibble: 6,843 x 31
#>    total_chars usaf_station wban_station date  time  date_flag latitude
#>    <chr>       <chr>        <chr>        <chr> <chr> <chr>     <chr>   
#>  1 0157        007026       99999        2017… 1404  4         +00000  
#>  2 0157        007026       99999        2017… 1414  4         +00000  
#>  3 0157        007026       99999        2017… 1419  4         +00000  
#>  4 0157        007026       99999        2017… 1424  4         +00000  
#>  5 0157        007026       99999        2017… 1429  4         +00000  
#>  6 0144        007026       99999        2017… 1434  4         +00000  
#>  7 0157        007026       99999        2017… 1439  4         +00000  
#>  8 0157        007026       99999        2017… 1444  4         +00000  
#>  9 0172        007026       99999        2017… 1449  4         +00000  
#> 10 0157        007026       99999        2017… 1454  4         +00000  
#> # … with 6,833 more rows, and 24 more variables: longitude <chr>,
#> #   type_code <chr>, elevation <chr>, call_letter <chr>, quality <chr>,
#> #   wind_direction <chr>, wind_direction_quality <chr>, wind_code <chr>,
#> #   wind_speed <chr>, wind_speed_quality <chr>, ceiling_height <chr>,
#> #   ceiling_height_quality <chr>, ceiling_height_determination <chr>,
#> #   ceiling_height_cavok <chr>, visibility_distance <chr>,
#> #   visibility_distance_quality <chr>, visibility_code <chr>,
#> #   visibility_code_quality <chr>, temperature <chr>,
#> #   temperature_quality <chr>, temperature_dewpoint <chr>,
#> #   temperature_dewpoint_quality <chr>, air_pressure <chr>,
#> #   air_pressure_quality <chr>