Welcome to ClientVPS Mirrors

Decoding UKB Column Names and Values

Decoding UKB Column Names and Values

Overview

Raw UKB phenotype data contains encoded column names and values that need to be converted before analysis.

Source Column names Column values
extract_pheno() participant.p31 Raw integer codes — needs decode_values()
extract_batch() p31, p53_i0 Usually already decoded — decode_values() typically not needed

Both outputs need decode_names() to convert field ID column names to human-readable snake_case.

Call order matters: when using extract_pheno() output, always run decode_values() before decode_names(), because value decoding relies on the numeric field ID still being present in the column name.


Step 1: Decode Values

decode_values() converts raw integer codes to human-readable labels for categorical fields that have UKB encoding mappings. Continuous, date, text, and already-decoded fields are left unchanged.

df <- decode_values(df)
#> ✔ Decoded 3 categorical columns; 2 non-categorical columns unchanged.

It requires two metadata files from the UKB Showcase. Download them once with:

fetch_metadata(dest_dir = "data/metadata")

Then point decode_values() to the same directory (default matches fetch_metadata()):

df <- decode_values(df, metadata_dir = "data/metadata")

What gets decoded

Column Raw value Decoded value
p31 0 / 1 "Female" / "Male"
p54 11012 "Leeds"
p20116_i0 0 / 1 / 2 "Never" / "Previous" / "Current"

Codes absent from the encoding table (including UKB missing codes -1, -3, -7) are returned as NA.


Step 2: Decode Names

decode_names() renames columns from field ID format to snake_case labels using the approved UKB field dictionary available to your project.

df <- decode_names(df)
#> ✔ Renamed 5 columns.

Name conversion examples

Raw name Decoded name
participant.eid eid
participant.p31 sex
participant.p21022 age_at_recruitment
participant.p53_i0 date_of_attending_assessment_centre_i0
p31 sex
p53_i0 date_of_attending_assessment_centre_i0

Both extract_pheno() format (participant.p31) and extract_batch() format (p31) are handled automatically.

Long names

Some UKB field titles are verbose. Names exceeding max_nchar characters are flagged with a warning (default: 60). Lower the threshold to catch more aggressively:

df <- decode_names(df, max_nchar = 30)
#> ! 1 column name longer than 30 characters - consider renaming manually:
#> • date_of_attending_assessment_centre_i0

Rename manually to something concise:

names(df)[names(df) == "date_of_attending_assessment_centre_i0"] <- "date_baseline"

Getting Help

Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.

This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.