| Title: | Explorer of Indonesian Population Pyramids from Harmonized and Non-Harmonized Census Data |
|---|---|
| Description: | Provides harmonized and non-harmonized population pyramid datasets from the Indonesian population censuses (1971–2020), along with tools for visualization and an interactive 'shiny'-based explorer application. Data are processed from IPUMS International (1971–2010) and the Population Census 2020 (BPS Indonesia). |
| Authors: | Ari Purwanto Sarwo Prasojo [aut, cre] (ORCID: <https://orcid.org/0000-0002-4862-5523>), Puguh Prasetyoputra [aut] (ORCID: <https://orcid.org/0000-0001-5494-7003>), Nur Fitri Mustika Ayu [aut] |
| Maintainer: | Ari Purwanto Sarwo Prasojo <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.2 |
| Built: | 2026-05-18 08:20:16 UTC |
| Source: | https://github.com/aripurwantosp/censuspyrid |
Create a line plot of population age profiles (5-year age groups) for a given province and year, with optional logarithmic scale. The plot is faceted by sex.
ageprof(data, log_scale = FALSE, color = "Fresh and bright")ageprof(data, log_scale = FALSE, color = "Fresh and bright")
data |
A data frame of population data for a specific province and year,
containing at least the variables:
|
log_scale |
Logical; whether to use a logarithmic scale for the Y-axis.
Default is |
color |
Character; the name of a Canva color palette available in
|
The function produces an age-profile line chart where:
X-axis: Age (5-year groups).
Y-axis: Population counts (in thousands by default).
Separate lines are drawn for males and females.
Users can choose logarithmic scaling of the Y-axis.
A ggplot2 object representing the age-profile plot, faceted by sex.
pyr_single(), load_pop_data(), pop_data_by_reg(), pop_data_by_year(), get_code_label()
## Not run: # Example: age profile for Indonesia, 2020 data_idn <- pop_data_by_year(load_pop_data(), 2020) |> pop_data_by_reg(0) # Indonesia ageprof(data_idn) # Example with log scale ageprof(data_idn, log_scale = TRUE) ## End(Not run)## Not run: # Example: age profile for Indonesia, 2020 data_idn <- pop_data_by_year(load_pop_data(), 2020) |> pop_data_by_reg(0) # Indonesia ageprof(data_idn) # Example with log scale ageprof(data_idn, log_scale = TRUE) ## End(Not run)
This function builds an area plot showing the proportion of population distributed across three broad age groups (young, working-age, old) over census years. The plot can be displayed separately by sex or combined.
area_trends(data, sex = 1, color = "Fresh and bright")area_trends(data, sex = 1, color = "Fresh and bright")
data |
A data frame containing population trends data for a specific
region over years. Must include variables |
sex |
Integer indicating which sex to include in the plot:
Default is 1 (all sexes). |
color |
Character string specifying the palette name from
|
The function aggregates population into three age groups:
0–14 years (Young)
15–64 years (Working age)
65+ years (Old)
It then calculates the proportion of each age group within each sex and year. The result is plotted as a stacked area chart, optionally faceted by sex.
A ggplot2 object showing the population area trends.
pyr_trends(), load_pop_data(), pop_data_by_reg(), get_code_label()
## Not run: # Example: area trends for Indonesia data_idn <- load_pop_data(harmonized = TRUE, smoothing = 1) |> pop_data_by_reg(0) #Indonesia area_trends(data_idn, sex = 1) #All sexes area_trends(data_idn, sex = 2) #Male area_trends(data_idn, sex = 3) #Female area_trends(data_idn, sex = 4) #Male+Female ## End(Not run)## Not run: # Example: area trends for Indonesia data_idn <- load_pop_data(harmonized = TRUE, smoothing = 1) |> pop_data_by_reg(0) #Indonesia area_trends(data_idn, sex = 1) #All sexes area_trends(data_idn, sex = 2) #Male area_trends(data_idn, sex = 3) #Female area_trends(data_idn, sex = 4) #Male+Female ## End(Not run)
Launches censuspyrID Explorer, a Shiny application for visualizing harmonized and non-harmonized population pyramids from Indonesia’s population censuses (1971–2020).
censuspyrID_explorer(host = NULL, ...)censuspyrID_explorer(host = NULL, ...)
host |
Character string passed to |
... |
Additional arguments passed to |
The application provides interactive tools to explore demographic structures across provinces and census years. See the Help menu within the application for a navigation guide.
The function launches the Shiny application. It does not return a value.
## Not run: censuspyrID_explorer() ## End(Not run)## Not run: censuspyrID_explorer() ## End(Not run)
Prepares population data for tabular display (e.g., in reports or Shiny apps). The function reshapes the data by sex, adds total population, and computes the sex ratio, while also attaching province names and labels.
data_for_table(data, reg_code, harmonized = TRUE)data_for_table(data, reg_code, harmonized = TRUE)
data |
A data frame containing population data for a specific province
and year. Must include columns: |
reg_code |
Integer or character. Province code used to retrieve the province name. |
harmonized |
Logical. If |
The function performs the following steps:
Adds the province name using reg_name().
Relabels sex and age5 using reference tables in ref_label.
Reshapes data into wide format with separate columns for Male and Female.
Adds a Male+Female total population column.
Computes the sex ratio (Male/Female * 100).
A data frame in wide format with columns:
province_id — province identifier
province — province name
year — census year
age5 — five-year age group label
Male — male population
Female — female population
Male+Female — total population
sex_ratio — ratio of males to females (per 100 females)
load_pop_data(), pop_data_by_year(), get_code_label()
## Not run: data_idn <- pop_data_by_year(load_pop_data(), 2020) |> pop_data_by_reg(0) #Indonesia tab <- data_for_table(data_idn, reg_code = 0, harmonized = TRUE) head(tab) ## End(Not run)## Not run: data_idn <- pop_data_by_year(load_pop_data(), 2020) |> pop_data_by_reg(0) #Indonesia tab <- data_for_table(data_idn, reg_code = 0, harmonized = TRUE) head(tab) ## End(Not run)
This function returns reference tables for codes and labels used in the package. It can provide mappings for census years, sex, age groups, and province codes (harmonized or non-harmonized).
get_code_label(what = 4)get_code_label(what = 4)
what |
Integer indicating which reference table to return:
|
The function retrieves data from internal reference object
re_label, which stores standardized coding schemes and
their associated labels.
A data frame (or tibble) containing codes and labels for the selected reference category.
# Get harmonized province codes and labels get_code_label(4) # Get sex codes and labels get_code_label(2)# Get harmonized province codes and labels get_code_label(4) # Get sex codes and labels get_code_label(2)
This function checks whether a given province code corresponds to a province that has been expanded (i.e., administratively split or modified).
is_expanded(reg_code)is_expanded(reg_code)
reg_code |
non-harmonized province code (character or numeric). |
The function looks up the internal dataset prov_coverage.
Expansion status is determined by the field expanded.
A logical value:
TRUE if the province is marked as expanded,
FALSE otherwise.
# Example: check expansion status of a province get_code_label(5) #returns list of non-harmonized province code is_expanded(1400) # returns TRUE/FALSE for Riau province# Example: check expansion status of a province get_code_label(5) #returns list of non-harmonized province code is_expanded(1400) # returns TRUE/FALSE for Riau province
Load census population data with options for harmonization and smoothing. Returns population counts by year, province, sex, and five-year age group, with raw or smoothed estimates depending on the selected method.
load_pop_data(harmonized = TRUE, smoothing = 1)load_pop_data(harmonized = TRUE, smoothing = 1)
harmonized |
Logical. If |
smoothing |
Integer. Smoothing method applied to population counts:
|
Data are retrieved from internal census datasets:
hpop5: harmonized census data
ypop5: non-harmonized census data
Smoothing methods are applied to the population counts:
1: none (raw, default)
2: Arriaga method
3: Karup–King–Newton (KKN) method
A tibble with columns:
year: census year
province_id: province identifier (harmonized or non-harmonized)
sex: sex code
age5: five-year age group code
pop: population count (raw or smoothed)
pop_data_by_year(), pop_data_by_reg(), pop5
## Not run: # Load harmonized, raw (unsmoothed) population data load_pop_data(harmonized = TRUE, smoothing = 1) # Load non-harmonized, Arriaga-smoothed population data load_pop_data(harmonized = FALSE, smoothing = 2) ## End(Not run)## Not run: # Load harmonized, raw (unsmoothed) population data load_pop_data(harmonized = TRUE, smoothing = 1) # Load non-harmonized, Arriaga-smoothed population data load_pop_data(harmonized = FALSE, smoothing = 2) ## End(Not run)
Filter population data based on a specified province ID. This function is
intended for use with population datasets loaded via load_pop_data(),
but can work with any data frame that includes a province_id column.
pop_data_by_reg(data, reg)pop_data_by_reg(data, reg)
data |
A data frame or tibble containing population data.
Must include a column named |
reg |
Integer or character. The province ID to filter by. |
A tibble (or data frame) containing only rows for the specified province.
load_pop_data(), pop_data_by_year()
# Load harmonized data dat <- load_pop_data(harmonized = TRUE, smoothing = 1) # Filter data for province ID 0 (Indonesia) pop_data_by_reg(dat, reg = 0)# Load harmonized data dat <- load_pop_data(harmonized = TRUE, smoothing = 1) # Filter data for province ID 0 (Indonesia) pop_data_by_reg(dat, reg = 0)
Filter population data for a specific census year. This function is intended
for use with population datasets loaded via load_pop_data(), but can work
with any data frame that contains a year column.
pop_data_by_year(data, yr)pop_data_by_year(data, yr)
data |
A data frame or tibble containing population data.
Must include a column named |
yr |
Integer or numeric. The census year to filter by. |
A tibble (or data frame) containing only rows from the specified year.
load_pop_data(), pop_data_by_reg()
# Load harmonized data first dat <- load_pop_data(harmonized = TRUE, smoothing = 1) # Filter for the 2000 census year pop_data_by_year(dat, 2000)# Load harmonized data first dat <- load_pop_data(harmonized = TRUE, smoothing = 1) # Filter for the 2000 census year pop_data_by_year(dat, 2000)
Generate and print a formatted summary of population counts, percentages, sex ratio, and dependency ratios from a given dataset of population data for a specific province and year.
pop_summary(data)pop_summary(data)
data |
A data frame of population data for a specific province and year,
containing at least the variables:
|
The function calculates:
Total population
Male and female population counts and percentages
Age group distribution: 0–14, 15–64, and 65+ (counts and percentages)
Sex ratio (males per 100 females)
Dependency ratios (0–14, 65+, and total dependency ratio relative to 15–64)
Results are printed directly to the console in a formatted table.
This function does not return an object. It prints formatted summary statistics to the console.
load_pop_data(), pop_data_by_reg(), pop_data_by_year(), get_code_label()
## Not run: # Example: population summary for Indonesia, 2020 data_idn <- pop_data_by_year(load_pop_data(), 2020) |> pop_data_by_reg(0) # Indonesia pop_summary(data_idn) ## End(Not run)## Not run: # Example: population summary for Indonesia, 2020 data_idn <- pop_data_by_year(load_pop_data(), 2020) |> pop_data_by_reg(0) # Indonesia pop_summary(data_idn) ## End(Not run)
Population counts in 5-year age groups at the provincial level (subnational level 1), derived from a series of Indonesian population censuses. Data are available in two versions:
hpop5 — Harmonized province codes across census years.
ypop5 — Original (non-harmonized) province codes as reported in each census.
Both datasets are processed from census samples provided by IPUMS International (1971–2010) and the Population Census 2020. Data processing steps include prorating to allocate missing attributes and smoothing using multiple demographic methods (Arriaga and Karup–King–Newton).
Each dataset is a tibble (data frame) with the following variables:
yearCensus year.
province_id_hHarmonized province identifier (in hpop5).
province_id_ynon-harmonized province identifier (in ypop5).
sexSex code.
age5Age group in 5-year intervals.
nsUnsmoothed population count.
arriagaPopulation count smoothed with the Arriaga method.
kknPopulation count smoothed with the Karup–King–Newton method.
hpop5: 5,500 observations.
ypop5: 6,146 observations.
Ruggles, S., Cleveland, L., Lovaton, R., Sarkar, S., Sobek, M., Burk, D., Ehrlich, D., Heimann, Q., Lee, J., & Merrill, N. (2025). Integrated Public Use Microdata Series, International: Version 7.7 (dataset). Minneapolis, MN: IPUMS. doi:10.18128/D020.V7.7
Badan Pusat Statistik (BPS). (2020). Jumlah Penduduk Menurut Wilayah, Kelompok Umur, dan Jenis Kelamin, di INDONESIA – Sensus Penduduk 2020. Retrieved September 4, 2025, from https://sensus.bps.go.id/topik/tabular/sp2020/3
Ruggles, S., Cleveland, L., Lovaton, R., Sarkar, S., Sobek, M., Burk, D., Ehrlich, D., Heimann, Q., Lee, J., & Merrill, N. (2025). Integrated Public Use Microdata Series, International: Version 7.7 (dataset). Minneapolis, MN: IPUMS. doi:10.18128/D020.V7.7
Badan Pusat Statistik (BPS). (2020). Jumlah Penduduk Menurut Wilayah, Kelompok Umur, dan Jenis Kelamin, di INDONESIA – Sensus Penduduk 2020. Retrieved September 4, 2025, from https://sensus.bps.go.id/topik/tabular/sp2020/3
Siegel, J. S., Swanson, D. A., & Shryock, H. S. (Eds.). (2004). The methods and materials of demography (2nd ed). Elsevier/Academic Press.
Aburto, J. M., Kashnitsky, I., Pascariu, M., & Riffe, T. (2022). Smoothing with DemoTools. Available at: https://timriffe.github.io/DemoTools/articles/smoothing_with_demotools.html#references-1
library(dplyr) # Harmonized data data(hpop5) glimpse(hpop5) head(hpop5) # Non-harmonized data data(ypop5) glimpse(ypop5) head(ypop5)library(dplyr) # Harmonized data data(hpop5) glimpse(hpop5) head(hpop5) # Non-harmonized data data(ypop5) glimpse(ypop5) head(ypop5)
Create a population pyramid for a given dataset (specific province and year), either in absolute counts or in proportions, with customizable color palettes.
pyr_single(data, use_prop = FALSE, color = "Fresh and bright")pyr_single(data, use_prop = FALSE, color = "Fresh and bright")
data |
A data frame containing population data for a specific province
and year. Must include variables |
use_prop |
Logical, default |
color |
Character string indicating the color palette name to use for
the pyramid. Available palettes come from
ggthemes::canva_palettes, e.g., |
A ggplot object representing the population pyramid.
ageprof(), pyr_trends(), load_pop_data(), pop_data_by_reg(), pop_data_by_year(), get_code_label()
## Not run: # Example data for Indonesia, 2020 data_idn <- pop_data_by_year(load_pop_data(), 2020) |> pop_data_by_reg(0) #Indonesia # Absolute count pyramid pyr_single(data_idn) # Proportional pyramid with different palette pyr_single(data_idn, use_prop = TRUE, color = "Professional and modern") ## End(Not run)## Not run: # Example data for Indonesia, 2020 data_idn <- pop_data_by_year(load_pop_data(), 2020) |> pop_data_by_reg(0) #Indonesia # Absolute count pyramid pyr_single(data_idn) # Proportional pyramid with different palette pyr_single(data_idn, use_prop = TRUE, color = "Professional and modern") ## End(Not run)
Create trend plots of population pyramids over multiple census years for a given region. Users can choose between a grid layout of pyramids or an overlay of age profiles across years.
pyr_trends(data, mode = 1, use_prop = FALSE, color = "Fresh and bright")pyr_trends(data, mode = 1, use_prop = FALSE, color = "Fresh and bright")
data |
A data frame of population data for a specific region across
census years, containing at least:
|
mode |
Integer; visualization mode: |
use_prop |
Logical; whether to show proportions instead of absolute
counts. Default is |
color |
Character; the name of a Canva color palette available in
|
Two visualization modes are available:
mode = 1: Grid of population pyramids (faceted by year).
mode = 2: Overlayed age profiles with separate lines by year.
Population counts can be displayed either as absolute numbers (default, in
thousands) or as proportions (use_prop = TRUE).
A ggplot2 object representing the population pyramid trend plot.
area_trends(), load_pop_data(), pop_data_by_reg(), get_code_label()
## Not run: # Example: pyramid trends for Indonesia data_idn <- load_pop_data(harmonized = TRUE, smoothing = 1) |> pop_data_by_reg(0) #Indonesia pyr_trends(data_idn, mode = 1) # grid layout # Overlay mode with proportions pyr_trends(data_idn, mode = 2, use_prop = TRUE) ## End(Not run)## Not run: # Example: pyramid trends for Indonesia data_idn <- load_pop_data(harmonized = TRUE, smoothing = 1) |> pop_data_by_reg(0) #Indonesia pyr_trends(data_idn, mode = 1) # grid layout # Overlay mode with proportions pyr_trends(data_idn, mode = 2, use_prop = TRUE) ## End(Not run)
This function determines the range of census years available for a given province. Coverage depends on whether harmonized or non-harmonized codes are used, and in the case of non-harmonized data, whether the province has experienced administrative expansion (pemekaran).
year_range(reg_code = NULL, harmonized = TRUE, before_expand = TRUE)year_range(reg_code = NULL, harmonized = TRUE, before_expand = TRUE)
reg_code |
Character or numeric. Province code. Required if
|
harmonized |
Logical. If |
before_expand |
Logical. Only relevant if |
For harmonized data (harmonized = TRUE), the full coverage
of 1971–2020 is returned.
For non-harmonized data (harmonized = FALSE), coverage is
determined based on the internal dataset prov_coverage.
If the province has expanded, coverage depends on
before_expand.
Census year labels are retrieved from
ref_label$census_label.
An integer vector of census years, with labels as names.
is_expanded(), get_code_label()
## Not run: # Harmonized coverage (1971–2020) year_range(harmonized = TRUE) # non-harmonized coverage for a province (before expansion) get_code_label(5) #returns list of non-harmonized province code year_range(reg_code = 1400, harmonized = FALSE, before_expand = TRUE) # non-harmonized coverage for a province (after expansion) year_range(reg_code = 1400, harmonized = FALSE, before_expand = FALSE) ## End(Not run)## Not run: # Harmonized coverage (1971–2020) year_range(harmonized = TRUE) # non-harmonized coverage for a province (before expansion) get_code_label(5) #returns list of non-harmonized province code year_range(reg_code = 1400, harmonized = FALSE, before_expand = TRUE) # non-harmonized coverage for a province (after expansion) year_range(reg_code = 1400, harmonized = FALSE, before_expand = FALSE) ## End(Not run)