GEDCOM files are often shared between people, but a file can contain detailed information for individuals, which is an issue, especially if they are still alive.
The tidyged.utils
package contains functionality to detect these individuals and either remove them or remove their details.
The function to remove living individuals is remove_living()
. To illustrate, we create a tidyged object containing a number of different individuals, alive, dead, and some ambiguous:
library(tidyged)
library(tidyged.utils)
people <- gedcom(subm("Me")) |>
add_indi(qn = "Living person") |>
add_indi_fact("birth", date = date_calendar(1996)) |>
add_indi(qn = "Confirmed dead person") |>
add_indi_fact("death") |>
add_indi(qn = "Reeeaally old person") |>
add_indi_fact("birth", date = date_calendar(1796)) |>
add_indi(qn = "Implicit dead person 1") |>
add_indi_fact("occupation", descriptor = "Driver", date = date_calendar(1930), age = "50y") |>
add_indi(qn = "Implicit dead person 2")
#> Added Unknown Individual: @I1@
#> Added Unknown Individual: @I2@
#> Added Unknown Individual: @I3@
#> Added Unknown Individual: @I4@
#> Added Unknown Individual: @I5@
idp2_xref <- find_indi_name(people, "Implicit dead person 2")
people <- people |>
add_famg(husband = idp2_xref) |>
add_famg_event("relationship", date = date_calendar(1856), husband_age = "20y")
#> Added Family Group: @F1@
describe_records(people, people$record)
#> [1] "Submitter @U1@, Me"
#> [2] "Individual @I1@, Living person, born 1996"
#> [3] "Individual @I2@, Confirmed dead person"
#> [4] "Individual @I3@, Reeeaally old person, born 1796"
#> [5] "Individual @I4@, Implicit dead person 1"
#> [6] "Individual @I5@, Implicit dead person 2"
#> [7] "Family @F1@, headed by Implicit dead person 2, and no children"
The default behaviour of the function is to remove data for living individuals, but also those that are ambiguous. Confirmed dead person has a death event, and Reeeaally old person was born in 1796. The function assumes a maximum age of 120.
remove_living(people) |>
describe_records(people$record)
#> Individual @I1@, Living person cleansed
#> Individual @I4@, Implicit dead person 1 cleansed
#> Individual @I5@, Implicit dead person 2 cleansed
#> [1] "Submitter @U1@, Me"
#> [2] "Individual @I1@, Unnamed individual"
#> [3] "Individual @I2@, Confirmed dead person"
#> [4] "Individual @I3@, Reeeaally old person, born 1796"
#> [5] "Individual @I4@, Unnamed individual"
#> [6] "Individual @I5@, Unnamed individual"
#> [7] "Family @F1@, headed by @I5@, and no children"
For illustration purposes, we can increase the maximum age threshold, which will make the function treat the old person as still living:
remove_living(people, max_age = 300) |>
describe_records(people$record)
#> Individual @I1@, Living person cleansed
#> Individual @I3@, Reeeaally old person cleansed
#> Individual @I4@, Implicit dead person 1 cleansed
#> Individual @I5@, Implicit dead person 2 cleansed
#> [1] "Submitter @U1@, Me"
#> [2] "Individual @I1@, Unnamed individual"
#> [3] "Individual @I2@, Confirmed dead person"
#> [4] "Individual @I3@, Unnamed individual"
#> [5] "Individual @I4@, Unnamed individual"
#> [6] "Individual @I5@, Unnamed individual"
#> [7] "Family @F1@, headed by @I5@, and no children"
The guess
parameter will cause the function to invoke additional functionality to try to guess the age of individuals where a date of birth is not given:
remove_living(people, guess = TRUE) |>
describe_records(people$record)
#> Individual @I1@, Living person cleansed
#> [1] "Submitter @U1@, Me"
#> [2] "Individual @I1@, Unnamed individual"
#> [3] "Individual @I2@, Confirmed dead person"
#> [4] "Individual @I3@, Reeeaally old person, born 1796"
#> [5] "Individual @I4@, Implicit dead person 1"
#> [6] "Individual @I5@, Implicit dead person 2"
#> [7] "Family @F1@, headed by Implicit dead person 2, and no children"
This causes the function to determine that all individuals apart from the first is dead. This looks in both individual facts and family group events.
The remaining parameters determine the action to take when living individuals are found. By default, the records are preserved, but all detail is removed, leaving only a change date and an explanatory note:
remove_living(people, guess = TRUE) |>
dplyr::filter(record == "@I1@")
#> Individual @I1@, Living person cleansed
#> # A tibble: 4 × 4
#> level record tag value
#> <dbl> <chr> <chr> <chr>
#> 1 0 @I1@ INDI ""
#> 2 1 @I1@ NOTE "Information on this individual has been redacted"
#> 3 1 @I1@ CHAN ""
#> 4 2 @I1@ DATE "25 JUN 2022"
The user has the option of changing the text of this note using the explan_note
parameter. Alternatively it can be set to an empty string to remove it completely:
remove_living(people, guess = TRUE, explan_note = "") |>
dplyr::filter(record == "@I1@")
#> Individual @I1@, Living person cleansed
#> # A tibble: 3 × 4
#> level record tag value
#> <dbl> <chr> <chr> <chr>
#> 1 0 @I1@ INDI ""
#> 2 1 @I1@ CHAN ""
#> 3 2 @I1@ DATE "25 JUN 2022"
Alternatively, the record can be removed completely:
living_removed <- remove_living(people, guess = TRUE, remove_record = TRUE)
#> Individual @I1@, Living person removed
Since supporting records could also hold sensitive information, the remove_supp_records
parameter allows you to also remove these (which it does by default).