Introduction
This article introduces records, which are how genealogical data is stored and organised in a GEDCOM file. There are several types of records, which are the subject of subsequent articles, but this article focuses on those aspects which they all (or almost all) have in common.
New records are created using the set of *Record()
functions, e.g.
library(gedcomS7)
indi <- IndividualRecord()
Restrictions
Records are considered read-only when the @locked
property is set to TRUE. If you attempt to pull a locked record from a
GEDCOM object for editing, you will be presented with a warning:
indi@locked <- TRUE
ged <- push_record(new_gedcom(), indi)
#> New Individual record added with xref @I1@
indi <- pull_record(ged, "@I1@")
#> Warning in pull_record(ged, "@I1@"): The record is locked. Ensure you have the
#> record owner's permission before editing it and pushing it back to the GEDCOM
#> object.
There are two other properties that place restrictions on records:
@confidential
and @private
. The exact
interpretation of confidential and private is largely up to the author
of the file, but they allow two independent mechanisms for excluding
certain records on export.
Identifiers
Cross references
All GEDCOM records are given unique identifiers known as xrefs (cross-references) to allow other records to link to them. These are alphanumeric strings surrounded by ‘@’ symbols.
Even though xref identifiers will be imported unchanged in the
gedcomS7
package, some systems do create their own xref
identifiers on import. So you cannot assume they will survive between
systems. However, they should always be internally consistent.
For this reason xref identifiers are not supposed to be exposed to
the typical user. However this rule can only really be applied to GEDCOM
software that has a point-and-click user interface, rather than one that
works interactively at the R console (and the S7
package
does not allow you to hide property values). If a shiny
app
is created, then xrefs will be hidden from the user.
Summarising and controlling xrefs
The gedcomS7
package creates xrefs automatically when
creating and pushing new records. When creating a new record it will be
given an xref identifying it as a standalone record that has not yet
been pushed to the GEDCOM object:
new_person <- IndividualRecord()
new_person@xref
#> [1] "@GEDCOMS7_ORPHAN@"
This is a special xref which indicates to the code that this is a new record and not an existing one. It is important you do not change it.
If you then push it to a GEDCOM object, it will assign it a proper xref:
ged <- push_record(new_gedcom(), new_person)
#> New Individual record added with xref @I1@
The property ged@xref_prefixes
is a named vector
containing any alphanumeric string (up to 6 characters long) which will
precede the number given to identify new records (of which there are 7
types). This vector must be of a particular length with these specific
names.
We’ll import a different GEDCOM file which has some records in it:
ged_max <- read_gedcom("https://gedcom.io/testfiles/gedcom70/maximal70.ged")
ged_max@records@prefixes
#> SUBM INDI FAM SOUR REPO OBJE SNOTE
#> "U" "I" "F" "S" "R" "M" "N"
The order that these records appear in the vector will also dictate the order in which records will appear in the exported file.
The @records@XREFS
property gives a list of record xrefs
in the GEDCOM object, split by record type:
ged_max@records@XREFS
#> $SUBM
#> [1] "@U1@" "@U2@"
#>
#> $INDI
#> [1] "@I1@" "@I2@" "@I3@" "@I4@"
#>
#> $FAM
#> [1] "@F1@" "@F2@"
#>
#> $SOUR
#> [1] "@S1@" "@S2@"
#>
#> $REPO
#> [1] "@R1@" "@R2@"
#>
#> $OBJE
#> [1] "@O1@" "@O2@"
#>
#> $SNOTE
#> [1] "@N1@" "@N2@"
The next xrefs of each type will therefore be:
ged_max@records@XREFS_NEXT
#> SUBM INDI FAM SOUR REPO OBJE SNOTE
#> "@U3@" "@I5@" "@F3@" "@S3@" "@R3@" "@M1@" "@N3@"
Other identifiers
As well as cross-reference identifiers, which are internally defined, there are also a number of other identifiers that can be supplied to a record:
- User-defined identifiers (
@user_ids
) - Globally unique identifiers (
@unique_ids
) - Identifiers given by an external authority
(
@ext_ids
)
The @user_ids
must be a vector of user reference
numbers, for example it may be a record number within the submitter’s
automated or manual system, or it may be a page and position number on a
pedigree chart. It can optionally be a named vector, where the vector
names describe what the reference number is. It’s usually a good idea to
provide this.
The @unique_ids
must take the form of a Universally
unique identifier (UUID). These can be generated with
uuid::UUIDgenerate()
, e.g.
uuid::UUIDgenerate(n = 1)
#> [1] "519e2bea-251a-43e5-93e7-4ca45b8da610"
The @ext_ids
must take the form of a named vector where
the names are the URI defining the identifier. For example, to include
the reference to an individual’s Find a Grave’s page, you would
supply c("https://www.findagrave.com/memorial" = "1075")
,
which would be interpreted as https://www.findagrave.com/memorial/1075.
Referencing other records
One of the most important aspects of a record is the provenance of the data within it. This can be provided via linking it with evidence (sources) and multimedia. It should be noted that all of these linkages can not only be provided at the record level, but also at more granular levels; for example, you can provide source citations for each personal name for an individual.
Source citations
Linkages to Source records (known as source citations) are among the
most important aspects of a GEDCOM file. They are accessed via the
@citations
property. This takes a list of
SourceCitation()
objects. You can provide a single object,
or even a character vector of Source record xrefs, and it will be
converted into a list of SourceCitation()
objects.
SourceCitation() |>
str()
#> <gedcomS7::SourceCitation>
#> @ sour_xref : chr "@VOID@"
#> @ where : chr(0)
#> @ date : chr(0)
#> @ source_text: list()
#> @ fact_type : chr(0)
#> @ fact_phrase: chr(0)
#> @ role : chr(0)
#> @ role_phrase: chr(0)
#> @ certainty : chr(0)
#> @ media_links: list()
#> @ note_xrefs : chr(0)
#> @ notes : list()
#> @ GEDCOM : chr "0 SOUR @VOID@"
Without providing any information you can see that the default xref
is “@VOID@”. This is a special xref value
which indicates there is no record to link to. In this case, all
information should be provided in the object itself, particularly the
@where
property. This is just a default value - if there is
a record, you should put the xref here.
Multimedia links
Links to Multimedia records are accessed via the
@media_links
property. Similar to source citations, this
can take a character vector of Multimedia record xrefs, a
MediaLink()
object, or a list of them.
MediaLink() |>
str()
#> <gedcomS7::MediaLink>
#> @ media_xref: chr "@VOID@"
#> @ title : chr(0)
#> @ top : int(0)
#> @ left : int(0)
#> @ height : int(0)
#> @ width : int(0)
#> @ GEDCOM : chr "0 OBJE @VOID@"
Again, a @VOID@ xref is given by
default and if this is retained, a @title
should be
provided (any title given will override the title given in the
Multimedia record if one is linked to). The remaining properties allow
you to specify a cropped region of an image.
Notes
All records (apart from Note records) allow you to attach as many
free text notes as you wish. If a note applies in many places then it is
best to create a Note record which can be referenced everywhere it is
needed with @note_xrefs
, but otherwise use the
@notes
property.
This property can take notes in a number of ways. The simplest way is
via a character vector. Another way is via a Note()
object,
which also allows you to define some other properties of the note such
as its language and media type.
indi <- IndividualRecord()
indi@notes <- "This is a single note"
indi@notes <- c("This is a note", "This is an another note")
indi@notes <- Note(text = "This is a single note using a Note object",
media_type = "text/plain")
Alternatively, you can supply a list which can contain any number of character or Note elements:
indi@notes <- list(
Note(text = "This is one of a number of <b>Note</b> objects. This one is HTML.",
media_type = "text/html"),
Note(text = "Esta es una nota",
language = "es",
media_type = "text/plain"),
"This one is a character note"
)
You should remember that for any properties that can take multiple elements, you can append any new values to the existing ones, otherwise they will be overwritten:
indi@notes <- append(
indi@notes,
list(
"This is an appended note",
Note("This is another appended note")
)
)
indi@notes
#> [[1]]
#> Note: This is one of a number of <b>Note</b> objects. This
#> one is HTML.
#>
#> Language: <Undefined>
#> Format: text/html
#> Translations: 0
#> Citations: 0
#>
#> [[2]]
#> Note: Esta es una nota
#>
#> Language: es
#> Format: text/plain
#> Translations: 0
#> Citations: 0
#>
#> [[3]]
#> Note: This one is a character note
#>
#> Language: <Undefined>
#> Format: <Undefined>
#> Translations: 0
#> Citations: 0
#>
#> [[4]]
#> Note: This is an appended note
#>
#> Language: <Undefined>
#> Format: <Undefined>
#> Translations: 0
#> Citations: 0
#>
#> [[5]]
#> Note: This is another appended note
#>
#> Language: <Undefined>
#> Format: <Undefined>
#> Translations: 0
#> Citations: 0
Creation/modification dates
You have the option of recording when a record is created or changed.
When you push a record to a GEDCOM object, it will record
creation/change dates depending on the values of
@add_creation_dates
and @update_change_dates
(these are FALSE by default):
ged <- new_gedcom()
ged@update_change_dates <- TRUE
ged@add_creation_dates <- TRUE
new_record <- IndividualRecord()
ged <- push_record(ged, new_record)
#> New Individual record added with xref @I1@
# Extract record with creation/change dates added
new_record <- pull_record(ged, "@I1@")
new_record@created
#> Created: 26 JAN 2025
new_record@updated
#> Changed: 26 JAN 2025
You can add a time and/or notes to these dates, but that’s probably overkill.