The GEDCOM Validator

The GEDCOM Validator

What is the Gigatrees GEDCOM Validator?

The Gigatrees GEDCOM Validator (VGedX) is a free genealogy application that can be used to validate your GEDCOM file against the GEDCOM standard. The application supports, and is compliant with, GEDCOM versions 5.5 Rev. 1 (January 2, 1996), 5.5 Rev. 2 (January 10, 1996), 5.5.1 and 5.6, as well as various file encodings and character sets (ANSI, UTF-8, UTF-16, ASCII, Extended ASCII and UNICODE). The application will generate a full and informative validation report in HTML format.

If you are using , then this validator is unneccessary. Gigatrees performs complete validation using it's expanded GEDCOM dictionary, which allows for many of the non-critical errors that may be pointed out by the more compliant, but less forgiving, Gigatrees GEDCOM validator. The Gigatrees GEDCOM Validator is only needed if you are attempting to import your GEDCOM file into an application other than Gigatrees.

Screenshots

The following sample screenshots are taken from the configuration interface.
"Main Screen"
Main Screen
"File Menu"
File Menu
"Main Screen (filled in)"
Main Screen (filled in)
"Options Menu"
Options Menu
"Translation Options"
Translation Options
"Translation Options (for Dutch)"
Translation Options (for Dutch)
"Validation Options"
Validation Options
"Validation Options (with help)"
Validation Options (with help)
"Run Menu"
Run Menu
"View Menu"
View Menu
"Help Menu"
Help Menu
The following sample screenshot is taken from a sample report.
"Sample Report"
Sample Report
The following sample screenshot is taken from a sample report translated into dutch.
"Sample Report (in Dutch)"
Sample Report (in Dutch)

The Gigatrees GEDCOM Validator is useful for genealogists and family historians who export GEDCOM files from their genealogy applications with the purpose of importing those files into an application by another vendor. It is an unfortunate reality that many applications do not follow the GEDCOM standard when exporting GEDCOM files, increasing the likelihood that their GEDCOM file will not import correctly into the other vendor's application. It is also an unfortunate reality that many applications do not inform their users when discarding non-standard GEDCOM fields due to an importing error. When this happens, many users are completely unaware that their data is corrupted or lost.

The Gigatrees GEDCOM Validator generates a report showing the line number, type of error, field name heirarchy and the GEDCOM data for every issue found. It also includes GEDCOM header information with the file name, encoding and character set, GEDCOM version and reported character set, and the exporting application's vendor, name and version.

The Gigatrees GEDCOM Validator provides several configurable options for ignoring certain types of alerts and warnings (GEDCOM errors cannot be ignored). International users can also <translate> their report into a number of languages [1].

The Gigatrees GEDCOM Validator runs under Microsoft x64 only. It will not run natively under Windows WIN32, MacOS, Linux, Android, IOS, or any other operating system. That being said, some users have run it successfully under MacOS using a <virtual Windows machine>. The Gigatrees GEDCOM Validator includes two executables, a configuration interface (CI) and a command line interface (CLI). The configuration interface is used to load, modify and save the configuration file and to launch the CLI. For advanced uses, the CLI can run independantly from the command line or in a batch file. The configuration interface includes detailed descriptions for each option, which can be viewed when clicking on the option name. There is an administrative <Dashboard> that includes some statistics associated with your builds, along with the latest news, revision history and other useful features. You will need to sign-in to access your personalized statistics. To sign-in, you will need your email and application id. You can find the application id at the top of your build log or by double-clicking on gigatrees-cli.exe, which can be found in your installation folder. Your application id will act as an access code to the dashboard, so please keep it private, it cannot be easily changed.


Upgrading

The Gigatrees GEDCOM Validator does not require the previous version to be uninstalled. You may copy over the previous installation, however doing so may overwrite previously modified configuration files. So please backup up your configuration files first.


Installing

To install the Gigatrees GEDCOM Validator, <download> the latest version and extract the downloaded file into a folder of your choice. The extracted files will be installed into a gigatrees-validator subfolder. The application is a standalone program and does not require separate installation files nor does it modify the system registry. It should be installed only onto a Windows x64 operating system.


Uninstalling

To uninstall the Gigatrees GEDCOM Validator, simply delete the installation folder.


Running

Start the program by double clicking on (vgedx.exe). On the main screen you can enter or browse to the names to your GEDCOM file and output path. There are additional items that can be configured using the Options menu. Once the GEDCOM file and output path have been set, save your configuration from the File menu and then launch the application from the Run menu (hotkeys: F2 F5). When the application is launched it will create a batch file in the same folder as your executable, and then run that batch file in a separate command window. The batch file will be given the same base name as your configuration file. If you did not create or save a configuration file yet, it will save your configuration as undefined.xml and put it into the same folder as your executable. Once the application finishes, the build report and the validation report can be accessed using the View menu.

First time users can load the sample configuration file (sample.xml) before launching the application to see it in action. The sample configuration will save your validation report to the installation's web folder by default.


Validation Tests

GEDCOM Validation, in general, requires that a GEDCOM file meets the specification requirements of both the GEDCOM Grammar (line and file syntax) and the GEDCOM Dictionary (record format, data types, data formats and data values). Data consistency is not part of a GEDCOM Validation. The GEDCOM Validator is only a partial validator in that it does not in all cases, validate data formats or data values. It also does not have 100% coverage of the validation tests listed below.

The GEDCOM Validator will validate that files meet the GEDCOM 5.5 grammar specification and whichever GEDCOM dictionary is appropriate. The following dictionaries are currently supported:

  • GEDCOM 5.5 Rev. 1 (January 2, 1996)
  • GEDCOM 5.5 Rev. 2 (January 10, 1996)
  • GEDCOM 5.5.1
  • GEDCOM 5.6
The following sections provide technical details on how Gigatrees performs GEDCOM Validation.

Validation Legend:

Symbols

() parentheses  = grouped components
[] brackets     = optional components
*  astricks     = multiple occurrences of a component
-  dash         = range of values of a component
|  pipe         = component or

Characters

Character               ASCII value
=========               ===========
tab                     = 0x09
line feed               = 0x0A
carriage return         = 0x0D
space                   = 0x20
exclamation point (!)   = 0x21
cross hatch (#)         = 0x23
colon (:)               = 0x3A
ampersand (@)           = 0x40
underscore (_)          = 0x5F

Character Sets

Character set           ASCII range
=============           ===========
number digit (0-9)      = (0x30 - 0x39)
alpha char (a-zA-Z_)    = (0x41 - 0x5A) | (0x61 - 0x7A) | 0x5F
non-alpha char          = (0x21 - 0x2F) | (0x3A - 0x3F) | (0x5B - 0x5E) | (0x7B - 0x7E) | (0x80 - 0xFE) | 0x60

Character Groups

alphanum                = (alpha char | number digit)	
printable character     = alphanum | non-alpha char | space | cross hatch

Strings

double-at string (@@)   = ampersand + ampersand
number string           = number digit + [number digit]*
alphanum string         = alphanum + [alphanum]*
pointer id              = (alphanum | exclamation point) + [printable character]*						
pointer string          = ampersand + pointer id + ampersand 
embedded id string      = ampersand + [pointer id +] exclamation point + pointer_id + ampersand 
escape string           = ampersand + cross hatch + (printable character | double-at string)* + ampersand + [space] + (printable character)* 
value string            = printable character + [printable character]*
data string             = (value string | escape string) [+ (value string | escape string)]*
delimiter               = space
terminator              = carriage return | line feed | (carriage return + line feed) | (line feed + carriage return)
whitespace              = ([tab]* + [space]* + [terminator]*)* 

Validation Tests

GEDCOM Validation testing includes two types of tests, GEDCOM Grammar and the GEDCOM Dictionary.

Grammar Line Syntax

All of the supported GEDCOM Dictionaries use the same GEDCOM 5.5 Grammar, which defines a line as having the following syntax:

line = [whitespace +] level + [delim + record_id +] delim + tag + [delim + reference_id +] terminator
or

line = [whitespace +] level + [delim + record_id +] delim + tag + [delim + line_value +] terminator

Grammar Tests

The following is a list of requirements of the GEDCOM 5.5 Grammar. Unsupported tests will be noted. String lengths are measured in characters, not bytes.

  1. The level is a number string.
  2. Level numbers should not contain leading zeroes.
  3. The minimum level number is 0.
  4. The maximum level number is 99.
  5. The maximum level number increment is 1.
  6. The level must be followed by a delimiter.

  7. A record_id can be a pointer string or an embedded id string.
  8. The length of a record_id is between 3 and 22 characters
  9. The record_id must be followed by a delimiter.
  10. The record_id must be unique to the file.

  11. for example:
    0 @I1@ INDI
      1 @!O1@ OBJE (I1 is implied)
      1 @I1!O1@ OBJE (duplicates not allowed)
      
    0 @I1@ INDI (duplicates not allowed)
    

  12. The tag is a alphanum string.
  13. The length of the tag is between 1 and 31 characters.
  14. The first 15 characters of the tag must be unique.

  15. A reference_id is a pointer string.
  16. The length of a reference_id is between 3 and 22 characters
  17. The reference_id must be preceded by a delimiter.
  18. The reference_id must be followed by a terminator.
  19. The presence of a reference_id implies that the record_id exists in the file unless a colon is present.
  20. If the reference_id contains an exclamation point, the record_id must exist in an embedded record contained within the same logical record.

    for example:
    0 @I1@ INDI
      1 @I1!O1@ OBJE
      1 OBJE @I1!O1@
      1 OBJE @!O1@ (I1 is implied)
    
    0 @I2@ INDI
      1 OBJE @I1!O1@ (not allowed)
    

  21. A line_value is a data string.
  22. The line_value must be preceded by a delimiter.
  23. The line_value must be followed by a terminator.
  24. If an ampersand is desired as part of the line_value, it must be included as a double-at string (i.e. name@@school.edu).

  25. The maximum length of a line is 255 characters.
  26. The maximum length of a logical record is 32 kilobytes. (Logical records are delineated by level numbers equal to 0 [zero]). [NOT SUPPORTED]

Dictionary Tests

To validate the dictionary, Gigatrees compares the structure of the logical records to the dictionary template associated with its GEDCOM version. It also validates general dictionary constructs common to all supported GEDCOM versions.

  1. The GEDCOM version must be either "5.5", "5.5.1" or "5.6".
  2. Each line should match the dictionary template unless the line has a user defined tag beginning with an underscore.
  3. Each record_id should be referenced from within the same file.
  4. If the template expects a record_id, then the line must have a record_id of the same type.
  5. If the template expects no record_id, then the line must not have a record_id.
  6. If the template expects a reference_id, then the line must have a reference_id of the same type.
  7. If the template expects no reference_id, then the line must not have a reference_id.
  8. If the template defines a minimum number of record occurrences, then the record should not have fewer.
  9. If the template defines a maximum number of record occurrences, then the record should not have more.
  10. If the template defines a minimum line_value length, then the line_value should not be shorter.
  11. If the template defines a maximum line_value length, then the line_value should not be longer.

Validation Statuses

Gigatrees somewhat arbitrarily, divides its validation statuses into three categories, Errors, Warnings, and Alerts. Errors are critical line failures that will more than likely prevent the line from being usable by importing applications. Warnings violate the letter of the specification, but are likely to not interfere with their usability by importing applications. Alerts are not violations and are provided for information purposes only. All warnings and alerts can be optionally ignored. Additional options are available.

Errors

  • Unsupported GEDCOM version detected
  • Level number expected
  • Level number gap
  • Invalid ID length
  • ID missing
  • Invalid ID reference length
  • Tag Expected
  • Data contains non-printable characters
  • ID reference missing
  • Unexpected ID reference
  • Invalid ID reference type
  • ID reference substitution
  • Duplicate record found
  • Referenced record not found

Warnings

  • Level number exceeds limit
  • Level has leading zero
  • ID delimiter missing
  • Invalid ID length
  • Invalid ID character
  • Invalid ID reference length
  • Invalid ID reference character
  • Invalid tag length
  • Invalid tag character
  • Too few occurrences of tag
  • Too many occurrences of tag
  • Data contains tabs
  • Maximum line length exceeded
  • Data missing
  • Insufficient data
  • Maximum data length exceeded
  • Data not expected
  • Trailing spaces not expected
  • Trailing data not expected
  • Unpaired ampersand (@)
  • Undefined record found
  • Record not referenced

Alerts

  • User defined record found

Footnotes:
 [1]The following langauges are supported: Afrikaans, Amharic, Arabic, Belarusian, Bulgarian, Bengali, Bosnian, Catalan, Czech, Welsh, Danish, German, Greek, English, Spanish, Estonian, Basque, Persian, Finnish, French, Irish, Scottish Gaelic, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Icelandic, Italian, Japanese, Javanese, Kannada, Korean, Lithuanian, Latvian, Malay, Norwegian, Dutch, Punjabi, Polish, Portuguese, Romanian, Russian, Slovakian, Slovenian, Serbian, Swedish, Swahili, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese and Chinese.
  • Last Modified:
Built with Innuendo 1.0.4