GEDCOM Files: Mac – PC

Note: This is a very long article. If your not doing a frequent exchange of GEDCOM files with other people, or you don’t have any problems when you do so, you should probably skip this. This grew out of a problem I had with a relative. We did much of what was listed below, until we were comfortable in knowing what the problems were on both sides, and how to work around them.

It’s a very serious problem in the genealogy software world – there are companies that can make it difficult, or at the least don’t make it very easy, to migrate or share information between platforms or genealogy programs. The idea being that they want you locked into their software.

If your having problems with GEDCOM files, going between Macs and PCs (even Macs to Macs or PCs to PCs), there are a few things you can do:

Check whether you are exporting using ASCII, ANSEL, or Unicode (UTF-8 or UTF-16). Keep in mind that ANSEL is a super-set of ASCII and contains ASCII’s characters.

If you are importing from a PC, have them generate GEDCOMs in ANSEL and Unicode format, and check both of them out. Pay attention to any warning messages you may receive.

It may not be a character set problem, it maybe your program or their program is generating too much information (too many custom tags, etc.), or one of the programs is writing data that is not contained within the proper fields and/or the going back and forth is cutting it off. Technically, the originating program may have no problem with storing data that falls outside of the GEDCOM standard, but when it goes to export and puts that information into the GEDCOM file, it’s either cutting it off, or placing it somewhere outside of the normal GEDCOM structure or creating a tag or a field that is ignored by the importing program.

It can be tedious to find where any problems are happening. Many genealogy applications will store data they couldn’t import properly in another area, maybe it’s under some kind of research notes, or in a database that is not based on a text-based format like GEDCOM.

That doesn’t always help you though, other than tell you that, yes, there is a problem, and here is the data. It doesn’t say “next time, export in this format instead”. It can help you though, if you determine there are only a few minor errors – you can then work with the other person on how they have their data organized – if it’s just one or two tags having problems, you can maybe have them place the data in another tag, or move that information into a note.

Your best bet is for each of you to create a new file, with maybe three generations in a family, with no more than 10 individuals. Try to fill out all the tags that you can, print your file and have them print their file (both printings should be in some kind of format that shows as much data about several individuals as possible).

Then exchange these 10 person GEDCOM files, and then each print again. Put the printouts side by side, and see how everything came out.

You can also take a standard blank GEDCOM file, edit it with a text editor (but be careful, some text editors want to turn it into an RTF format, or you could end up with a Unicode file by accident), and repeat the same process as the simple 10-person GEDCOM mentioned above.

You can find blank GEDCOM files here:
Heiner Eichmann’s GEDCOM 5.5 Sample Page – this site has a lot of information about GEDCOM files, and about the different character sets and fields, and it’s a good starting point. GEDitCOM’s developer hosts a GEDCOM Torture Test Files page as well.

I would recommend trying Smultron (Smultron – Mac App Store) on the Mac side or the built-in editor for Mac OS X – TextEdit, and make sure to export as plain text. Be wary of other text editors, especially ones that get fancy, as they trend towards putting everything in RTF format, especially on the Windows side. On the Windows side, I would say give Metapad (Metapad) a try. Both Metapad and Smultron have the ability to easily export clean text files. Regardless of what you choose, you should be able to easily find a way to generate a clean text file.

Another way, you can go and download a GEDCOM file from half a dozen places on the internet, just random GEDCOM files, each of you import them, print out the full details for a few individuals out of each one, and compare notes. That is even more tedious, but it might help you pinpoint where problems are occurring, because at this point your (hopefully) trying out GEDCOMs that were created on completely different applications than what you and the other person are using.

Yes, that increases the chances that there will be problems, since your adding to the equation, but at the same time you maybe able to learn a little bit more.

You can also fall back to an older standard on some applications when generating a GEDCOM file, and repeat the above process.

If you are not going to be exchanging data very frequently, it’s not going to be worth it to go through all of the above.

I would definitely recommend against merging data from one or more GEDCOM files generated by applications other than what you are using, for the simple fact that you should always be careful about what you import (i.e. keeping an eye on your source references). Plus, if there are slight differences between what is originally generated and what is imported, you might not find them until later on, and the problem goes from being relatively harmless to hours and hours of fixing (or simply reverting to an older GEDCOM if you keep good, regular backups).

More information on the GEDCOM 5.5 standard here : GEDCOM Standard Release 5.5 and Wikipedia’s GEDCOM page.

GEDCOM 6 XML, that is supposed to fix all of our problems, will be a standardized format in time for your gg-grandchildren to use (or rather the automated software applications that go out and do the research for them 😉 )

Note: A GEDCOM with an extension of.uged or UGED is MacFamilyTree’s way of showing that a GEDCOM file is in the Unicode (UTF-8) format.

February 2012 Update: FamilySearch is proposing GEDCOM X, a new standard that takes into account linking, digital documents and sources, and online work.