I saw this on FamilySearch today – How Machine-Learning and OCR Are Changing Family History: https://www.familysearch.org/blog/en/optical-character-recognition-indexing/. I included a brief portion of the article. I do want to caution people that OCR is not foolproof. We use it where I volunteer to scan marriage licenses in the system. The person doing it has to check each index after he does it and correct or add missing information. It probably gets around 90% correct, but it misses some information and wild guesses on some information it includes. For the most part, these are typewritten applications where he only looks at certain fields when doing the OCR. Generally, FamilySearch has two people index each record. If both agree, they are usually accepted. If there are differences between the two indexers, a third person reviews the records. That doesn’t mean that both indexers get it right. I remember seeing where someone indexed maiden name as Ruhr. Looking at the record, it was Unknown.
October 26, 2020 – by David Nielsen
If this article caught your eye, you probably have an interest in indexing or in online historical records. Maybe you’ve made indexing a part of your weekly or monthly volunteer efforts. If so, keep up the amazing work! You’re making it possible for people around the world to discover their ancestors and learn more about their family histories.
Still, our indexing volunteers have a colossal task in front of them. The world has billions and billions of records waiting to be indexed. Although we have hundreds of thousands of people willing to help out, we’re still outnumbered and it is clear that our volunteers will need help.
Enter optical character recognition—also called OCR, or computer-assisted indexing. Either name works—the more important thing is that the technology works. Thanks to OCR, we’re improving the quality of indexing, increasing the number of indexed records, and accelerating the speed at which historical records become available to the people who visit our website.
The result is more information for people to search and more documents to explore—in short, more opportunities to make that discovery about your family that connects you to your past.
FamilySearch and Computer-Assisted Indexing
So far, FamilySearch has employed optical character recognition to index a whopping 64 million historical records. The project in question involves a collection of Spanish-language records—namely christenings, marriages, burials, and other church documents. When the project is complete, nearly 900 million records will have been indexed and in need of review by an actual person.
FamilySearch International is the largest genealogy organization in the world. FamilySearch is a nonprofit, volunteer-driven organization sponsored by The Church of Jesus Christ of Latter-day Saints. Millions of people use FamilySearch records, resources, and services to learn more about their family history. To help in this great pursuit, FamilySearch and its predecessors have been actively gathering, preserving, and sharing genealogical records worldwide for over 100 years. Patrons may access FamilySearch services and resources free online at FamilySearch.org or through over 5,000 family history centers in 129 countries, including the main Family History Library in Salt Lake City, Utah.