While looking at the scanned copies for the copyright renewal entries for movies published in the USA, an idea occurred to me. The number of renewals are so few per year, it should be fairly quick to transcribe them all and add references to the corresponding IMDB title ID. This would give the (presumably) complete list of movies published 28 years earlier that did _not_ enter the public domain for the transcribed year. By fetching the list of USA movies published 28 years earlier and subtract the movies with renewals, we should be left with movies registered in IMDB that are now in the public domain. For the year 1955 (which is the one I have looked at the most), the total number of pages to transcribe is 21. For the 28 years from 1950 to 1978, it should be in the range 500-600 pages. It is just a few days of work, and spread among a small group of people it should be doable in a few weeks of spare time.
A typical copyright renewal entry look like this (the first one listed for 1955):
ADAM AND EVIL, a photoplay in seven reels by Metro-Goldwyn-Mayer Distribution Corp. (c) 17Aug27; L24293. Loew's Incorporated (PWH); 10Jun55; R151558.
The movie title as well as registration and renewal dates are easy enough to locate by a program (split on first comma and look for DDmmmYY). The rest of the text is not required to find the movie in IMDB, but is useful to confirm the correct movie is found. I am not quite sure what the L and R numbers mean, but suspect they are reference numbers into the archive of the US Copyright Office.
Tracking down the equivalent IMDB title ID is probably going to be a manual task, but given the year it is fairly easy to search for the movie title using for example http://www.imdb.com/find?q=adam+and+evil+1927&s=all. Using this search, I find that the equivalent IMDB title ID for the first renewal entry from 1955 is http://www.imdb.com/title/tt0017588/.
I suspect the best way to do this would be to make a specialised web service to make it easy for contributors to transcribe and track down IMDB title IDs. In the web service, once a entry is transcribed, the title and year could be extracted from the text, a search in IMDB conducted for the user to pick the equivalent IMDB title ID right away. By spreading out the work among volunteers, it would also be possible to make at least two persons transcribe the same entries to be able to discover any typos introduced. But I will need help to make this happen, as I lack the spare time to do all of this on my own. If you would like to help, please get in touch. Perhaps you can draft a web service for crowd sourcing the task?
Note, Project Gutenberg already have some transcribed copies of the US Copyright Office renewal protocols, but I have not been able to find any film renewals there, so I suspect they only have copies of renewal for written works. I have not been able to find any transcribed versions of movie renewals so far. Perhaps they exist somewhere?
I would love to figure out methods for finding all the public domain works in other countries too, but it is a lot harder. At least for Norway and Great Britain, such work involve tracking down the people involved in making the movie and figuring out when they died. It is hard enough to figure out who was part of making a movie, but I do not know how to automate such procedure without a registry of every person involved in making movies and their death year.
As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.