I've continued to track down list of movies that are legal to distribute on the Internet, and identified more than 11,000 title IDs in The Internet Movie Database (IMDB) so far. Most of them (57%) are feature films from USA published before 1923. I've also tracked down more than 24,000 movies I have not yet been able to map to IMDB title ID, so the real number could be a lot higher. According to the front web page for Retro Film Vault, there are 44,000 public domain films, so I guess there are still some left to identify.
The complete data set is available from a public git repository, including the scripts used to create it. Most of the data is collected using web scraping, for example from the "product catalog" of companies selling copies of public domain movies, but any source I find believable is used. I've so far had to throw out three sources because I did not trust the public domain status of the movies listed.
Anyway, this is the summary of the 28 collected data sources so far:
2352 entries ( 66 unique) with and 15983 without IMDB title ID in free-movies-archive-org-search.json 2302 entries ( 120 unique) with and 0 without IMDB title ID in free-movies-archive-org-wikidata.json 195 entries ( 63 unique) with and 200 without IMDB title ID in free-movies-cinemovies.json 89 entries ( 52 unique) with and 38 without IMDB title ID in free-movies-creative-commons.json 344 entries ( 28 unique) with and 655 without IMDB title ID in free-movies-fesfilm.json 668 entries ( 209 unique) with and 1064 without IMDB title ID in free-movies-filmchest-com.json 830 entries ( 21 unique) with and 0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json 19 entries ( 19 unique) with and 0 without IMDB title ID in free-movies-imdb-c-expired-gb.json 6822 entries ( 6669 unique) with and 0 without IMDB title ID in free-movies-imdb-c-expired-us.json 137 entries ( 0 unique) with and 0 without IMDB title ID in free-movies-imdb-externlist.json 1205 entries ( 57 unique) with and 0 without IMDB title ID in free-movies-imdb-pd.json 84 entries ( 20 unique) with and 167 without IMDB title ID in free-movies-infodigi-pd.json 158 entries ( 135 unique) with and 0 without IMDB title ID in free-movies-letterboxd-looney-tunes.json 113 entries ( 4 unique) with and 0 without IMDB title ID in free-movies-letterboxd-pd.json 182 entries ( 100 unique) with and 0 without IMDB title ID in free-movies-letterboxd-silent.json 229 entries ( 87 unique) with and 1 without IMDB title ID in free-movies-manual.json 44 entries ( 2 unique) with and 64 without IMDB title ID in free-movies-openflix.json 291 entries ( 33 unique) with and 474 without IMDB title ID in free-movies-profilms-pd.json 211 entries ( 7 unique) with and 0 without IMDB title ID in free-movies-publicdomainmovies-info.json 1232 entries ( 57 unique) with and 1875 without IMDB title ID in free-movies-publicdomainmovies-net.json 46 entries ( 13 unique) with and 81 without IMDB title ID in free-movies-publicdomainreview.json 698 entries ( 64 unique) with and 118 without IMDB title ID in free-movies-publicdomaintorrents.json 1758 entries ( 882 unique) with and 3786 without IMDB title ID in free-movies-retrofilmvault.json 16 entries ( 0 unique) with and 0 without IMDB title ID in free-movies-thehillproductions.json 63 entries ( 16 unique) with and 141 without IMDB title ID in free-movies-vodo.json 11583 unique IMDB title IDs in total, 8724 only in one list, 24647 without IMDB title ID
I keep finding more data sources. I found the cinemovies source just a few days ago, and as you can see from the summary, it extended my list with 63 movies. Check out the mklist-* scripts in the git repository if you are curious how the lists are created. Many of the titles are extracted using searches on IMDB, where I look for the title and year, and accept search results with only one movie listed if the year matches. This allow me to automatically use many lists of movies without IMDB title ID references at the cost of increasing the risk of wrongly identify a IMDB title ID as public domain. So far my random manual checks have indicated that the method is solid, but I really wish all lists of public domain movies would include unique movie identifier like the IMDB title ID. It would make the job of counting movies in the public domain a lot easier.
As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.