The issue I have been pondering in my head more and more lately is the pain the butt it is to actually get your movie collection into something like ANT, XBMC, or BOXEE. Everyone has a different method to try to auto parse the file names to know what the correct IMDB site to goto is.
I question the legality of a project like this, but Music Brainz is doing something exactly like what I want, but for music. Create a public / social database of the hash of a file pointing to its IMDB page. It would be dead simple to write, and once the project gets going it would theoretically be self sufficient. What are your thoughts ideas?
Hashing Files for Auto-Importing
Hashing large files does take some time, but I bet you can get away without hashing the entire file, and perhaps only the first few megabytes. We're not going for data integrity, just a quick and easy method to say this file relates to this row in the database.
A database would need to be hosted someplace. I wouldn't mind setting one up. I was thinking of making it as open as possible, maybe something like openVideoDB.info or something silly like that.
Write a quick and dirty API to access the hashes along with the IMDB links, and some mechanism to make sure no one trashes it. Probably backing up once a day would be good enough. Perhaps a voting system where people can flag bad matches. And then stop, keep it simple.
A database would need to be hosted someplace. I wouldn't mind setting one up. I was thinking of making it as open as possible, maybe something like openVideoDB.info or something silly like that.
Write a quick and dirty API to access the hashes along with the IMDB links, and some mechanism to make sure no one trashes it. Probably backing up once a day would be good enough. Perhaps a voting system where people can flag bad matches. And then stop, keep it simple.
Yea,
It'll obviously need some method to extend to any service someone is using and thats where it gets complicated. However, it might be able to be simple if you consider all the other sites an addition..
For example:
Table 1 has columns: hash, movie_index, votes
Table 2 has columns: movie_index, title, year, sites
So the scenario would be as follows:
1.) I load some client app that hashes my movie and looks it up in the database.
2.) If its found, we're done.. we know what it is.
3.) If its not found, I use whatever script ant has to pull the movie information, but the main key is we only care about the final link and site name its pulled from.
4.) If there's more than 1 result the user much choose (as they do now in ant).
5.) User selects the movie, and the hash entry is added to Table 1, and the movie info added to Table 2.
The votes column can be used to validate, say we need at least 5 people to agree some file relates to a movie before we say its accurate.
Maybe I'll start a google-code project to launch this and get started, it all seems so simple, but I know we'll hit snags along the way.
It'll obviously need some method to extend to any service someone is using and thats where it gets complicated. However, it might be able to be simple if you consider all the other sites an addition..
For example:
Table 1 has columns: hash, movie_index, votes
Table 2 has columns: movie_index, title, year, sites
So the scenario would be as follows:
1.) I load some client app that hashes my movie and looks it up in the database.
2.) If its found, we're done.. we know what it is.
3.) If its not found, I use whatever script ant has to pull the movie information, but the main key is we only care about the final link and site name its pulled from.
4.) If there's more than 1 result the user much choose (as they do now in ant).
5.) User selects the movie, and the hash entry is added to Table 1, and the movie info added to Table 2.
The votes column can be used to validate, say we need at least 5 people to agree some file relates to a movie before we say its accurate.
Maybe I'll start a google-code project to launch this and get started, it all seems so simple, but I know we'll hit snags along the way.