I decided to come back and work on this open source project more. I updated it to 1.1, with a big focus on performance.
The feed is random (based on the collections and identifiers enabled). By default I just enable a collection by "AV Geeks". It's usually NASA/Space stuff, or old educational videos. I use them cause their videos are optimized the best! Other collections have varying degrees of loading issues. Sometimes I can compensate in the player, but sometimes it's just slow to load. But AVGeeks is rock solid unless the server is down! I had the complete collection of Seinfeld in the main feed when I launched v1.0. (There is a ton of copyrighted content on the Internet Archive). I wanted to see how long it would take to see it taken down...3 days. lol. Anyway, to customize the feed, you can create Presets where you can flip on collections. You can add any identifier you find to a preset as well.
Search was custom implemented. Internet Archive does have a basic keyword search, but I wanted something more. So I used the Internet Archive cli tool to download metadata for all the collections I liked. I used OpenAI to generate embeddings for the metadata (just the metadata, didn't get super fancy here, but would be fun to maybe extract more metadata from the frames?). Anyway, I stored the embeddings in Pinecone. Boom. Natural language search with easy 'show me videos like this one'. Also added a year filter to only focus on older videos (otherwise you get tons of lame G4 vids). I want to include more metadata on Pinecone, I stupidly only included year at first. Now I'd have to reindex and just haven't bothered yet.
Another interesting thing I learned was that identifiers on The Internet Archive can have multiple individual video files. For example, before it was removed, Seinfeld has each episode in a different video file, all organized under one single identifier. SO, if I had a preset with only that one identifier, as I swiped, each time I would make it show a random identifier, cued up to a random point in the video. This made it so I could swipe thru all the episodes of Seinfeld...get bored, swipe to the next...watch a bit, get bored, swipe. SORT of like a clip show? But each video was the FULL episode, so if it was a part I knew and liked I could watch for a while. Could always swipe when I got bored tho. This sparked in me, cause I think it's a good interface. Short form video interface for long form legacy video content. I dunno. I like that. Really good for old sitcoms, skit comedy shows, etc. Similar to how shows like Parks & Rec just post up hour long clip comps on Youtube. This is like that, but you are more in control. Like I said, there is a lot of copyrighted content on the Internet Archive, but it regularly gets taken down, as expected. And therein lies the problem. I like the interface enough tho, so I made a version of this app for the Plex Media Server. So I loaded up Parks and Rec, and did exactly what I described. But, eh...I think the Venn diagram of people that like this type of TikTok interface and people who use Plex Media Server are... me? lol. I briefly looked into how to do this for real, but UGH. The FAST licensing model is the most likely candidate, and my friend at NBC says that would be the easiest to get, but after investigating it's purely about linear programming. Specific ad placement at specific times. Not setup for a vertical swipe feed. Would have to negotiate a totally new thing. FAVF=Free Ad-supported Vertical Feed? Not my thing. Anyone a tech media legal person interested in negotiating a new licensing format?
So thats how ended up back on this open source project, hanging with Claude again. Enjoy the v1.1 update.
I decided to come back and work on this open source project more. I updated it to 1.1, with a big focus on performance.
The feed is random (based on the collections and identifiers enabled). By default I just enable a collection by "AV Geeks". It's usually NASA/Space stuff, or old educational videos. I use them cause their videos are optimized the best! Other collections have varying degrees of loading issues. Sometimes I can compensate in the player, but sometimes it's just slow to load. But AVGeeks is rock solid unless the server is down! I had the complete collection of Seinfeld in the main feed when I launched v1.0. (There is a ton of copyrighted content on the Internet Archive). I wanted to see how long it would take to see it taken down...3 days. lol. Anyway, to customize the feed, you can create Presets where you can flip on collections. You can add any identifier you find to a preset as well.
Search was custom implemented. Internet Archive does have a basic keyword search, but I wanted something more. So I used the Internet Archive cli tool to download metadata for all the collections I liked. I used OpenAI to generate embeddings for the metadata (just the metadata, didn't get super fancy here, but would be fun to maybe extract more metadata from the frames?). Anyway, I stored the embeddings in Pinecone. Boom. Natural language search with easy 'show me videos like this one'. Also added a year filter to only focus on older videos (otherwise you get tons of lame G4 vids). I want to include more metadata on Pinecone, I stupidly only included year at first. Now I'd have to reindex and just haven't bothered yet.
Another interesting thing I learned was that identifiers on The Internet Archive can have multiple individual video files. For example, before it was removed, Seinfeld has each episode in a different video file, all organized under one single identifier. SO, if I had a preset with only that one identifier, as I swiped, each time I would make it show a random identifier, cued up to a random point in the video. This made it so I could swipe thru all the episodes of Seinfeld...get bored, swipe to the next...watch a bit, get bored, swipe. SORT of like a clip show? But each video was the FULL episode, so if it was a part I knew and liked I could watch for a while. Could always swipe when I got bored tho. This sparked in me, cause I think it's a good interface. Short form video interface for long form legacy video content. I dunno. I like that. Really good for old sitcoms, skit comedy shows, etc. Similar to how shows like Parks & Rec just post up hour long clip comps on Youtube. This is like that, but you are more in control. Like I said, there is a lot of copyrighted content on the Internet Archive, but it regularly gets taken down, as expected. And therein lies the problem. I like the interface enough tho, so I made a version of this app for the Plex Media Server. So I loaded up Parks and Rec, and did exactly what I described. But, eh...I think the Venn diagram of people that like this type of TikTok interface and people who use Plex Media Server are... me? lol. I briefly looked into how to do this for real, but UGH. The FAST licensing model is the most likely candidate, and my friend at NBC says that would be the easiest to get, but after investigating it's purely about linear programming. Specific ad placement at specific times. Not setup for a vertical swipe feed. Would have to negotiate a totally new thing. FAVF=Free Ad-supported Vertical Feed? Not my thing. Anyone a tech media legal person interested in negotiating a new licensing format?
So thats how ended up back on this open source project, hanging with Claude again. Enjoy the v1.1 update.