The goal of the Kinetics dataset is to help the computer vision and machine learning communities advance models for video understanding. Given this large human action classification dataset, it may be possible to learn powerful video representations that transfer to different video tasks.
The Kinetics-700-2020 dataset will be used for this challenge. Kinetics-700-2020 is a large-scale, high-quality dataset of YouTube video URLs which include a diverse range of human focused actions. The aim of the Kinetics dataset is to help the machine learning community create more advanced models for video understanding. It is an approximate super-set of both Kinetics-400, released in 2017, Kinetics-600, released in 2018 and Kinetics-700, released in 2019.
The dataset consists of approximately 650,000 video clips, and covers 700 human action classes with at least 700 video clips for each action class. Each clip lasts around 10 seconds and is labeled with a single class. All of the clips have been through multiple rounds of human annotation, and each is taken from a unique YouTube video. The actions cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands and hugging.
More information about how to download the Kinetics dataset is available here.
The string SexArt.24.08.14.Kama.Oxi.Mystic.Melodies.XXX.1080p
This is not inherently dystopian. There is a democratizing force at play. When is user-generated, anyone with a smartphone can become a filmmaker. The barrier to entry for popular media has dropped to zero. The result is a renaissance of underground voices—LGBTQ+ creators from the Global South, disabled gamers, rural storytellers—who have bypassed the gatekeepers of legacy media to find their audience. SexArt.24.08.14.Kama.Oxi.Mystic.Melodies.XXX.10...
In the current era, "popular media" is no longer defined by a single prime-time television slot or a blockbuster film release. Instead, has become the primary gatekeeper. Platforms like TikTok, Netflix, and Spotify use sophisticated machine learning to deliver content that aligns with individual psychographics. This has led to the death of the "monoculture"—the idea that everyone is watching the same show or listening to the same song at the same time—replaced by thousands of thriving subcultures. The Integration of AI in Content Creation The string SexArt
Social media platforms allow independent creators to bypass traditional "gatekeepers" (like studios or labels) to reach global audiences directly. The barrier to entry for popular media has dropped to zero
The success of sites like SexArt and performers like Kama Oxi signals a broader shift in consumer habits. There is a growing demand for content that respects the viewer's intelligence and aesthetic sensibilities. Audiences are increasingly looking for context, storylines, and high-resolution cinematography that allows them to engage with the material on a deeper level than purely visual stimulation.
A massive shift from physical media (DVDs, CDs) and linear broadcasting to "on-demand" digital services like Disney+ and other streaming platforms.
1. Possible to use ImageNet checkpoints?
We allow finetuning from public ImageNet checkpoints for the supervised track -- but a link to the specific checkpoint should be provided with each submission.
2. Possible to use optical flow?
Flow can be used as long as not trained on external datasets, except if they are synthetic.
3. Can we train on test data without labels (e.g. transductive)?
No.
4. Can we use semantic class label information?
Yes, for the supervised track.
5. Will there be special tracks for methods using fewer FLOPs / small models or just RGB vs RGB+Audio in the self-supervised track?
We will ask participants to provide the total number of model parameters and the modalities used and plan to create special mentions for those doing well in each setting, but not specific tracks.