|Description (include details on usage, files and paper references)||The Crepe Dataset provides 6 different types of structured activity videos in 1920x1080 resolution. Each activity is represented as a sequence of different action components.
Notable features of this dataset includes:
- Structured activities as a sequence of component actions.
- Multiple activities running in parallel.
- Inclusion of distractors that are not relevant to defined activities.
- Every frame is annotated with bounding boxes, agent types, agent occlusion and action labels.
We provide the following human-labeled annotations:
- Bounding box of every person
- Person type (action performer or distractor)
- Occlusion against another person
- Action label
- Activity label