Video decoder for data_pipeline #200

am831 · 2023-12-02T01:01:14Z

What does this PR do? Please describe:
Implements video_decoder for loading a video dataset as part of the data_pipeline using ffmpeg libraries. The libraries used are libavcodec, libavformat, libavutil, and libswscale.
video_decoder class is exposed to pybind11 which uses the classes in the detail directory for decoding. ffmpeg_decoder class does the heavy lifting and handles the resources acquired by libavformat. stream class handles metadata for the streams and the resources acquired by libavcodec. transform class handles transformations on frame data and resources acquired by libswscale.

You can use this example to test it manually:

from dataclasses import dataclass
from pathlib import Path
from typing import Generator, Sequence, Tuple

import logging 
import torch
from torch import Tensor

from fairseq2.data import Collater, FileMapper, StringLike
from fairseq2.data.video import VideoDecoder
from fairseq2.data.text import StrSplitter, read_text
from fairseq2.typing import DataType, Device
from fairseq2.data.data_pipeline import DataPipeline

@dataclass
class DataContext:
    data_file: Path
    """The pathname of the test TSV data file."""

    video_field: str
    """The string field corresponding to the relative path of the audio file."""

    video_root_dir: Path
    """The pathname of the directory under which audio files are stored."""

    device: Device
    """The device on which to run inference."""

    dtype: DataType
    """The dtype with which to run inference."""


def build_data_pipeline(ctx: DataContext) -> DataPipeline:
    # TODO: This will be soon auto-tuned. Right now hand-tuned for devfair.
    n_parallel = 4

    # Open TSV, skip the header line, split into fields, and return three fields
    # only.
    split_tsv = StrSplitter(
        # We assume the tsv file has these 3 fields.
        names=["id", ctx.video_field, "raw_target_text"], indices=[0, 1, 2]
    )

    pipeline_builder = read_text(ctx.data_file, rtrim=True).skip(1).map(split_tsv)

    # Memory map video files and cache up to 10 files.
    map_file = FileMapper(root_dir=ctx.video_root_dir, cached_fd_count=10)

    pipeline_builder.map(map_file, selector=ctx.video_field, num_parallel_calls=n_parallel)

    # Decode mmap'ed video using ffmpeg and convert them from waveform to fbank.
    decode_vid = VideoDecoder()

    pipeline_builder.map(
        [decode_vid],
        selector=f"{ctx.video_field}.data",
        num_parallel_calls=n_parallel,
    )

    # Build and return the data pipeline.
    return pipeline_builder.and_return()


def run_pipeline(ctx: DataContext):
    """Iterate through the specified TSV file and return translation + reference text + units"""
    # Build a simple pipeline that just reads a single TSV file.
    pipeline = build_data_pipeline(ctx)
    
    # Iterate through each example in the TSV file until CTRL-C.
    for example in pipeline:
        print(example)    

if __name__ == "__main__":
    # fmt: off
    ctx = DataContext(
        data_file=Path("yourtsv"),
        video_field="mp4_file",
        video_root_dir=Path("path"),
        device=torch.device("cpu"),
        dtype=torch.float32
    )
    # fmt: on

    run_pipeline(ctx)

Check list:

Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
Did you read the contributor guideline?
Did you make sure that your PR does only one thing instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

…nto vid_decoding

zrthxn · 2024-04-11T20:34:12Z

I get this error from this line

fairseq2/src/fairseq2/data/video.py

Line 34 in 94570e9

from fairseq2n.bindings.data.video import VideoDecoder as VideoDecoder

Traceback (most recent call last):
  File "diffusersurgical/dataloader.py", line 5, in <module>
    from fairseq2.data.video import VideoDecoder
  File ".venv/lib/python3.10/site-packages/fairseq2/data/video.py", line 34, in <module>
    from fairseq2n.bindings.data.video import VideoDecoder as VideoDecoder
ModuleNotFoundError: No module named 'fairseq2n.bindings.data.video'; 'fairseq2n.bindings.data' is not a package

am831 and others added 29 commits November 2, 2023 12:41

set up video decoder

3e2183c

header file

4c8db4b

Merge branch 'main' of https://github.com/facebookresearch/fairseq2 i…

bee4972

…nto vid_decoding

libavcodec

10986bb

video decoder progress with libavcodec/libavformat

08943ea

Merge branch 'facebookresearch:main' into vid_decoding

d493e00

fix library linking

f9902cf

video decoder debugging

37853e9

fix seg fault

f2a9dcf

video decoder progress

4a9645b

add open_streams, deocde_frame, utils.h

72e650b

decoder progress

4fd41de

Merge branch 'facebookresearch:main' into vid_decoding

03c4681

decoder progress

9cb21fb

decoder progress

bb2bafb

combine decoder functions. write data into tensor

9f952ac

convert video frames to rgb

5e61ae7

decode video frames

a5f9ce9

improve design

43b8049

Merge branch 'facebookresearch:main' into vid_decoding

687eb2d

file names

1cb04e2

Merge branch 'facebookresearch:main' into vid_decoding

03bc600

linker error

530edf9

linker error

05f03aa

everything works

da11a5e

clean up

8ad19e6

Merge branch 'facebookresearch:main' into vid_decoding

e2b1edd

clean up

1aefe54

remove unused library

96c5f6f

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 2, 2023

am831 added 10 commits December 2, 2023 11:04

fix dtype

dddc3a1

class for libswscale resources

914d1d5

clang tidy

6e42f36

reformat

e790178

unit test

c1fbd06

unit test

483c7d1

probe format

d1f6d17

transform class

e06f7f4

more options in video_decoder_options

b20694e

more options for video_decoder_options

ebada55

am831 marked this pull request as ready for review December 5, 2023 19:56

am831 requested a review from cbalioglu as a code owner December 5, 2023 19:56

am831 added 5 commits December 5, 2023 17:03

clean up

13111ab

clean up

33cae96

Merge branch 'main' of https://github.com/facebookresearch/fairseq2 i…

155f3b1

…nto vid_decoding

fix cmake

e07fd1f

clang tidy

94570e9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Video decoder for data_pipeline #200

Video decoder for data_pipeline #200

am831 commented Dec 2, 2023 •

edited

zrthxn commented Apr 11, 2024

Video decoder for data_pipeline #200

Are you sure you want to change the base?

Video decoder for data_pipeline #200

Conversation

am831 commented Dec 2, 2023 • edited

zrthxn commented Apr 11, 2024

am831 commented Dec 2, 2023 •

edited