thunder.distributed

ddp(model, *[, broadcast_from, ...])

Thunder's Distributed Data Parallel.

fsdp(model, *[, device, broadcast_from, ...])

Convert model into Fully Sharded Data Parallel.

FSDPType(value)

Specifies the sharding strategy to be used for FSDP in Thunder.

FSDPBucketingStrategy(value)

Specify how we group parameters into a bucket for collective communication in fsdp.

set_skip_data_parallel_grad_sync(value)

Set whether to skip data parallel grad sync.

reset_skip_data_parallel_grad_sync(token)

Reset whether to skip data parallel grad sync.

get_skip_data_parallel_grad_sync()

Get whether to skip data parallel grad sync.

skip_data_parallel_grad_sync()

A context manager to skip data parallel grad sync.