Shard-Only Mode

SwarmLLM supports shard-only operation — a node only needs individual shard files (~512 MB each) plus a small GGUF header (~6 MB), not the full model file.

How It Works

A model directory in shard-only mode:

~/.local/share/swarmllm/models/qwen2.5-coder-7b/
├── manifest.json        # Model metadata + shard layout
├── gguf_header.bin      # First ~6MB of GGUF (metadata + tensor index)
├── shard_000.bin        # 512MB shard
├── shard_001.bin
├── shard_002.bin
└── ...

SwarmLLM automatically extracts gguf_header.bin from shard_000.bin when first needed. The ShardReader constructs a virtual GGUF from header + shard files, so the model parser works exactly as if the full GGUF were present.

Why This Matters

A 7B model is ~4.5 GB as a full GGUF, but a single shard is only ~512 MB
Nodes only load the layers they're assigned — no wasted disk or VRAM
You can participate in inference for a 70B model on a machine with 8 GB VRAM by hosting just a few shards

Manual Shard Assignment (--shards)

For multi-node split inference, assign each node a subset of shards:

./swarmllm run --shards "0-3"    # This node handles shards 0, 1, 2, 3

The range is persisted to the database and restored on subsequent runs. Start without --shards to clear.

Behavior when --shards is set:

The node only advertises the specified shard indices
Auto-manage prioritizes downloading missing shards in the range (100x scoring bonus)
Smart pruning never removes shards in the configured range

Multi-Node Example

Run a 7B model across two machines:

# Machine A (shards 0-3, layers 0-13):
./swarmllm run --shards "0-3" --bootstrap "/ip4/MACHINE_B_IP/udp/8800/quic-v1/p2p/PEER_ID"

# Machine B (shards 4-7, layers 14-27):
./swarmllm run --shards "4-7" --bootstrap "/ip4/MACHINE_A_IP/udp/8800/quic-v1/p2p/PEER_ID"

Both nodes discover each other, assemble a distributed pipeline, and forward hidden-state activations between them. The pipeline is assembled automatically by the InferenceRouter.

Without --shards

If you don't specify --shards, the node auto-detects and advertises all local shards. This is the normal mode for most users — --shards is only needed when you want explicit control over which layers a node handles.