Multi-Threaded Search in CP-SAT
How parallel workers improve scheduling results
Not data parallelism
A common misconception: when you set num_workers = 8 in CP-SAT, the solver does NOT split your scheduling problem into 8 pieces. It does not assign employees 1-37 to worker 1 and employees 38-75 to worker 2. That would be data parallelism, and it does not work for constraint programming.
Instead, CP-SAT uses portfolio-based parallelism. Each worker runs a different search strategy on the full problem. All 8 workers see every employee, every shift, every constraint. What differs is how they explore the solution space. They share their best solutions and bounds in real time, so every discovery by one worker benefits all the others.
This distinction matters because it explains why adding more workers improves solution quality, not just speed. More workers means more diverse search strategies running simultaneously, which means a higher chance of finding better solutions.
Search strategies
CP-SAT runs multiple search strategies simultaneously, each one suited to different aspects of the problem:
Default search (branch-and-bound)
The classic approach. Pick a variable, try a value, propagate constraints, backtrack if it fails. CP-SAT uses domain reduction to prune impossible values before branching. This strategy is methodical and thorough but can be slow on large problems.
Large Neighborhood Search (LNS)
Start from the current best solution, destroy a random subset of assignments (e.g. unassign 20% of employee-day pairs), then repair the destroyed part while keeping the rest fixed. LNS is excellent at improving good solutions incrementally. It often finds the final best solution in scheduling problems.
Linear relaxation
Relax the Boolean variables to continuous values (allow 0.5 instead of just 0 or 1), then solve the resulting linear program. The LP solution provides a lower bound on the objective. This bound tells all other workers: "no solution can score below this value," which prunes their search trees.
Core-based search
Find minimal unsatisfiable cores: small subsets of soft constraints that cannot all be satisfied simultaneously. This strategy identifies which trade-offs are unavoidable, helping the solver focus on realistic improvements rather than chasing impossible targets.
Fixed search
Follow a fixed variable ordering, typically based on the problem structure. For scheduling, this might mean assigning shifts day by day or employee by employee. Predictable but fast, and sometimes finds good solutions that randomized strategies miss.
Random search with restarts
Make random branching decisions, but restart frequently. The randomness provides diversity, and restarts prevent the search from getting stuck in unproductive regions. Combined with learned clauses from previous restarts, this strategy can be surprisingly effective.
When one worker finds a better solution, all others update their bounds immediately. If the LNS worker improves the score from 95,000 to 88,000, the branch-and-bound worker can now prune any subtree that would score above 88,000.
How workers collaborate
The key mechanism is a shared solution pool. All workers read from and write to the same pool. The collaboration works like this:
Suppose worker 1 (branch-and-bound) finds an initial solution with score 120,000. This solution is immediately visible to all other workers. Worker 3 (LNS) takes this solution, destroys and repairs parts of it, and finds a score of 95,000. Now all workers know the target: beat 95,000.
Worker 2 (linear relaxation) computes a lower bound of 82,000. This means no solution can possibly score below 82,000. The gap between 95,000 (best found) and 82,000 (theoretical best) tells the solver how much room for improvement exists.
Worker 4 (LNS with a different destruction pattern) finds 88,000. Worker 5 (core-based) identifies that 3,000 points of penalty are unavoidable due to conflicting constraints. The lower bound tightens to 85,000. The gap narrows from 13,000 to 3,000.
This convergence happens without any explicit coordination between workers. Each worker simply does its own thing, but every improvement propagates instantly to all others. The effect is multiplicative: each worker makes all other workers more efficient by narrowing the search space.
Impact of num_workers
Here is what happens when you vary num_workers on a real scheduling problem (200 employees, 30 days, 50 shift types, 5-minute time limit):
| Workers | Solve time | Best score | Improvement vs 1 worker |
|---|---|---|---|
| 1 | 30+ minutes | 125,000 | baseline |
| 2 | 18 minutes | 110,000 | 12% better |
| 4 | 12 minutes | 98,000 | 22% better |
| 8 | 8 minutes | 92,000 | 26% better |
| 16 | 7 minutes | 90,000 | 28% better |
Two things stand out. First, the improvement in solution quality is substantial: 8 workers find a score 26% better than 1 worker on the same problem. This is not just about speed. The diversity of search strategies discovers solutions that a single strategy would never find within a reasonable time.
Second, the speed improvement is super-linear up to 4 workers. One worker needs 30+ minutes; 4 workers need 12 minutes. That is a 2.5x speedup from a 4x increase in compute, but the solution is also 22% better. The real value is the combination of faster convergence and higher solution quality.
Benchmarks
Measured across different problem sizes on a standard server (16 cores, 32 GB RAM), with a 10-minute time limit:
| Employees | 1 worker | 4 workers | 8 workers | 16 workers |
|---|---|---|---|---|
| 100 | 8 min / 95k | 3 min / 82k | 2 min / 78k | 1.8 min / 77k |
| 150 | 15 min / 115k | 6 min / 95k | 4 min / 88k | 3.5 min / 85k |
| 200 | 30+ min / 125k | 12 min / 98k | 8 min / 92k | 7 min / 90k |
| 300 | timeout | 25 min / 145k | 15 min / 120k | 12 min / 115k |
Format: solve time / best score (lower score = better schedule). "Timeout" means the solver did not find a satisfactory solution within the time limit with a single worker.
The 300-employee row is particularly telling. A single worker cannot solve this problem in a reasonable time. With 4 workers, it becomes tractable. With 8, it is practical. The gap between 8 and 16 workers is modest, suggesting that 8 workers capture most of the benefit for this problem size.
Diminishing returns
The relationship between worker count and performance follows a curve of diminishing returns:
- 1 to 4 workers: dramatic improvement. Each new worker adds a fundamentally different search strategy. The LP relaxation worker provides bounds that the branch-and-bound worker could not compute alone. The LNS worker improves solutions that the default search found. This is where you get the biggest gains.
- 4 to 8 workers: strong improvement. Additional workers run variations of the core strategies (LNS with different destruction sizes, random search with different restart policies). The search becomes more robust and less likely to get stuck.
- 8 to 16 workers: moderate improvement. The additional workers mostly run more variations of existing strategies. You gain 10-20% in solution quality and solve time, which matters for large problems but is not transformative.
- Beyond 16: marginal improvement. CP-SAT has a finite number of distinct strategies. Beyond 16 workers, you are mostly running duplicates with slightly different random seeds. The shared solution pool becomes the bottleneck as workers spend more time synchronizing than searching.
The diversity of search strategies matters more than raw parallelism. Eight workers running eight different strategies will outperform sixteen workers running eight strategies twice.
Memory considerations
Each worker maintains its own search tree, propagation data structures, and clause database. Memory usage scales roughly linearly with the number of workers.
| Model size | 1 worker | 4 workers | 8 workers | 16 workers |
|---|---|---|---|---|
| 100 employees | 200-400 MB | 500 MB - 1 GB | 1-2 GB | 2-3 GB |
| 200 employees | 400-800 MB | 1-2 GB | 2-4 GB | 4-6 GB |
| 300 employees | 600 MB - 1 GB | 1.5-3 GB | 2-4 GB | 4-8 GB |
For a dedicated scheduling server running a 300-employee model with 8 workers, plan for 4 GB of RAM minimum, 8 GB recommended. With 16 workers, 8 GB minimum, 16 GB recommended. The operating system and other processes need memory too.
Memory usage also depends on solve time. Longer runs accumulate more learned clauses, which consume memory. If you notice memory growing over time, you can set a clause database limit or reduce the time limit.
In practice, 16 GB of RAM is sufficient for any scheduling problem under 300 employees with 8 workers. This fits comfortably on a standard on-premise server.
Practical configuration
The recommended setup for a production scheduling server:
from ortools.sat.python import cp_model
solver = cp_model.CpSolver()
solver.parameters.num_workers = 8
solver.parameters.max_time_in_seconds = 300
solver.parameters.log_search_progress = True
Choosing num_workers
Set num_workers to the number of physical CPU cores on your server. Not logical cores (hyperthreading) but physical cores. CP-SAT workers are CPU-bound, and hyperthreading provides minimal benefit for constraint solving.
For a dedicated scheduling server, 8 cores is the sweet spot. It captures most of the parallelism benefit (the jump from 4 to 8 is still significant), while keeping hardware costs reasonable. Going to 16 cores gives 10-20% improvement at roughly double the cost.
Monitoring search progress
Enable log_search_progress = True to see which workers find solutions and when. The log output shows lines like:
#1 0.5s best:120000 next:[95000,120000] lns
#3 1.2s best:95000 next:[88000,95000] default
#5 2.8s best:88000 next:[85000,88000] core
This tells you which search strategies are productive. If LNS finds most improvements, the problem benefits from local search. If the default search dominates, the problem has good structure that branch-and-bound can exploit. This information helps you tune time limits and understand your problem.
Time limit
With 8 workers, typical time limits by problem size:
- 100 employees: 2-3 minutes is usually sufficient
- 150 employees: 5 minutes
- 200 employees: 10 minutes
- 300 employees: 15-20 minutes
CP-SAT's anytime behaviour means you always get the best solution found so far, even if you stop early. A shorter time limit gives a slightly worse score, not an empty result. This makes it safe to set aggressive time limits during testing and longer ones for production runs.
Want to benchmark CP-SAT on your scheduling data?
Send us your employee list and shift definitions. We will run multi-threaded benchmarks and show you the results.
Contact us