Configuration Reference¶
Note
It is possible to configure Dask inline with dot notation, with YAML or via environment variables. See the conversion utility for converting the following dot notation to other forms.
Dask¶
-
dask.temporary-directoryNone ¶ Temporary directory for local disk storage /tmp, /scratch, or /local. This directory is used during dask spill-to-disk operations. When the value is "null" (default), dask will create a directory from where dask was launched: `cwd/dask-worker-space`
-
dask.dataframe.shuffle-compressionNone ¶ Compression algorithm used for on disk-shuffling. Partd, the library used for compression supports ZLib, BZ2, SNAPPY, and BLOSC
-
dask.array.svg.size120 ¶ The size of pixels used when displaying a dask array as an SVG image. This is used, for example, for nice rendering in a Jupyter notebook
-
dask.optimization.fuse.activeTrue ¶ Turn task fusion on/off
-
dask.optimization.fuse.ave-width1 ¶ Upper limit for width, where width = num_nodes / height, a good measure of parallelizability
-
dask.optimization.fuse.max-widthNone ¶ Don't fuse if total width is greater than this. Set to null to dynamically adjust to 1.5 + ave_width * log(ave_width + 1)
-
dask.optimization.fuse.max-heightinf ¶ Don't fuse more than this many levels
-
dask.optimization.fuse.max-depth-new-edgesNone ¶ Don't fuse if new dependencies are added after this many levels. Set to null to dynamically adjust to ave_width * 1.5.
-
dask.optimization.fuse.subgraphsNone ¶ Set to True to fuse multiple tasks into SubgraphCallable objects. Set to None to let the default optimizer of individual dask collections decide. If no collection-specific default exists, None defaults to False.
-
dask.optimization.fuse.rename-keysTrue ¶ Set to true to rename the fused keys with `default_fused_keys_renamer`. Renaming fused keys can keep the graph more understandable and comprehensible, but it comes at the cost of additional processing. If False, then the top-most key will be used. For advanced usage, a function to create the new name is also accepted.
Distributed¶
Client¶
-
distributed.client.heartbeat5s ¶ This value is the time between heartbeats The client sends a periodic heartbeat message to the scheduler. If it misses enough of these then the scheduler assumes that it has gone.
-
distributed.client.scheduler-info-interval2s ¶ Interval between scheduler-info updates
Comm¶
-
distributed.comm.retry.count0 ¶ The number of times to retry a connection
-
distributed.comm.retry.delay.min1s ¶ The first non-zero delay between retry attempts
-
distributed.comm.retry.delay.max20s ¶ The maximum delay between retries
-
distributed.comm.compressionauto ¶ The compression algorithm to use This could be one of lz4, snappy, zstd, or blosc
-
distributed.comm.offload10MiB ¶ The size of message after which we choose to offload serialization to another thread In some cases, you may also choose to disable this altogether with the value false This is useful if you want to include serialization in profiling data, or if you have data types that are particularly sensitive to deserialization
-
distributed.comm.default-schemetcp ¶ The default protocol to use, like tcp or tls
-
distributed.comm.socket-backlog2048 ¶ When shuffling data between workers, there can really be O(cluster size) connection requests on a single worker socket, make sure the backlog is large enough not to lose any.
-
distributed.comm.recent-messages-log-length0 ¶ number of messages to keep for debugging
-
distributed.comm.zstd.level3 ¶ Compression level, between 1 and 22.
-
distributed.comm.zstd.threads0 ¶ Number of threads to use. 0 for single-threaded, -1 to infer from cpu count.
-
distributed.comm.timeouts.connect10s ¶ No Comment
-
distributed.comm.timeouts.tcp30s ¶ No Comment
-
distributed.comm.require-encryptionNone ¶ Whether to require encryption on non-local comms
-
distributed.comm.tls.ciphersNone ¶ No Comment
-
distributed.comm.tls.ca-fileNone ¶ Path to a CA file, in pem format
-
distributed.comm.tls.scheduler.certNone ¶ Path to certificate file
-
distributed.comm.tls.scheduler.keyNone ¶ Path to key file. Alternatively, the key can be appended to the cert file above, and this field left blank
-
distributed.comm.tls.worker.keyNone ¶ Path to key file. Alternatively, the key can be appended to the cert file above, and this field left blank
-
distributed.comm.tls.worker.certNone ¶ Path to certificate file
-
distributed.comm.tls.client.keyNone ¶ Path to key file. Alternatively, the key can be appended to the cert file above, and this field left blank
-
distributed.comm.tls.client.certNone ¶ Path to certificate file
Dashboard¶
-
distributed.dashboard.link{scheme}://{host}:{port}/status ¶ The form for the dashboard links This is used wherever we print out the link for the dashboard It is filled in with relevant information like the schema, host, and port number
-
distributed.dashboard.export-toolFalse ¶ No Comment
-
distributed.dashboard.graph-max-items5000 ¶ maximum number of tasks to try to plot in "graph" view
Deploy¶
-
distributed.deploy.lost-worker-timeout15s ¶ Interval after which to hard-close a lost worker job Otherwise we wait for a while to see if a worker will reappear
-
distributed.deploy.cluster-repr-interval500ms ¶ Interval between calls to update cluster-repr for the widget
Scheduler¶
-
distributed.scheduler.allowed-failures3 ¶ The number of retries before a task is considered bad When a worker dies when a task is running that task is rerun elsewhere. If many workers die while running this same task then we call the task bad, and raise a KilledWorker exception. This is the number of workers that are allowed to die before this task is marked as bad.
-
distributed.scheduler.bandwidth100000000 ¶ The expected bandwidth between any pair of workers This is used when making scheduling decisions. The scheduler will use this value as a baseline, but also learn it over time.
-
distributed.scheduler.blocked-handlers[] ¶ A list of handlers to exclude The scheduler operates by receiving messages from various workers and clients and then performing operations based on those messages. Each message has an operation like "close-worker" or "task-finished". In some high security situations administrators may choose to block certain handlers from running. Those handlers can be listed here. For a list of handlers see the `dask.distributed.Scheduler.handlers` attribute.
-
distributed.scheduler.default-data-size1kiB ¶ The default size of a piece of data if we don't know anything about it. This is used by the scheduler in some scheduling decisions
-
distributed.scheduler.events-cleanup-delay1h ¶ The amount of time to wait until workers or clients are removed from the event log after they have been removed from the scheduler
-
distributed.scheduler.idle-timeoutNone ¶ Shut down the scheduler after this duration if no activity has occured This can be helpful to reduce costs and stop zombie processes from roaming the earth.
-
distributed.scheduler.transition-log-length100000 ¶ How long should we keep the transition log Every time a task transitions states (like "waiting", "processing", "memory", "released") we record that transition in a log. To make sure that we don't run out of memory we will clear out old entries after a certain length. This is that length.
-
distributed.scheduler.work-stealingTrue ¶ Whether or not to balance work between workers dynamically Some times one worker has more work than we expected. The scheduler will move these tasks around as necessary by default. Set this to false to disable this behavior
-
distributed.scheduler.work-stealing-interval100ms ¶ How frequently to balance worker loads
-
distributed.scheduler.worker-ttlNone ¶ Time to live for workers. If we don't receive a heartbeat faster than this then we assume that the worker has died.
-
distributed.scheduler.pickleTrue ¶ Is the scheduler allowed to deserialize arbitrary bytestrings? The scheduler almost never deserializes user data. However there are some cases where the user can submit functions to run directly on the scheduler. This can be convenient for debugging, but also introduces some security risk. By setting this to false we ensure that the user is unable to run arbitrary code on the scheduler.
-
distributed.scheduler.preload[] ¶ Run custom modules during the lifetime of the scheduler You can run custom modules when the scheduler starts up and closes down. See https://docs.dask.org/en/latest/setup/custom-startup.html for more information
-
distributed.scheduler.preload-argv[] ¶ Arguments to pass into the preload scripts described above See https://docs.dask.org/en/latest/setup/custom-startup.html for more information
-
distributed.scheduler.unknown-task-duration500ms ¶ Default duration for all tasks with unknown durations Over time the scheduler learns a duration for tasks. However when it sees a new type of task for the first time it has to make a guess as to how long it will take. This value is that guess.
-
distributed.scheduler.default-task-durations.rechunk-split1us ¶ No Comment
-
distributed.scheduler.default-task-durations.shuffle-split1us ¶ No Comment
-
distributed.scheduler.validateFalse ¶ Whether or not to run consistency checks during execution. This is typically only used for debugging.
-
distributed.scheduler.dashboard.status.task-stream-length1000 ¶ The maximum number of tasks to include in the task stream plot
-
distributed.scheduler.dashboard.tasks.task-stream-length100000 ¶ The maximum number of tasks to include in the task stream plot
-
distributed.scheduler.dashboard.tls.ca-fileNone ¶ No Comment
-
distributed.scheduler.dashboard.tls.keyNone ¶ No Comment
-
distributed.scheduler.dashboard.tls.certNone ¶ No Comment
-
distributed.scheduler.dashboard.bokeh-application.allow_websocket_origin['*'] ¶ No Comment
-
distributed.scheduler.dashboard.bokeh-application.keep_alive_milliseconds500 ¶ No Comment
-
distributed.scheduler.dashboard.bokeh-application.check_unused_sessions_milliseconds500 ¶ No Comment
-
distributed.scheduler.locks.lease-validation-interval10s ¶ The time to wait until an acquired semaphore is released if the Client goes out of scope
-
distributed.scheduler.locks.lease-timeout30s ¶ The timeout after which a lease will be released if not refreshed
-
distributed.scheduler.http.routes['distributed.http.scheduler.prometheus', 'distributed.http.scheduler.info', 'distributed.http.scheduler.json', 'distributed.http.health', 'distributed.http.proxy', 'distributed.http.statics'] ¶ A list of modules like "prometheus" and "health" that can be included or excluded as desired These modules will have a ``routes`` keyword that gets added to the main HTTP Server. This is also a list that can be extended with user defined modules.
Worker¶
-
distributed.worker.blocked-handlers[] ¶ A list of handlers to exclude The scheduler operates by receiving messages from various workers and clients and then performing operations based on those messages. Each message has an operation like "close-worker" or "task-finished". In some high security situations administrators may choose to block certain handlers from running. Those handlers can be listed here. For a list of handlers see the `dask.distributed.Scheduler.handlers` attribute.
-
distributed.worker.multiprocessing-methodspawn ¶ How we create new workers, one of "spawn", "forkserver", or "fork" This is passed to the ``multiprocessing.get_context`` function.
-
distributed.worker.use-file-lockingTrue ¶ Whether or not to use lock files when creating workers Workers create a local directory in which to place temporary files. When many workers are created on the same process at once these workers can conflict with each other by trying to create this directory all at the same time. To avoid this, Dask usually used a file-based lock. However, on some systems file-based locks don't work. This is particularly common on HPC NFS systems, where users may want to set this to false.
-
distributed.worker.connections.outgoing50 ¶ No Comment
-
distributed.worker.connections.incoming10 ¶ No Comment
-
distributed.worker.preload[] ¶ Run custom modules during the lifetime of the worker You can run custom modules when the worker starts up and closes down. See https://docs.dask.org/en/latest/setup/custom-startup.html for more information
-
distributed.worker.preload-argv[] ¶ Arguments to pass into the preload scripts described above See https://docs.dask.org/en/latest/setup/custom-startup.html for more information
-
distributed.worker.daemonTrue ¶ Whether or not to run our process as a daemon process
-
distributed.worker.validateFalse ¶ Whether or not to run consistency checks during execution. This is typically only used for debugging.
-
distributed.worker.lifetime.durationNone ¶ The time after creation to close the worker, like "1 hour"
-
distributed.worker.lifetime.stagger0 seconds ¶ Random amount by which to stagger lifetimes If you create many workers at the same time, you may want to avoid having them kill themselves all at the same time. To avoid this you might want to set a stagger time, so that they close themselves with some random variation, like "5 minutes" That way some workers can die, new ones can be brought up, and data can be transferred over smoothly.
-
distributed.worker.lifetime.restartFalse ¶ Do we try to resurrect the worker after the lifetime deadline?
-
distributed.worker.profile.interval10ms ¶ The time between polling the worker threads, typically short like 10ms
-
distributed.worker.profile.cycle1000ms ¶ The time between bundling together this data and sending it to the scheduler This controls the granularity at which people can query the profile information on the time axis.
-
distributed.worker.profile.low-levelFalse ¶ Whether or not to use the libunwind and stacktrace libraries to gather profiling information at the lower level (beneath Python) To get this to work you will need to install the experimental stacktrace library at conda install -c numba stacktrace See https://github.com/numba/stacktrace
-
distributed.worker.memory.target0.6 ¶ Target fraction below which to try to keep memory
-
distributed.worker.memory.spill0.7 ¶ When the process memory (as observed by the operating system) gets above this amount we spill data to disk.
-
distributed.worker.memory.pause0.8 ¶ When the process memory (as observed by the operating system) gets above this amount we no longer start new tasks on this worker.
-
distributed.worker.memory.terminate0.95 ¶ When the process memory reaches this level the nanny process will kill the worker (if a nanny is present)
-
distributed.worker.http.routes['distributed.http.worker.prometheus', 'distributed.http.health', 'distributed.http.statics'] ¶ A list of modules like "prometheus" and "health" that can be included or excluded as desired These modules will have a ``routes`` keyword that gets added to the main HTTP Server. This is also a list that can be extended with user defined modules.
Admin¶
-
distributed.admin.tick.interval20ms ¶ The time between ticks, default 20ms
-
distributed.admin.tick.limit3s ¶ The time allowed before triggering a warning
-
distributed.admin.max-error-length10000 ¶ Maximum length of traceback as text Some Python tracebacks can be very very long (particularly in stack overflow errors) If the traceback is larger than this size (in bytes) then we truncate it.
-
distributed.admin.log-length10000 ¶ Default length of logs to keep in memory The scheduler and workers keep the last 10000 or so log entries in memory.
-
distributed.admin.log-format%(name)s - %(levelname)s - %(message)s ¶ The log format to emit. See https://docs.python.org/3/library/logging.html#logrecord-attributes
-
distributed.admin.pdb-on-errFalse ¶ Enter Python Debugger on scheduling error