Logo Citus Docs
9.1

Get Started

  • What is Citus?
    • How Far Can Citus Scale?
  • When to Use Citus
    • Multi-Tenant Database
    • Real-Time Analytics
    • Considerations for Use
    • When Citus is Inappropriate
  • Quick Tutorials
    • Multi-tenant Applications
      • Data model and sample data
      • Creating tables
      • Distributing tables and loading data
      • Running queries
    • Real-time Analytics
      • Data model and sample data
      • Creating tables
      • Distributing tables and loading data
      • Running queries

Install

  • Single-Machine Cluster
    • Docker (Mac or Linux)
    • Ubuntu or Debian
    • Fedora, CentOS, or Red Hat
  • Multi-Machine Cluster
    • Ubuntu or Debian
      • Steps to be executed on all nodes
      • Steps to be executed on the coordinator node
    • Fedora, CentOS, or Red Hat
      • Steps to be executed on all nodes
      • Steps to be executed on the coordinator node
  • Managed Deployment

Use-Case Guides

  • Multi-tenant Applications
    • Let’s Make an App – Ad Analytics
    • Scaling the Relational Data Model
    • Preparing Tables and Ingesting Data
      • Try it Yourself
    • Integrating Applications
    • Sharing Data Between Tenants
    • Online Changes to the Schema
    • When Data Differs Across Tenants
    • Scaling Hardware Resources
    • Dealing with Big Tenants
    • Where to Go From Here
  • Real-Time Dashboards
    • Data Model
    • Rollups
    • Expiring Old Data
    • Approximate Distinct Counts
    • Unstructured Data with JSONB
  • Timeseries Data
    • Scaling Timeseries Data on Citus
    • Automating Partition Creation

Architecture

  • Concepts
    • Nodes
      • Coordinator and Workers
    • Distributed Data
      • Table Types
        • Type 1: Distributed Tables
        • Type 2: Reference Tables
        • Type 3: Local Tables
      • Shards
        • Shard Placements
      • Co-Location
      • Parallelism
  • Citus MX
    • Data Access
    • Scaling Out a Raw Events Table
    • MX Limitations
      • Supported only via coordinator
      • Other query limitations

Develop

  • Determining Application Type
    • At a Glance
    • Examples and Characteristics
  • Choosing Distribution Column
    • Multi-Tenant Apps
      • Best Practices
    • Real-Time Apps
      • Best Practices
    • Timeseries Data
      • Best Practices
    • Table Co-Location
      • Data co-location in Citus for hash-distributed tables
      • A practical example of co-location
      • Using Regular PostgreSQL Tables
      • Distributing tables by ID
      • Distributing tables by tenant
      • Co-location means better feature support
      • Query Performance
  • Migrating an Existing App
    • Identify Distribution Strategy
      • Pick distribution key
      • Identify types of tables
    • Prepare Source Tables for Migration
      • Add distribution keys
      • Backfill newly created columns
    • Prepare Application for Citus
      • Set up Development Citus Cluster
        • Include distribution column in keys
      • Add distribution key to queries
        • Ruby on Rails
        • Django
        • ASP.NET
        • Java Hibernate
        • Other (SQL Principles)
      • Enable Secure Connections
      • Check for cross-node traffic
    • Migrate Production Data
      • Small Database Migration
      • Big Database Migration
        • Duplicate schema
        • Enable logical replication
        • Open access for network connection
        • Begin Replication
        • Switch over to Citus and stop all connections to old database
  • SQL Reference
    • Creating and Modifying Distributed Tables (DDL)
      • Creating And Distributing Tables
        • Reference Tables
        • Distributing Coordinator Data
      • Co-Locating Tables
        • Upgrading from Citus 5.x
      • Dropping Tables
      • Modifying Tables
        • Adding/Modifying Columns
        • Adding/Removing Constraints
        • Using NOT VALID Constraints
        • Adding/Removing Indices
        • Manual Modification
    • Ingesting, Modifying Data (DML)
      • Inserting Data
        • “From Select” Clause (Distributed Rollups)
        • COPY Command (Bulk load)
    • Caching Aggregations with Rollups
      • Updates and Deletion
      • Maximizing Write Performance
    • Querying Distributed Tables (SQL)
      • Aggregate Functions
        • Count (Distinct) Aggregates
        • Estimating Top N Items
      • Limit Pushdown
      • Views on Distributed Tables
      • Joins
        • Co-located joins
        • Reference table joins
        • Repartition joins
    • Query Processing
      • Distributed Query Planner
      • Distributed Query Executor
        • Adaptive Executor
        • Task Tracker Executor
        • Subquery/CTE Push-Pull Execution
      • PostgreSQL planner and executor
    • Manual Query Propagation
      • Running on all Workers
      • Running on all Shards
      • Running on all Placements
      • Limitations
    • SQL Support and Workarounds
      • Workarounds
        • JOIN a local and a distributed table
        • Temp Tables: the Workaround of Last Resort
  • Citus API
    • Citus Utility Functions
      • Table and Shard DDL
        • create_distributed_table
        • create_reference_table
        • upgrade_to_reference_table
        • mark_tables_colocated
        • create_distributed_function
        • master_create_distributed_table
        • master_create_worker_shards
        • master_create_empty_shard
      • Table and Shard DML
        • master_append_table_to_shard
        • master_apply_delete_command
        • master_modify_multiple_shards
      • Metadata / Configuration Information
        • master_add_node
        • master_update_node
        • master_set_node_property
        • master_add_inactive_node
        • master_activate_node
        • master_disable_node
        • master_add_secondary_node
        • master_remove_node
        • master_get_active_worker_nodes
        • master_get_table_metadata
        • get_shard_id_for_distribution_column
        • column_to_column_name
        • citus_relation_size
        • citus_table_size
        • citus_total_relation_size
        • citus_stat_statements_reset
      • Cluster Management And Repair Functions
        • master_copy_shard_placement
        • master_move_shard_placement
        • rebalance_table_shards
        • get_rebalance_progress
        • master_drain_node
        • replicate_table_shards
        • isolate_tenant_to_new_shard
        • citus_create_restore_point
    • Citus Tables and Views
      • Coordinator Metadata
        • Partition table
        • Shard table
        • Shard placement table
        • Worker node table
        • Distributed object table
        • Co-location group table
        • Query statistics table
        • Distributed Query Activity
      • Tables on all Nodes
        • Connection Credentials Table
        • Connection Pooling Credentials
        • Shards and Indices on MX Workers
    • Configuration Reference
      • General configuration
        • citus.max_worker_nodes_tracked (integer)
        • citus.use_secondary_nodes (enum)
        • citus.cluster_name (text)
        • citus.enable_version_checks (boolean)
        • citus.log_distributed_deadlock_detection (boolean)
        • citus.distributed_deadlock_detection_factor (floating point)
        • citus.node_conninfo (text)
        • citus.override_table_visibility (boolean)
      • Query Statistics
        • citus.stat_statements_purge_interval (integer)
        • citus.stat_statements_max (integer)
      • Data Loading
        • citus.multi_shard_commit_protocol (enum)
        • citus.shard_replication_factor (integer)
        • citus.shard_count (integer)
        • citus.shard_max_size (integer)
      • Planner Configuration
        • citus.limit_clause_row_fetch_count (integer)
        • citus.count_distinct_error_rate (floating point)
        • citus.task_assignment_policy (enum)
      • Intermediate Data Transfer
        • citus.binary_worker_copy_format (boolean)
        • citus.binary_master_copy_format (boolean)
        • citus.max_intermediate_result_size (integer)
      • DDL
        • citus.enable_ddl_propagation (boolean)
      • Executor Configuration
        • General
        • Adaptive executor configuration
        • Task tracker executor configuration
        • Real-time executor configuration (deprecated)
        • Explain output
    • Append Distribution
      • Creating and Distributing Tables
      • Expiring Data
      • Deleting Data
      • Dropping Tables
      • Data Loading
        • Bulk load using \copy
        • Incremental loads by appending to existing shards
        • Increasing data loading performance
      • Scaling Data Ingestion
        • Coordinator Node Bulk Ingestion (100k/s-200k/s)
        • Worker Node Bulk Ingestion (100k/s-1M/s)
        • Pre-processing Data in Citus
  • External Integrations
    • Ingesting Data from Kafka
      • Caveats
    • Ingesting Data from Spark
    • Business Intelligence with Tableau

Administer

  • Cluster Management
    • Choosing Cluster Size
      • Shard Count
        • Multi-Tenant SaaS Use-Case
        • Real-Time Analytics Use-Case
    • Initial Hardware Size
      • Multi-Tenant SaaS Use-Case
      • Real-Time Analytics Use-Case
    • Scaling the cluster
      • Add a worker
      • Rebalance Shards without Downtime
        • How it Works
      • Adding a coordinator
    • Dealing With Node Failures
      • Worker Node Failures
      • Coordinator Node Failures
    • Tenant Isolation
    • Viewing Query Statistics
      • Statistics Expiration
    • Resource Conservation
      • Limiting Long-Running Queries
    • Security
      • Connection Management
      • Increasing Worker Security
      • Row-Level Security
    • PostgreSQL extensions
    • Creating a New Database
  • Table Management
    • Determining Table and Relation Size
    • Vacuuming Distributed Tables
    • Analyzing Distributed Tables
  • Upgrading Citus
    • Upgrading Citus Versions
      • Patch Version Upgrade
      • Major and Minor Version Upgrades
        • Step 1. Update Citus Package
        • Step 2. Apply Update in DB
    • Upgrading PostgreSQL version from 11 to 12
      • For Every Node

Troubleshoot

  • Query Performance Tuning
    • Table Distribution and Shards
    • PostgreSQL tuning
    • Scaling Out Performance
    • Distributed Query Performance Tuning
      • General
      • Subquery/CTE Network Overhead
      • Advanced
        • Task Assignment Policy
        • Intermediate Data Transfer Format
        • Real Time Executor
        • Task Tracker Executor
    • Scaling Out Data Ingestion
      • Real-time Insert and Updates
        • Insert Throughput
        • Update Throughput
        • Insert and Update: Throughput Checklist
        • Insert and Update: Latency
      • Staging Data Temporarily
      • Bulk Copy (250K - 2M/s)
      • Citus MX (50k/s-500k/s)
  • Useful Diagnostic Queries
    • Finding which shard contains data for a specific tenant
    • Finding the distribution column for a table
    • Detecting locks
    • Querying the size of your shards
    • Querying the size of all distributed tables
    • Determining Replication Factor per Table
    • Identifying unused indices
    • Monitoring client connection count
    • Index hit rate
  • Common Error Messages
    • Relation foo is not distributed
      • Resolution
    • Could not receive query results
      • Resolution
    • Canceling the transaction since it was involved in a distributed deadlock
      • Resolution
    • Could not connect to server: Cannot assign requested address
      • Resolution
    • SSL error: certificate verify failed
      • Resolution
    • Could not connect to any active placements
      • Resolution
    • Remaining connection slots are reserved for non-replication superuser connections
      • Resolution
    • PgBouncer cannot connect to server
      • Resolution
    • Unsupported clause type
      • Resolution
    • Cannot open new connections after the first modification command within a transaction
      • Resolution
    • Cannot create uniqueness constraint
      • Resolution
    • Function create_distributed_table does not exist
      • Resolution
    • STABLE functions used in UPDATE queries cannot be called with column references
      • Resolution

FAQ

  • Frequently Asked Questions
    • Can I create primary keys on distributed tables?
    • How do I add nodes to an existing Citus cluster?
    • How does Citus handle failure of a worker node?
    • How does Citus handle failover of the coordinator node?
    • How do I ingest the results of a query into a distributed table?
    • Can I join distributed and non-distributed tables together in the same query?
    • Are there any PostgreSQL features not supported by Citus?
    • How do I choose the shard count when I hash-partition my data?
    • How do I change the shard count for a hash partitioned table?
    • How does citus support count(distinct) queries?
    • In which situations are uniqueness constraints supported on distributed tables?
    • How do I create database roles, functions, extensions etc in a Citus cluster?
    • What if a worker node’s address changes?
    • Which shard contains data for a particular tenant?
    • I forgot the distribution column of a table, how do I find it?
    • Can I distribute a table by multiple keys?
    • Why does pg_relation_size report zero bytes for a distributed table?
    • Why am I seeing an error about max_intermediate_result_size?
    • Can I run Citus on Microsoft Azure?
    • Can I run Citus on Amazon RDS?
    • What is the state of Citus on AWS?
    • Can I create a new DB in a Citus Cloud instance?
    • Can I shard by schema on Citus for multi-tenant applications?
    • How does cstore_fdw work with Citus?
    • What happened to pg_shard?

Citus Cloud

  • Overview
    • Provisioning
      • Configuring Your Plan
      • Supported Regions
    • Connecting
      • Connecting Directly to a Worker
  • Manage
    • Scaling
      • Scaling Up (increasing node size)
      • Scaling Out (adding new nodes)
        • Rebalancing
      • Scaling Connections (pgBouncer)
    • Monitoring
      • Cloud Platform Status
      • Resources Usage
        • Amazon EBS Volume Metrics
        • CPU and Network
        • PostgreSQL Write-Ahead Log
      • Formation Events Feed
      • StatsD external reporting
        • Example: Datadog with statsd
      • VividCortex External Monitoring
    • Security
      • Connecting with SSL
        • Two-Factor Authentication
      • Users and Permissions
        • Granting Privileges in Bulk
        • Granting Access to Other Schemas
      • Encryption at Rest
      • Network Perimeter Controls
        • VPC Peering
        • IP Whitelisting
    • Backup, Availability, and Replication
      • High-Availability (HA) Replication
      • Disaster Recovery (DR)
      • Comparison of HA and DR
        • Disaster Recovery takes a little extra work but gives greater reliability
        • Trade-offs between latency and reliability for backups
        • How High Availability and Disaster Recovery is used for crash recovery
      • Point-in-Time Recovery
    • Formation API
      • Authentication
      • Create Formation
      • List Formations
      • Formation Details
      • Destroy Formation
    • Upgrades
    • Logging
      • What Is Logged
      • Recent Logs
      • External Log Destinations
        • Verified Provider Settings
  • Additional Features
    • Extensions
    • Forking
      • How to Fork a Formation
      • When is it Useful
      • How it Works Internally
    • Followers
    • Custom PostgreSQL Configuration
  • Support and Billing
    • Support
    • Billing
      • Pricing
      • Payments

Articles

  • Related Articles
    • Efficient Rollup Tables with HyperLogLog in Postgres
      • Rollup tables without HLL—using GitHub events data as an example
      • Without HLL, rollup tables have a few limitations
      • HLL to the rescue
      • HLL and rollup tables in action, together
      • What kinds of queries can HLL answer?
      • A rollup table with HLL is worth a thousand rollup tables without HLL
      • Want to learn more about HLL in Postgres?
    • Distributed Distinct Count with HyperLogLog on Postgres
      • What does HLL do behind the curtains?
        • Hash all elements
        • Observe the data for rare patterns
        • Stochastic Averaging
        • More?
      • HLL in distributed systems
      • Hands on with HLL
        • Setup
        • Examples
        • Conclusion
    • Postgres Parallel Indexing in Citus
    • Real-time Event Aggregation at Scale Using Postgres with Citus
    • How Distributed Outer Joins on PostgreSQL with Citus Work
      • Distributed Outer Joins with Citus
    • Designing your SaaS Database for Scale with Postgres
    • Building a Scalable Postgres Metrics Backend using the Citus Extension
      • Time-Series Metrics
      • Events
    • Sharding a Multi-Tenant App with Postgres
      • Tenancy
      • Multi-tenancy and co-location, a perfect pair
      • In conclusion
    • Sharding Postgres with Semi-Structured Data and Its Performance Implications
      • One large table, without joins
      • Enter Citus
      • The query workload
      • Every distribution has its thorns
    • Scalable Real-time Product Search using PostgreSQL with Citus
Citus Docs
  • Docs »
  • SQL Reference

SQL Reference¶

  • Creating and Modifying Distributed Tables (DDL)
    • Creating And Distributing Tables
    • Co-Locating Tables
    • Dropping Tables
    • Modifying Tables
  • Ingesting, Modifying Data (DML)
    • Inserting Data
  • Caching Aggregations with Rollups
    • Updates and Deletion
    • Maximizing Write Performance
  • Querying Distributed Tables (SQL)
    • Aggregate Functions
    • Limit Pushdown
    • Views on Distributed Tables
    • Joins
  • Query Processing
    • Distributed Query Planner
    • Distributed Query Executor
    • PostgreSQL planner and executor
  • Manual Query Propagation
    • Running on all Workers
    • Running on all Shards
    • Running on all Placements
    • Limitations
  • SQL Support and Workarounds
    • Workarounds

This site uses cookies for analytics, personalized content and ads. By continuing to browse this site, you agree to this use. Learn more.

Next Previous

© Copyright 2020, Citus Data, a Microsoft Company