Prerequisites & VPS Selection
ClickHouse is a high-performance, open-source columnar database management system designed for online analytical processing (OLAP). Developed by Yandex, it excels at processing analytical queries on massive datasets with exceptional speed.
Before starting, ensure you have:
- Ubuntu 22.04 LTS (recommended) or 24.04 LTS
- Root or sudo access to your server
- At least 50GB of available disk space (SSD recommended)
- Basic familiarity with Linux command line
Initial Server Setup
Connect to your VPS via SSH and update the system:
apt update && apt upgrade -yapt install -y curl wget gnupg2 apt-transport-https ca-certificates software-properties-commontimedatectl set-timezone America/New_YorkInstall ClickHouse
Add the official ClickHouse repository and install:
curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpgecho "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main" | tee /etc/apt/sources.list.d/clickhouse.listapt update
apt install -y clickhouse-server clickhouse-client💡 Tip: During installation, you'll be prompted to set a password for the default user. Choose a strong password and save it securely.
systemctl enable clickhouse-server
systemctl start clickhouse-serversystemctl status clickhouse-serverBasic Configuration
Configure ClickHouse for network access and security. Configuration files are in /etc/clickhouse-server/.
Configure Network Access
By default, ClickHouse only listens on localhost. To allow remote connections:
nano /etc/clickhouse-server/config.d/network.xml<clickhouse>
<listen_host>::</listen_host>
<listen_host>0.0.0.0</listen_host>
</clickhouse>Set Default User Password
nano /etc/clickhouse-server/users.d/default-password.xmlFor better security, use SHA256 hashed passwords:
echo -n 'YOUR_STRONG_PASSWORD' | sha256sum<clickhouse>
<users>
<default>
<password_sha256_hex>your_generated_hash</password_sha256_hex>
</default>
</users>
</clickhouse>Configure Data Directory (Optional)
mkdir -p /data/clickhouse
chown clickhouse:clickhouse /data/clickhouse<clickhouse>
<path>/data/clickhouse/</path>
</clickhouse>systemctl restart clickhouse-serverTesting the Installation
Connect to ClickHouse and verify the installation:
clickhouse-client --passwordSELECT version();
SELECT now();
SHOW DATABASES;ClickHouse client version X.X.X.X (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version X.X.X.
your-server :)EXIT;Firewall Configuration
Secure your ClickHouse installation with UFW:
apt install -y ufw
# Set default policies
ufw default deny incoming
ufw default allow outgoing
# Allow SSH first!
ufw allow 22/tcp
# Allow ClickHouse HTTP interface from specific IPs
ufw allow from YOUR_IP_ADDRESS to any port 8123 proto tcp
# Allow ClickHouse native protocol from specific IPs
ufw allow from YOUR_IP_ADDRESS to any port 9000 proto tcp
# Enable firewall
ufw enable⚠️ Important: Replace YOUR_IP_ADDRESS with your actual IP. Allowing public access to ClickHouse ports is a security risk!
Creating Your First Database
Create a database and table using the powerful MergeTree engine:
CREATE DATABASE analytics;
USE analytics;CREATE TABLE events (
event_date Date,
event_time DateTime,
user_id UInt32,
event_type String,
url String,
response_time UInt32
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, event_time, user_id);INSERT INTO events VALUES
('2024-01-15', '2024-01-15 10:30:00', 12345, 'page_view', '/home', 150),
('2024-01-15', '2024-01-15 10:31:00', 12345, 'click', '/products', 80),
('2024-01-15', '2024-01-15 10:32:00', 67890, 'page_view', '/about', 120);SELECT * FROM events;
-- Analytical query
SELECT
event_type,
count() AS event_count,
avg(response_time) AS avg_response
FROM events
GROUP BY event_type
ORDER BY event_count DESC;Performance Optimization
Tune ClickHouse for optimal performance based on your VPS resources.
Memory Configuration (8GB RAM example)
nano /etc/clickhouse-server/config.d/memory.xml<clickhouse>
<max_server_memory_usage>6442450944</max_server_memory_usage>
<max_memory_usage>4294967296</max_memory_usage>
</clickhouse>Query Complexity Limits
nano /etc/clickhouse-server/users.d/query-limits.xml<clickhouse>
<profiles>
<default>
<max_execution_time>60</max_execution_time>
<max_rows_to_read>1000000000</max_rows_to_read>
<max_bytes_to_read>10000000000</max_bytes_to_read>
</default>
</profiles>
</clickhouse>Enable Query Logging
<clickhouse>
<query_log>
<database>system</database>
<table>query_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log>
</clickhouse>SSD/NVMe Optimization
<clickhouse>
<merge_tree>
<parts_to_delay_insert>150</parts_to_delay_insert>
<parts_to_throw_insert>300</parts_to_throw_insert>
<max_parts_in_total>100000</max_parts_in_total>
</merge_tree>
</clickhouse>systemctl restart clickhouse-serverSecurity Hardening
Create application-specific users instead of using the default user:
Create Application User
<clickhouse>
<users>
<app_user>
<password_sha256_hex>your_hashed_password</password_sha256_hex>
<networks>
<ip>APP_SERVER_IP/32</ip>
</networks>
<profile>default</profile>
<quota>default</quota>
<databases>
<database>analytics</database>
</databases>
</app_user>
</users>
</clickhouse>Create Read-Only User (for BI Tools)
<clickhouse>
<users>
<readonly_user>
<password_sha256_hex>your_hashed_password</password_sha256_hex>
<networks>
<ip>BI_TOOL_IP/32</ip>
</networks>
<profile>readonly</profile>
<quota>default</quota>
</readonly_user>
</users>
<profiles>
<readonly>
<readonly>1</readonly>
</readonly>
</profiles>
</clickhouse>Enable HTTPS
mkdir -p /etc/clickhouse-server/certs
openssl req -newkey rsa:2048 -nodes -keyout /etc/clickhouse-server/certs/server.key -x509 -days 365 -out /etc/clickhouse-server/certs/server.crt
chown -R clickhouse:clickhouse /etc/clickhouse-server/certs<clickhouse>
<https_port>8443</https_port>
<openSSL>
<server>
<certificateFile>/etc/clickhouse-server/certs/server.crt</certificateFile>
<privateKeyFile>/etc/clickhouse-server/certs/server.key</privateKeyFile>
<verificationMode>none</verificationMode>
<loadDefaultCAFile>true</loadDefaultCAFile>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
</server>
</openSSL>
</clickhouse>Backup and Recovery
Set up automated backups to protect your data:
mkdir -p /backup/clickhouse
chown clickhouse:clickhouse /backup/clickhousenano /usr/local/bin/clickhouse-backup.sh#!/bin/bash
BACKUP_DIR="/backup/clickhouse"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=7
# Create backup
clickhouse-client --query="BACKUP DATABASE analytics TO File('/backup/clickhouse/analytics_${DATE}')" --password=YOUR_PASSWORD
# Clean old backups
find ${BACKUP_DIR} -name "analytics_*" -type d -mtime +${RETENTION_DAYS} -exec rm -rf {} \;
echo "Backup completed: analytics_${DATE}"chmod +x /usr/local/bin/clickhouse-backup.sh
# Add to crontab
crontab -e
# Add this line for daily 2 AM backups:
0 2 * * * /usr/local/bin/clickhouse-backup.sh >> /var/log/clickhouse-backup.log 2>&1Using clickhouse-backup Tool
wget https://github.com/Altinity/clickhouse-backup/releases/latest/download/clickhouse-backup-linux-amd64.tar.gz
tar -xzvf clickhouse-backup-linux-amd64.tar.gz
mv clickhouse-backup /usr/local/bin/
chmod +x /usr/local/bin/clickhouse-backup# Create backup
clickhouse-backup create
# List backups
clickhouse-backup list
# Restore a backup
clickhouse-backup restore BACKUP_NAMEMonitoring and Maintenance
Monitor ClickHouse performance and maintain optimal operation:
SELECT
formatReadableSize(total_memory_usage) AS memory,
formatReadableSize(total_bytes) AS disk
FROM system.metrics
WHERE metric LIKE '%memory%' OR metric LIKE '%bytes%';SELECT
query_start_time,
query_duration_ms,
read_rows,
formatReadableSize(read_bytes) AS read_size,
query
FROM system.query_log
WHERE type = 'QueryFinish'
ORDER BY query_start_time DESC
LIMIT 10;SELECT
query_duration_ms,
read_rows,
formatReadableSize(read_bytes) AS data_read,
query
FROM system.query_log
WHERE type = 'QueryFinish'
AND query_duration_ms > 1000
ORDER BY query_duration_ms DESC
LIMIT 10;Maintenance Tasks
OPTIMIZE TABLE analytics.events FINAL;SELECT
database,
table,
formatReadableSize(sum(bytes)) AS size,
sum(rows) AS rows
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes) DESC;Log Rotation
nano /etc/logrotate.d/clickhouse-server/var/log/clickhouse-server/*.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
create 0640 clickhouse clickhouse
sharedscripts
postrotate
systemctl reload clickhouse-server > /dev/null 2>&1 || true
endscript
}Connecting Applications
Connect your applications to ClickHouse using various clients:
Troubleshooting
Best Practices
1. Partitioning: Always partition large tables by date or logical dimension
2. Ordering Key: Choose ORDER BY columns based on query patterns
3. Data Types: Use specific types (UInt32 vs UInt64 when possible)
4. Materialized Views: Create for frequently aggregated data
5. Regular Optimization: Schedule OPTIMIZE TABLE during low-traffic
6. Monitoring: Regularly review query logs and system metrics
7. Backups: Maintain automated, tested backup procedures
8. Security: Strong passwords, limited network access, role-specific users
9. Resources: Monitor memory and disk usage, adjust limits
10. Updates: Keep ClickHouse updated for performance and security
