Analyzing Hits Dataset with Databend
This usecase shows how to analyze the Hits dataset with Databend.
Step 1. Deploy Databend
Make sure you have installed Databend, if not please see:
Step 2. Load hits Datasets
2.1 Create a Databend User
Connect to Databend server with MySQL client:
mysql -h127.0.0.1 -uroot -P3307
Create a user:
CREATE USER user1 IDENTIFIED BY 'abc123';
Grant privileges for the user:
GRANT ALL ON *.* TO user1;
See also How To Create User.
2.2 Create hits Table
2.3 Load Data Into hits Table
hits_1m.tsv.gz
wget --no-check-certificate https://repo.databend.rs/hits/hits_1m.tsv.gz
## If you want to load full version of hits dataset, please download from clickhouse's dataset:
## wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits_1m.csv.gz
Load CSV files into Databend
curl -H "insert_sql:insert into hits file_format = (type = 'TSV')" -F "upload=@./hits_1m.tsv" -XPUT http://user1:abc123@127.0.0.1:8000/v1/streaming_load
Step 3. Queries
Execute Queries:
mysql -h127.0.0.1 -P3307 -uroot
SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits;
Example Queries:
Number | Query |
---|---|
Q1 | SELECT COUNT(*) FROM hits; |
Q2 | SELECT COUNT(*) FROM hits WHERE AdvEngineID <> 0; |
Q3 | SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits; |
Q4 | SELECT AVG(UserID) FROM hits; |
Q5 | SELECT COUNT(DISTINCT UserID) FROM hits; |
Q6 | SELECT COUNT(DISTINCT SearchPhrase) FROM hits; |
Q7 | SELECT MIN(EventDate), MAX(EventDate) FROM hits; |
Q8 | SELECT AdvEngineID, COUNT(*) FROM hits WHERE AdvEngineID <> 0 GROUP BY AdvEngineID ORDER BY COUNT(*) DESC; |
Q9 | SELECT RegionID, COUNT(DISTINCT UserID) AS u FROM hits GROUP BY RegionID ORDER BY u DESC LIMIT 10; |
Q10 | SELECT RegionID, SUM(AdvEngineID), COUNT(*) AS c, AVG(ResolutionWidth), COUNT(DISTINCT UserID) FROM hits GROUP BY RegionID ORDER BY c DESC LIMIT 10; |