Write a SQL solution to output big countries’ name, population and area.
Problem
There is a table World
+-----------------+------------+------------+--------------+---------------+
| name | continent | area | population | gdp |
+-----------------+------------+------------+--------------+---------------+
| Afghanistan | Asia | 652230 | 25500100 | 20343000 |
| Albania | Europe | 28748 | 2831741 | 12960000 |
| Algeria | Africa | 2381741 | 37100000 | 188681000 |
| Andorra | Europe | 468 | 78115 | 3712000 |
| Angola | Africa | 1246700 | 20609294 | 100990000 |
+-----------------+------------+------------+--------------+---------------+
A country is big if it has an area of bigger than 3 million square km or a population of more than 25 million.
Write a SQL solution to output big countries' name, population and area.
For example, according to the above table, we should output:
+--------------+-------------+--------------+
| name | population | area |
+--------------+-------------+--------------+
| Afghanistan | 25500100 | 652230 |
| Algeria | 37100000 | 2381741 |
+--------------+-------------+--------------+
Solution
This is quite an easy one. We need to select the rows which meet either of the two requirements. The keyword OR
or UNION
coud be used to solve the problem.
Solution 1: OR
SELECT name, population, area
FROM World
WHERE area > 3000000 OR population > 25000000
3
Solution 2: UNION
SELECT name, population, area
FROM World
WHERE area > 3000000
UNION
SELECT name, population, area
FROM World
WHERE population > 25000000
In this case, UNION
is faster than OR
. The difference is caused by the use of index, which we will discuss below in details.
Related Knowledge
Index
First, lets look at the definition for index :
“An index is an on-disk structure associated with a table or view that speeds retrieval of rows from the table or view. An index contains keys built from one or more columns in the table or view. These keys are stored in a structure (B-tree) that enables SQL Server to find the row or rows associated with the key values quickly and efficiently.”
Usually, the use of index can boost the performance of query:
“When performing a table scan, the query optimizer reads all the rows in the table, and extracts the rows that meet the criteria of the query. When the query optimizer uses an index, it searches the index key columns, finds the storage location of the rows needed and extracts the matching rows from that location.Generally, searching the index is much faster than searching the table because unlike a table, an index frequently contains very few columns per row and the rows are in sorted order.”
Clustered Index vs. Nonclustered Index
We can classify indexes from different angles:
- By storage type: Clustered Index, Nonclustered Index
- By uniqueness: Unique Index, Normal Index
- By number of keys: Single-column Index, Multi-column Index
Here we only focus on comparing Clustered Index and Nonclustered Index:
Clustered Index
- Sort and store the data rows in the table or view based on their key values(columns included in the index definition).
- There can be only one clustered index per table, because the data rows themselves can be stored in only one order.
- When a table has a clustered index, the table is called a clustered table. If it doesn’t, its data rows are stored in an unordered structure called a heap.
Nonclustered Index
- Have a structure separate from the data rows. A nonclustered index composed of a key value and a pointer(row locator) to the data row that contains the key value.
- The structure of the row locator depends on whether the data pages are stored in a heap or a clustered table. For a heap, a row locator is a pointer to the row. For a clustered table, the row locator is the clustered index key.
- There can be only multiple nonclustered index per table.
Conclusion
Using UNION
is faster when it comes to cases like this, when we need to scan two different columns.
Given that MySQL usually uses one index per table in a given query:
When using OR
, it only uses the 1st index rather than 2nd index, it would still have to do a table-scan to find rows that fit the 2nd index.
When using UNION
, each sub-query can use the index of its search, then the sub-queries are combined.
The performance statistics can be found in here:
Scenario 4: Selecting Clustered index columns for different fields
CPU Reads Duration Row Counts
OR 0 319 366 1228
UNION 0 50 193 1228
Reference
1. Union and OR and the Explanation
2. Whether to use UNION or OR in SQL Server Queries
3. Clustered and Nonclustered Indexes Described
4. MySQL: index and key (Chinese)
近期评论