url Table Function
url function creates a table from the URL with given format and structure.
url function may be used in SELECT and INSERT queries on data in URL tables.
Syntax
Parameters
| Parameter | Description |
|---|---|
URL | Single quoted HTTP or HTTPS server address, which can accept GET or POST requests (for SELECT or INSERT queries correspondingly). Type: String. |
format | Format of the data. Type: String. |
structure | Table structure in 'UserID UInt64, Name String' format. Determines column names and types. Type: String. |
headers | Headers in 'headers('key1'='value1', 'key2'='value2')' format. You can set headers for HTTP call. |
Returned value
A table with the specified format and structure and with data from the defined URL.
Examples
Getting the first 3 lines of a table that contains columns of String and UInt32 type from HTTP-server which answers in CSV format.
Inserting data from a URL into a table:
Globs in URL
Patterns in { } are used to generate a set of shards or to specify failover addresses. Supported pattern types and examples see in the description of the remote function.
Character | inside patterns is used to specify failover addresses. They are iterated in the same order as listed in the pattern. The number of generated addresses is limited by glob_expansion_max_elements setting.
Virtual Columns
_path— Path to theURL. Type:LowCardinality(String)._file— Resource name of theURL. Type:LowCardinality(String)._size— Size of the resource in bytes. Type:Nullable(UInt64). If the size is unknown, the value isNULL._time— Last modified time of the file. Type:Nullable(DateTime). If the time is unknown, the value isNULL._headers- HTTP response headers. Type:Map(LowCardinality(String), LowCardinality(String)).
use_hive_partitioning setting
When setting use_hive_partitioning is set to 1, ClickHouse will detect Hive-style partitioning in the path (/name=value/) and will allow to use partition columns as virtual columns in the query. These virtual columns will have the same names as in the partitioned path.
Example
Use virtual column, created with Hive-style partitioning
Resolving relative URLs
The url_base setting allows passing a relative URL to the url function. When url_base is set and the function argument is a relative reference, it is resolved against the base URL per RFC 3986.
The resolution rules are:
- Path-relative (e.g.
data.csv): merged with the base URL path — everything after the last/of the base path is replaced. The trailing slash matters:https://example.com/dir/+data.csvgiveshttps://example.com/dir/data.csv, buthttps://example.com/dir+data.csvgiveshttps://example.com/data.csv. Dot segments (./and../) are normalized. - Host-relative (e.g.
/test/data.csv): resolved using the scheme and host of the base URL. - Scheme-relative (e.g.
//other.com/test/data.csv): resolved using the scheme of the base URL. - Query-only (e.g.
?x=1): appended to the full base path, replacing any existing query or fragment. - Fragment-only (e.g.
#frag): appended to the base URL, preserving the query, replacing any existing fragment. - Empty: returns the base URL without fragment.
- Absolute URL: passed through unchanged;
url_baseis ignored.
Example
Storage Settings
- engine_url_skip_empty_files - allows to skip empty files while reading. Disabled by default.
- enable_url_encoding - allows to enable/disable decoding/encoding path in uri. Enabled by default.
- url_base - base URL for resolving relative URLs passed to the
urlfunction.
Permissions
url function requires CREATE TEMPORARY TABLE permission. As such - it'll not work for users with readonly = 1 setting. At least readonly = 2 is required.