cro-http-client
Cro::HTTP::Client
The Cro::HTTP::Client
class provides a flexible asynchronous HTTP and
HTTPS client, scaling from simple to more complex use cases. It can be
consumed in two ways:
By making calls on the type object (
Cro::HTTP::Client.get($url)
). This is good for one-off requests, but does not provide connection re-use when making multiple requests to the same server (such as by using HTTP/1.1 persistent connections or HTTP/2.0 mutliplexing).By making an instance of
Cro::HTTP::Client
. By default, this enables re-use of a pool of connections (HTTP/1.1) or multiplexing (HTTP/2.0). It may also be configured with a default base URL, default authorization data to pass along, and even middleware to insert into the request/response processing pipeline. An instance ofCro::HTTP::Client
may be used concurrently.
In general, if you are going to make a one-off request, just use the type object. If you are going to make many requests to the same server or set of servers, make an instance.
By default, a HTTPS request will use ALPN to negotiate whether to do HTTP/2 or HTTP/1.1, and a HTTP request will always use HTTP/1.1.
Making basic requests
The get
, post
, put
, delete
, patch
and head
methods may be called on either
the type object or an instance of Cro::HTTP::Client
. They will all return a
Promise
, which will be kept if the request is successful or broken if an
error occurs.
my $resp = await Cro::HTTP::Client.get('https://www.raku.org/');
The response will be provided as a Cro::HTTP::Response
object. It will be
produced as soon as the request headers are available; the body may not yet
have been received.
If making an instance of Cro::HTTP::Client
, a base URI may be specified.
The URI passed to request methods will be appended to the base URI using
the relative URI resolution algorithm.
my $client = Cro::HTTP::Client.new(base-uri => "http://foo.com/some/path/");
await $client.get('added'); # http://foo.com/some/path/added
await $client.get('/rooted'); # http://foo.com/rooted
await $client.get('http://bar.com/'); # http://bar.com
Error handling
By default, error responses (4xx and 5xx status codes) will result in an
exception that does the X::Cro::HTTP::Error
role. Such exceptions have a
response
property containing the Cro::HTTP::Response
object.
my $resp = await Cro::HTTP::Client.delete($product-url);
CATCH {
when X::Cro::HTTP::Error {
if .response.status == 404 {
say "Product not found!";
}
else {
say "Unexpected error: $_";
}
}
}
The actual exception type will be either X::Cro::HTTP::Error::Client
for
4xx errors, and X::Cro::HTTP::Error::Server
for 5xx errors (which is useful
when setting up retries that should distinguish server errors from client
errors).
The exception also has a request
property, which provides access to the
Cro::HTTP::Request
that was sent.
my $resp = await Cro::HTTP::Client.get($url);
CATCH {
when X::Cro::HTTP::Error {
say "Problem fetching " ~ .request.target;
}
}
This method simply delegates to .response.request
, since each response
object has the request that was sent attached to it. In the event of a
redirect, the request object will be that of the redirected request, not the
originally sent request.
Setting the user agent
By default, Cro::HTTP::Client
sends a User-agent
header with the value Cro
.
This can be done at the request level:
my $resp = await Cro::HTTP::Client.get: 'example.com',
user-agent => 'MyCrawler v42';
Or set at construction time when making an instance of the client, in which case it will be used for all requests (unless overridden in a specific request):
my $client = Cro::HTTP::Client.new:
user-agent => 'MyCrawler v42';
To suppress sending a User-agent
header, pass either False
, Nil
, or the
empty string.
Adding extra request headers
One or more headers can be set for a request by passing an array to the
headers
named argument. It may contain either Pair
objects, instances
of Cro::HTTP::Header
, or a mix of the two.
my $resp = await Cro::HTTP::Client.get: 'example.com',
headers => [
referer => 'http://anotherexample.com',
Cro::HTTP::Header.new(
name => 'X-MyCustomHeader',
value => 'pancake'
)
];
If the headers should be added to all requests, they can be set by default at construction time:
my $client = Cro::HTTP::Client.new:
headers => [
X-MyCustomHeader => 'strudel'
];
Adding query string parameters
Query string parameters can be added to the request by passing the query
named
argument. It can be either a list of Pair
s or a Hash
. Both keys and values
will be encoded.
my $resp = await Cro::HTTP::Client.get: 'we.love.pand.as/pandas/search',
query => { location => 'Chengdu', maxweight => 90 };
Setting the request body
To give the request a body, pass the body
named argument. The content-type
named argument should typically be passed too, to indicate the type of the
body. For example, a request with a JSON body can be sent as:
my %panda = name => 'Bao Bao', eats => 'bamboo';
my $resp = await Cro::HTTP::Client.post: 'we.love.pand.as/pandas',
content-type => 'application/json',
body => %panda;
If writing a client for a JSON API, it may become tedious to set the content type on every request. In this case, it can be set when constructing an instance of the client, and used by default (note that it will only be used if a body is set):
# Configure with JSON content type.
my $client = Cro::HTTP::Client.new: content-type => 'application/json';
# And later get it added by default.
my %panda = name => 'Bao Bao', eats => 'bamboo';
my $resp = await $client.post: 'we.love.pand.as/pandas', body => %panda;
The Cro::HTTP::Client
class uses a Cro::BodySerializer
in order to
serialize request bodies for sending. Besides JSON, there are body parsers
encoding and sending a Str
:
my $resp = await Cro::HTTP::Client.post: 'we.love.pand.as/facts',
content-type => 'text/plain; charset=UTF-8',
body => "99% of a Panda's diet consists of bamboo";
A Blob
:
my $resp = await Cro::HTTP::Client.put: 'we.love.pand.as/images/baobao.jpg',
content-type => 'image/jpeg',
body => slurp('baobao.jpg', :bin);
Form data formatted according to application/x-www-form-urlencoded
(this is
the default in a web browser):
my $resp = await Cro::HTTP::Client.post: 'we.love.pand.as/pandas',
content-type => 'application/x-www-form-urlencoded',
# Can use a Hash; an Array of Pair allows multiple values per name
body => [
name => 'Bao Bao',
eats => 'bamboo'
];
Or form data formatted according to multipart/form-data
(this is used in web
browsers for forms that contain file uploads):
my $resp = await Cro::HTTP::Client.post: 'we.love.pand.as/pandas',
content-type => 'multipart/form-data',
body => [
# Simple pairs for simple form values
name => 'Bao Bao',
eats => 'bamboo',
# For file uploads, make a part object
Cro::HTTP::Body::MultiPartFormData::Part.new(
headers => [Cro::HTTP::Header.new(
name => 'Content-type',
value => 'image/jpeg'
)],
name => 'photo',
filename => 'baobao.jpg',
body-blob => slurp('baobao.jpg', :bin)
)
];
To replace the set of body serializers that a client will use, pass an array
of them when constructing an instance of Cro::HTTP::Client
using the
body-serializers
named argument:
use Cro::HTTP::BodySerializers;
my $client = Cro::HTTP::Client.new:
body-serializers => [
Cro::HTTP::BodySerializer::JSON,
My::BodySerializer::XML
];
To instead retain the existing set of body serializers and add some new ones
(which will have higher precedence), use add-body-serializers
:
my $client = Cro::HTTP::Client.new:
add-body-serializers => [ My::BodySerializer::XML ];
It is also possible to have the body come from a stream of bytes by passing a
Supply
to body-byte-stream
.
my $resp = await Cro::HTTP::Client.post: 'example.com/incoming',
content-type => 'application/octet-stream',
body-byte-stream => $supply;
The body
and body-byte-stream
arguments cannot be used together; trying to
do so will result in a X::Cro::HTTP::Client::BodyAlreadySet
exception.
Getting the response body
The response body is always provided asynchronously, either by a Promise
(if
requesting the enitre body) or a Supply
(when the body is to be delivered as
it arrives).
The body
method returns a Promise
that will be kept when the body has
been received and parsed.
my $resp = await Cro::HTTP::Client.get($some-json-api-url);
my $json = await $resp.body;
The body
method will offer the response to each available body parser, and
returns a Promise
that will be kept when the first applicable body parser has
completely parsed the body. The default body parsers available are:
JSON, which will be used when the
Content-type
header is eitherapplication/json
or uses the+json
suffix.JSON::Fast
will be used to perform the parsing.String fallback, which is used when the
Content-type
type istext/*
. AStr
will be returned.Blob fallback, which is used in all other cases and returns a
Blob
with the body.
A Cro::HTTP::Client
instance can be configured either with a replacement set
of body parsers by passing the body-parsers
argument:
use Cro::HTTP::BodyParsers;
my $client = Cro::HTTP::Client.new:
body-parsers => [
Cro::HTTP::BodyParser::JSON,
My::BodyParser::XML
];
Or to prepend extra body parsers to the default set, use add-body-parsers
:
my $client = Cro::HTTP::Client.new:
add-body-parsers => [ My::BodyParser::XML ];
To get the response body as a Supply
that will emit the bytes as they
arrive over the network, use the body-byte-stream
method:
react {
whenever $resp.body-byte-stream -> $chunk {
say "Got chunk: $chunk.gist()";
}
}
To get the entire response body as a Blob
, use the body-blob
method:
my Blob $body = await $resp.body-blob();
To get the entire response body as a Str
, use the body-text
method:
my Str $body = await $resp.body-text();
This method will look at the Content-type
header to see if a charset
is
specified, and decode the body using that. Otherwise, it will see if the body
starts with a BOM and rely on
that. If it is not passed, the a heuristic will be used: if the body can be
decoded as utf-8
then it will be deemed to be utf-8
, and failing that it
will be decoded as latin-1
(which can never fail as all bytes are valid).
Cookies
By default, cookies in the response are ignored. However, constructing a
Cro::HTTP::Client
with the :cookie-jar
option (that is, passing True
)
will create an instance of Cro::HTTP::Client::CookieJar
. This will be used
to store all cookies set in responses. Relevant cookies will automatically be
included in follow-up requests.
my $client = Cro::HTTP::Client.new(:cookie-jar);
Cookie relevance is determiend by considering host, path, and the Secure
extension. Cookies that have passed their expiration date for maximum age will
automatically be removed from the cookie jar.
It is also possible to pass in an instance of Cro::HTTP::Client::CookieJar
,
which makes it possible to share one cookie jar amongst several instances of
the client (or to pass in a subclass that adds extra features).
my $jar = Cro::HTTP::Client::CookieJar.new;
my $client = Cro::HTTP::Client.new(cookie-jar => $jar);
my $json-client = Cro::HTTP::Client.new:
cookie-jar => $jar,
content-type => 'application/json';
To include a particular set of cookies with a request, pass them in a hash
using the cookies
named argument when making a reuqest:
my $resp = await $client.get: 'http://somesite.com/',
cookies => {
session => $fake-session-id
};
Cookies passed in this way will override any cookies from a cookie jar.
To get the cookies set by a response, use the cookies
method on the
Cro::HTTP::Response
object, which returns a List
of Cro::HTTP::Cookie
objects.
Following redirects
By default, Cro::HTTP::Client
will follow HTTP redirect responses, with a
limit of 5 redirects being enforced in order to avoid circular redirects. If
there are more than 5 redirections, X::Cro::HTTP::Client::TooManyRedirects
will be thrown.
This behavior can be configured when constructing a new Cro::HTTP::Client
or
on a per-request basis, with the per-request setting overriding the behavior
configured at construction time. In either case, it is done using the follow
named argument.
:follow # follow redirects (up to 5 times per request)
:!follow # never follow redirects
:follow(2) # follow redirects (up to 2 times per request)
:follow(10) # follow redirects (up to 10 times per request)
The 301, 307 and 308 redirects are treated identically at this time; no
caching of permanent redirects takes place. They retain the original request
method. 302 and 303 instead cause a GET
request to be issued, regardless of
the original request method.
The .request
property of the response object will refer to the final request
that was issued after a redirect (or sequence of redirects). Therefore, to get
the URI that a request was eventually redirected to, use the .uri
property
of that request object.
my $resp = await Cro::HTTP::Client.get($some-url);
say "Eventually requested $resp.request.uri()";
Authentication
Both basic authentication and bearer authentication are supported directly by
Cro::HTTP::Client
. These can be configured when instantiating the client, or
per request (which will override that configured on the instance).
For basic authentication, pass the auth
option with a hash containing a
username and a password.
auth => {
username => $user,
password => $password
}
For bearer authentication, pass the auth
option with a hash containing a
bearer:
auth => { bearer => $jwt }
Failing to pass precisely either username
and password
or bearer
will
result in an X::Cro::Client::InvalidAuth
exception.
In both cases, the authentication information will be sent immediately with
the request. In order to only have it sent if the server responds to the
initial request with a 401 response, set the if-asked
option to True
.
auth => {
username => $user,
password => $password,
if-asked => True
}
TLS configuration
The ca
argument, passed either at construction time or to a request method,
is used to provide TLS configuration. Its primary use is for providing a
custom CA certificate:
my $client = Cro::HTTP::Client.new:
ca => { ca-file => 't/certs-and-keys/ca-crt.pem' };
However, the hash may contain any arguments that the connect
method of
the TLS module accepts. Of
note, one can disable certificate checking by passing the insecure
option:
my $client = await Cro::HTTP::Client.get: 'https://badly-configur.ed/',
ca => { :insecure };
As the name suggests, this is not a secure configuration; transmissions are encrypted, but there's no checking that the server is who it claims to be.
Proxying
By default, Cro::HTTP::Client
will honor the HTTP_PROXY
, HTTPS_PROXY
and NO_PROXY
enrivonment variables. It is also possible to pass the
http-proxy
and/or https-proxy
named arguments when constructing
Cro::HTTP::Client
; these will be used for all requests made with that
instance (and take preference over any proxy found via the environment,
and furthermore cause NO_PROXY
to be disregarded).
It is not possible to override the proxy at a per-request level. There is no
mechanism to ignore the HTTP_PROXY
or HTTPS_PROXY
environment variables,
however one could delete them from %*ENV
if needed.
Persistent connections
An instance of Cro::HTTP::Client
will use persistent connections by default.
When many requests are being made to the same server, this can enable better
throughput by not requiring a new connection to be established each time. To
not use persisted connections, pass :!persistent
to the constructor. When
using the type object (for example, Cro::HTTP::Client.get($url)
, then no
persistent connection cache will be used.
Timeouts
By default, Cro::HTTP::Client
enforces:
A 60s timeout on establishing a connection to the target server
A 60s timeout on receiving the response headers once a connection has been established and the request sent
No timeout on receiving the entire response body
No overall bounding timeout for the entire HTTP request/response
These can be configured by passing the timeout
setting, either at an instance
or per-request level. One may pass:
A
Real
value, which will be interpreted as the total number of seconds for the entire HTTP request/response (including the body being downloaded). This will not increase the default connection and headers timeouts, however they will be clipped to the total time budget if it is smaller.A hash with the keys
connection
,headers
,body
, andtotal
(any not provided will have the default values of 60, 60,Inf
, andInf
) respectively.An object that does the
Cro::Policy::Timeout
role, should one wish to implement a more complex scheme.
In the case persistent connections are being used:
For HTTP/1.1, the connection will be closed in the event of any timeout
For HTTP/2.0, in the event of a body timeout, only the individual stream will be reset, and the connection left intact
HTTP version
The :http
option can be passed, either at construction or per request, to
control which versions of HTTP should be used. It can be passed a single item
or list. Valid options are 1.1
(which will implicitly handle HTTP/1.0 too)
and 2
.
:http<1.1> # HTTP/1.1 only
:http<2> # HTTP/2 only
:http<1.1 2> # HTTP/1.1 and HTTP/2 (HTTPS only; selected by ALPN)
The default is :http<1.1>
for a HTTP request, and :http<1.1 2>
for a HTTPS
request. It is not legal to use :http<1.1 2>
with a HTTP connection, as ALPN
is the only supported mechanism for deciding which protocol to use.
Push promises
HTTP/2.0 provides push promises, which allow the server to push extra resources
to the client as part of the response. By default, Cro::HTTP::Client
will
instruct the remote server to not send push promises. To opt in to this
feature, either:
If making an instance of
Cro::HTTP::Client
, pass:push-promises
to the constructor to enable them for all requests made with the client instanceOtherwise, pass
:push-promises
when making a request (for example, to theget
method). However, when using HTTP/2.0, it's usually wise to make an instance and re-use the connection for many requests.
Push promises are obtained by calling the push-promises
method of the
Cro::HTTP::Response
object that the request produces. This returns a Supply
that emits an instance of Cro::HTTP::PushPromise
for each push promise the
server sends. Each of those in turn has a response
property that returns a
Promise
that will be kept with a Cro::HTTP::Response
object when the push
promise is fulfilled.
Making a request and obtaining all push promises can therefore be achieved as follows:
react {
my $client = Cro::HTTP::Client.new(:push-promises);
my $response = await $client.get($url);
whenever $response.push-promises -> $prom {
whenever $prom.response -> $resp {
say "Push promise for $prom.target() had status $resp.status()";
}
}
}
Custom HTTP methods
The get
, post
, put
, delete
, patch
and head
methods are convenience
forms of the more general request
method, which takes the HTTP request
method as a first argument. The request
method can be used to make requests
with other HTTP methods. For example, making a request with the LINK
method
can be achieved using:
my $resp = await Cro::HTTP::Client.request('LINK', $url);
This may also be useful if the request method to use is held in a variable.
Tracing
To debug problems with the Cro HTTP client, or to understand in more detail
exactly what is being sent and received, set CRO_TRACE=1
in the environment.
(Note that this turns on tracing for all Cro components, not just the client.)
Long binary blobs (such as dumps of the TCP packets arriving) will be
truncated in the debug output. To raise the limit, put something like
CRO_TRACE_MAX_BINARY_DUMP=8192
in the environment (you may wish to pick a
higher value).