What CloudFlare Logs

What CloudFlare
Logs

Over the last few weeks, we’ve had a number of requests for information
about what data CloudFlare logs when someone visits a site on our
network. While we have provided a Privacy
Policy
that outlines how we
keep information private, I wanted to take the time to clarify our
customer log retention policies.

What CloudFlare Logs

When you visit a site on CloudFlare’s network, we record information
about that visit. If you run a web server you’ll be familiar with these
logs as they’re similar to an Apache access log. We log data for two
reasons: 1) to help us identify security threats and attacks hitting our
customers in order to mitigate them; and 2) in order to identify
performance bottlenecks and errors on our system.

It’s somewhat hard to fathom the scale of the log data that we generate.
Every minute of every day we generate more than 20GB (compressed) of log
data. That translates, at our current volume, to more than 10 Petabytes
of storage needed to store a year’s worth of logs, and, due to our
continued growth, that volume that has been doubling every 4 months or
so. Today, even if we wanted to, we don’t have the ability to retain all
the logs we generate. This means that, for most customers, we discard
access logs within 4 hours of them being recorded.

What CloudFlare
Logs

For our Enterprise customers, we offer an optional feature that allows
them to export their raw log files in Apache format. This requires us to
store log files for a longer period of time in order to allow them to be
downloaded. By default, we store logs for these customers for 3 days.

Crunching Data

Since CloudFlare does not keep the raw logs, it is impossible for us to
answer questions like: tell me all the visitors who have been to a
particular website on CloudFlare’s network.

However, CloudFlare does generate aggregate data, so we can provide
analytics back to customers. We use the aggregated data to populate
things like the CloudFlare Analytics page which includes numbers of
hits, page views, bandwidth consumed and unique visitors. As logs are
received, we run a stream processing engine that extracts this summary
data. This data is correlated in each of our edge data centers and then
sent to one of our core facilities in order to report through our UI.

This same data summary engine also looks for attack patterns, which is
then used to provide security protection for our customer’s websites.
Using this engine, we can identify an attack on one site, usually in
less than 1 minute, and then push updated security rules that then
protect every site using CloudFlare from that same attack.

Access logs for most customers are stored briefly at the edge of our
network and then deleted within 4 hours. If there is an error, those
logs are transmitted back to one of our core facilities in order for us
to diagnose the error. Error logs sent to core are currently kept for 1
week then discarded.

The Future

Going forward, we want to allow customers who would like to have more
insight into the visitors to their sites to be able to choose to do so.
As we do, we will provide details on how any feature we add changes our
log retention policy, and we will continue to be guided by the principle
that our customers should be able to understand and control what data is
being stored about visitors to their sites.

Via Cloudflare.com

Tags: , ,

No comments yet.

Leave a Reply