Online privacy and tracking Series - Cookies
When reproducing, using, or extracting this text, always reference the source and the author : Ignacio Rondini
What cookies are?
So at this point of history, if you’re reading these lines, then I am almost sure that you have certainly heard about cookies. If you live in Europe or use sites based in Europe, then you have certainly had to choose the level of cookies you want to accept. So what are cookies?
A cookie is information written as a key-value pair that’s sent by the server and stored by the client’s browser on the user’s device.
Let us imagine we have a user that connects to www.my-great-webshop.test. The server that hosts the webpage will return the page, together with other complementary information. Within this extra information, there will be an instruction to set a cookie that will be executed by the browser. The instruction has the form “Set-Cookie: configuration=dark”, where “configuration” is the cookie’s key, and “dark” is the cookie’s value.
This information will be stored by the browser, and it will be sent each time the browser calls the server through an http request. The cookie is sent through a header on that request. This means that afterwards, if the user wants to check some products and is redirected to www.my-great-webshop.test/products, in the request for the new page, all the cookies which have been set will be sent with that request. If you are a more visual person, and you like sequence diagrams, here you can see the same explanation :
Said in other words, a cookie is some static data, stored by the browser following the instruction of the server. One important remark here is that cookies are generally domain-bound, so all the cookies that are set by our webshop www.my-great-webshop.test won’t be sent to another domain. As a digital service, you often don’t want your data to be sent and used elsewhere.
With all that said, we have to emphasise that not every cookie is bad per se, it is just information that’s stored and exchanged. In our example, some preferences could be stored as cookies, like if the user prefers dark or light themes, or the password used to log in into a web application. They are also often used to handle sessions, so the server can now that’s me who’s doing the operations, with my account, for instance, and not someone else.
Types of cookies
There are a few distinctions we can make about the nature of cookies. Depending on the purpose, we have four main types of cookies :
-
Strictly necessary cookies. These cookies are needed to browse a website and use its features.
-
Preferences cookies, or “functionality cookies”. These cookies allow a website to remember choices you have made in the past, like the language you prefer.
-
Statistics cookies or “performance cookies”. These cookies gather data on your website usage. This can be also provided by third parties, but only used for the owner of the website.
-
Marketing cookies. These cookies track your online activity for advertisers.
Depending on their duration, they can also be divided into :
- Session cookies. Which are deleted when the browser is closed, and
- Persistent cookies. Which are kept until they expire. They survive closing the browser.
Depending on the origin :
- First-party cookies. Which are set by the website itself.
- Third-party cookies. Which are set by a third party, typically an analytics or advertising provider.
How are they used to track us?
The mechanism is the following. Whenever we perform an action on a site, like visiting a page the first time or interacting with any item, our browser calls the server to perform this operation. When the server receives the first request to render the page, there’s a cookie that’s generated. The browser will store this value and will send it with each request to the server. The server can then log each of the user’s actions. If the cookie is persistent, then it can even link several visits to the same identifier. This can still be done in an anonymous way if the server doesn’t have any further information about the user or does not link it to her. This tracking allows having a better understanding of the website usage, about functionalities usage, making visits statistics, and so on.
There is still some other information that the server can extract from us, and that can be linked to the cookie identifier we have been assigned. The following are the breadcrumbs we almost always leave when surfing the web:
- First, the IP address of the user. With this, it is possible to obtain :
- The location of the user at the city or municipality level and
- the isp provider.
- Thanks to the user-agent, we can identify the browser and the operating system, and thus if it is a mobile device or a desktop device. The user agent is a header sent to the server by the browser to identify itself as a browser.
- The list of languages that your browser uses, with the priorities you’ve set. This is sent via the accept-language header.
- The time the user makes the requests.
- Any other metadata that the server can extract from the request by using utm parameters. The utm parameters will have a special blogpost in the future. If you ever have shared a link to someone, you may have seen that after the classic url, there’s a part that looks like this: utm_source=blabla&utm_campaign=xxxxx. This is extra data that can be linked by the server to the “anonymous” user.
Even if the server doesn’t have any personal data, it can still generate a profile of the user based on the collected information obtained during all the different interactions during the lifetime of the cookie. If the user has provided more information, for instance, during an account creation. Then the server can link the behaviour to other characteristics, like a demographic group, age, personal data, and so on. The question that we can make at this stage is what happens with this collection of user data and behavioural data? Will it be sold somehow to advertisers? Will it be used to profile us and give us tailored content or advertising? Will the server be secure enough to never have a data leak or any hacking?
The thing is, once our data has been generated, it will be certainly stored and put together to be exploited. And we can’t be sure where it will end, who will be using it, and for what purpose.
Who tracks you?
Although all this can be done by the server actually rendering the page, in practice it is commonly done by a third party, like Google Analytics or any other analytics provider. And this is where things get more interesting. Because if a third party can keep track of your actions on a website A, it is possible that it can track you on a website B that’s using that same service. This is where cross-site cookies come in. We will talk about this in a future blogpost. For now, be aware that you can be tracked, not only within a single website, but through different sites, making it possible to have the history of your visits.
RGPD and data protection
I am not a legal expert, so do not take this as official advice.
Depending on where you live, there may be or not a few laws about data protection. One of the most widespread is the GDPR, from Europe, which states that a website must receive users’ consent before any cookie usage, except for strictly necessary cookies, i.e. the cookies that are essential for the website to work properly. It should be explained to the user what they do and why. This is why you see so many cookie banners in so many sites.
What about this site?
This site does not use cookies. We do use an analytics provider, SimpleAnalytics, to keep track of the number of users visiting the site. All data is anonymous, and we do not collect any personal data, so that’s why you don’t see any cookie banner.
Limiting the reach of cookies
So far, we have seen that cookies can be used for both, providing a better user experience, and to track users. The limit between them may not be that clear. Is collecting anonymised data tracking? Could anonymised-data be cross-referenced with other data sources to de-anonymise the user? Can this data be sold? Is it a problem if it can’t be linked to my identity? I won’t try to answer these questions, but what I will say is that in any case, it is a datafication of our online activities and there’s an underlying possibility for this data to be exploited. Furthermore, as said before, we can’t be sure where this data will end, either because the data was sold, the enterprise went bankrupt, or got hacked.
So how to limit the reach of cookies?
Cleaning
Let us have a cleaning routine. In most of the browsers, you can define how the cookies are handled. Some browsers allow you to delete every cookie each time the browser is closed, converting, de facto, persistent cookies into session cookies without you having to remember to erase them. The second next best alternative is to do it manually. Every day, every week, erase the cookies. The option is usually under the privacy settings of the browser. Erase them. Yes, you may see some loss on site’s preference or have to log in again, but it is a great way to limit the reach of cookies.
Tracking blockers plugins
Third party cookies are the most problematic ones. Look in your browser for the option to block third-party cookies. This will prevent many of the cross-site tracking techniques. If your favorite browser doesn’t have this option, you can either swap to a more privacy-friendly browser or use a tracking blocker plugin.
What to remember?
Cookies are not bad per se, they are just information that’s stored in the user’s device and exchanged with the server. However, they can be used to track us, especially third-party cookies sent by analytics or ads providers. And once data is created and gathered, it is difficult to know where it will be used, how it will be used, and for what purpose.
Do you build websites?
Stop tracking unnecessary data and please, stop putting nice “cookies cartoons” to make it fun and cute to extract personal data :).
Reference
For more official information about GDPR and cookies, you can visit the site https://gdpr.eu/cookies/