Welcome!

Cloud Security Authors: Pat Romanski, Elizabeth White, Liz McMillan, Yeshim Deniz, Ed Featherston

Related Topics: Cloud Security, Java IoT, Linux Containers, Containers Expo Blog, Agile Computing, @CloudExpo

Cloud Security: Blog Feed Post

Facebook Exploit Is Not Unique

Facebook isn't unique in the ability to use it to attack a third party, it's just more effective

This week's "bad news" with respect to information security centers on Facebook and the exploitation of HTTP caches to affect a DDoS attack. Reported as a 'vulnerability', this exploit takes advantage of the way the application protocol is designed to work. In fact, the same author who reports the Facebook 'vulnerability' has also shown you can use Google to do the same thing. Just about any site that enables you to submit content containing links and then retrieves those links for you (for caching purposes) could be used in this way. It's not unique to Facebook or Google, for that matter, they just have the perfect environment to make such an exploit highly effective.

The exploit works by using a site (in this case Facebook) to load content and takes advantage of the general principle of amplification to effectively DDoS a third-party site. This is a flood-based like attack, meaning it's attempting to overwhelm a server by flooding it with requests that voraciously consume server-side resources and slow everyone down - to the point of forcing it to appear "down" to legitimate users.

The requests brokered by Facebook are themselves 110% legitimate requests. The requests for an image (or PDF or large video file) are well-formed, and nothing about the requests on an individual basis could be detected as being an attack. This is, in part, why the exploit works: because the individual requests are wholly legitimate requests.

How it Works
The trigger for the "attack" is the caching service. Caches are generally excellent at, well, caching static objects with well-defined URIs. A cache doesn't have a problem finding /myimage.png. It's either there, or it's not and the cache has to go to origin to retrieve it. Where things get more difficult is when requests for content are dynamic; that is, they send parameters that the origin server interprets to determine which image to send, e.g. /myimage?id=30. This is much like an old developer trick to force the reload of dynamic content when browser or server caches indicate a match on the URL. By tacking on a random query parameter, you can "trick" the browser and the server into believing it's a brand new object, and it will go to origin to retrieve it - even though the query parameter is never used. That's where the exploit comes in.

HTTP servers accept as part of the definition of a URI any number of variable query parameters. Those parameters can be ignored or used at the discretion of the application. But when the HTTP server is looking to see if that content has been served already, it does look at those parameters. The reference for a given object is its URL, and thus tacking on a query parameter forces (or tricks if you prefer) the HTTP server to believe the object has never been served before and thus can't be retrieved from a cache.

Caches act on the same principles as an HTTP server because when you get down to brass tacks, a cache is a very specialized HTTP server, focused on mirroring content so it's closer to the user.

<img src=http://target.com/file?r=1>
<img src=http://target.com/file?r=2>
<img src=http://target.com/file?r=3>
...
<img src=http://target.com/file?r=1000>

Many, many, many, many (repeat as necessary) web applications are built using such models. Whether to retrieve text-based content or images is irrelevant to the cache. The cache looks at the request and, if it can't match it somehow, it's going to go to origin.

Which is what's possible with Facebook Notes and Google. By taking advantage of (exploiting) this design principle, if a note crafted with multiple image objects retrieved via a dynamic query is viewed by enough users at the same time, the origin can become overwhelmed or its network oversubscribed.

This is what makes it an exploit, not a vulnerability. There's nothing wrong with the behavior of these caches - they are working exactly as they were designed to act with respect to HTTP. The problem is that when the protocol and caching behavior was defined, such abusive behavior was not considered.

In other words, this is a protocol exploit not specific to Facebook (or Google). In fact, similar exploits have been used to launch attacks in the past. For example, consider some noise raised around WordPress in March 2014 that indicated it was being used to attack other sites by bypassing the cache and forcing a full reload from the origin server:

If you notice, all queries had a random value (like “?4137049=643182″) that bypassed their cache and force a full page reload every single time. It was killing their server pretty quickly.

 

But the most interesting part is that all the requests were coming from valid and legitimate WordPress sites. Yes, other WordPress sites were sending that random requests at a very large scale and bringing the site down.

The WordPress exploit was taking advantage of the way "pingbacks" work. Attackers were using sites to add pingbacks to amplify an attack on a third party site (also, ironically, a WordPress site).

It's not just Facebook, or Google - it's inherent in the way caching is designed to work.

Not Just HTTP
This isn't just an issue with HTTP. We can see similar behavior in a DNS exploit that renders DNS caching ineffective as protection against certain attack types. In the DNS case, querying a cache with a random host name results in a query to the authoritative (origin) DNS service. If you send enough random host names at the cache, eventually the DNS service is going to feel the impact and possibly choke.

In general, these types of exploits are based on protocol and well-defined system behavior. A cache is, by design, required to either return a matching object if found or go to the origin server if it is not. In both the HTTP and DNS case, the caching services are acting properly and as one would expect.

The problem is that this proper behavior can be exploited to affect a DDoS attack - against third-parties in the case of Facebook/Google and against the domain owner in the case of DNS.

These are not vulnerabilities, they are protocol exploits. This same "vulnerability" is probably present in most architectures that include caching. The difference is that Facebook's ginormous base of users allows for what is expected behavior to quickly turn into what looks like an attack.

Mitigating
The general consensus right now is the best way to mitigate this potential "attack" is to identify and either rate limit or disallow requests coming from Facebook's crawlers by IP address. In essence, the suggestion is to blacklist Facebook (and perhaps Google) to keep it from potentially overwhelming your site.

The author noted in his post regarding this exploit that:

Facebook crawler shows itself as facebookexternalhit. Right now it seems there is no other choice than to block it in order to avoid this nuisance.

The post was later updated to note that blocking by agent may not be enough, hence the consensus on IP-based blacklisting.

The problem is that attackers could simply find another site with a large user base (there are quite a few of them out there with the users to support a successful attack) and find the right mix of queries to bypass the cache (cause caches are a pretty standard part of a web-scale infrastructure) and voila! Instant attack.

Blocking Facebook isn't going to stop other potential attacks and it might seriously impede revenue generating strategies that rely on Facebook as a channel. Rate limiting based on inbound query volume for specific content will help mitigate the impact (and ensure legitimate requests continue to be served) but this requires some service to intermediate and monitor inbound requests and, upon seeing behavior indicative of a potential attack, the ability to intercede or apply the appropriate rate limiting policy. Such a policy could go further and blacklist IP addresses showing sudden increases in requests or simply blocking requests for the specified URI in question - returning instead some other content.

Another option would be to use a caching solution capable of managing dynamic content. For example, F5 Dynamic Caching includes the ability to designate parameters as either indicative of new content or not. That is, the caching service can be configured to ignore some (or all) parameters and serve content out of cache instead of hammering on the origin server.

Let's say the URI for an image was: /directory/images/dog.gif?ver=1;sz=728X90 where valid query parameters are "ver" (version) and "sz" (size). A policy can be configured to recognize "ver" as indicative of different content while all other query parameters indicate the same content and can be served out of cache. With this kind of policy an attacker could send any combination of the following and the same image would be served from cache, even though "sz" is different and there are random additional query parameters.

/directory/images/dog.gif?ver=1;sz=728X90; id=1234
/directory/images/dog.gif?ver=1;sz=728X900; id=123456
/directory/images/dog.gif?ver=1;sz=728X90; cid=1234 

By placing an application fluent cache service in front of your origin servers, when Facebook (or Google) comes knocking, you're able to handle the load.

Action Items
There have been no reports of an attack stemming from this exploitable condition in Facebook Notes or Google, so blacklisting crawlers from either Facebook or Google seems premature. Given that this condition is based on protocol behavior and system design and not a vulnerability unique to Facebook (or Google), though, it would be a good idea to have a plan in place to address, should such an attack actually occur - from there or some other site.

You should review your own architecture and evaluate its ability to withstand a sudden influx of dynamic requests for content like this, and put into place an operational plan for dealing with it should such an event occur.

For more information on protecting against all types of DDoS attacks, check out a new infographic we’ve put together here.

Read the original blog entry...

More Stories By Lori MacVittie

Lori MacVittie is responsible for education and evangelism of application services available across F5’s entire product suite. Her role includes authorship of technical materials and participation in a number of community-based forums and industry standards organizations, among other efforts. MacVittie has extensive programming experience as an application architect, as well as network and systems development and administration expertise. Prior to joining F5, MacVittie was an award-winning Senior Technology Editor at Network Computing Magazine, where she conducted product research and evaluation focused on integration with application and network architectures, and authored articles on a variety of topics aimed at IT professionals. Her most recent area of focus included SOA-related products and architectures. She holds a B.S. in Information and Computing Science from the University of Wisconsin at Green Bay, and an M.S. in Computer Science from Nova Southeastern University.

@ThingsExpo Stories
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
DXWorldEXPO LLC, the producer of the world's most influential technology conferences and trade shows has announced the 22nd International CloudEXPO | DXWorldEXPO "Early Bird Registration" is now open. Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant tha...
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
I think DevOps is now a rambunctious teenager - it's starting to get a mind of its own, wanting to get its own things but it still needs some adult supervision," explained Thomas Hooker, VP of marketing at CollabNet, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Recently, WebRTC has a lot of eyes from market. The use cases of WebRTC are expanding - video chat, online education, online health care etc. Not only for human-to-human communication, but also IoT use cases such as machine to human use cases can be seen recently. One of the typical use-case is remote camera monitoring. With WebRTC, people can have interoperability and flexibility for deploying monitoring service. However, the benefit of WebRTC for IoT is not only its convenience and interopera...
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
It is of utmost importance for the future success of WebRTC to ensure that interoperability is operational between web browsers and any WebRTC-compliant client. To be guaranteed as operational and effective, interoperability must be tested extensively by establishing WebRTC data and media connections between different web browsers running on different devices and operating systems. In his session at WebRTC Summit at @ThingsExpo, Dr. Alex Gouaillard, CEO and Founder of CoSMo Software, presented ...
WebRTC is great technology to build your own communication tools. It will be even more exciting experience it with advanced devices, such as a 360 Camera, 360 microphone, and a depth sensor camera. In his session at @ThingsExpo, Masashi Ganeko, a manager at INFOCOM Corporation, introduced two experimental projects from his team and what they learned from them. "Shotoku Tamago" uses the robot audition software HARK to track speakers in 360 video of a remote party. "Virtual Teleport" uses a multip...
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, discussed the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...
Detecting internal user threats in the Big Data eco-system is challenging and cumbersome. Many organizations monitor internal usage of the Big Data eco-system using a set of alerts. This is not a scalable process given the increase in the number of alerts with the accelerating growth in data volume and user base. Organizations are increasingly leveraging machine learning to monitor only those data elements that are sensitive and critical, autonomously establish monitoring policies, and to detect...
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settl...
In his session at @ThingsExpo, Dr. Robert Cohen, an economist and senior fellow at the Economic Strategy Institute, presented the findings of a series of six detailed case studies of how large corporations are implementing IoT. The session explored how IoT has improved their economic performance, had major impacts on business models and resulted in impressive ROIs. The companies covered span manufacturing and services firms. He also explored servicification, how manufacturing firms shift from se...
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of bus...
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
IoT solutions exploit operational data generated by Internet-connected smart “things” for the purpose of gaining operational insight and producing “better outcomes” (for example, create new business models, eliminate unscheduled maintenance, etc.). The explosive proliferation of IoT solutions will result in an exponential growth in the volume of IoT data, precipitating significant Information Governance issues: who owns the IoT data, what are the rights/duties of IoT solutions adopters towards t...
Amazon started as an online bookseller 20 years ago. Since then, it has evolved into a technology juggernaut that has disrupted multiple markets and industries and touches many aspects of our lives. It is a relentless technology and business model innovator driving disruption throughout numerous ecosystems. Amazon’s AWS revenues alone are approaching $16B a year making it one of the largest IT companies in the world. With dominant offerings in Cloud, IoT, eCommerce, Big Data, AI, Digital Assista...