Cloud Security Authors: Don MacVittie, Elizabeth White, Fouad Khalil, Darren Anstee, Greg Pierce

Related Topics: Cloud Security

Cloud Security: Blog Feed Post

Advanced Data Exfilration

Penetration testing and red-team exercises have been running for years using the same methodology and techniques

This paper has been published in several security conferences during 2011, and is now being made fully available (as well as a PDF version for downloading)


Penetration testing and red-team exercises have been running for years using the same methodology and techniques. Nevertheless, modern attacks do not conform to what the industry has been preparing for, and do not utilize the same tools and techniques employed by such tests. This paper discusses the different ways that attacks should be emulated, and focuses mainly on data exfiltration.

The ability to “break into” an organization is fairly well documented and accepted, but the realization that attacks do not stop at the first system compromised did not get to all business owners. As such, this paper describes multiple ways to exfiltrate data that should be used as part of penetration testing in order to better simulate an actual attack, and stress the organization’s security measures and detection capabilities of data that leaves it.


Modern attack trees employ multiple realms of information that are not necessarily extensively tested and protected by organizations. Some of these paths involve not only technical factors, but also human, business, and physical ones.

From a technical perspective, even though information security as a practice has been around ever since computer systems have been in use, organizations tend to focus on the issues that are well documented, and have a plethora of products that address them. On the other hand, attackers would do exactly the opposite – try to find infiltration paths that involve some human interaction (as humans are generally less secure than computers), and focus on elements that are less scrutinized by protection and control mechanisms. One example for such an attack path is using formats that are commonly used by organizations, and are known to contain multiple vulnerabilities. Such formats (like WinZIP, Rar, PDF, Flash) and applications (vulnerable Internet Explorer, Firefox, and other commonly used tools) are bound to be found in organizations – even if the installed versions are problematic from a security standpoint. This is due to the fact that a lot of internal applications are still enforcing the use of old versions of such tools (notably Internet Explorer).

The human element kicks in when trying to introduce the malicious code into the organization. A common and still highly useful attack avenue is the use of targeted phishing emails (spear-phishing) that provide an appealing context to the target which would make it more “legitimate” to open and access any content embedded in the email or referred by it (external links to sites that would carry the malicious code).

Another human element that can be easily exploited in getting malicious content into an organization is using physical devices that are dropped or handed to key personnel. This paper is read at a conference, where multiple vendors provide giveaways in the form of electronic media (be it CDs, or USB memory devices). A classic attack vector would use such an opportunity to deliver crafted malicious code over said media (or follow-up emails) to key personnel.

The last element of the infiltration attack surface is the physical one – gaining physical access to the offices or the target (or one of its partners) is easier than perceived, and provides the attacker multiple ways of getting their attack code onto the network. From casually tapping into open ports and sniffing the network via a remote connection (by plugging in a wireless access point), or simply plugging in an infected USB drive into various PCs, such attacks can be carried out by personnel that are not necessarily security professionals (we often use paid actors to carry out such engagements and gain a familiarity bonus for local accent and conversation context).

Data targeting and acquisition

Before launching the actual attack on the organization, the attacker will perform some basic intelligence gathering in order to properly select the target thorough which the attack would be conducted, as well as the data that is being sought after.

Such intelligence gathering would be done through open source intelligence, as well as onsite reconnaissance. From an organizational perspective, the ability to map out the information that is available through public channels on it is crucial. It provides a baseline on which the organization can prepare the threat model, which reflects the attacker’s intelligence data, and allows it to properly address any policy/procedure issues in terms of exposing data. Social media channels are one of the more popular means of intelligence gathering, and can easily be used to build an organizational map and identify the right targets to go through.

Finally – in terms of data targeting, the attacker would usually focus on intellectual property, financial and personal information – all of which would be easily sold on the open market and to competitors.

Once the target has been selected, and the data have been identified, the actual payload needs to be created and deployed using the attack tree that was chosen for infiltration. Such payload (often called an “APT” – Advanced Persistent Threat) is usually no more sophisticated than a modern banker Trojan that has been retooled for a singular targeted attack. Such Trojans are available for purchase on the criminal markets for $500 to $2,500, depending on the plugins and capabilities provided. These by themselves offer a fairly low detection rate by security software, but in order to assure a successful attack they are often further obfuscated and packed.

Command and Control

One of the main differences between targeted attacks and the more common malware is that targeted attacks need to take into consideration the fact that the connectivity between the payload and the attacker cannot be assured for long periods of time (and sometimes are consistently nonexistent). As such, the C&C (command & Control) scheme for such an attack needs to take that into considerations and the payload should be well equipped to operate fairly independently.

Nevertheless, some form of control communication is needed, and usually utilizes a hierarchical control structure, where multiple payloads are deployed into the organization at different locations, and are able to communicate with each other to form a “grid” that enables the more restricted locations to communicate through other layers to the attacker outside the organization.


So far we have reviewed how an attacker would infiltrate an organization, target the data it is after, and find a way to somehow control the payload deployed. Nevertheless, getting the actual data out is still a challenge, as more often than not, it is located so deep inside the network that traditional means of communications (DNS/HTTP tunneling for example) are not applicable.

However, the way that organizations build their infrastructure these days basically call for other means of getting the data out. Following are a few concepts that should be used for testing exfiltration capabilities as part of penetration testing – which have proved to be useful on multiple major corporations, as well as government/defense organizations.

First off – the obvious: use “permitted” protocols to carry the information out. These are usually DNS traffic and HTTP traffic. The data itself may be sensitive and filtered by DLP mechanisms, and as such should be encrypted using a method that would not allow a filtering/detection device to parse through it. After encryption, the data can be sent out through services such as Dropbox, Facebook, Twitter, blog comments and posts, wikis, etc… These are often not filtered by the corporate control mechanisms, and are easy to set up if needed to (a WordPress blog for example, where the payload can post encrypted data using the attacker’s public key).

The next exfiltration method that usually works is simply printing documents. Obviously, we won’t print out the original data as it would be easily detected and shredded. Instead – the encrypted information can be sent to shared printers (which are easy to map in the network), and made to look like print errors (i.e. remove any header/footer from the encryption method we utilize). Printouts like that are more likely to end up in the paper bin rather than the shredder, and later extracted as part old-school dumpster-diving. Such documents just need to be OCR’d after their retrieval and decrypted to reveal the sensitive data that has been stolen. This method is usually more efficient where proper mapping of the paper disposal process of the target has been performed.

An alternative to printing the encrypted data uses the same means of exfiltration – the shared printers. When a shared printer is found to be a multi-function device with faxing capabilities, the payload can utilize it to fax out the encrypted documents.

In this situation the payload would still need to keep “operational awareness” as some DLP products would actually look at the fax queue for information that is not supposed to leave the organization, hence using the encrypted text is better form.

Exfiltartion through VoIP

This is the main concept that is being displayed here. As VoIP networks are usually accessible to the local network (usually to accommodate soft-phones, and just a simpler network topology), crafted payloads are able to utilize this channel to exfiltrate data. The method proposed here is to initially sniff the network and observe recurring patterns of calls, and user identifications (to be later used when initiating the SIP call). After some initial pattern can be mapped out, the payload encodes the data to be exfiltrated form its binary format to audio.

A proposed encoding maps out the half-byte values of the data stream to a corresponding scale of audio tones using 16 distinct octaves on the human audible frequency range (20Hz to 20,000Hz). Therefore, a byte value is split into it’s high, and low values, and then the value is used to select an octave (out of the 16 available ones).

For each byte do:
    hb_low = byte & 0xF
    hb_high = byte >> 4
    voice_msg += octave[hb_low]
    voice_msg += octave[hb_high]

Sample 1: pseudo-code for voice encoding of data to 16-octave representation

The octave is then played for a short period of time (for example ½ second) on the final output voice channel. The final output is then played back on an opened SIP call that can be made to almost any number outside the organization (for example a Google voice account’s voicemail) for later decoding back to the original binary data.

For the decoding itself, an approximation analysis is performed on the input sound file in order to identify the maximum frequency detected for each “time slice” which carries the generated tone, and then comparing the frequency to the octaves used in generating the original sound. As sounds get distorted and downsampled as they go through older telephony systems, and cannot be guaranteed the same quality as used on pure VoIP circuits, the spacing between the frequencies used should be enough to create a distinction between tones.

For each sample_pair do:
    max_f = getMaxFreq(sample0)
    bh = getByteFromFreq(max_f)
    max_f = getMaxFreq(sample1)
    bl = getByteFromFreq(max_f)
    byte = bl | bh << 4
    out_stream += byte

Sample 2: pseudo-code for voice encoding of data to 16-octave representation

This method can obviously be optimized in several ways – first, using more octaves (as long as they are distant enough in their frequencies and non-harmonic) to represent more data in each tone being played, and again in the time each tone is played to essentially compress the data over less time.

The proof-of-concept that is being released along with this paper is intentionally designed to act as an example (although it can be easily tooled to carry out a significant amount of data, and has been used in several occasions to do so in penetration testing engagements to exfiltrate highly sensitive data).

In terms of protection against such exfiltration techniques, the recommended strategy is to basically extend the same kind of monitoring and scrutiny that is being applied to traditional electronic communication channels (web, email) to the VoIP channels. Although voice communication monitoring has been traditionally associated with more government type organizations, the move to VoIP enables small companies and businesses to extend the same security controls to the voice channel as well. DLP systems for example could easily be used to transcribe (voice to text) phone conversations, and apply the same filtering to them, while alerting on calls that contain non-verbal communications as suspicious.

Future Directions

The concepts presented here in relation to advanced techniques in data exfiltration are not only theoretical. We have been observing progress in the way that advanced threats are addressing this issue, and adding more capabilities and techniques to the arsenal od data exfiltration beyond simply staging data in archives and pushing it out through FTP connections. The proliferation of VoIP networks that are being configured mainly for convenience with not much security concern into them have allowed us to observe a few cases where similar methods of utilizing such channels were used in the transmission of data outside of the targeted organization. Additionally, VoIP networks also allow simple bridges between networks with different classifications that may not have a direct data connection in the “classic” sense of a TCP/IP network infrastructure.

The other techniques mentioned in this paper (namely the use of covert channels in legitimate services such as blogs, social networks, Wikis, and DNS) are already in full use and should be a reality that corporate security should already address.

Mitigation Strategies

When attempting to address data exfiltration the first important thing to realize is that infiltration is almost taken for granted. With so many attack surfaces encompassing different facets of the organization (outside technical scopes), security managers need to realize that detection and mitigation of data exfiltration is an integral part of the strategic approach to security.

Identifying data in transit and in-motion is the basic element that allows detection of exfiltration paths, and many tools already allow organizations to address this issue. The missing components usually lie in the realms that traditional products and approaches neglect such as encrypted communications, and “new” channels such as VoIP. Addressing these channels is not an unresolved problem, and in our experience simple solutions can provide insight into the data carried in them.

For encrypted channels a policy (both legal as well as technical) of terminating such encryption at the organizational perimeter before it is being continued to the outside should be applied. This approach, coupled with an integration of existing DLP products to identify and detect misplaced data, will provide the required insight into such communications. An unknown data type carried over legitimate channels should be flagged and blocked until proven “innocent” (for example custom encryption used inside a clear-text channel that cannot be correctly identified and validated as legitimate).

For VoIP channels, the same approach that is being applied to the more traditional web and email channels should be used as well. Full interception and monitoring of such channels can be applied, even when not in real-time – such as recording all conversations, processing them using speech recognition software, and feeding the results back to the DLP. This approach yield the same results as a DLP installed on the email and web channels does. Additionally, the investment in terms of time, human resources, and materials is negligible when compared with the added security in terms of detection and mitigation of such threats, and complements the layered security approach that should have covered these aspects in the first place.


This paper covered both the more advanced infiltration techniques utilized by targeted attack on organizations (which should have been covered by the security industry to a point, although organizations are still struggling with this aspect), as well as raises the awareness to the more problematic issue of detecting data in transit outside of the organization as it is being exfiltrated. Several methods of exfiltration have been discussed, with the more evasive one being the use of voice channels on VoIP infrastructure.

We believe that the current practices do a disservice to the layered security approach that is being preached in the security industry by leaving gaping holes in the exfiltration paths monitoring and mitigation. While there may be claims of privacy issues, such gaps are similar in the way that data is being processed and inspected to existing channels, and should adhere to the same standards of privacy protection and abuse as traditional solutions that address data leakage do.

Read the original blog entry...

More Stories By Iftach Ian Amit

With more than 10 years of experience in the information security industry, Ian (Iftach) Amit brings a mixture of software development, OS, network and Web security expertise as Managing Partner of the top-tier security consulting and research firm Security & Innovation. Prior to Security & Innovation, Ian was the Director of Security Research at Aladdin and Finjan, leading their security research while positioning them as leaders in the Web security market. Amit has also held leadership roles as founder and CTO of a security startup in the IDS/IPS arena, developing new techniques for attack interception, and a director at Datavantage, responsible for software development and information security, as well as designing and building a financial datacenter. Prior to Datavantage, he managed the Internet application and UNIX worldwide. Amit holds a Bachelor's degree in Computer Science and Business Administration from the Interdisciplinary Center at Herzlya.

@ThingsExpo Stories
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone inn...
Cloud Expo | DXWorld Expo have announced the conference tracks for Cloud Expo 2018. Cloud Expo will be held June 5-7, 2018, at the Javits Center in New York City, and November 6-8, 2018, at the Santa Clara Convention Center, Santa Clara, CA. Digital Transformation (DX) is a major focus with the introduction of DX Expo within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive ov...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, examined the regulations and provided insight on how it affects technology, challenges the established rules and will usher in new levels of diligence arou...
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, provided a fun and simple way to introduce Machine Leaning to anyone and everyone. He solved a machine learning problem and demonstrated an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intelligence and B...
"Digital transformation - what we knew about it in the past has been redefined. Automation is going to play such a huge role in that because the culture, the technology, and the business operations are being shifted now," stated Brian Boeggeman, VP of Alliances & Partnerships at Ayehu, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
Recently, REAN Cloud built a digital concierge for a North Carolina hospital that had observed that most patient call button questions were repetitive. In addition, the paper-based process used to measure patient health metrics was laborious, not in real-time and sometimes error-prone. In their session at 21st Cloud Expo, Sean Finnerty, Executive Director, Practice Lead, Health Care & Life Science at REAN Cloud, and Dr. S.P.T. Krishnan, Principal Architect at REAN Cloud, discussed how they built...
No hype cycles or predictions of a gazillion things here. IoT is here. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, an Associate Partner of Analytics, IoT & Cybersecurity at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He also discussed the evaluation of communication standards and IoT messaging protocols, data...
SYS-CON Events announced today that Evatronix will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Evatronix SA offers comprehensive solutions in the design and implementation of electronic systems, in CAD / CAM deployment, and also is a designer and manufacturer of advanced 3D scanners for professional applications.
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
DevOps at Cloud Expo – being held June 5-7, 2018, at the Javits Center in New York, NY – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Among the proven benefits,...
@DevOpsSummit at Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, is co-located with 22nd Cloud Expo | 1st DXWorld Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait...
SYS-CON Events announced today that T-Mobile exhibited at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on qua...