Info Loss

Searching for a needle in a pcap haystack with pyshark

2017-07-14T11:02:00.000-07:00

Faced with a bit of a challenge recently: I had a large (multi-megabyte) packet capture file from Wireshark and needed to extract information from the start of each SSL/TLS session in the capture. I could have used a Wireshark display filter to find SSL/TLS packets, but then manually sifting the client hello packets out of the capture and manually copying the needed data would have taken more time than I could spare for this task.

Fortunately, we can use the pyshark Python module to access packets in a pcap file using a loop and programmatically search for data in the packets of interest. I'm using MacPorts on MacOS, but pyshark doesn't seem to available, so I used "sudo /opt/local/bin/pip install pyshark" to install the module. I already have wireshark installed, and it conveniently has a link /usr/local/bin/tshark to run the text-mode wireshark tool needed by pyshark to extract data from pcap files.

thePacketGeek wrote a helpful series of articles on using pyshark, but didn't get as deep into the details of SSL/TLS packets as I needed. So, first step was to determine how to access the data of interest in SSL/TLS client hello packets. I extracted a single representative client hello packet from the large capture file using Wireshark's "Export Specified Packets" option in the file menu into a testing pcap file, and used the interactive Python interpreter to see what was available:

$ /opt/local/bin/python2.7
Python 2.7.13 (default, Apr 25 2017, 11:00:18)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyshark
>>> cap = pyshark.FileCapture('client-hello.pcapng')
>>> dir(cap[0])
>>> ['__class__', '__contains__', '__delattr__', '__dict__', '__dir__', '__doc__', '__format__', '__getattr__', '__getattribute__', '__getitem__', '__getstate__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_packet_string', 'captured_length', 'eth', 'frame_info', 'get_multiple_layers', 'highest_layer', 'interface_captured', 'ip', 'layers', 'length', 'number', 'pretty_print', 'sniff_time', 'sniff_timestamp', 'ssl', 'tcp', 'transport_layer']

"ssl" looks interesting:

dir(cap[0].ssl)>>> ['', 'DATA_LAYER', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__format__', '__getattr__', '__getattribute__', '__getstate__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_all_fields', '_field_prefix', '_get_all_field_lines', '_get_all_fields_with_alternates', '_get_field_or_layer_repr', '_get_field_repr', '_layer_name', '_sanitize_field_name', 'field_names', 'get', 'get_field', 'get_field_by_showname', 'get_field_value', 'handshake', 'handshake_cipher_suites_length', 'handshake_ciphersuite', 'handshake_ciphersuites', 'handshake_comp_method', 'handshake_comp_methods', 'handshake_comp_methods_length', 'handshake_extension_len', 'handshake_extension_type', 'handshake_extensions_ec_point_format', 'handshake_extensions_ec_point_formats_length', 'handshake_extensions_elliptic_curve', 'handshake_extensions_elliptic_curves', 'handshake_extensions_elliptic_curves_length', 'handshake_extensions_length', 'handshake_extensions_reneg_info_len', 'handshake_extensions_server_name', 'handshake_extensions_server_name_len', 'handshake_extensions_server_name_list_len', 'handshake_extensions_server_name_type', 'handshake_extensions_status_request_exts_len', 'handshake_extensions_status_request_responder_ids_len', 'handshake_extensions_status_request_type', 'handshake_length', 'handshake_random', 'handshake_random_time', 'handshake_session_id_length', 'handshake_sig_hash_alg', 'handshake_sig_hash_alg_len', 'handshake_sig_hash_algs', 'handshake_sig_hash_hash', 'handshake_sig_hash_sig', 'handshake_type', 'handshake_version', 'layer_name', 'pretty_print', 'raw_mode', 'record', 'record_content_type', 'record_length', 'record_version']

pyshark pulled out a large number of named elements from this packet. I'm interested in the client hello's extension where the server name indication lives, so "handshake_extensions_server_name" looks useful.

cap[0].ssl.handshake_extensions_server_name
>>> 'www.bing.com'

It worked!

Now we can use this in a python script -- since not all packets in the capture are a TLS client hello with the Server Name Indication (SNI) extension, I wrapped the code into a try block to casually pass by any packets that didn't have the data I'm looking for, and call it from a loop over all the filename(s) on the command line:

import pyshark
import sys

def process(fn):
    cap = pyshark.FileCapture(input_file=fn, keep_packets=False)
    for pkt in cap:
        try:
            print pkt.ssl.handshake_extensions_server_name
        except AttributeError:
            pass

for i in range(1, len(sys.argv)):
    process(sys.argv[i])

(My actual program is a little more complex, but this is the fundamental task.)

This takes about 8 minutes to run through the hundreds of thousands of packets in a 125MB pcapng file, but saved hours of time that would have been needed to write an equivalent C++ program.

Searching Logs: A Work In Progress

2013-05-15T10:49:00.002-07:00

A while back, I read a blog post at the SANS Internet Storm Center (ISC) handler's diary, "There's Value In Them There Logs" that piqued my interest. I'm well aware that logs are essential for error discovery and diagnosis as well as incident forensic analysis. The systems I build consistently provide valuable data in their logs to aid such analysis. However, I've long wanted an open-source centralized log tool that could merge and manage all my log data across all my systems.

In the ISC diary, there is a good diagram of a set of tools that can cooperate to build a useful log indexing and analysis system (rather than copy and describe all the components here, please see the original blog). I initially was a bit lost in the numerous pieces involved, but with a couple of days' worth of trial and investigation, it is making sense.

At the moment, I've pulled together Logstash to read and parse logs, ElasticSearch to store the log contents & indexes, and Kibana to visualize & search the log data. Logstash and ElasticSearch need a working Java Runtime (JRE); Kibana needs Ruby.

I initially followed the Logstash tutorials to get the Logstash component working. With all its flexibility, it can be a challenge to understand what Logstash is capable of, but the tutorial assists getting the software working, and by working through the steps I was able to figure out what Logstash was doing and why.

The standalone tutorial lead down the path of running Logstash in agent and web server modes. It wasn't clear to me immediately, but Logstash uses either an embedded ElasticSearch component or a companion ElasticSearch server to manage the log index and storage. I used logs from my mail server and other systems to feed it.

After my initial standalone trial, I tried out the centralized tutorial that uses Redis as a broker between Logstash instances and ElasticSearch. It was interesting to see how this functionality worked, but the centralized approach ended up complicating the architecture and diverting my attention from my goal: visualization and search.

Aside from my diversion into the centralized tutorial, something else was bothering me: the mail server logs I used were not being deeply parsed -- the log messages were being indexed and stored, but no semantics were being applied to the data. I wanted to be able to query on sendmail queue IDs, mail senders and recipients, rejected messages, and other useful data.

Logstash incorporates the very useful grok functionality to extract content and semantics from data using tagged regular expressions. Surprisingly, I didn't find built-in recipes to work with sendmail log data, so I rolled my own in this standalone logstash configuration:

input { stdin { type => "mail"}}
filter {
grok {
type => "mail"
pattern => [
"%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} (?<program>(sendmail|sm-mta[^,\[]+))(?:\[%{POSINT:pid}\])?: (?<qid>\S+): timeout waiting for input from %{IPORHOST:timeoutHost} .*",
"%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} (?<program>(sendmail|sm-mta[^,\[]+))(?:\[%{POSINT:pid}\])?: (?<qid>\S+): Milter ($?<milter>\S+$| add|): (?<milterMsg>.*)",
"%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} (?<program>(sendmail|sm-mta[^,\[]+))(?:\[%{POSINT:pid}\])?: (?<qid>\S+): <(?<unknownUser>\S+)>\.\.\. User unknown",
"%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} (?<program>(sendmail|sm-mta[^,\[]+))(?:\[%{POSINT:pid}\])?: STARTTLS=(?<starttls>\S+), ((relay=%{IPORHOST:relay}( \[%{IPORHOST:relayip}\]( $may be forged$)?)?|version=(?<version>\S+)|verify=%{DATA:verify}|cipher=(?<cipher>[^,]+)|bits=(?<bits>\S+))(, |$))*",
"%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} (?<program>(sendmail|sm-mta[^,\[]+))(?:\[%{POSINT:pid}\])?: (?<qid>NOQUEUE): connect from (%{IPORHOST:host})?( ?\[%{IPORHOST:ip}\])?( ?$may be forged$?)?",
"%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} (?<program>(sendmail|sm-mta[^,\[]+))(?:\[%{POSINT:pid}\])?: (?<qid>\S+): ((to=(?<to>[^,]+)|from=(?<from>[^,]+)|ctladdr=(?<ctladdr>[^,]+)|delay=(?<delay>(\d+\+)?\d+:\d+:\d+)|xdelay=(?<xdelay>\d+:\d+:\d+)|mailer=(?<mailer>[^,]+)|pri=(?<pri>[^,]+)|dsn=(<dsn>[^,]+)|size=(?<size>\d+)|class=(?<class>\d+)|nrcpts=(?<nrcpts>\d+)|msgid=(?<msgid>[^,]+)|proto=(?<proto>[^,]+)|daemon=(?<daemon>[^,]+)|bodytype=(?<bodytype>\S+)|relay=(%{IPORHOST:relay})?( ?\[%{IPORHOST:relayip}\])?( ?$may be forged$?)?|reject=(?<reject>.*)|stat=(?<stat>[^,]+)|ruleset=(?<ruleset>[^,]+)|arg1=(?<arg1>[^,]+))(, |$))*",
"%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} (?<program>(dovecot))(?:\[%{POSINT:pid}\])?: imap-login: Login: user=<%{DATA:user}>, method=%{DATA:method}, rip=%{IPORHOST:rip}, lip=%{IPORHOST:lip}, mpid=%{INT:mpid}(, TLS)?, session=<%{DATA:session}>",
"%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} (?<program>(dovecot))(?:\[%{POSINT:pid}\])?: imap$%{DATA:user}$: (?<status>Disconnected: Logged out|Disconnected for inactivity) in=%{INT:in} out=%{INT:out}",
"%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} (?<program>(opendkim))(?:\[%{POSINT:pid}\])?: (?<qid>\S+): (?<milterMsg>.*)",
"%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} (?<program>(milter-greylist))(?:\[%{POSINT:pid}\])?: (?<qid>\S+): (?<milterMsg>.*)",
"%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} (?<program>(MailScanner))(?:\[%{POSINT:pid}\])?: (?<mailScannerMsg>.*)"
]
}
}
output {
stdout { debug => true debug_format => "json"}
elasticsearch { host => "127.0.0.1" }
}

Along with sendmail message parsing, I added matches for a few dovecot imap server, opendkim milter, and greylist milter messages. Note that I set type => "mail" for the input and the filter sections; as a result, ElasticSearch has the type "mail" set on the data received from this input and filter. Also, Logstash sets the index name to "logstash-YYYY.MM.DD" (where YYYY is four-digit year, MM is month, and DD is day of month) for ElasticSearch -- this can be useful to know when it comes time to query and visualize the data.

With this configuration, I've been able to parse my mail logs using:

java -jar logstash-1.1.11-flatjar.jar agent -f logstash-maillog-elasticsearch.conf < maillog.0

(Note that the ElasticSearch server was running in the background, and receiving requests from Logstash at 127.0.0.1:9200)

After this, ElasticSearch was full of tagged data. Now, I'd like to see what I have in there. I tried the HTTP access via Logstash's port 9292 but was a little underwhelmed at the spartan interface.

I installed Kibana using the simple instructions and started it up. With my browser pointed at its TCP port 5601, I adjusted its time selector at the top left of the page and had immediate access to all the data.

Now I can click down into interesting stuff. Importantly, it is fast! It looks like I may need to tweak the regexes in my Logstash filters, but now I can quickly research any issues and spot trends that bear investigation.

A concern I have is the security of these tools. There is no authentication or authorization for access to the number of TCP ports opened by each of these pieces. I'm not sure if there is a way to secure these tools, or if they need to be run in an isolated environment. So far, I'm isolating them in a private VM.

Security by Labels vs. Content

2013-04-24T13:47:00.004-07:00

Generally, authorization security (determining whether a subject has access to data) is based on labels. For example, file pathnames determine what directory a file resides under, and accordingly, what discretionary access controls are assigned to the file. Firewalls determine what packets are authorized based on IP addresses and port numbers from packet headers. Document management systems often require users to apply tags to newly-scanned documents so the documents can be protected and routed appropriately.

These labels we assign to data (filenames, port numbers, tags, etc.) need to be representative of the information contents. We often depend on users to use appropriate and correct labels so we can implement hard and fast controls on the data.

Unfortunately, labels are often indeterminate or not representative of the content. For example, an HTTPS stream to a site like GotoMyPC that actually is providing remote access to a PC screen results in complete access to any data and applications on that PC, but the contents of that HTTPS stream can't be controlled short of blocking all access to the GotoMyPC web site.

Content-aware data loss prevention systems use a variety of approaches to authorize data (in use, at rest, or in motion) based on the actual content of the data. For those who understand and accept its approach, it enables deeper understanding of information and also enables more intelligent authorization decisions. DLP also provides a backstop when other access controls fail, such as when users forget to correctly tag a document.

Perfect Security?

2012-04-18T07:18:00.003-07:00

Many years ago, I was privileged to hear Marcus Ranum speak at a conference for our regional NSFNet member network. At the time, I was of the mindset that it was possible to have perfect security for the computer systems and networks I managed, and I was not willing to compromise security for any purpose.

For example, when my employer at the time wanted to build a way to accept credit cards via the web, I proposed an isolated database server behind multiple firewalls -- mind you, this was long before PCI-DSS! Instead of taking the perfect solution, they probably just accepted credit card numbers via email...

Anyway, I understood Marcus to say that business needs had priority, and in particular, sometimes the business (and its software and systems) has to be built in advance of the security. This did not mean that we needed to ignore or discard security, but to be cognizant of the business needs -- if there's no business, there's no need for security.

So, we need to manage risks and prepare to respond to problems rather than wait to enable business operations until known risks are eliminated.

Verizon Data Breach Report 2012

2012-03-23T07:28:00.002-07:00

The Verizon Data Breach Report 2012 (pdf) has been released. The information security industry owes Verizon gratitude for the amount of data Verizon has been able to assemble and analyze, and for making the results publicly available.

Unsurprisingly, the total number of records breached in 2011 was quite large. The majority of the breaches were motivated by "hacktivism" rather than illicit financial gains, but Verizon points out that serious criminals are still actively stealing data.

Regardless of the motivations by attackers, 2011 was a terrible year for the number of breaches and the amount of data lost.

RSA Conference 2012 Post-mortem

2012-03-14T10:32:00.000-07:00

This year, my schedule at the RSA Conference 2012 was much different than previous conferences. As a speaker, I spent quite a bit of time preparing and rehearsing my presentation, as well as talking with other presenters. Of course, audiences get a lot out of the presentations and meeting the presenters afterwards, but it's a step up to be able to meet and talk with presenters informally about the industry, security issues and solutions for customers, and the direction of technologies.

Looking back at the past year and the significant number of huge data loss events, I thought I saw that people were looking to step up their game against breaches. I liked what I heard from industry industry leaders - concepts with the potential to improve data security: 1) better communication and interaction between software development and operations, such as Josh Corman and Gene Kim's Rugged DevOps talk, 2) improving security functionality for cloud - Chris Hoff and Rich Mogul's Grilling Cloudicorns talk, and 3) improving mobile device security.

I'm looking forward to digging into these ideas further in the coming year.

RSA Conference 2012 - Data Breaches and Web Servers: The Giant Sucking Sound

2012-02-02T09:43:00.000-08:00

I'm scheduled to present "Data Breaches and Web Servers: The Giant Sucking Sound" at RSA Conference 2012 - session DAS-204 on Wednesday, February 29.
From the abstract:

An analysis of recent data breach events shows a large number of events occur via web servers. Barracuda, Epsilon, Citigroup, eHarmony, Sony and the State of Texas are just a few of the names in the news as a result of web data exposures. Web servers in the cloud only complicate the situation. This presentation will examine technologies and practices you can apply to help keep your name off this list.

Since I submitted the abstract several months ago, there have been several additional major breaches of web servers including Stratfor, Zappos and Care2, so the giant sucking continues.

Hope to meet you at the RSA Conference!

Guy

Cloud Security - Hardware Support for Isolation?

2012-01-10T08:18:00.000-08:00

Recently, Joanna Rutkowska wrote that IaaS cloud services are insecure without hardware support for separation between tenants, citing Bruce Schneier's recent article on cloud insecurity.

However, successful IaaS cloud services like Amazon Web Services are enabled by commodity hardware (using the Intel x86 instruction set), free operating systems (Linux), and the ubiquitous TCP/IP and SSL protocols. Today's IaaS providers already have massive investments in commodity hardware, and hardware support for tenant isolation would seem to be a ways off, both in hardware development and adoption by providers.

I am more concerned about SaaS services and isolation of clients. How I can be sure my Office365, SalesForce, or other SaaS service can successfully and permanently isolate and protect my company's data? I don't see how hardware support for isolation can be extended into the realm of SaaS services unless it is in terms of per-customer encryption of data.

Your thoughts?

Web Server Security Checklist

2011-12-21T11:32:00.000-08:00

Here is a quick checklist of items I have found to be important in securing and monitoring the security of outward-facing web servers.

Architecture

Typical components surrounding a web server include an external firewall to protect the web server itself from attacks and an internal firewall to protect the internal corporate systems in case the web server is breached.

Hardening

Operating systems and web server software packages often come with additional components that may not be necessary. Rather than leaving unused but potentially vulnerable software available on a web server, it is wise to disable and/or remove any unused software.

Ensure directories and files have appropriate access permissions - does the web server process really need read access to the entire filesystem?

Remove default system accounts. Ensure accounts with access to the server have appropriate passwords.

If you have a host-based firewall on the web server, limit access to administrative functions (SSH or remote terminal services). Limit outbound network connections from the server to only necessary sites and/or protocols.

Patching & Updating

It is amazing how many web servers I have found that are running operating systems, web server software, or web applications that are long outdated and likely to have substantial vulnerabilities. It's important to stay abreast of known vulnerabilities and vendor patches, and have a working plan to evaluate, apply, and test patches for all the software on the web server as well as the other servers and network devices associated with the web server.

I subscribe to the SANS @Risk Consensus Security Alert mail list to stay informed of vulnerabilities and patches in major operating systems and applications.

Web application firewall

Even the best-run and maintained systems can have latent vulnerabilities hiding in the software and/or configuration. Web application firewalls can help protect against attacks such as SQL injection which are otherwise all too commonly successful.

I have made use of the mod_security Apache plug-in module and rules to protect web servers. Many commercial web application firewall devices are available, and even cloud-based web application firewall services are available.

Penetration testing

It's not a bad idea to have a third-party check your web site using penetration testing techniques to check for potential network, operating system, web server, and web application vulnerabilities and mis-configurations.

I have not used it, but I understand the BackTrack bootable Linux CD provides a nice collection of tools to perform penetration testing. Otherwise, there seems to be quite a few consultants willing to perform penetration testing.

Log monitoring

Of course, a busy web site can generate a large amount of log data every day. Tools like awstats can be useful to build an understanding of typical usage loads, top pages, and user demographics.

I have also found looking at failed requests (4xx responses) to be interesting because one can see what approaches attackers are using against web sites, and can help make sure that defenses are working properly.

Auditing

If a system seems to be running OK, why bother looking for trouble?

What if you have outdated administrative accounts, some of which probably have poor passwords?
What if a piece of software was installed at some point that unexpected opened access permissions in the filesystem?
What if, during a hasty period of diagnosing and resolving a significant issue, permissions were changed in the filesystem or in the web server configuration and never were restored?
Or, what if an attacker has gained access to the server and is siphoning data into a hidden directory for later download?

Make time to audit your web server regularly and look for unexpected changes in files, permissions, or access, check logs, and verify installed software and patches.

Data loss monitoring and prevention

Data loss monitoring and prevention systems should have a place in high-stakes web services. These systems can monitor the type and quantity of data that is coming out of a web server or the database, and raise alerts or block results that violate rules. These systems can be put in place either in front of the web server or the database server to monitor requests and responses.

Data Loss Prevention: Technology or Strategy?

2011-11-14T05:55:00.001-08:00

As often happens in the computer industry, nomenclature is unwieldy and flexible as technologists, sales & marketing, and the rest of the world clash.

My case in point is the phrase "data loss prevention" or DLP. In other articles, I have talked about DLP as a technology -- in that it is used to analyze the content of a document or message, determine whether the content references a concept confidential or protected in nature, and uses rules or reporting to handle the content. As the concept of DLP was developed in the last decade, the industry struggled to find an appropriate phrase that defined it: phrases including content monitoring & filtering, content analysis, deep packet inspection, and others were used, but the industry and analysts settled on data loss prevention.

Many companies are marketing "data loss prevention" in relation to their technologies, but not in the context of analysis of document content. Instead, their approaches include building a wall around all corporate data (such as on a mobile device, or in a cloud-based document-sharing service), or providing some regular expression matching for message content. This is well and good, but I would suggest these technologies fall under the larger strategy of information protection rather than being specifically about "data loss prevention".

This goes to the heart of the matter: when we build true data loss prevention systems, the intent is to protect confidential information rather than just bits and bytes of raw data. Under the fundamentals of information theory, data is just bits and bytes, but information is found where there is entropy, or value, in the data. This is what distinguishes data loss prevention technology from other data protection technologies, and perhaps the better phrase for the technology would be "information loss protection."

Practically, though, we are probably stuck with the labels that have been adopted. So, I suppose we can accept a variety of technologies under the strategy of data loss prevention, including the technology of data loss prevention itself. Unfortunately, this will continue to be confusing to those inside and outside of the industry and troublesome for sales and marketing.

Five Stages of Cloud Acceptance

2011-08-12T09:25:00.000-07:00

Denial: We'll never put anything in the cloud because of security/reliability/performance/etc.

Anger: You already put WHAT in the cloud? How are we going to do backups/switch providers/manage identity/etc.???

Bargaining: OK, we'll move X into the cloud if/when the cloud becomes secure/reliable/etc.

Depression: The CFO/CEO/etc. wants us to start using cloud to save money/reduce costs/expand functionality. We can't use the cloud. What about my job running the data center? What about our bandwidth? What about PCI DSS/HIPAA/GLBA?

Acceptance: It works, I can do more to enable my company's business, and reduce capital expenditures. Let's put everything into the cloud!

Seriously, I have had some of these reactions myself. I hear some of these reactions from people when we talk about using cloud services and realize there truly is a road to acceptance for many people.

Changing Face of "Spam" Email

2011-08-12T07:55:00.000-07:00

As a network engineer involved in bringing up some of the first Internet connections in the upper midwest in the late 1980s and early 1990s, I also managed email systems in the 1990s as spam email started becoming a nuisance. In the past decade, spam has been more than a nuisance - email systems must have effective spam filters to keep email usable for end users.

There is an interesting trend I see now - I am getting a fair bit of relevant business-related marketing email in my inbox. The amount of "online pharmacy" spam is way down, but I still get a fair amount of complete junk, including a lot of Cyrillic and Mandarin spam that is completely unintelligible to me. Fortunately, my company's spam filter, including up-to-date SpamAssassin rule lists and a good blacklist, are doing a good job discarding and classifying the useless spam, while allowing through the reasonable marketing queries (I think).

A few years back, the sales team at my employer emailed potential customers asking if they could setup meetings to introduce the company's software - not an unusual email message, especially nowadays. One particular recipient hit the roof and replied with a rant worthy of a response to the first massive Usenet spam from the green card lawyers back in the day.

Are people's attitudes changing about spam? Is there an increasing acceptance of reasonable marketing-type contact via email?

Security Technology Musings

2011-08-11T12:32:00.000-07:00

Each security technology that comes along has its set of "use cases" -- that is, it improves confidentiality, integrity, or availability for certain uses. Trying to apply that security technology outside of its useful situations results in either a false sense of security or complete failure.

For example, full disk encryption is a useful security technology intended to keep the entire contents of a disk drive relatively safe from an attacker who might steal the physical disk drive (or the system in which it is installed, such as a laptop). However, when the computer is in operation, full disk encryption has nothing to do with whether files can be accessed -- that is the function of the access control technology built into the operating system.

When we began building Data Loss Prevention (DLP) some years ago, my idea was that content analysis (looking at the textual content of a document) was a powerful way to determine whether a document should be shared outside of an organization. However, the documents that would be visible to the DLP system for analysis would depend on a number of factors: logical placement of the DLP functionality in an organization's computing system, whether the DLP system would be able to see documents as plaintext, and how an adversary might try to circumvent the system.

As we have further developed DLP technology and the industry has settled on standard implementations (data-in-motion, data-at-rest, data-at-use), customers have become comfortable with the functionality and capability of DLP systems. We're finding that DLP is a very useful tool -- helping significantly reduce exposure of confidential information, and improving standing in risk & compliance audits -- for our customers. It's become one part of the security management arsenal.

Are Anti-Virus and a Firewall Enough?

2011-08-05T10:58:00.000-07:00

I thought after all the commotion from the many significant data breaches of the past several months that data security would be top-of-mind at nearly every company. Perhaps people outside the information security industry have become tired of the breach news, or perhaps the lesson didn't sink in. Maybe more likely is the idea that "we haven't been hit yet, so we don't need more security yet."

Computer viruses were such a big problem in the late 80's and 90's (and still today) that companies became accustomed to buying anti-virus software.

The Internet was such a wild and wooly place that companies didn't dare connect their LANs to the 'net without a firewall of some sort to keep the outside world from instantly pwning everything.

People in the information security industry know these two main tools, anti-virus and firewalls, have significant limitations. Anti-virus tools have limited effectiveness in the era of morphing malware. Firewalls often are configured to allow HTTP/HTTPS (web traffic) and SMTP (email traffic) without any limits, and everyone always has browsers and email clients running. The result is that attackers have a fairly easy time exploiting problems with browsers, email programs, and the users themselves.

Today, organizations need deeper defenses to handle the problems. Intrusion Detection Systems (IDS/IPS), Data Loss Prevention (DLP), patch management, web filter, and Security Information & Event Management (SIEM) are the important systems to have in place in addition to firewalls and anti-virus.

Web servers need to have a Web Application Firewall (WAF) in front of them to protect against attacks on the applications running on the web servers. If you have a good hosting provider for your web server, you may already have a WAF protecting your web server.

If you don't have these systems in place, you can prioritize based on an analysis of your organization's risks.

Web Servers as an Attack Vector

2011-07-21T14:02:00.000-07:00

For a long time in computer security, we have been focused on protecting workstations, and rightly so. Viruses, worms, remote access Trojans, and other malware has targeted the end-user workstation, and unfortunately, the attacks continue to be quite successful. A number of recent high-profile data leaks have occurred using workstations as the initial point of attack.

However, a point of attack in several other high-profile data leaks have involved attacks on web servers. Citigroup, Barracuda, and now Pacific Northwest National Laboratory (PNNL) were attacked through web servers. This makes me a bit nervous -- I do like to make sure a public-facing web server is hardened and running software that is fully-patched, but there are several techniques attackers can use to find and take advantage of any holes in the server.

One of the problems that I saw disclosed today, CVE 2011-2688, involves a SQL injection attack against the mod_authnz_external module, an Apache authentication module. It is worrisome that a well-known attack is successful on this security-critical component that may be in use on many web servers. Many other attacks, including parameter tampering,

Web servers and the web applications running under them are proving to be all too vulnerable. With high-value data accessible in a web server, such as customer accounts at an online banking website, any exploitable vulnerability in the web server or web application can result in significant loss. As the events at PNNL illustrated, even a web server that may not be high-value can still be an entry point for an attacker into more valuable networks and systems.

It seems that web servers need backstops. We need to be able to filter and/or monitor requests coming into a web server, and to filter and/or monitor data returned by a web server. And, we need to be able to do this in the cloud with web servers that automatically scale. Something to think about.

Cloud Computing and the Insider Threat

2011-07-06T13:06:00.000-07:00

Something that hasn't been top-of-mind for me, but remains a threat nonetheless, is that the scope of the "insider threat" changes when the cloud is used for computing and storage.

One of the significant data loss vectors is the "insider threat" where a trusted insider -- either unintentionally or maliciously -- leaks protected information in violation of policy or regulations. In traditional datacenters, the trusted insiders are usually the organization's employees and contractors -- the organization should be able to physically and logically account for every individual that has access to the organization's computers and data. The insider threat is one vector that data loss prevention (DLP) is often deployed to help mitigate.

The situation changes in cloud computing, though. An organization that makes use of cloud computing services, whether SaaS, PaaS, or IaaS, is now using computers and storage that can be accessed by more individuals than just the organization's employees and contractors -- the cloud provider actually owns the servers, networks, and storage and employs personnel and contractors that have administrative access to those components. Now the "insider threat" has suddenly expanded to include a whole new group of people beyond just the original organization's employees.

One mitigation technique used to protect data stored in the cloud from any insider is to encrypt the data. Depending on the operating system used, it may be possible to setup volume encryption or folder encryption on which sensitive data can be securely stored. Unfortunately, encryption key management is not easy -- it seems the best (or only) solution to this problem in the cloud is using a key management server to authenticate and authorize encryption keys, and then configure and monitor the key management server carefully.

Another problem with insiders in the cloud is watching for confidential data in motion. DLP would be a solution to this problem in an organization's datacenter, but the situation is more complex in a cloud environment because of a lack of availability of DLP systems in cloud provider networks and the difficulty of separating individual cloud customer's traffic for DLP analysis. This is a problem we're looking into at Palisade Systems.

Fully-Functional Data Loss Prevention

2011-06-27T10:30:00.000-07:00

Since Data Loss Prevention (DLP) became a known technology in the computer security arena a few years ago, a number of vendors of existing non-DLP security products added basic DLP-like features to enable detection of some common private or confidential information. However, a complete DLP implementation involves more than just regular expressions to match patterns in text in, say, email messages.

Certainly, email is a significant vector by which data loss occurs. More generally, the DLP industry terms data traversing the network as Data in Motion. However, there are many more protocols than just email, not the least of which include web-based email services, such as Google Mail, and social media services, such as Facebook, that could also be data loss vectors. A complete DLP implementation will likely be able to work with a number of common network protocols to manage Data in Motion.

DLP also manages data in two other important situations, Data in Use and Data at Rest. Data in Use DLP can manage data used on a workstation, such as monitoring data being copied to a USB flash drive. Data at Rest DLP can inventory and manage the private and confidential data stored on workstation and server's hard drives.

The ways in which most DLP systems are able to discover protected information extend far beyond basic regular expressions. Common approaches include pre-packaged sets of terms, database fingerprints, file fingerprints, special code to match data like credit card numbers, and more. I previously wrote an article on Classes of Protected Information and DLP that goes into much more detail on this topic.

In addition to managing protected data in the scenarios of Data in Motion, Use, and Rest, and using multiple approaches to finding protected data, DLP systems also offer sophisticated configuration, reporting, alerting, and case management services. There may be situations where certain groups of users are allowed to work with certain kinds of confidential information while others are not -- a DLP system might be configured to monitor such information use for the privileged users and block use by other users. The depth of reporting and alerting capabilities offered by a DLP system can make a DLP installation more useful by providing information ranging from summaries to detailed violation information as needed for management and compliance reports. Finally, DLP case management tools can enable rolling up multiple incidents into a consolidated case that can be managed as necessary to resolution.

In summary, a DLP system is a significant addition to an organization's data security arsenal.

It's 10:00pm - Do You Know Where Your Data Is?

2011-06-07T08:48:00.000-07:00

Data can be stored in so many places and be so vulnerable to loss or exposure. The obvious risk and probability of loss for protected data stored on devices like laptops often motivates security staff to make improvements in this area. Many people have an "a-ha moment" when they see how Data Loss Prevention (DLP) discovery agents can find and report confidential or protected data stored in unexpected places.

It's good practice to inventory where and how confidential / protected data is stored, create policy that defines where and how such data should be stored, then move towards the goal defined by the policy and monitor progress. (Helpful side benefits of this process include improving your backup and archive coverage of protected data, reducing duplication of data, and assisting your business continuity planning.)

The initial inventory of protected data can be overwhelming -- data can be dispersed over all the personal workstations and laptops in the entire company and in the oddest nooks and crannies of servers. But it's good to know where your organization stands with regard to protected data, and what your biggest points of risk might be. If you found confidential financial data being stored on laptops that don't have disk encryption, maybe that's your prime starting point. If you found multiple copies of confidential data stored on a server, maybe it's just a matter of consolidating the data and keeping employees better informed about what location to use on the server for that data.

When it comes to writing your protected data storage policies, keep flexibility in mind. Mobility is a big factor in employee computing use cases today, so if important data on laptops is common, then maybe a disk encryption solutions for laptops is needed rather than disrupting employees' work by requiring them not to keep data on laptops.

When your protected data storage policy is defined, then it's time to move toward it. Education will be important so employees understand why and how this process is happening. Some time & effort will be required to implement the changes, and perhaps some new software will be required for encryption.

As progress is made, DLP discovery software can be used to measure and monitor the progress, and watch for significant deviations from the policy that need to be addressed.

Cloud Computing and Protecting Confidential Information

2011-06-03T07:34:00.000-07:00

A couple of months ago, I talked about the implementation of DLP in cloud computing environments. Since then, I have seen a few examples of how security-oriented firms are working with cloud computing vendors, such as Tripwire, enStratus, and others working with cloud vendors to provide internal compliance and validation.

Meanwhile, we have seen several large-scale data breaches, including numerous attacks on Sony, that involve attacks through web servers.

A significant use case for cloud computing is to provide scalable web services, so we have an interesting and significant security intersection between deployments of web servers (often with vulnerabilities) in the cloud, and the need for web application firewall (WAF), data loss prevention (DLP), and intrusion detection/prevention (IDS/IPS) to protect the web servers and the information to which they provide access.

There are some difficult problems with protecting outward-facing cloud-based web servers, though. It might not be feasible to scale WAF, DLP, and IDS/IPS systems alongside the web servers. It may be challenging to be able to monitor and/or intercept web traffic -- especially SSL web traffic -- to protect against attacks and data loss.

A solution to this problem might be to incorporate WAF, DLP, and IDS/IPS technology into the web servers themselves, so as the web servers are scaled, the protection automatically scales also.

Insidious Insiders: Bank of America

2011-05-25T09:47:00.000-07:00

When I talk or write about inappropriate confidential information disclosure, I often point out that data loss prevention (DLP) systems most commonly help reduce the everyday mistakes by well-intentioned employees just trying to do their jobs. A DLP system also helps discover a malicious insider gathering or passing confidential information to outsiders. Regardless of intent, a good DLP system can help administrators notice a trend of confidential leaks and help build a case file for action with regard to a problematic insider.

A story I saw today about a problem at Bank of America that has been under investigation for a while where an apparently-malicious employee, who had access to "personally identifiable information such as names, addresses, Social Security numbers, phone numbers, bank account numbers, driver's license numbers, birth dates, e-mail addresses, family names, PINs and account balances," allegedly passed this information to criminals. The estimated resulting direct financial loss is $10 million. Indirect losses, including employee time spent investigating the problem, cost of credit report monitoring for affected customers, revisiting policies and controls, and diminished brand may be significant as well.

A DLP system is one of the best practices that a business can put into place to help track and prevent data breach events. If you have a DLP system in place, make sure it is correctly configured, installed in the correct locations in your network, servers, and clients, and make sure it is monitored. (It is highly likely that Bank of America has a DLP system in place, but I do not have any knowledge in regards to whether information from a DLP system helped with the investigation of this case.)

Other best practices for protection of information include:

Limiting the amount and scope of information available to employees to that necessary to do their jobs. Often, employees are given increasing access to information over their tenure, and it's a good idea to review access to make sure potential for problems is limited.
Logging information access and reviewing the logs for unusual patterns. A Security Event Manager (SEM, also known as SIEM) can help with this by making it possible to centrally manage and review information from servers.
Limit network access for workstations and servers. Servers should generally not be using protocols like Internet Relay Chat or accessing random web sites. A network protocol manager or firewall can be configured to prevent unexpected network use. Unexpected use of web sites or network protocols from servers might be indicative of an intrusion that should be investigated.

With good practices and vigilance, you can reduce the risk posed by malicious intent.

Classes of Protected Information and DLP

2011-05-20T06:28:00.000-07:00

Data Loss Prevention (DLP) systems have to deal with a variety of formats of data and identify protected data in those formats. In general, protected information falls into these formats:

Unstructured text - as found in text documents - including various types of information:

Corporate proprietary information or trade secrets
Personal health records
Personal financial records
Personal identifying information

Structured data - as found in spreadsheets, tables, database output, and CSV files

To deal with these different formats of protected information, a variety of approaches are used in a DLP system.

For corporate proprietary information, document fingerprinting is the predominant approach to identifying parts or complete copies of proprietary documents. This requires the administrator to register proprietary documents with the DLP system, and then the DLP system can match fragments or wholesale copies of the proprietary documents.

Another approach that can be used for proprietary documents is to embed tags in the documents, such as "Company Confidential", and then add a simple rule to the DLP system to watch for that tag. However, this depends on corporate users applying the correct tags to the documents, and is easy for a malicious insider to circumvent, for example, by simply removing the tag before transmitting the document to an unauthorized recipient.

For data like personal health information (PHI) or personal financial information (PFI), several approaches (or a combination of approaches) are typically used. A combination of search terms can be used to determine if data contains information referring to a particular individuals or group of individuals, plus whether the data contains significant information about those individuals. For example, an email message from a bank containing the customer's account number, name, and account balance, it might be considered to be information protected under the Gramm-Leach-Bliley Act (GLBA).

Another approach to PHI and PFI is to use information from a corporate database, such as account numbers and customer names, in the DLP system to search for matches. If an account number and associated customer name turns up in an email message, the message might be considered to contain information protected under GLBA.

A third approach, specific to personal financial information, is to look for credit card information. Credit card numbers use a standard format and are assigned in specific ways, so it is possible to look at a sixteen-digit number and determine with a high degree of accuracy whether that number is probably a VISA or MasterCard credit card number.

For personal identifying information, an approach is to look for national identification numbers, state driver's license numbers, or account numbers. In the United States, the Social Security Number (SSN) is often used (and abused) for purposes of identification and authentication for financial and health purposes, and as such has gained status as a protected piece of information. Unfortunately, the format of the SSN was developed without the concept of check digits or embedded validators, so it is easy for a DLP system to mistake a number in the form 123-45-6789 as an SSN.

As for structured data, DLP systems can identify protected contents in a couple of ways. One is to write rules for the DLP system that match the format of data typically used in a company, such as forms that are often used for things like customer orders. Another approach is to use information from a corporate database, such as account numbers and customer names, in the DLP system to search for matches.

These formats cover the majority of ways I have seen protected information stored and transmitted in ways that DLP systems can help identify and protect the data.

Bouncing Through the Cloud

2011-05-17T07:51:00.000-07:00

A Bloomberg report over the weekend referenced an unnamed source as saying that Amazon cloud resources were used in the breach of the Sony Playstation Network. Specifically, Amazon's cloud infrastructure was not compromised, but instead used as a "relay" for the attacker to hide his/her origin.

An article on Reuters makes an (IMO) unsubstantiated claim that the attack on Sony spells doom for cloud computing. My response is that, whether or not cloud computing had anything to do with this, Sony simply had vulnerable software and apparently had insufficient controls and management in place to detect and respond to security issues. Poor security and controls are mostly unrelated to cloud technologies -- yes, there is a possibility of attacks on the hypervisor in shared infrastructure, among other things -- but none of the recent significant breaches has involved vulnerabilities in cloud computing.

What I see as a more significant exposure in cloud computing is the extent to which confidential data is being stored in the public or hybrid cloud and being provided via cloud-based servers to end users over the Internet without sufficient monitoring and controls in place. The glaring security deficiencies in cloud computing right now are the lack of visibility and the lack of security functionality that we have in private data centers, including network traffic analysis, intrusion detection systems (IDS), data loss prevention (DLP) systems, and audit and logging systems.

We're working at Palisade Systems to improve the security controls available in cloud computing. Palisade has virtual DLP appliances available for VMware cloud environments, and will have more good cloud security products coming up.

Virtualization and Data Loss

2011-05-11T06:55:00.000-07:00

Well, it had to happen to me eventually. A physical server running VMware ESXi crashed and I lost a set of virtual servers that I had moved to it.

It seemed to result from a power hiccup. Nearly everything important in the server room is on a UPS, except for this system.

This failure mode was new to me: VMware ESXi would not finish its boot, but complained about an invalid file (sorry, exact filename escapes me) and stopped. (It looked an awful lot like a Windows boot failure I've seen in the past where a corrupted registry hive file prevented Windows from booting!) I had to perform a VMware ESXi recovery installation, and that resulted in the ominous warning that one of my filesystems had an invalid partition table.

This particular VMware server has two VMFS filesystems on it (two separate hard drives to improve I/O performance for the VMs), and the second of the two filesystems was toast.

I hadn't considered the virtual machines on this VMware server to be irreplaceable, but they were valuable. It took a couple of days of work to rebuild one of the lost VMs. Another of the lost VMs caused a troublesome cascaded failure: it provided an infrequently-used web proxy whose loss caused unexpected software update failures elsewhere, and that took some time to diagnose as well.

In summary: I wish I had enough disk space everywhere to have backups of all the virtual machines, and I wish I had a good way to use apcupsd (or equivalent) to shutdown ESXi servers nicely on power failures.

Data Loss Prevention and Mobility

2011-05-04T09:21:00.000-07:00

At Palisade we are often asked how to protect data from loss when your employees and/or partners all have access to your corporate private/privileged data through handy little gadgets like iPhones.

The problem we are finding is that gadget vendors have not provided hooks into the devices so we can do DLP on the gadgets directly. In fact, software on iOS devices is intended to be quite isolated to prevent any application accessing information that belongs to another application, such as email messages or stored PDFs.

Enter some pretty cool software from Whisper Systems for Android systems. WhisperCore looks very intriguing:

WhisperCore integrates with the underlying Android OS to protect everything you keep on your phone. This initial beta features full disk encryption and basic platform management tools for Nexus S phones. WhisperCore presents a simple and unobstrusive interface to users, while providing powerful security and management APIs for developers.

Will be looking into this more deeply :-) Maybe this would encourage Apple to provide hooks for similar software into iOS.

The Bigger They Are...

2011-04-29T13:21:00.000-07:00

Rumblings started a week ago as the Sony Playstation Network went offline, and stayed offline. I wasn't initially very concerned about this, but have been encouraged to look into it now that more information is available, and I have become much more concerned.

From the ominous note at http://us.playstation.com/news/consumeralerts/#us:

Although we are still investigating the details of this incident, we believe that an unauthorized person has obtained the following information that you provided: name, address (city, state, zip), country, email address, birthdate, PlayStation Network/Qriocity password and login, and handle/PSN online ID. It is also possible that your profile data, including purchase history and billing address (city, state, zip), and your PlayStation Network/Qriocity password security answers may have been obtained. If you have authorized a sub-account for your dependent, the same data with respect to your dependent may have been obtained. While there is no evidence at this time that credit card data was taken, we cannot rule out the possibility. [emphasis supplied]

If you will recall, I was concerned about the identity theft / social engineering dangers from the Epsilon data breach. This breach is much more serious because of the scope of information lost: everything necessary for successful identity theft, plus the potential for online identity takeover and even the possibility of credit card disclosure. Reports have placed the record count at between 70 and 80 million!

The quantity of the confidential information involved here is stunning, and for an attacker to be able to obtain this volume of information in the matter of a couple of days seems extreme. It would seem prudent for a company with this size and scope of a database to be using database access monitoring and data loss prevention systems. It will be interesting to find out whether they actually did have essential business intelligence, monitoring, and policy enforcement systems in place.