Individuals frequently reveal personal information in the course of doing business in order to gain benefits such as home delivery of products, customized services, and the ability to buy things on credit. In so doing, they may also become vulnerable to other uses of their personal information that they find undesirable. The Internet and computerized databases make automated collection and processing of information particularly easy and convenient. As a result, individuals may take advantage of new services, such as personalized electronic newspapers and shopping from home, but they may also become more vulnerable to misuses of personal information.
Just as technology can be used to automate data collection and processing, it can also be used to automate individual control over personal information. In particular, technology can:
Technologies to support these applications are in varying stages of development, deployment, and adoption. This paper presents an overview of these technologies in order to inform discussion about which tools and techniques are most worth pursuing.
Notice and choice are among the most important principles of fair information practice. Responsible data collectors provide individuals with clear advance notice about the types of data they collect and how that data will be treated. They also provide individuals with the means to choose what data they provide for specific purposes. (Of course, individuals who choose not to provide essential data in some situations might be denied services as a consequence.) Traditional means of providing notice and choice generally require individuals to divert their attention away from the task at hand in order to read or listen to lengthy explanations and answer questions. When such disruptions occur frequently, individuals are unlikely to pay close attention to them. On the Internet, individuals typically wander from site to site without such interruptions. However, if most Internet content and service providers provided notice and choice through traditional means, interruptions would be a common occurrence. Fortunately, a number of alternative mechanisms may facilitate the provision of notice and choice over telecommunications networks while preserving the seamless browsing experience.
One way to simplify notice and choice is to provide standard notices with consistent choice options. Currently, some organizations are experimenting with privacy rating structures that classify each Web site into one of several categories based on the site's information practices. For example, one category might be used for sites that do not reveal information collected from visitors, while another category might be used for sites that may trade or sell information they collect from visitors. Sites rated under such systems display icons on their pages that notify individuals of their information practices.
This solution provides individuals with a means of quickly determining a site's information practices. However, the number of information practice categories must remain small if the category icons are to remain easily distinguishable. But with only a limited number of categories, it may not be possible to encode all details about information practices that individuals might find important. For example, individuals might want to visit sites that may reveal personal information to third parties only if that information is limited to names and contact information and does not include transactional data. In addition, because these systems rely on visual icons, individuals must consciously remember to look for these icons at every site they visit and take additional actions to confirm that the icon has not been forged.
Some of the problems inherent in icon-based systems can be overcome by a machine-readable label system. The Platform for Internet Content Selection (PICS), developed by the World Wide Web Consortium (W3C), is one such system [8]. PICS was originally developed as a user-empowerment approach to protecting children from Internet content that their parents consider objectionable. It is an infrastructure for associating descriptions, called labels, with documents and Web sites on the Internet. PICS can accommodate any labeling vocabulary: currently several vocabularies are in use that indicate either age-appropriateness or the presence of potentially objectionable content such as offensive language or nudity. A label is not normally visible when a document is displayed to a user; instead, when a PICS-compliant browser is used, the browser reads the PICS label and determines whether the associated document meets the user's criteria for display. If a document fails to meet the user's criteria, it is blocked, unless the user chooses to override the block. As of December 1996, Microsoft® Internet Explorer 3.0 is PICS compliant, as are a number of stand-alone filtering products. This user-empowerment approach has played an important role in public discussion, both in the U.S. and around the world, of how best to protect children from objectionable content without introducing government censorship.
The PICS technology also offers promise in the privacy realm for user empowerment through automated notice and choice [9]. Labeling vocabularies might be developed to describe the information practices of organizations that collect data over the Internet. For example, a vocabulary might encode the categories used in existing icon-based systems. Other vocabularies might also employ multiple dimensions, for example, one dimension for practices pertaining to each type of information a site collects (demographic information, contact information, transactional data, etc.). Individuals might choose to have their browsers automatically block sites that do not have information practices consistent with their personal privacy preferences.
The PICS infrastructure allows sites to describe their own information practices or for independent monitoring organizations to compose and distribute labels describing a site's practices. Unlike objectionable content, however, a site's information practices are not immediately visible to a casual observer. Thus, the most effective notice about information practices is likely to come from the Web sites themselves.
In order to provide the most flexibility for both individuals and Internet content providers, it would be useful if browsers could negotiate information practices with content providers automatically, rather than just blocking access to those sites with undesirable practices. For example, upon discovering that a Web site does not have practices consistent with an individual's preferences, the browser might contact the site and ask how the individual might be accommodated. The server could respond by agreeing to honor the individual's preferences, by offering a restricted portion of the site in which the individual's preferences will be honored, or by providing an explanation as to why the individual's preferences cannot be honored or an incentive for the individual to access the site even though it does not honor the stated preferences. The PICS infrastructure cannot currently support such a negotiation; however, it could be expanded to include a negotiation protocol. Web negotiation protocols are currently under development by W3C and other organizations. Once a negotiation protocol is developed, it will take some time to incorporate it into Web browsers and servers.
Another possible extension of the PICS infrastructure might be used to specify the conditions under which an individual would allow the automatic transfer of certain types of information. Such information might include contact information needed for business transactions, or demographic and personal preference information used by Web sites to customize the services they provide. Automated transfer of this information would be more convenient for users than typing the information each time they visit a site, and users could set up their browsers to ensure transfers only to Web sites that have certain information practices.
The user empowerment tools described above depend on cooperation between individuals and information gathering organizations. When there are mutually acceptable terms for transfer of individual information and conditions on its use, these tools allow the negotiation and information transfer to happen in the background, without consuming the individual's valuable time and attention. The opportunity to automate the notice and choice process is a major advantage of the Internet over other media for commercial interaction. As in the physical world, however, these tools do not guarantee that mutually acceptable terms will always be found: depending on market conditions, individuals may or may not find privacy-friendly choices available.
While the approaches outlined here facilitate the seamless exchange of information about data collectors' information practices and individuals' privacy preferences, they do not ensure that data collectors will report their information practices accurately. Independent labeling services can label bad actors once they have been identified, but it may be difficult to detect sites that violate their reported practices. An audit may help a site to convince consumers of its good information practices and to distinguish it from other sites that may dishonestly report their practices. However, traditional audits are likely to be prohibitively expensive for most Web site operators. It may be possible to use technology to automate the information practice audit process to some extent. For example, systems might be developed to systematically reveal decoy data to Web sites and monitor the propagation of that data. Further work is needed to develop techniques for automating the information practice auditing process.
Another approach to safeguarding personal information is to minimize the need for collecting such information or minimize the number of times the information must be accessed. This can be done through the use of trusted intermediaries or technologies designed for this purpose.
Several trusted intermediary systems currently in use on the Internet are designed to prevent the release of personal information. These anonymizing systems generally remove all personally-identifiable information (such as name and email address) from communications before forwarding them to the intended recipients. For example, anonymizing proxy servers allow individuals to surf the Web without revealing their network location [3], and anonymous remailers allow individuals to send electronic mail without revealing their email addresses to their correspondents [1].
One step removed from anonymous interactions are interactions under a pseudonym. In such interactions individuals do not reveal their true identity, but reveal pseudonyms instead. Each individual may reveal the same pseudonym each time he or she visits a particular Web site, but may reveal different pseudonyms to other sites. This allows a site to accumulate a profile of each individual's preferences over time so that it may tailor content and advertisements to that individual's interests, while preventing information revealed to different sites from being combined into a comprehensive profile.
Pseudonyms also allow a site to maintain information about the state of an individual's interactions with that site, such as the contents of an individual's virtual shopping basket. Many Web sites currently use an alternative mechanism called "cookies" to maintain such information [6,7]. Cookies are pieces of information stored on a user's computer at the request of a particular Web site. The next time the user visits that site, the site can retrieve any cookies that it previously stored. In practice, however, multiple Web sites sometimes share access to cookies. A user who reveals personal information to one Web site may unwittingly reveal that information to other sites. By contrast, pseudonyms allow users to decide when to allow their information to be shared among Web sites, preventing unwanted information leakage. From a privacy perspective, interaction under a pseudonym offers users more control over the release of information than cookies do, but retains the benefits that come from allowing sites to maintain information about an individual's interaction with them.
Anonymizing intermediaries and pseudonyms are insufficient for some types of transactions. For example, imagine an individual who wants to purchase software over the Internet. The individual might have used a pseudonym in her relationship with the vendor, allowing the vendor to keep a profile of her preferences and maintain information about the state of her virtual shopping cart. She might have also used an anonymizing server whenever she visited the vendor's Web site so as not to reveal her network location. But these systems cannot help her transfer funds to the vendor from her bank account without revealing personal information to the vendor.
Fortunately, trusted intermediaries can also enable monetary transactions with minimal requirements for personal information. For example, some Internet credit card systems currently in use allow individuals to make a credit card purchase over the Internet without transferring their card numbers directly to vendors. Instead, an individual sends a vendor a special-purpose code that identifies the transaction. The vendor forwards the code to the card issuer with a request for payment. The issuer then contacts the buyer and asks that the transaction be authorized. Upon receiving authorization, the issuer bills the buyer's credit card and pays the vendor, without revealing the buyer's credit card number to the vendor. Thus the danger of an individual's credit card number being misappropriated is substantially reduced. However, as with traditional credit cards, the card issuer has a complete record of the individual's credit card transactions and must be trusted to safeguard this information.
In general, the more information can be consolidated in the databases of trusted intermediaries, the less need there is to transfer information in the course of completing a transaction. This approach allows attention to be focused on the information practices of a small number of intermediaries rather than on all parties that might engage in transactions. However, the potential for damage can be quite large in the event that the trusted database is compromised or that the intermediary proves to be untrustworthy. This is true whether transactions take place over the Internet or over traditional means.
An alternative to consolidating information in the databases of trusted intermediaries is to keep information in the hands of individuals as much as possible. This can be done by designing transaction systems that transfer only the information that each party absolutely needs to know. For example, in an electronic payment transaction the bank need only know that the individual is authorized to withdraw money from a particular account, the identification number of that account, and the sum of money to be withdrawn; the vendor need only know that it has received a valid payment. The bank need not know what the individual is doing with the withdrawn money, and the vendor need not know the individual's name or bank account number (in contrast, these pieces of information must be transferred, for example, when individuals purchase goods with checks). Thus, only the purchaser has access to the list of purchases that he or she has made. Of course, if the bank does not have access to information about how individuals spend their money, the individuals must maintain their own records. Electronic cash systems can offer the privacy of cash payments with the convenience of electronic payments. However, some of these systems have many of the same vulnerabilities as traditional cash, including risk of theft or loss.
The underlying technology behind some electronic cash systems is a special type of digital signature called a blind signature [2]. Introduced by David Chaum, blind signatures allow a document to be signed without revealing its contents. The effect is analogous to placing a document and a sheet of carbon paper inside an envelope. If somebody signs the outside of the envelope, they also sign the document on the inside of the envelope. The signature remains attached to the document, even when it is removed from the envelope. Blind signatures can be used in electronic payment systems to allow banks to sign and distribute digital notes without keeping a record of which notes an individual has been given. An individual who wishes to withdraw money from a bank account must prepare a digital note with a secret serial number and submit it to the bank. The bank withdraws the money from the individual's account and signs the note. However, the bank does not know the serial number of the note. When the individual gives the digital note to a vendor in exchange for a purchase, the vendor may take the note to the bank and ask for it to be deposited. The bank can verify its signature on the note to determine whether it is legitimate, and it can record the serial number and make sure notes with the same serial number are not spent multiple times. But as with most cash transactions, the bank cannot determine which individual gave the note to the vendor.
Electronic payment systems may be designed as software-only systems that can be used to make payments over computer networks, smart card systems that can be used to purchase goods from vendors who have smart card hardware, or hybrid systems. Regardless of whether the system is implemented in hardware or software, it may be used to store information so that the information is always under the control of the individual to whom it belongs. Thus transaction records may be stored on a chip in the smart card itself or on an individual's computer. The individual may view these records to keep track of personal finances, but the card issuer does not have access to these records.
Blind signatures and smart card technologies can be used for other types of transactions as well. For example blind signatures can be used in electronic voting systems to ensure that each registered voter votes only once while at the same time ensuring that nobody can find out who each person voted for [4]. Smart cards can be used to store credential information (academic degrees, financial credit, certification to enter a restricted area of a building, etc.) and produce convincing evidence that an individual holds requested credentials without requiring the individual to supply the credential checker with the personal information ordinarily required to verify the individual's credentials through traditional means [2].
So far, the approaches discussed here have been aimed at preventing the unwanted release of personal information. Another area in which technology can play a role is in reducing the ability of people and organizations to use personal information to invade an individual's privacy. Today, many individuals become aware of the extent to which their personal information is bought and sold when they start receiving unwanted solicitations over the telephone or in the postal mail. Already, some of these solicitations have begun arriving via electronic mail. Because sending large volumes of electronic mail is so inexpensive, electronic junk mail is likely to become a significant problem in the future if preventative steps are not taken. Currently some Internet service providers filter email sent from addresses known to send mass junk email. However, due to the ease with which junk emailers may forge return addresses, this is not likely to be a long-term solution to the ever-increasing junk email problem.
The junk email problem might be addressed through technologies that sort incoming email according to sender, placing email from unknown senders in a low-priority mailbox. Robert Hall, a researcher at AT&T Labs, has developed a system of channelized electronic mail in which individuals set up many email channels and assign a different channel to each of their correspondents [5]. Correspondents who have not been assigned a channel can only communicate with the individual on a low-priority public channel. The system may be augmented so that the individual may choose to ignore messages received on the public channel unless they are accompanied by an electronic payment. If a message is from someone the individual wishes to correspond with, such as a long lost friend, the individual might refund the payment; however, if the message is a solicitation, he or she would likely keep the payment. This would make it more costly to send unsolicited email, and it would compensate people for spending time reading this mail.
Hall's implementation of channels requires that individuals ask each of their correspondents to contact them at a different email address, a requirement that may prove inconvenient. Alternatively, an individual's email software might sort correspondence into channels based on the name of the sender. Digital signatures might be used in such a system to authenticate senders, thus allowing each person to maintain a single email address. While no email system that performs all of these operations seamlessly currently exists, such an email system could be built using currently available technology.
Unsolicited email could also be read by software programmed to identify unwanted messages. This might be done by looking for patterns that are indicative of junk mail or looking for messages that are similar to messages in a database of known junk mail. People might subscribe to junk mail filtering services that maintain databases of junk mail submitted by subscribers and periodically send updates to each subscriber's computer with instructions on how to identify and delete newly discovered junk mail. Current technology cannot be used to filter junk mail with perfect accuracy, but our ability to filter accurately should improve over time.
It is important to recognize that the technologies presented here address only part of the problem. Even the most privacy-friendly information policies may be thwarted if data collectors do not protect their communications and databases. Security precautions should be taken to prevent communications from being intercepted and databases from being compromised. Organizations should develop procedures to protect passwords and prevent employees from accessing data for unauthorized purposes. Data and communications security is an important component of all privacy protection schemes, whether the data in question was collected over the Internet or by traditional means.
A variety of technologies can be used to safeguard personal privacy on the Internet while allowing individuals to reap the benefits of customized services and convenient payment mechanisms. These technologies can be used to build applications that minimize the need to reveal personal information and empower individuals to control the personal information they reveal and understand how it will be used. The technologies needed to implement these applications are fairly well understood. However, a strong industry-wide commitment will be necessary to drive deployment and adoption. To be effective, these applications will need user-friendly interfaces. Some may also require widespread adoption by both consumers and Web sites before they can be successful.
Lorrie Faith Cranor is a researcher in the Public
Policy Research Department at AT&T Labs-Research. She received
her doctorate in Engineering & Policy from Washington University
in 1996. Her graduate research focused on electronic voting system
design and the development of a new voting paradigm made practical
through the use of computers. Prior to joining AT&T, Cranor
was a lecturer in the Engineering & Policy and Computer Science
departments at Washington University.