Engineering and Policy Development

Not Just Semantics: Learning from Definitions in Technology Policy Proposals

Assembly Fellow Marissa Gerchick explains the state of affairs that motivated term tabs, her Assembly Fellowship project that aggregates and makes searchable important terms in proposed and enacted federal tech legislation.

(Photo: Yasmin Dwiputri & Data Hazards Project / Better Images of AI / AI across industries / CC-BY 4.0)

In recent years, lawmakers in Congress have outlined various approaches for remedying lax data privacy and security practices, grappling with the impacts of AI systems, addressing the monopoly power of large tech companies, and more. Many of these proposals include a section at the beginning titled “Definitions,” outlining key terms used throughout the proposal:

Figure 1: Definitions sections often appear at the beginning of a bill after a section outlining the bill’s title.

At first glance, the definitions sections contained in many legislative proposals may seem to be a perfunctory, obligatory part of legislation – but these definitions are actually crucial. Definitions determine what a proposal does, who it applies to, and how it might be enforced. In the context of technology-related legislation, industry lobbyists may seek to water down legislative proposals by narrowing definitions to exclude their products from regulatory requirements. On the other hand, experts and advocates may seek to broaden definitions to strengthen regulation and ensure the regulation will remain applicable as technology changes over time.

At a time when many policymakers, advocates, tech companies, and the public are swept up in broad debates about whether and how to regulate emerging technologies, these definitions and their histories are also representations of some of the biggest questions in technology policy today. What do we mean when we talk about “artificial intelligence”? What defines a “social media platform”? What kind of harms are we worried about, and who do we want to protect from those harms? Crafting definitions requires reckoning, in some way, with these broad, challenging questions.

Working in Congress for the Senate Judiciary Committee’s Antitrust Subcommittee through the TechCongress program, I found myself wrestling with these questions, and I looked to legislative proposals and existing law for ideas and possible answers. My experience analyzing definitions while working in Congress largely consisted of a time-consuming and imprecise process of copying-and-pasting definitions from bills I found on Congress.gov into a spreadsheet, followed by manually comparing and contrasting the various approaches taken by different proposals. I soon realized my copy, paste, and inspect routine was part of a shared experience. Other staffers were also grappling with the challenge of putting pen to paper and crafting definitions but, like me, lacked readily accessible tools to enable or improve that process.

Over time, as a Rebooting Social Media (RSM) Assembly Fellow at the Berkman Klein Center, I came to realize that a tool that did something seemingly simple – aggregating tech-related definitions across legislative proposals – could enable debate, discussions, and potentially progress on something much harder: articulating clear, workable, and robust definitions. With that realization, I built term tabs, a tool for querying definitions in tech-related legislation introduced in the United States Congress and in enacted U.S. federal laws. While certainly not comprehensive, the tool is designed to make it easier to search and compare definitions, to present information in a manner that is interpretable to various audiences (not just staffers or policymakers!), and to serve as a resource that can be further built upon by others.

Having spent the last several months exploring the nuances of bill structures and legislative APIs (shoutout to the Congress API!), I’ve come to believe that definitions in tech bills that have been introduced in Congress to date also tell important stories about the ways lawmakers and their staff have been exploring these difficult questions for years, shaped by the work of civil rights advocates, impacted communities, researchers, and others. For example, a 2021 proposal defined “reliability” of an artificial intelligence (AI) system and “representativeness” in the context of training data for AI systems – one of several attempts members of Congress have made in recent years at defining “artificial intelligence” and overlapping or related terms.

In areas including and beyond AI, analyzing definitions can also illustrate debates, differing approaches, and changes over time. For instance, three privacy-focused bills introduced in the 117th Congress used different distance thresholds for defining what qualifies as “precise geolocation” information:

The term “precise geolocation information” means information that reveals the past or present physical location of an individual, or device that identifies or is linked or reasonably linkable to 1 or more individuals, with sufficient precision to identify street level location information or an individual’s location within a range of 1,000 feet or less…
Excerpt from the American Data Privacy and Protection Act (H.R. 8152, text as introduced June 21, 2022. Note that this threshold was changed to 1,850 feet in a subsequent version of the bill)

The term “precise geolocation information” means historical or real-time location information, or inferences drawn from other information, capable of identifying the location of an individual or a consumer device of an individual with specificity sufficient to identify street level location information or an individual’s or device’s location within a range of 1,640 feet or less.
Excerpt from the PROTECT Kids Act (H.R. 1781, introduced March 10, 2021)

The term “precise geolocation” means any data that is derived from a device and that is used or intended to be used to locate an individual within a geographic area that is equal to or less than the area of a circle with a radius of one thousand, eight hundred and fifty (1,850) feet.
Excerpt from the Data Protection Act of 2021 (S. 2134, introduced June 17, 2021)

While there is perhaps no perfect or preeminent distance threshold to quantify “precise geolocation information,” a staffer working on new legislation or a policy analyst weighing in on policy proposals may care to know what thresholds have been used in the past. Similarly, proposals that impact social media platforms in some way often contend with the question of how to delineate which apps, services, and systems should be considered a “covered platform” subject to the bill’s requirements, often qualifying platforms based on user numbers. Figure 2 highlights five different legislative proposals from the 117th Congress related to social media with definitions specifying that the bills only apply to platforms with above a certain number of users (among other requirements). While some bills might include platforms with a few hundred thousand users, others apply only to platforms with hundreds of millions.

Figure 2: Definitions of “covered platform” or “social media platform” often hinge on a platform’s number of users and may encompass a wide range of thresholds.

Legislative data isn’t just useful for finding discrepancies in thresholds. We can also borrow graph theory concepts to identify cyclic dependencies in bill text, explore the lineages of definitions across the aisle and across houses of Congress, or put legislative definitions in conversation with other definitions – including from international standards, research works, and other government publications.

As debates about how to govern emerging and existing technologies continue in the U.S. and around the world, we can look to legislative definitions to provide valuable and practical insights for policy conversations and hopefully move us toward answering those broad and challenging questions.