Close
Josh Lefkowitz
Chief Executive Officer
Josh Lefkowitz executes the company’s strategic vision to empower organizations with the fastest, most comprehensive coverage of threatening activity on the internet. He has worked extensively with authorities to track and analyze terrorist groups. Mr. Lefkowitz also served as a consultant to the FBI’s senior management team and worked for a top tier, global investment bank. Mr. Lefkowitz holds an MBA from Harvard University and a BA from Williams College.
Evan Kohlmann
Chief Innovation Officer
Evan Kohlmann focuses on product innovation at Flashpoint where he leverages fifteen years’ experience tracking Al-Qaida, ISIS, and other terrorist groups. He has consulted for the US Department of Defense, the US Department of Justice, the Australian Federal Police, and Scotland Yard’s Counter Terrorism Command, among others. Mr. Kohlmann holds a JD from the Univ. of Pennsylvania Law School and a BSFS in International Politics from the Walsh School of Foreign Service at Georgetown Univ.
Josh Devon
Chief Operating Officer / Chief Product Officer
Josh Devon focuses on product vision and strategy at Flashpoint while ensuring the company’s departments function synergistically during its rapid growth. He also works to ensure that customers receive best in class products, services, and support. Previously, Mr. Devon co-founded the SITE Intelligence Group where he served as Assistant Director. He holds an MA from SAIS at Johns Hopkins Univ. At the Univ. of Pennsylvania, he received a BS in Economics from the Wharton School and a BA in English from the College of Arts and Sciences.
Chris Camacho
Chief Revenue Officer
As Chief Revenue Officer, Chris Camacho leads the company’s global sales team, which includes solution architecture, business development, strategic integrations, partnerships, and revenue operations; he is also the architect of Flashpoint’s FPCollab sharing community. With over 15 years of cybersecurity leadership experience, he has spearheaded initiatives across Operational Strategy, Incident Response, Threat Management, and Security Operations to ensure cyber risk postures align with business goals. Most recently as a Senior Vice President of Information Security at Bank of America, Mr. Camacho was responsible for overseeing the Threat Management Program. An entrepreneur, Mr. Camacho also serves as CEO for NinjaJobs: a career-matching community for elite cybersecurity talent. He has a BS in Decision Sciences & Management of Information Systems from George Mason University.
Lisa Iadanza
Chief People Officer
Lisa M. Iadanza leads all functional areas of People Operations at Flashpoint, including human resources, talent acquisition & management, employee engagement, and developing high performance teams. In addition to collaborating with the executive team to drive strategic growth, she plays an integral role in fostering Flashpoint’s culture and mission. Driven by her passions for mentorship, employee advocacy, and talent development, Ms. Iadanza has more than twenty years of experience in building, scaling, and leading human resources functions. Prior to Flashpoint, she held leadership roles at Conde Nast, Terra Technology, and FreeWheel. She is a member of the Society for Human Resources Management (SHRM) and holds a bachelor’s degree in management with concentrations in human resources and marketing from State University of New York at Binghamton.
Donald Saelinger
President
Donald Saelinger is responsible for driving strategic and operational initiatives to accelerate Flashpoint’s growth and scale. In this role, Donald leads a broad portfolio including Marketing, Customer Success, Revenue Operations, Legal and related functions, and is focused on helping the company execute on a go-to-market approach that maximizes value to our customers. Prior to Flashpoint, Donald served as Chief Operating Officer and General Counsel of Endgame, Inc., an endpoint detection and response company acquired by Elastic N.V. in 2019, and where he led a range of teams focused on growth, scale, and legal and compliance matters. Donald also previously served as the General Counsel and Chief Compliance Officer at Opower, Inc. (NYSE: OPWR), a global provider of SaaS solutions to electric and gas utilities that was acquired by Oracle, Inc. in 2016. Donald graduated from Columbia University in 2000 and received his JD from the Georgetown University Law Center in 2006.
Rob Reznick
SVP Finance and Corporate Development
Rob Reznick leads the finance, accounting, and corporate development teams at Flashpoint. Rob previously served as Director of Finance & Accounting for 1010data (acquired by Advance/Newhouse), and Director of Finance for Financial Guard (acquired by Legg Mason) after prior work in forensic accounting and dispute consulting. Mr. Reznick is a Certified Public Accountant and holds an MBA and MAcc from the Fisher College of Business at the Ohio State University, and a BBA from the Ross School of Business at the University of Michigan.
Tom Hofmann
SVP Intelligence
Tom Hofmann leads the intelligence directorate that is responsible for the collection, analysis, production, and dissemination of Deep and Dark Web data. He works closely with clients to prioritize their intelligence requirements and ensures internal Flashpoint operations are aligned to those needs. Mr. Hofmann has been at the forefront of cyber intelligence operations in the commercial, government, and military sectors, and is renowned for his ability to drive effective intelligence operations to support offensive and defensive network operations.
Jake Wells
SVP Solutions Architecture
Jake Wells leads strategic integrations and information sharing as part of the client engagement & development team, which serves as an internal advocate for our government and commercial clients to ensure Flashpoint’s intelligence solutions meet their evolving needs. He leverages a decade of experience running cyber and counterterrorism investigations, most recently with the NYPD Intelligence Bureau, to maximize the value customers generate from our products and services. Mr. Wells holds an MA from Columbia University and a BA from Emory University.
Brian Brown
SVP Strategy and Business Development
Brian Brown is responsible for the overall direction of strategic sales and development supporting Flashpoint’s largest clients. In his role, Mr. Brown focuses on designing and executing growth-oriented sales penetration strategies across multiple vertical markets, including both Government and Commercial, supporting Flashpoint’s Sales and Business Development Teams. An experienced entrepreneur, Mr. Brown also serves as CSO for NinjaJobs, a private community created to match elite cybersecurity talent with top tier global jobs and also advise growth-stage cybersecurity companies.
Justin Rogers
VP Revenue Operations
Justin Rogers leads the Revenue Operations team at Flashpoint, aligning sales, marketing, partnerships, customer success, and finance across vision, planning, process, and goals. He leverages over 15 years of experience in security, strategy, product design, and implementation to drive growth, provide an end-to-end view of the customer journey, and a seamless customer experience. Recently, Justin led Marketing for Centripetal, bringing the first Threat Intelligence Gateway to market. Previously, he managed operations of a Counter IED lab electronics forensics division while forward deployed in support of Operation Iraqi Freedom and Operation Enduring Freedom in Afghanistan. Justin holds a BS in Electrical Engineering from the University of New Hampshire.
Peter Partyka
VP Engineering
Peter Partyka leads Flashpoint’s engineering teams. Peter previously worked in the quantitative hedge fund space in New York City, implementing security and administrative solutions around proprietary trading platforms, high-availability cloud deployments, and hardening of applications and infrastructure. Peter leverages more than 16 years of experience in technology specializing in application security, red-teaming, penetration testing, exploit development, as well as blue-teaming. Peter has a long track record of managing tech teams and implementing engineering security best practices. Recently Peter led Flashpoint toward GDPR and CCPA compliance and has been a key architect of Flashpoint’s robust compliance programs. Peter has taught advanced cybersecurity courses at New York University and consulted at various tech startups during his career.
Paul Farley
VP APAC Sales
Paul Farley is responsible for the Asia-Pacific region of Flashpoint's international business, including Australia, Japan, and Singapore. In his role at Flashpoint, Paul is executing growth-oriented sales strategies across multiple countries and vertical markets, including both Government and Commercial. Paul has extensive experience leading regional sales for both pre-IPO growth businesses and large organizations such as RSA, EMC and DELL.
Steven Cooperman
VP Public Sector Sales
Steven Cooperman is responsible for Flashpoint’s strategy and sales growth of its public sector business. He also supports the development of a robust partner ecosystem for public sector business to deliver value added offerings and innovation focused to the mission of government. Steven has an established and diverse career in the Public Sector, holding leadership positions at a number of successful enterprise software companies and Federal System Integrators, including ServiceNow, HP, Oracle and Northrop Grumman. He holds an MA in Analytic Geography from the State University of New York - Binghamton, and received his BS in Geology from the State University - Oneonta.
Matthew Howell
VP Product
Matthew Howell leads the Product Management and Product Marketing teams for Flashpoint. He is responsible for developing a strong team that drives product adoption and user engagement through outcome based prioritization, continuous process improvement, and metrics driven development. Matthew brings a passion for diverse ideas, experience launching B2B SaaS products, building integration ecosystems, supporting five 9s SLAs, and leading distributed teams. He holds a bachelor’s degree in computer science from the University of Virginia
Glenn Lemons
Executive Director Strategic Accounts Engagement
Glenn Lemons is Executive Director, Strategic Accounts Engagement at Flashpoint. He previously served as the acting Director of Citigroup's Cyber Intelligence Center where he was responsible for analyzing and reacting to intelligence from a variety of threats. These threats ranged from fraudulent activity and attempting to defraud Citi's clients to supporting security operations for the firm's worldwide network presence. He has extensive experience working with multiple clients across the financial services, manufacturing, healthcare, and public sectors. Glenn also has more than 26 years of intelligence experience within the operational and support communities in the U.S. military and federal civilian service; seven of which focused on both defensive and offensive cyber operations. While working for the U.S. Department of Homeland Security, he testified numerous times before U.S. Congressional committees and member requested open and closed sessions.
Close
Steve Leightell
Steve started his career in Internet sales in the early 1990s and was always a top sales rep before transitioning to business development. By the early 2000s, he was the Director of Business Development at DWL, where he managed a team that built partnerships with Accenture, Oracle, Tata Consulting, Wipro, Cognizant and IBM. Steve designed the channel and strategy that ultimately culminated in the acquisition of DWL by IBM in 2005. He went on to lead a global team within IBM that was responsible for major system integrator partnerships. In 2008, he left IBM to found a niche consulting firm focused on business development for SaaS organizations. Steve holds a BA in anthropology and sociology from Carleton University in Ottawa.
Ellie Wheeler
Ellie Wheeler is a Partner at Greycroft and is based in the firm’s New York office. Prior to joining Greycroft, Ellie worked in a similar role evaluating investment opportunities at Lowercase Capital. Ellie also worked at Cisco in Corporate Development doing acquisitions, investments, and strategy within the unified communications, enterprise software, mobile, and video sectors. While at Cisco, she was involved in multiple acquisitions and investments, including PostPath, Jabber, Xobni, and Tandberg. She began her career in growth capital private equity at Summit Partners in Boston. Ellie graduated magna cum laude from Georgetown University with a BA in Psychology and holds an MBA from Harvard Business School.
Glenn McGonnigle
Glenn McGonnigle is a General Partner at TechOperators. Prior to launching TechOperators in 2008, Glenn was CEO of VistaScape Security Systems, a venture-backed provider of enterprise intelligent video surveillance software. He lead the company through its successful sale to Siemens Building Technologies. Previously, Glenn was a co-founder and senior executive of Atlanta-based Internet Security Systems (ISS) where he helped raise initial venture capital and launch the business. For 7 years, he led the business development team in developing sales channels and entering the managed security services market. During his tenure, the company grew from startup to revenues of over $225 million and was later acquired by IBM for $1.3 billion.
Brendan Hannigan
Brendan joined Polaris Partners in 2016 as an entrepreneur partner. In this role, he focuses on funding and founding companies in the technology sector with a concentration in cloud, analytics, and cybersecurity. Brendan is a co-founder of Sonrai Security and chairman of Twistlock, both Polaris investments. He also currently serves on the board of Bitsight Technologies and Flashpoint. A 25 year technology industry veteran, Brendan was most recently the general manager of IBM Security. Under Brendan’s leadership, IBM Security grew significantly faster than the overall security market to become the number one enterprise security provider in the world with almost $2B of annual revenue.
Matt Devost
Currently, Devost serves as CEO & Co-Founder of OODA LLC as well as a review board member for Black Hat. In 2010, he co-founded the cybersecurity consultancy FusionX LLC which was acquired by Accenture in August 2015, where he went on to lead Accenture's Global Cyber Defense practice. Devost also founded the Terrorism Research Center in 1996 where he served as President and CEO until November 2008 and held founding or leadership roles at iDefense, iSIGHT Partners, Total Intel, SDI, Tulco Holdings, and Technical Defense.

The Evolution of Flashpoint’s Data Pipeline, an Engineering Story

Blog
November 12, 2021

By Greg Busanus, Senior Software Engineer

Beginnings: Individual pipelines, successes and trade-offs

The Flashpoint data pipeline was in its infancy when I joined four years ago. Our engineering team published and developed datasets independently. For a spell, this worked well because we could develop rapidly and without stepping on the toes of the other teams. And, if anything went wrong, each team could respond independently.

The trade-off, however, was a duplicated effort. 

Every new dataset required the engineering team to build out a full new pipeline while each individual team needed to allocate one person to become familiar with that particular—and entirely separate—pipeline deployment process. 

Similarly, that independence also meant that communicating lessons learned or improvements from one pipeline to another was done on an ad hoc basis, requiring cross-team coordination for consistent deployments. Finally, it meant that there was no clear path for cross-dataset analysis or enrichment.

Selecting tools to aid our data evolution

The prospect of cross-dataset analysis was the initial impetus to build a fully streaming ETL pipeline through which all of our data would travel. This allowed us to perform transformations individually or across datasets, and to easily load data into any current or future data stores with minimal additional effort. 

We also wanted the ability to load or reload historical data in batches on-demand and to dynamically scale our streaming jobs based on the amount of data flowing into them. To that end, we elected to use Apache Beam running on Google Cloud’s Dataflow Runner which, in addition to satisfying our requirements, was tightly integrated with Google Cloud services like  PubSub, BigTable, and Cloud Storage.

Streaming v1.0

We chose to make our updates in phases to avoid any negative operational impact. To keep things simple and straightforward, we built out a simple Beam job which read from a single PubSub subscription containing the data from one dataset and then wrote that data to an Elasticsearch index. After thorough testing we created new subscriptions on our production topics and a Dataflow job reading from each, decommissioning the old jobs as we went. 

With our one-VM-running-Python-per-dataset pipelines effectively replaced by one-Dataflow-job-per-dataset pipelines, we were in a somewhat awkward position: While our infrastructure was centralized in a single team, the pipelines themselves were separate. 

What this means is that individual teams were slower to iterate when it came to deploying new datasets, but we had yet to see any of the promised benefits of the transition. That being said, our new pipeline was faster and we had a much clearer idea of the overall amount of data that we were processing. Furthermore, our data centralization meant that we could track all of our different datasets more reliably. It wasn’t ideal, but it was serviceable, especially with only a few datasets to maintain.

 

Data enrichments provide immediate value

Our new data pipeline provided us with the opportunity to begin transforming our data. When we collect our data we collect individual messages, as well as information about the users and sites they are posted on. 

We store this data separately, but when it comes to displaying that information, we want to ensure that the individual posts are properly correlated with their respective sites, users, and any other pertinent data that would help to classify our collections. To facilitate this, our individual messages have URIs to provide links between those messages and their associated data. Before writing to Elasticsearch, we wanted to join the incoming messages with the most up-to-date version of that additional data.

Denormalization

To that end, we implemented a denormalization stage in our Elasticsearch writer pipeline. This transformation would read the URIs, the intended behavior associated with each URI, and a path within the message to which we would apply the behavior. So for example, we might say to “embed” the data stored in the URI to the “site” field in the document we were processing. To start with, we were storing this metadata in Google Cloud Storage since it was the most cost-efficient option that we found. The transformation would iterate over all URIs in a document, embedding and replacing fields with the separately stored information as specified.

Unfortunately, we soon encountered an issue. 

The metadata was frequently being collected around the same time as the messages we were processing. This meant that if the process for writing the metadata to GCS took longer than the process for denormalizing and writing to Elasticsearch, then we ran the substantial risk of embedding outdated information or possibly failing to embed outright. 

Trial and error: Implementing an effective strategy 

Initially, we tried adding retries to the denormalization stage but this caused delays to the entire pipeline, and could have resulted in backlogs that would either become insurmountable or would strain our Elasticsearch cluster. We also tried a dead lettering strategy, but that involved a manual intervention to trigger reprocessing the data, which led to unacceptably long delays in ingesting it. Furthermore, due to the ephemeral nature of the data that we were collecting, we couldn’t guarantee that any one piece of metadata would ever actually be collected. 

With these issues in mind, we decided it was preferable to have the incomplete documents available rather than leave them missing due to their lack of metadata. Eventually, we settled on a temporary solution where we would replay messages after about an hour, giving us the confidence that all available metadata could be written to GCS during that time period.  

With that fix in place, we had the time to refactor our denormalizer logic, switching from GCS to BigTable for our storage. BigTable has proven to be much more efficient at storing the necessary metadata in a timely manner.

Scaling challenges

Though our pipeline with the denormalization logic shared a codebase and was consistent across all of our datasets, we were still deploying separate individual pipelines. When we originally deployed this architecture we had maybe four or five pipelines loading data into Elasticsearch. Not ideal, but manageable for our small but growing Data Engineering team. 

As we began to push for more varied datasets, that number rapidly expanded—we added multiple datasets over just a few weeks and our data engineering team spent most of their time configuring new pipelines and their surrounding infrastructure.

Data engineers were manually setting up new infrastructure and pipeline components for each new dataset. This was manageable when we were adding one dataset per quarter, but at multiple datasets per week, it quickly became clear that we couldn’t keep this up. The rapid increase in big data pipelines began to affect the performance and reliability of Elasticsearch, leading our SRE team to spend more of its time on scaling up the cluster, rebalancing shards, and reindexing. Additionally, all these new pipelines began to significantly increase our cloud costs.

It was the time to move on to the next stage: Creating a single unified pipeline across datasets.

Unified streaming: Finding the sweet spot

There were two major factors we had to address in order to support a unified data pipeline: 1) multiple inputs and 2) multiple outputs. 

Solving for multiple inputs

First, we had individual PubSub topics per dataset coming into our data pipeline; this was necessary with individual pipelines because we had to keep that data independent. If we wanted to rapidly transition from multiple pipelines to one, the easiest solution was to modify our job to support an arbitrary number of PubSub topics as potential inputs, which we would then combine and treat as a single input. 

Easy enough using Beam’s built-in Flatten transformation, but once we had all of our data coming through a unified pipeline, we then had to face the prospect of separating it back out again into the individual locations where it was stored. Fortunately, each dataset had a well-defined JSON Schema associated with the messages. 

Solving for multiple outputs

Rather than statically inserting all incoming data into whatever index we manually specified, we modified our Elasticsearch writer Beam job to dynamically determine the index to which the data belonged, and write appropriately for each individual incoming message. (This also had the added benefit of enforcing a common naming convention for our indices, and for our other data stores.)

With these changes made and deployed, we were able to reduce the number of running Dataflow jobs from 50 or so to about 4. Each job was able to scale dynamically with the throughput of data from the various PubSub subscriptions from which it was reading, meaning that we were now using the minimum number of necessary worker machines at any given time instead of at least one worker per job per dataset. 

This autoscaling was extremely helpful from a cost standpoint, but it did end up having a downside when it came to communicating with Elasticsearch. We discovered that there were not many options for throttling the throughput on our Elasticsearch writer, meaning that if we got a massive flood of data at a certain point, we would end up scaling out our pipeline and overwhelming our Elasticsearch cluster. 

After some tuning, we found a sweet spot with a reasonable maximum number of workers for our Elasticsearch writer job, ensuring that we would not overwhelm the cluster but also that our data would still be able to get into our system in a timely manner (within 15 minutes for 95% of streaming data).

 

We finally had our initial unified pipeline—cost-effective, easy to maintain, and complete with a design that would allow us to operate over multiple datasets at once with a single codebase, while also differentiating between those datasets for certain operations when necessary. The major transition was out of the way, and the cost-related fires were put out. This gave us the time to automate the process of adding new datasets to further reduce the workload on our data engineering team. 

Free of many of the tedious and time-sensitive tasks that were previously required of us, the data engineering team could look towards the future yet again.

Enrichments

We began by focusing on the “Transform” portion of our ETL pipeline. An opportunity soon arose when we reached out to our fellow engineers as well as Flashpoint’s product managers. Together we discovered that there was a very specific type of query that our customers were performing within the platform that was causing considerable strain on our Elasticsearch cluster. 

These queries were searching for a list of numbers but due to how we were storing the data and handling the searches, they were being performed over the full text of our documents. We quickly determined that it would be relatively easy to extract the results that our customers wanted from the raw text, allowing for queries to be run much more efficiently. In the interest of separating our loads and transforms, we decided to create a new Dataflow job for these enrichments. Having separate enrichment and writing jobs also allowed us to add new data stores in parallel with enrichment development.

 

We began by creating a variety of enrichments which we would extract via regex, verify the results using enrichment-specific logic, and then add that enrichment to the base document.  

As we developed our new enrichment pipeline, we came to realize that our denormalization stage was really just a complicated enrichment—and that we would benefit from certain enrichments running before denormalization and others running after—so we migrated that logic to our new enrichment job. We also developed utility enrichments to ensure that datasets had certain common fields for our UI’s usage—namely the date that we first observed a document and the date by which we want to sort any given document in the UI. (Presumably, API users would want to use that field by default as well). 

These common fields allowed us to add BigQuery as an additional data store and gave us the ability to perform analyses across multiple datasets. Finally, we had a number of fields in our documents that were remnants of the collection process and wholly irrelevant to any end users (not to mention taking up space in our data stores). We created an end stage in our pipeline to remove those fields before the documents were sent to our user-facing data stores. 

Where do we go from here?

We now have a scalable, performant, and cost-efficient ETL pipeline writing to multiple data stores with the ability to easily add transformations as new use cases come in. So where do we go from here?

What if we have an enrichment that is highly valuable to us but slow to run? What if we want the data from it but we don’t want to delay the message on its way to the data store, or potentially cause a roadblock in the pipeline? 

The answer we’ve come to as a team is that we need to create a separate pathway for our “slow” data—one that is enriched as we get to it and doesn’t delay the base message from getting to our data stores and therefore our customers. 

Join Flashpoint’s Engineering Team!

Flashpoint is hiring for a variety of exciting engineering positions. Apply today and learn why 99% of Flashpoint employees love working here, helping us become a Great Place to Work-Certified™ Company for 2021.

Flashpoint Intelligence Brief

Subscribe to our newsletter to stay up-to-date on our latest research, news, and events