Data Ethics and Best Practices: Highlights from Strata Data Conference

Next: The Failures of Protecting Consumer Privacy: Our Takeaways...
Previous: E-Crime Congress: The Confluence of Fraud Risk and...
Writer Amber W.
March 16, 2018

Amber is the Chief Data Scientist at Terbium Labs. If she’s not working on her latest model, she can probably be found gardening with her kids or getting weird looks for knitting in a sports bar.

Members of the Terbium Labs technical team were in San Jose last week, attending the business-focused data science conference Strata Data Conference.

Strata was founded in 2012, formerly known as Strata + Hadoop World, combining O’Reilly’s and Cloudera’s big data conferences. Currently the largest data conference series in the world, Strata covers a full range of big data tools and technologies, while keeping a casual and informal approach.

Terbium CTO, Clare Gollnick, presented a thoughtful talk on the limits of inference, in which she shared a framework for avoiding common data science and machine learning pitfalls. She argued that the best way to have a successful machine learning project is to recognize early which questions can be answered with data, and which cannot, and provided examples on how to do so.

Alongside the myriad technical topics at Strata, there were two notable threads that seemed to have particular importance in light of recent breaches and upcoming regulations: data ethics and regulation compliance.

Natalie Evans Harris of BrightHive had a keynote talk and a brainstorming session on defining responsible data practices. Both highlighted work over the past year that has culminated in two ethics documents: community principles on ethical data practices and a manifesto for data practices. (These documents can be signed by practitioners at, and interested volunteers can sign up to help the project at

We at Terbium consider these efforts to be particularly important. There is no industry standard for use of sensitive user data, and we heard at least one attendee (to remain anonymous) arguing that all user data is company property. Ethics discussions and agreements like these can guide data practitioners to see why that view might be problematic.

We also saw several presentations about upcoming data regulations, specifically the EU General Data Protection Regulation (GDPR) which will affect all companies handling EU-citizen data. Many companies are preparing for when these regulations go into effect in May and are retooling technologies for more robust encryption, easier identification of sensitive data, and ensuring that sensitive data can be anonymized or fully deleted on request.

We did notice, however, that there was a lack of discussion about monitoring for data breaches, which is a requirement of GDPR, and may have inspired us to start thinking about talk topics for next year.

events September 06, 2018
Sharks and Shpiony: A Conversation with Andrei Soldatov and Irina Borogan, Authors of The Red Web

As part of our Black Hat programming this year, we had the distinct honor of hosting journalists Irina Borogan and Andrei Soldatov for an evening of discussion on security, surveillance, and the state of...

events August 30, 2018
Risk, Cyber Crime and Strategic Security: Highlights from Black Hat 2018

Members of the Terbium Labs team once again made the summer trek to Las Vegas for Black Hat USA in search of the latest developments in information security.