![[D] if your company is ingesting work emails and chats for AI/ML pipelines, is there concern around sensitive business info getting out?](https://jfbmhhfxbbrxcmwilqxt.supabase.co/storage/v1/object/public/resource-images/MachineLearning_AI_for_business_20250328_183405_processed_image.jpg)
[D] if your company is ingesting work emails and chats for AI/ML pipelines, is there concern around sensitive business info getting out?
Edit: to be more specific - around sensitive raw data/metadata being dumped in system logs and accidentally viewed by an insider
Hi folks
Firstly full disclosure I’m the CEO of DataFog (www.datafog.ai). This is NOT a sales pitch but rather an interest in hearing what the community thinks about the overall issue which I believe will ultimately be solved via an ML-based implementation.
My contention is:
- Generative AI has catalyzed widespread practice of ingesting email and work chat content to power AI training and inference
- this introduces a risk of content concerning confidential corporate affairs* that can pass most privacy filters
This results in Raw data alluding to sensitive business events flowing in freely for easy accidental unauthorized access by an internal - like MLOps - user
My second contention is that the current security tools may not offer adequate coverage for what will be an evolving ongoing need that run of the mill PII redactors can’t account for.
Take this statement which might easily be found in the inbox of the C-Suite for one of these two companies under “CiscoAcqPR_Draft.docx” or the like:
Cisco offered $157 in cash for each share of Splunk, representing a 31% premium to the company's last closing price.
I myself have run various merger docs and legal filings through some standard PII tools and all of them fail to redact mention of deal terms. ~~A model training on phrases like “ $157 in cash per share” could have negative downstream inferential consequences or~~ if viewed accidentally by someone internally without the right access privileges
How’re you all thinking about this problem? Custom recognizers are a common option like what you see with Microsoft Presidio but I’ve heard from some that maintaining those can be a PITA. At big companies this has been solved through internal tooling.
*more than Personally Identifiable Information (PII), HIPAA, or customer transaction data. It’s about those emails the CEO has sent to the Board of Directors in the midst of a corporate crisis, or the email thread between the C-Suite regarding an upcoming Earnings Call, or the market-moving announcement in the works regarding a merger with a competitor. In other words, Non-PII content that still needs to be redacted.
Vibe Score

0
Sentiment

1
Rate this Resource
Join the VibeBuilders.ai Newsletter
The newsletter helps digital entrepreneurs how to harness AI to build your own assets for your funnel & ecosystem without bloating your subscription costs.
Start the free 5-day AI Captain's Command Line Bootcamp when you sign up: