buzzline-06-world

Civic Transparency: Behavior and Detection

Real-world platform integrity teams analyze behavioral signals to detect coordination while preserving user privacy. This notebook simulates an approach using synthetic civic discourse data structured as if it came from a proposed transparency API.

What if platforms provided transparency data that protected user privacy?

Instead of raw posts and user data, imagine platforms shared aggregated behavioral signals:

What we get:

What we DON’T get:

Privacy Considerations:

Key Signals for Detection

  1. Burst Score (0-1): How “bursty” is the activity?
  1. Synchrony Index (0-1): How synchronized are the posts?
  1. Duplication Clusters: How many groups of near-identical content?
  1. Account Age Distribution: Are these established users or new accounts?
  1. Automation Indicators: Are posts coming from humans or automated tools?
    • Groups accounts into buckets by account type, using an enumerated type
    • For example: “enum”: [“person”,”org”,”media”,”public_official”,”unverified”,”declared_automation”]
    • Manual: Web/mobile app posting by humans
    • Scheduled: Legitimate tools (Buffer, Hootsuite) used by real people
    • API: Direct API access (can be legitimate developers or bots)
    • High API use with other signals may be suspicious
    • API use alone is not diagnostic (e.g., journalists, marketers use APIs)

What Is Normal? Baseline Patterns in Organic Discourse

Typical ranges for legitimate civic discussions:

Deviations from these patterns warrant investigation, not automatic interpretation. Context matters - for example, a genuine grassroots mobilization may show coordination indicators.

Why These Signals?

These metrics derive from:

References: Cha et al. (2010), Ferrara et al. (2016), Badawy et al. (2019)