Driving Scalable Experimentation at Atlassian

Unlocking experimentation at scale with Statsig—driving rapid innovation and smarter, data-informed product decisions.

TypeScriptNode.jsReact

🎯 Background & Challenges

Atlassian lacked a scalable and intuitive experimentation platform. While most products had migrated to the cloud, the legacy of on-prem infrastructure left significant gaps in our ability to run controlled experiments effectively. Experimentation was unintuitive, difficult to configure, and poorly supported—resulting in low adoption of feature flags and minimal experimentation across teams. UI tooling was fragmented and lacked usability, limiting experimentation velocity.

Compared to industry leaders like Facebook, Atlassian’s experimentation culture and tooling were lagging. Teams lacked confidence, and iteration was often driven by instinct instead of data.

This project embraced a cloud-native mindset to introduce standardized, developer-friendly tools—enabling fast, safe, and scalable experimentation.

🎯 Project Goals

Standardize Rollout

Ensure consistent usage of libraries across platforms, products, and languages.

Increase Experimentation

Enable more robust A/B testing and rapid iteration across teams.

Support Data-Driven Decisions

Provide clear insights to inform product development through integrated analytics.

Drive Library Adoption

Roll out reusable libraries and improve visibility across engineering teams.

Support Adopted Libraries

Provide guidance, documentation, and maintenance as adoption scales.

Enable Scalable Tooling

Develop internal tools that empower teams to adopt best practices with minimal friction.

🚚 The Move to Statsig

To meet our goals around experimentation, we adopted Statsig—giving us a fast, reliable way to launch experiments with confidence.


⚙️ How It Works

Running an experiment using statsig involves the following steps.


  • Define A Hypothesis

    Clearly define what you want to learn. Formulate a testable hypothesis with a measurable outcome that helps you decide what success or failure looks like.


    🔍 Example Scenario
    You’re working on a SaaS app and want to test whether changing the primary CTA button from “Start Free Trial” to “Get Started Now” improves sign-up conversion.



  • Create Experiment

    Set up your experiment using Statsig. Define control and variant groups, target the appropriate user segments, and identify the key metrics you'll analyze.

    Statsig handles random assignment and exposure logging once your groups are defined.


    🎛️

    Control Group

    The baseline experience — this is what users currently see. All impact is measured against this group.

    🧪

    Variant(s)

    The new version(s) being tested. You compare these against the control to determine if your change is effective.

    🎯

    Targeting

    Rules that define who sees the experiment — by geography, platform, feature flag, or user segment.

    📊

    Metrics

    Quantitative measurements that indicate success. Common examples include conversion rate, retention, or revenue.

    📊

    Exposure Events

    Events that track when a user is exposed to a feature variant, allowing experiments to accurately measure impact and attribute outcomes.



  • Instrument in Code

    Use Statsig's SDK to integrate the experiment. Call getExperiment() or getConfig() to control logic in your product based on user assignment.



  • Track Metrics

    Ensure all exposure events and relevant metrics are properly logged. This provides the data needed to evaluate your experiment's impact.



  • Monitor Results

    Use Statsig's dashboard to watch experiment performance. Compare metrics between variants, look for significant differences, and assess impact.



  • Make a Decision

    Interpret the results. Decide whether to ship the change, roll it back, or iterate. Back your decision with data from the experiment.



  • Share Learnings

    Document the experiment's outcomes, insights, and decisions. Share findings with your team to support collective learning and future initiatives.

⚙️ Implementation

Statsig Node.js Wrapper

While Statsig provides robust SDKs for multiple platforms, we chose to develop our own internal wrappers to ensure consistent and scalable integration across ourS services.

By building these wrappers, we were able to embed native support for our internal Traits and Attributes Platform (TAP), allowing both sidecar-based and API-based trait resolution to integrate seamlessly into the experiment workflow. It also enabled us to enforce standardized evaluation logic across teams, reducing duplication and potential misconfigurations.

Furthermore, the wrappers offered a clean developer interface with utility functions such as buildStatsigUser, checkGate, and getExperiment, streamlining adoption and ensuring a unified developer experience regardless of the service or team implementing Statsig.

Traits

In Statsig, the StatsigUser object includes traits for targeting:

  • userId: A unique identifier
  • customIDs: e.g. companyID for org-level rollouts
  • email: The user’s email
  • privateAttributes: For secure evaluation
  • country: User region
  • organization: Org or team context

The wrappers support both sidecar-based and API-based TAP trait retrieval methods with graceful fallback logic. For developer ergonomics, I introduced helper methods like buildStatsigUser, checkGate, and getExperiment.

Bootstrapping

Bootstrapping allows initializing Statsig with pre-evaluated flags before a network call. It helps:

  • Avoid UI flickers
  • Improve cold-start performance
  • Ensure consistent flag values across SSR and client
Trait Merging

Trait merging combines traits from multiple sources—auth, runtime, and environment—into a single user object. This enables:

  • High-fidelity targeting in experiments and feature gates
  • Consistent identity resolution across services
  • Cleaner structure, less boilerplate, and fewer runtime errors

The wrappers also resolve merge conflicts deterministically and provides lifecycle methods like initializeStatsig() and shutdownStatsig().

📈 Outcomes

🧭

Data-Driven Culture

Empowered product teams to make release decisions based on experiment outcomes rather than intuition

📉

Reduced Experiment Setup Errors

Cut experiment configuration errors by 70%

👥

Developer Adoption

100+ teams actively using the platform for experimentation

🔄

Experiment Velocity

3x increase in the number of concurrent experiments run

⏱️

Time to Insight

Faster time to insight, enabling teams to iterate faster and make more informed decisions

📊

Simplified Experiment Setup

Significantly reduced experiment setup time by 80%

← Back to Projects