How Segment Works
Segment is a Customer Data Platform (CDP) that acts as a routing layer between data sources and downstream tools. Instead of installing separate SDKs for your analytics platform, email service, advertising pixels, and data warehouse, you install Segment once and configure it to forward data to each destination.
The architecture has three core components:
Sources -- Where data originates. A source is a website (analytics.js), a mobile app (iOS/Android SDK), a server (Node, Python, Go, etc.), or a cloud app (Stripe, Zendesk, Salesforce via Cloud Sources). Each source has a unique write key.
Destinations -- Where data goes. A destination is any tool that receives data from Segment: Google Analytics, Amplitude, Mixpanel, HubSpot, a data warehouse, etc. Segment maintains pre-built integrations for 400+ destinations. Each destination can be configured with mapping rules, filters, and transformations.
Protocols -- The schema enforcement layer. Protocols define a Tracking Plan: the expected events, properties, and data types. When an event arrives that violates the plan, Segment can flag it, block it, or allow it through with a warning.
Data flows through Segment in a predictable pipeline: Source SDK collects an event -> event is sent to Segment's API (api.segment.io/v1/...) -> Segment validates against the Tracking Plan -> Segment transforms and routes the event to each enabled destination -> destination receives the data in its expected format.
Segment translates your single API call into the format each destination expects. When you call analytics.track('Purchase', { revenue: 49.99 }), Segment sends a purchase event to GA4 (with the correct gtag format), a track call to Mixpanel (with distinct_id), a revenue event to Amplitude (mapped to their Revenue API), and an INSERT to your BigQuery warehouse, all from that single call.
For client-side destinations (like Google Analytics or Facebook Pixel), Segment can operate in two modes:
Device mode: Segment loads the destination's native SDK in the browser and calls its methods directly. The data never passes through Segment's servers. This is required for destinations that need browser-level access (cookie setting, pixel firing).
Cloud mode: Events are sent to Segment's servers first, then forwarded server-side to the destination's API. This reduces the number of scripts loaded in the browser and gives you server-side filtering and transformation.
Installing analytics.js
Add the Segment snippet to the <head> of every page. Replace YOUR_WRITE_KEY with the write key from your JavaScript source in the Segment app.
<script>
!function(){var i="analytics",analytics=window[i]=window[i]||[];if(!analytics.initialize)if(analytics.invoked)window.console&&console.error&&console.error("Segment snippet included twice.");else{analytics.invoked=!0;analytics.methods=["trackSubmit","trackClick","trackLink","trackForm","pageview","identify","reset","group","track","ready","alias","debug","page","screen","once","off","on","addSourceMiddleware","addIntegrationMiddleware","setAnonymousId","addDestinationMiddleware"];analytics.factory=function(e){return function(){var t=Array.prototype.slice.call(arguments);t.unshift(e);analytics.push(t);return analytics}};for(var e=0;e<analytics.methods.length;e++){var key=analytics.methods[e];analytics[key]=analytics.factory(key)}analytics.load=function(key,e){var t=document.createElement("script");t.type="text/javascript";t.async=!0;t.src="https://cdn.segment.com/analytics.js/v1/" + key + "/analytics.min.js";var n=document.getElementsByTagName("script")[0];n.parentNode.insertBefore(t,n);analytics._loadOptions=e};analytics._writeKey="YOUR_WRITE_KEY";analytics.SNIPPET_VERSION="5.2.0";
analytics.load("YOUR_WRITE_KEY");
analytics.page();
}}();
</script>
The snippet stubs out all Segment methods, queues calls made before the library loads, then asynchronously loads the full analytics.js bundle. The analytics.page() call at the end fires an automatic page view.
Server-Side (Node.js)
const Analytics = require('@segment/analytics-node');
const analytics = new Analytics({ writeKey: 'YOUR_WRITE_KEY' });
analytics.track({
userId: 'USER_123',
event: 'Subscription Renewed',
properties: {
plan: 'annual',
revenue: 599.00,
currency: 'USD'
}
});
// Flush pending events (important for serverless/Lambda)
await analytics.flush();
Server-Side (Python)
import segment.analytics as analytics
analytics.write_key = 'YOUR_WRITE_KEY'
analytics.track('USER_123', 'Subscription Renewed', {
'plan': 'annual',
'revenue': 599.00,
'currency': 'USD'
})
analytics.flush()
Verifying Installation
Open Chrome DevTools Network tab and filter for api.segment.io. You should see POST requests to /v1/p (page calls), /v1/t (track calls), and /v1/i (identify calls). Each request contains the event payload as JSON.
In the Segment app, the Debugger tab for your source shows every event in real time with full payload details, destination delivery status, and any schema violations.
The Segment Spec: Core API Calls
Segment defines six core API methods. Every source SDK implements these same methods with consistent semantics.
identify
Associates a user with traits (persistent attributes):
analytics.identify('USER_123', {
name: 'Jane Martinez',
email: 'jane@example.com',
plan: 'premium',
company: {
id: 'COMPANY_456',
name: 'Acme Corp',
employee_count: 500
},
createdAt: '2026-01-15T00:00:00Z'
});
The first argument is the user ID. The second is a traits object. Segment stores traits in the user's profile and includes them as context in subsequent calls. Traits are forwarded to each destination in its expected format (e.g., $name for Mixpanel, user_properties for Amplitude).
Call identify when a user signs up, logs in, or updates their profile.
track
Records an action the user performed:
analytics.track('Item Purchased', {
item_id: 'SKU_456',
item_name: 'Pro Widget',
price: 29.99,
currency: 'USD',
category: 'Widgets',
quantity: 1
});
The first argument is the event name. The second is a properties object describing the event. Segment forwards this to each destination, mapping property names to the destination's expected schema.
page
Records a page view:
analytics.page('Docs', 'Segment Overview', {
url: window.location.href,
referrer: document.referrer,
title: document.title
});
Arguments: category (optional), name (optional), properties. Most implementations call analytics.page() with no arguments and let Segment auto-detect URL, title, and referrer.
group
Associates a user with a group (company, account, organization):
analytics.group('COMPANY_456', {
name: 'Acme Corp',
plan: 'enterprise',
employee_count: 500,
industry: 'Technology'
});
Group calls are forwarded to destinations that support account-level analytics (Amplitude Groups, Mixpanel Group Analytics, Salesforce Accounts, etc.).
alias
Creates a permanent mapping between two user IDs. Used primarily for destinations that require explicit ID aliasing (like Mixpanel's original ID management):
analytics.alias('NEW_USER_ID');
Most modern destinations handle identity resolution through identify calls, making alias less commonly needed.
screen
The mobile equivalent of page. Records a screen view in iOS or Android apps:
// iOS
analytics.screen("Dashboard", properties: ["section": "overview"])
Identity and User Tracking
Segment manages identity through two IDs:
Anonymous ID: A UUID generated by the SDK on first load and stored in a cookie (ajs_anonymous_id) or localStorage. Every event includes this ID until the user is identified.
User ID: Set by your code via identify('USER_123'). Once set, the user ID is included in all subsequent calls alongside the anonymous ID.
When identify is called for the first time on a device, Segment sends both the anonymous ID and user ID to each destination. Destinations that support identity resolution (Amplitude, Mixpanel, etc.) use this to merge the anonymous and identified profiles.
// Before login: events have anonymous_id only
analytics.track('Page Viewed');
// On login: link anonymous_id to user_id
analytics.identify('USER_123', { name: 'Jane Martinez' });
// After identify: events include both anonymous_id and user_id
analytics.track('Dashboard Viewed');
// On logout: reset to a new anonymous_id
analytics.reset();
analytics.reset() clears the user ID and generates a new anonymous ID. Call this on logout to prevent the next user's events from being attributed to the previous user.
Protocols: Schema Enforcement
Protocols let you define a Tracking Plan that specifies:
- Which events are expected (event names)
- Which properties each event should have (property names, types, required/optional)
- Which traits
identifycalls should include
When an event arrives that does not match the Tracking Plan:
- Violations are logged in the Segment UI with details about what failed
- Blocking (optional) drops the non-conforming event before it reaches destinations
- You can configure violations to trigger alerts via email or webhook
Define your Tracking Plan in the Segment app under Protocols > Tracking Plans, or manage it via the Config API:
# Create a tracking plan rule via the Config API
curl -X POST "https://platform.segmentapis.com/v1beta/workspaces/YOUR_WORKSPACE/tracking-plans" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"display_name": "Web App Tracking Plan",
"rules": {
"events": [
{
"name": "Item Purchased",
"description": "User completes a purchase",
"rules": {
"properties": {
"type": "object",
"required": ["item_id", "price", "currency"],
"properties": {
"item_id": { "type": "string" },
"price": { "type": "number" },
"currency": { "type": "string" }
}
}
}
}
]
}
}'
Protocols use JSON Schema to validate event properties. This catches instrumentation errors before bad data reaches your analytics tools and warehouse.
Segment Functions
Segment Functions let you write custom JavaScript that runs in Segment's infrastructure. There are two types:
Source Functions: Receive data from webhooks or custom sources and translate it into Segment events:
// Source Function: ingest Stripe webhook events
async function onRequest(request, settings) {
const body = request.json();
if (body.type === 'charge.succeeded') {
Segment.track({
userId: body.data.object.customer,
event: 'Payment Succeeded',
properties: {
amount: body.data.object.amount / 100,
currency: body.data.object.currency,
charge_id: body.data.object.id
}
});
}
}
Destination Functions: Transform events before sending them to a custom API endpoint:
// Destination Function: send events to an internal API
async function onTrack(event, settings) {
await fetch(settings.apiEndpoint, {
method: 'POST',
headers: {
'Authorization': `Bearer ${settings.apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
user: event.userId,
action: event.event,
data: event.properties,
timestamp: event.timestamp
})
});
}
Functions run on AWS Lambda under the hood. They support async/await, fetch, and standard Node.js APIs.
Personas (Profiles API)
Segment Personas (now called Unify) builds unified user profiles by merging identity and trait data from all sources. The Profiles API lets you query these profiles:
# Get a user profile by user_id
curl "https://profiles.segment.com/v1/spaces/YOUR_SPACE_ID/collections/users/profiles/user_id:USER_123/traits" \
-H "Authorization: Basic BASE64_API_TOKEN"
# Get a user's event history
curl "https://profiles.segment.com/v1/spaces/YOUR_SPACE_ID/collections/users/profiles/user_id:USER_123/events?limit=20" \
-H "Authorization: Basic BASE64_API_TOKEN"
Personas also supports computed traits (e.g., "total revenue in the last 30 days") and audiences (e.g., "users who purchased in the last 7 days but haven't logged in since"). These computed values are synced to destinations in real time, enabling personalization and targeting without building custom data pipelines.
Warehouse Destinations
Segment can route all events to a data warehouse (BigQuery, Snowflake, Redshift, Postgres, Databricks). The warehouse destination:
- Receives events from Segment's pipeline
- Creates tables for each event type (e.g.,
item_purchased,page_viewed) - Creates an
identifiestable with all user traits - Syncs on a configurable schedule (every 1 hour minimum, or near-real-time with Selective Sync)
Each event table has columns for every property ever sent with that event, plus standard columns: id, user_id, anonymous_id, timestamp, received_at, context_* fields.
-- Query Segment data in your warehouse
SELECT
user_id,
item_name,
price,
timestamp
FROM production.item_purchased
WHERE timestamp > CURRENT_DATE - INTERVAL '7 days'
ORDER BY timestamp DESC;
-- Join with user traits
SELECT
p.user_id,
i.name AS user_name,
i.plan,
COUNT(*) AS purchase_count,
SUM(p.price) AS total_revenue
FROM production.item_purchased p
JOIN production.identifies i ON p.user_id = i.user_id
WHERE p.timestamp > CURRENT_DATE - INTERVAL '30 days'
GROUP BY p.user_id, i.name, i.plan
ORDER BY total_revenue DESC;
Common Issues
| Issue | Cause | Fix |
|---|---|---|
| Events not reaching a destination | Destination is in device mode but the SDK bundle doesn't include it, or destination is disabled | Check the destination's connection mode (device vs cloud) in the Segment app; verify the destination is enabled and the write key is correct |
| Duplicate events in warehouse | Segment retries on delivery failure, causing duplicates | Deduplicate using the id field (Segment's message ID) with DISTINCT or window functions in SQL |
anonymous_id changes on every page |
Cookies are blocked, cleared by a privacy tool, or the site uses a different subdomain | Set cookie.domain in analytics.js options; verify no privacy extension is clearing ajs_anonymous_id |
identify traits not appearing in destination |
Destination only accepts traits on identify calls, but you're sending them on track |
Send user traits via identify, not as track properties; some destinations require explicit trait mapping |
| Protocols showing violations for valid events | Tracking Plan is outdated or property types don't match (e.g., sending a number where string is expected) | Update the Tracking Plan to match the current implementation; check property types in the event payload |
| Device mode destination loading slowly | Multiple device-mode destinations load their SDKs sequentially | Switch destinations to cloud mode where possible; use Destination Filters to reduce the number of events sent to each destination |
| Server-side events missing context | Server libraries don't automatically collect browser context (IP, user agent, locale) | Manually pass context.ip, context.userAgent, and context.locale in server-side calls if needed by downstream destinations |
| Warehouse sync delayed | Sync schedule set to a long interval, or sync job failed | Check the warehouse destination status in Segment; reduce sync interval; check warehouse permissions and connection settings |
Platform-Specific Considerations
Data residency: Segment offers US and EU data residency. EU workspace data is processed and stored in EU infrastructure. The API endpoint changes to events.eu1.segmentapis.com for EU workspaces. This must be configured at workspace creation.
Rate limits: The Tracking API accepts up to 500 requests per second per source. The Batch API (/v1/batch) accepts up to 500KB per request and up to 32KB per individual event. Server-side libraries batch events automatically and flush on an interval.
analytics.js bundle size: The base analytics.js library is approximately 35KB gzipped. Each device-mode destination adds its SDK to the bundle. A site with 5 device-mode destinations might load 100-200KB of additional JavaScript. Switching destinations to cloud mode eliminates this overhead.
Replay: Segment stores raw events for the duration of your contract. You can replay historical events to a new destination, which means you can add a new analytics tool and backfill it with months of data without re-instrumenting anything.