Untangling GraphQL Auth: Stopping Field-Level Data Leaks in TypeScript APIs

Alprina Security Team

Cover Image for Untangling GraphQL Auth: Stopping Field-Level Data Leaks in TypeScript APIs

Alprina Security Team

August 12, 2024

Hook: The Feature Flag That Bypassed Your ACL

Your team just shipped a set of premium analytics widgets behind a feature flag. The GraphQL schema already had per-field directives for @requiresRole("analyst"), so nobody worried about access control. A week later, a support engineer notices junior users pulling the premium metrics through a single query. The culprit? A lonely resolver that checks the feature flag before the role, and a caching layer that reuses authorized results across sessions. No audit log fired, the CDN happily cached the response, and your billing logic never noticed the leak.

This is not a contrived bug. It is the outcome of how we compose GraphQL resolvers: tiny functions, composed through middleware, often written by different people months apart. If you layer those resolvers wrong, field-level rules silently fall away. This article walks through the core failure modes, shows you the differences between short-term patches and durable fixes, and gives you a battery of automated tests to prevent regression. You will see real TypeScript snippets, not marketing slides, plus the trade-offs you make when optimizing for latency versus consistency.

The Problem Deep Dive

GraphQL pushes authorization checks down to resolver functions. That flexibility is also the trap. Most Node backends wire auth via middleware that injects a viewer object onto the context, then decorate resolvers with helper wrappers. The failure modes stem from two patterns:

Resolver composition order. Developers frequently wrap resolvers with withFeatureFlag before withRole, meaning feature gating occurs without verifying the caller's privileges. If the flag is on, the resolver returns data even when viewer.role is insufficient.
Shared cache layers. Libraries such as dataloader or CDN edge caches reuse results based on query text and variables. If authorization is not part of the cache key, a privileged response can leak to unprivileged consumers.

Here is a simplified anti-pattern:

const premiumReport = withFeatureFlag("premium-analytics",
  withRateLimit(5, 60,
    async (_parent, args, ctx) => {
      if (!ctx.viewer) throw new AuthenticationError("login first");
      return resolvePremiumReport(args.accountId);
    }
  )
);

The developer assumed a later directive ensured role checks. In reality, nothing stops a basic user from requesting the field. Worse, if the accountId is optional and defaults to the viewer's tenant, simple queries expose sensitive forecasts.

Developers also misuse schema-level directives that run at schema build time, not per request. For example, @auth directives implemented via schema transforms wrap resolvers, but adding a feature flag inside the resolver bypasses the wrapper. Another recurring mistake is relying on info.operation.operation === "mutation" checks to guard data writes; batching via graphql-upload or persisted queries can trigger unexpected execution paths.

Latency optimizations amplify these issues. Edge caching with apollo-server-plugin-response-cache or custom Varnish rules often key solely on query hash. Without including viewer.id or viewer.role, cached responses bleed between sessions. Developers under pressure to meet SLOs rarely revisit these keys, especially when initial load tests show acceptable behavior.

Technical Solutions

Quick Patch: Assert Role at the Top of the Resolver

A stopgap is to insert explicit guard clauses before any business logic:

const premiumReport = compose(
  withTracing,
  async (parent, args, ctx, info) => {
    if (!ctx.viewer) throw new AuthenticationError("login required");
    if (!ctx.viewer.roles.includes("analyst")) {
      throw new ForbiddenError("analyst role required");
    }
    return resolvePremiumReport(args.accountId ?? ctx.viewer.accountId);
  }
);

This works but is brittle. Every engineer must remember to replicate the check. Miss one, and you are back where you started.

Durable Fix: Centralize Authorization Wrappers

Use higher-order functions that ensure auth wrappers execute after feature flags but before business logic. A small helper makes intent obvious:

type Resolver<TParent, TArgs, TResult> = (
  parent: TParent,
  args: TArgs,
  ctx: GraphQLContext,
  info: GraphQLResolveInfo
) => Promise<TResult>;

type Guard = (ctx: GraphQLContext) => void;

const requireRole = (role: string): Guard => (ctx) => {
  if (!ctx.viewer) throw new AuthenticationError("login required");
  if (!ctx.viewer.roles.includes(role)) {
    throw new ForbiddenError(`${role} role required`);
  }
};

const withGuards = <TParent, TArgs, TResult>(guards: Guard[],
  resolver: Resolver<TParent, TArgs, TResult>) => (
    parent: TParent,
    args: TArgs,
    ctx: GraphQLContext,
    info: GraphQLResolveInfo
  ) => {
    for (const guard of guards) guard(ctx);
    return resolver(parent, args, ctx, info);
  };

export const premiumReport = withFeatureFlag("premium-analytics",
  withGuards([
    requireRole("analyst"),
    requireTenantAccess((ctx) => ctx.viewer.accountId),
  ], async (_parent, args, ctx) => {
    return resolvePremiumReport(args.accountId ?? ctx.viewer.accountId);
  })
);

Because withGuards runs after withFeatureFlag, the feature flag cannot short-circuit role enforcement. You can extend requireTenantAccess to validate row-level security.

Cache Safety: Key on Authorization Claims

When caching, inject auth context into the key. For Apollo response cache:

const responseCachePlugin = ApolloServerPluginResponseCache({
  sessionId: (requestContext) => {
    const viewer = requestContext.contextValue.viewer;
    if (!viewer) return null;
    return `${viewer.id}:${viewer.roles.sort().join("|")}`;
  },
});

Also configure CDN edge rules to vary on an X-Viewer-Id header. Expect a cost: cache hit rates drop, but you avoid leaking privileged data. Measure the latency impact and consider per-role caches if roles correlate with large data differences.

Schema-Level Guards with Directives

If your team prefers schema directives, ensure feature flags execute inside the guard. A directive implementation might look like:

class RequiresRoleDirective extends SchemaDirectiveVisitor {
  visitFieldDefinition(field: GraphQLField<any, any>) {
    const resolver = field.resolve ?? defaultFieldResolver;
    field.resolve = async function (parent, args, ctx, info) {
      if (!ctx.viewer) throw new AuthenticationError("login required");
      if (!ctx.viewer.roles.includes(this.args.role)) {
        throw new ForbiddenError("role required");
      }
      return resolver.call(this, parent, args, ctx, info);
    };
  }
}

The trick is to wrap feature flags as part of the resolver body, not as external middleware, so the directive always executes first.

Integration with Alprina (Optional)

If you run Alprina scanning, add rules that flag resolvers invoking feature gates without preceding auth guards. Feed your resolver directory to the policy engine and fail CI when a resolver calls resolvePremiumReport without withGuards.

Testing & Verification

Add contract-style tests in Jest or Vitest to enforce resolver ordering:

describe("premiumReport", () => {
  const run = (roles: string[], feature = true) =>
    premiumReport.resolve?.(null, {}, { viewer: { id: "u1", roles }, featureFlags: { premium: feature } }, {} as any);

  it("rejects viewers without analyst role", async () => {
    await expect(run(["basic"]))
      .rejects.toThrow(ForbiddenError);
  });

  it("allows analysts when feature is on", async () => {
    await expect(run(["analyst"]))
      .resolves.toMatchObject({});
  });

  it("disables feature when flag off", async () => {
    await expect(run(["analyst"], false))
      .rejects.toThrow(ForbiddenError);
  });
});

Complement unit tests with integration tests that hit the GraphQL endpoint using supertest and a mocked Redis cache. Seed responses for an analyst user, then reissue the same query as a basic user. The test should fail if the cache ignores auth context.

In CI, run Alprina or custom AST linters that detect resolvers missing guard helpers. You can also enforce schema directives via graphql-constraint-directive to reject schema builds when directives are absent.

Common Questions & Edge Cases

What about hot-path queries where guard checks dominate latency? Cache the viewer.roles array on the context and reuse it. Authorization checks are cheap compared to leaking revenue data. If latency still hurts, consider precomputing per-role aggregates instead of bypassing auth.

Can I rely on database row-level security instead? Yes, but enforce both. Postgres RLS protects against misconfigured resolvers, yet you still need to ensure resolvers do not request cross-tenant data to begin with.

How do I secure federated GraphQL? Apply guards in each subgraph. Federation headers do not guarantee auth propagation. Treat every subgraph as potentially exposed.

What if I have dozens of roles? Use permission bitmasks or policy engines (e.g., Cedar, Oso) to keep code readable. The guard pattern still applies; the guard simply calls the policy engine instead of checking an array.

Does caching per user kill performance? Measure it. In most SaaS apps, cache hit rates shift but stay profitable. If you cannot afford per-user caches, at least vary on coarse-grained claims like role or tenant.

Conclusion

GraphQL's flexibility is both its strength and its liability. When you bolt feature flags or performance hacks onto resolvers, you silently change the security boundary. Guard helpers, auth-aware cache keys, and automated tests keep those boundaries intact. The next time you ship a premium field, make the guard explicit, bake it into your caching strategy, and let CI fail if anyone tries to bypass it.

Alprina Blog