SAS to BigQuery Migration | MigryX Case Study

Executive Summary

A leading US healthcare provider and hospital network operating across 14 states engaged MigryX to retire a deeply entrenched SAS Enterprise Guide environment that had accumulated nearly two decades of clinical, operational, and financial analytics. Spanning 1,900 SAS scripts, over 2.6 million lines of code, and tightly coupled to an on-premises Teradata data warehouse, the existing platform imposed escalating licensing costs, HIPAA-era compliance risks, and performance ceilings that were incompatible with the organization's push toward real-time population health analytics and value-based care reporting.

Over a focused nine-month engagement, MigryX deployed its parser-driven migration platform to systematically translate every SAS PROC, DATA step, and macro into idiomatic BigQuery SQL, Dataform pipelines, and PySpark jobs on Google Cloud Dataproc. The result: a fully cloud-native analytics estate, a 7X improvement in end-to-end query performance, and projected savings of $4.0 million over two years, all without a single disruption to live clinical reporting.

Client Overview

The client is a not-for-profit integrated health system operating 28 hospitals, over 300 outpatient facilities, and employing more than 45,000 clinical and administrative staff. Its analytics function supports a wide range of mission-critical use cases: clinical quality metrics, CMS star ratings, cost-of-care modeling, population health cohort analysis, staffing optimization, and payer contract performance reporting.

For nearly 18 years, SAS had been the organization's analytical backbone. SAS Enterprise Guide served as the primary authoring environment for statisticians, clinical data analysts, and financial modelers. The platform was tightly integrated with a Teradata EDW that ingested data from dozens of source systems including Epic, Cerner, Meditech, and various claims clearinghouses. While the combination had served the organization well during a period of steady-state reporting, the acceleration of value-based care initiatives, the growth of real-time bedside analytics, and the desire to leverage modern ML tooling made the legacy stack increasingly untenable.

Business Challenge

The analytics leadership team had explored a gradual, manual rewrite of SAS code on several prior occasions. Each attempt stalled within weeks as the true scale and interdependency of the code became apparent. The core challenges driving the MigryX engagement were:

Licensing cost trajectory: Annual SAS and Teradata licensing, maintenance, and hardware refresh costs had grown to over $3.2 million per year, with no path to reduction under the existing architecture. Cloud-native BigQuery offered a consumption-based model that analysts estimated could reduce annual compute and storage spend by 60% or more.
HIPAA compliance surface area: Clinical data was replicated across multiple SAS libraries stored on shared network drives, with inconsistent encryption-at-rest policies and no granular audit logging at the column level. A move to BigQuery's native IAM, column-level security, and VPC Service Controls would dramatically shrink the compliance surface and simplify SOC 2 and HITRUST audit cycles.
Teradata coupling and migration dependency: The SAS estate was not merely scripts calling a database. Hundreds of scripts used SAS/ACCESS to Teradata with DBMS-specific pass-through SQL, Teradata-dialect macros, and implicit session-level formatting assumptions. Any BigQuery migration required simultaneously resolving Teradata-specific SQL constructs embedded within SAS source files.
Clinical analytics dependency chains: Key reports consumed by CMS quality submissions, Joint Commission readiness reviews, and payer contracting teams depended on macro chains that could span eight to twelve nested macro calls, with shared global macro variables injected at runtime. Manual refactoring of these chains carried an unacceptable risk of output divergence in regulated reporting contexts.
Analyst productivity and time-to-insight: Complex cohort queries that required PROC SQL joins across multiple Teradata-resident tables routinely took 45 to 90 minutes to complete. Analytics teams had built elaborate workarounds involving intermediate SAS datasets and batched export jobs, compounding maintenance overhead and introducing latency into clinical decision support workflows.
Talent and succession risk: The organization's SAS-certified analyst population was aging, with a significant share of senior SAS developers approaching retirement. Recruiting junior talent into a proprietary, vendor-locked environment had become increasingly difficult, while Python and SQL fluency were abundant in the market.

The MigryX Approach

MigryX began the engagement with a comprehensive discovery phase, ingesting the full SAS code repository into its static analysis engine. The platform constructed a dependency graph spanning all 1,900 scripts, identifying 847 unique macros, 214 distinct PROC types, 63 SAS/ACCESS Teradata pass-through blocks, and 1,109 DATA step transformations. This inventory, completed in under 48 hours, replaced weeks of manual cataloging and gave both the MigryX team and the client's migration governance committee a precise view of migration complexity before a single line of target code was written.

AST-Driven PROC Translation

The MigryX Abstract Syntax Tree (AST) parser decomposed each SAS source file into a structured parse tree, isolating PROC boundaries, DATA step logic, macro definitions, and call sites. For procedural SQL workloads, PROC SQL blocks were translated directly to BigQuery Standard SQL, with automatic resolution of Teradata pass-through fragments into their BigQuery equivalents. PROC MEANS, PROC FREQ, PROC SORT, PROC TRANSPOSE, and PROC TABULATE were each mapped to native BigQuery SQL aggregation patterns, window functions, and UNNEST expressions, preserving statistical semantics without resorting to emulation layers.

Macro-heavy scripts presented the most complex translation challenge. The MigryX macro resolver expanded and inlined macro calls within the AST prior to translation, enabling the output generator to produce clean, readable BigQuery SQL and Dataform SQLX files rather than macro-laden intermediaries. Each Dataform model was annotated with dependency declarations derived directly from the original SAS library reference graph, ensuring that Dataform's DAG-based orchestration matched the logical execution order of the legacy SAS job chains.

DATA Step to PySpark on Dataproc

SAS DATA steps that performed row-by-row iterative logic, retain-based accumulation, or complex conditional branching were identified by the classifier as unsuitable for direct SQL translation. For these 312 DATA steps, MigryX generated equivalent PySpark DataFrame transformations deployed as managed jobs on Google Cloud Dataproc. The generated PySpark code preserved the original variable naming conventions and transformation logic, and each job was unit-tested against a sample of the original SAS output data to validate fidelity before promotion to the production pipeline.

Orchestration and Pipeline Governance

Legacy SAS batch schedules, managed through a combination of SAS Management Console and Windows Task Scheduler, were replaced with Cloud Composer (managed Apache Airflow) DAGs. MigryX generated DAG skeletons directly from the dependency graph, with each Dataform model execution and each Dataproc job submission represented as a native Airflow operator. This gave the analytics engineering team immediate visibility into pipeline lineage, retry behavior, and SLA tracking through the Airflow UI, replacing opaque SAS batch logs with structured, searchable execution history.

Migration Architecture

Component	Legacy (Before)	Modern (After)
Analytics authoring	SAS Enterprise Guide 7.x	Dataform SQLX + dbt-compatible models
Data warehouse	Teradata 16.x on-premises EDW	Google BigQuery (multi-region US)
Procedural transforms	SAS DATA steps, PROC steps	PySpark on Google Cloud Dataproc
Orchestration	SAS Management Console + Windows Task Scheduler	Cloud Composer (Apache Airflow 2.x)
Clinical ML scoring	SAS Enterprise Miner batch scoring	Vertex AI Model Registry + BigQuery ML
Access control	SAS metadata server roles + Teradata grants	Google IAM + BigQuery column-level security
Audit & lineage	SAS log files on shared NFS	BigQuery audit logs + Dataplex lineage
Storage cost model	Fixed capacity hardware + SAN refresh cycles	BigQuery on-demand + long-term storage pricing

Key Migration Highlights

    MigryX Migration Highlights — SAS to BigQuery
    1,900 SAS scripts translated covering clinical quality, population health, financial modeling, and payer analytics domains, all produced by MigryX's AST parser with automated conversion for 94% of scripts with no manual rewrite required.
63 Teradata pass-through blocks automatically resolved to BigQuery Standard SQL, eliminating a dependency on Teradata dialect that would have blocked any manual migration effort.
847 SAS macros fully expanded and inlined by the MigryX macro resolver prior to translation, producing readable, maintainable Dataform SQLX models without macro scaffolding.
312 DATA steps converted to validated PySpark jobs on Dataproc, each regression-tested against a 90-day sample of historical SAS output to confirm numeric and categorical output parity.
Zero disruption to live CMS reporting: All 14 CMS quality measure scripts were migrated in a shadow-run mode, with BigQuery output validated against SAS output in parallel for 60 days before cutover.
HIPAA compliance uplift: Column-level BigQuery security policies applied to all 38 tables containing protected health information (PHI), replacing SAS library-level access controls with row and column policy enforcement.

  

Security & Compliance

For a healthcare organization operating under HIPAA, HITECH, and state-level health data privacy regulations, compliance was not a secondary concern but a primary migration driver. The MigryX platform and the Google Cloud target architecture were designed together to address these requirements comprehensively.

All PHI-bearing BigQuery tables were provisioned within a dedicated Google Cloud project governed by VPC Service Controls, ensuring that no data could be accessed outside the defined security perimeter even by privileged Google employees. BigQuery's native column-level security policies replaced the coarse SAS metadata server role model, allowing the client's compliance team to enforce least-privilege access at the individual field level for the first time.

Customer-managed encryption keys (CMEK) via Cloud KMS were applied to all BigQuery datasets containing clinical data, satisfying the organization's internal encryption policy and the HITRUST CSF requirement for customer-controlled key management. BigQuery audit logs were streamed to Cloud Logging and retained for seven years in a locked archive bucket, providing the immutable access history required for HIPAA audit obligations and Joint Commission readiness reviews.

De-identification pipelines for research analytics use cases were rebuilt using Cloud Healthcare API's DLP-based de-identification capabilities, replacing a custom SAS macro de-identification approach that had not undergone a formal security review since 2019. The new architecture was reviewed and approved by the client's Privacy Officer and external HIPAA counsel prior to go-live.

Results & Business Impact

The migration delivered measurable, quantified outcomes across financial, operational, and clinical dimensions within the first six months of production operation.

1,900

SAS scripts fully migrated to BigQuery & Dataproc

2.6M

Lines of SAS code translated by MigryX AST parser

Query performance improvement vs. legacy Teradata + SAS

$4.0M

Projected cost savings over 2 years (licensing + infrastructure)

9 mo.

End-to-end migration duration from discovery to production cutover

94%

Scripts translated with zero manual intervention required

Population health cohort queries that previously required 60-to-90-minute SAS batch jobs now complete in under nine minutes in BigQuery, enabling clinical decision support dashboards to refresh intraday rather than overnight. The analytics engineering team has significantly reduced routine pipeline maintenance time, with the team reporting that the majority of prior Teradata queue management and SAS job debugging work had been eliminated, freeing staff to focus on new model development rather than SAS job debugging and Teradata queue management.

On the financial side, the elimination of SAS and Teradata licenses, combined with the decommission of three aging on-premises data warehouse nodes, is projected to deliver cumulative savings of $4.0 million over 24 months. The transition to BigQuery's on-demand pricing model also introduced a direct feedback loop between query cost and query design, motivating the analytics team to write more efficient SQL and adopt partition pruning and clustering practices that further reduced ongoing compute spend.

"We had been told our SAS estate was too complex and too critical to migrate without a multi-year rewrite. MigryX delivered a production-ready BigQuery environment in nine months without disrupting CMS reporting. Our analysts are now working in Dataform and Python, and the compliance improvements alone justified the investment."

— VP of Analytics Engineering, Leading US Healthcare Provider (anonymized)

Ready to Modernize Your SAS Estate?

See how MigryX can accelerate your migration to BigQuery — without disrupting clinical or operational reporting.

Explore BigQuery Migration →

From SAS Enterprise Guide to Google BigQuery: How a National Healthcare Network Modernized 1,900 Clinical Analytics Scripts