Who the Organization Is
Verra is a nonprofit organization that operates the world’s leading carbon crediting program, the Verified Carbon Standard (VCS), alongside standards programs for sustainable development and plastic. With more than 18 years of experience, Verra has set the benchmark for quality and integrity in environmental and social markets.
Verra’s methodologies are built on rigorous science. They raise the bar on transparency and credibility, all while embedding stronger safeguards and benefit-sharing.
Today, Verra standards programs enable companies, countries, and communities to turn goals into action, and its digitalization efforts play a vital role in this context. From digital MRV tools to establishing jurisdictional deforestation data, from transparent documentation to streamlined project registration processes, Verra is powering the transformation we need. To date, Verra has registered more than 3,400 projects in 125+ countries and has issued over 1.3+ billion carbon credits.
While Verra is working to digitize its project workflow, most historical project information still exists in PDF format. These documents contain complex templates, forms, and agreements, and many cover technical details such as emission reduction calculations, as well as structured and unstructured data. Digitizing them enables greater interoperability, automation, transparency, and scalability in managing the many projects Verra certifies.
To illustrate this potential, an AI-driven initiative demonstrated that data from these PDFs can be extracted accurately and reliably, confirming the value of modernizing legacy documents.
The Problem

Verra faced three major challenges.
First, they were dealing with a massive and diverse document backlog. The organization had roughly 8,000 historical PDFs that needed to be digitized and included a large number of unique document types. There was no fixed layout, no consistent placement of fields, and no reliable structure to target.
Second, Verra’s extraction approach was not scalable. The engineering team previously relied on extensive regex logic to pull out fields such as greenhouse gas figures, project identifiers, and monitoring periods. Each field needed its own pattern. A single regex could take 2-4 days to design and refine because every document had edge cases. Scaling this approach to hundreds or thousands of fields would have taken months of development time.
Third, Verra needed a solution that their internal developers could maintain independently. They wanted to onboard new document types, update schemas, and integrate with internal systems without relying on an external vendor or recurring custom engineering work.
The Solution

Through the AWS BOX program, Verra partnered with Upstage and the systems integrator Pariveda to create an automated document extraction pipeline to progressively replace the entire regex-based workflow.
The new system uses Upstage’s Information Extract technology, an agentic solution which combines OCR with a language model. Instead of relying on templates or static rules, the system interprets documents the way an expert would. It understands page layout, reads tables accurately, follows the flow of multi-column structures, and recognizes the semantic relationships between fields.
A key part of the solution is its schema-driven design. Verra’s developers now provide a simple JSON schema that describes the fields they want. The AI interprets the schema, locates the correct information, validates the extraction, and produces structured output. No regex. No hand-coded rules. No brittle code that breaks when formatting changes.
To support long-term autonomy, the project was delivered with:
- Infrastructure as code for predictable deployment
- DynamoDB for structured storage
- Cognito for secure authentication
- Amplify for front-end hosting
- CloudWatch for monitoring and auditing
- API endpoints that make downstream integration straightforward
The BOX-funded engagement moved quickly. After the matchmaking event in April, the project received funding approval in June, began development in July, and completed delivery and knowledge transfer by late August. Verra was able to onboard a new document to type entirely on their own shortly afterward, confirming that the system was truly self-extensible.
The Impact

The results were immediate and meaningful. In the initial phase, the system extracted data from more than 7,000 pages across roughly 50 documents. The MVP mainly focused on project description template and monitoring report document types. Accuracy results were about 90- 100% for critical fields and 80-90% for secondary fields.
More importantly, Verra’s developers can now process up to a thousand fields in days or weeks. The regex approach would require months of engineering effort and constant maintenance. The new workflow significantly reduces that burden.
Verra’s CTO described the Upstage contribution as a combination of robust architecture and strategic guidance that is unlocking insights trapped in legacy PDFs. Data can be more instantly accessed which will help to enable quicker review times and digital documents can be indexed and analyzed for trends, and compliance and performance metrics, and will help to enable AI driven risk assessment and predictive modeling for carbon projects.
Before and After Upstage

The following comparison highlights the shift in Verra’s document processing capabilities following the implementation of the Upstage AI solution.
Before
- Dozens of regex patterns for each document type
- 2-4 days of work per field
- Constant breakage when document formats changed
- Slow, fragile, and difficult to maintain
- Potentially thousands of fields from 8,000-document in backlog
After
- Simple schema describing the information needed for the document type
- AI that analyzes layout and content automatically • Clean, validated structured output ready for internal systems • New document types onboarded in hours or days • A backlog that is now fully manageable





