Measuring Collaborative Intelligence: How Basel and Microsoft’s 2025 Research Advances the Science of Human Cognitive Amplification

Basel and Microsoft proved AI boosts productivity and learning. The Human Enhancement Quotient explains what those metrics miss: the measurement of human intelligence itself.

Opening Framework

Two major studies published in October 2025 prove AI collaboration boosts productivity and learning. What they also reveal: we lack frameworks to measure whether humans become more intelligent through that collaboration. This is the measurement gap the Human Enhancement Quotient addresses.

We are measuring the wrong things. Academic researchers track papers published and journal rankings. Educational institutions measure test scores and completion rates. Organizations count tasks completed and time saved.

None of these metrics answer the question that matters most: Are humans becoming more capable through AI collaboration, or just more productive?

This is not a semantic distinction. Productivity measures output. Intelligence measures transformation. A researcher who publishes 36% more papers may be writing faster without thinking deeper. A student who completes assignments more quickly may be outsourcing cognition rather than developing it.

The difference between acceleration and advancement is the difference between borrowing capability and building it. Until we can measure that difference, we cannot govern it, improve it, or understand whether AI collaboration enhances human intelligence or merely automates human tasks.

The Evidence Arrives: Basel and Microsoft

Basel’s Contribution: Productivity Without Cognitive Tracking

University of Basel (October 2, 2025)
Can GenAI Improve Academic Performance? (IZA Discussion Paper No. 17526)

Filimonovic, Rutzer, and Wunsch delivered rigorous quantitative evidence using author-level panel data across thousands of researchers. Their difference-in-differences approach with propensity score matching provides methodological rigor the field needs. The findings are substantial: GenAI adoption correlates with productivity increases of 15% in 2023, rising to 36% by 2024, with modest quality improvements measured through journal impact factors.

The equity findings are particularly valuable. Early-career researchers, those in technically complex subfields, and authors from non-English-speaking countries showed the strongest benefits. This suggests AI tools may lower structural barriers in academic publishing.

What the study proves: Productivity gains are real, measurable, and significant.

What the study cannot measure: Whether those researchers are developing stronger analytical capabilities, whether their reasoning quality is improving, or whether the productivity gains reflect permanent skill enhancement versus temporary scaffolding.

As the authors note in their conclusion: “longer-term equilibrium effects on research quality and innovation remain unexplored.”

This is not a limitation of Basel’s research. It is evidence of the measurement category that does not yet exist.

Microsoft’s Contribution: Learning Outcomes Without Cognitive Development Metrics

Microsoft Research (October 7, 2025)
Learning Outcomes with GenAI in the Classroom (Microsoft Technical Report MSR-TR-2025-42)

Walker and Vorvoreanu’s comprehensive review across dozens of educational studies provides essential guidance for educators. Their synthesis documents measurable improvements in writing efficiency and learning engagement while identifying critical risks: overconfidence in shallow skill mastery, reduced retention, and declining critical thinking when AI replaces rather than supplements human-guided reflection.

The report’s four evidence-based guidelines are immediately actionable: ensure student readiness, teach explicit AI literacy, use AI as supplement not replacement, and design interventions fostering genuine engagement.

What the study proves: Learning outcomes depend critically on structure and oversight. Without pedagogical guardrails, productivity often comes at the expense of comprehension.

What the study cannot measure: Which specific cognitive processes are enhanced or degraded under different collaboration structures. Whether students are developing transferable analytical capabilities or becoming dependent on AI scaffolding. How to quantify the cognitive transformation itself.

As the report acknowledges: “isolating AI’s specific contribution to cognitive development” remains methodologically complex.

Again, this is not a research flaw. It is proof that our measurement tools lag behind our deployment reality.

Why Intelligence Measurement Matters Now

Together, these studies establish that AI collaboration produces measurable effects on human performance. What they also reveal is how much we still cannot see.

Basel tracks velocity and destination: papers published, journals reached. Microsoft tracks outcomes: scores earned, assignments completed. Neither can track the cognitive journey itself. Neither can answer whether the collaboration is building human capability or borrowing machine capability.

Organizations are deploying AI collaboration tools across research, education, and professional work without frameworks to measure cognitive transformation. Universities integrate AI into curricula without metrics for reasoning development. Employers hire for “AI-augmented roles” without assessing collaborative intelligence capacity.

The gap is not just academic. It is operational, ethical, and urgent.

“We measure what machines help us produce. We still need to measure what humans become through that collaboration.”
— Basil Puglisi, MPA

Enter Collaborative Intelligence Measurement

The Human Enhancement Quotient quantifies what Basel and Microsoft cannot: cognitive transformation in human-AI collaboration environments.

HEQ does not replace productivity metrics or learning assessments. It measures a different dimension entirely: how human intelligence changes through sustained AI partnership.

Let me demonstrate with a concrete scenario.

A graduate student uses ChatGPT to write a literature review.

Basel measures: Papers published, citation patterns, journal placement.

Microsoft measures: Assignment completion time, grade received, engagement indicators.

HEQ measures four cognitive dimensions:

Cognitive Amplification Score (CAS)

After three months of AI-assisted research, does the student integrate complex theoretical frameworks faster? Can they identify connections between disparate sources more efficiently? This measures cognitive acceleration, not output speed. Does the processing itself improve?

Evidence-Analytical Index (EAI)

Does the student critically evaluate AI-generated citations before using them? Do they verify claims independently? Do they maintain transparent documentation distinguishing AI contributions from independent analysis? This tracks reasoning quality and intellectual integrity in augmented environments.

Collaborative Intelligence Quotient (CIQ)

When working with peers on joint projects, does the student effectively synthesize AI outputs with human discussion? Can they explain AI contributions to committee members in ways that strengthen arguments rather than obscure thinking? This measures integration capability across human and machine perspectives.

Adaptive Growth Rate (AGR)

Six months later, working on a new topic without AI assistance, is the student demonstrably more capable at literature synthesis than before using AI? Did the collaboration build permanent analytical skill or provide temporary scaffolding? This tracks whether enhancement persists when the tool is removed.

Productivity measures what we produce. Intelligence measures what we become. The difference is everything.

These dimensions complement Basel and Microsoft’s findings while measuring what they cannot. If a researcher publishes 36% more papers (Basel’s metric) but shows declining source evaluation rigor (HEQ’s EAI), we understand the true cost of that productivity. If a student completes assignments faster (Microsoft’s metric) but demonstrates reduced independent capability afterward (HEQ’s AGR), we see the difference between acceleration and advancement.

Applying this framework retrospectively to Basel’s equity findings, we could test whether non-English-speaking researchers’ productivity gains correlate with improved analytical capability or simply faster translation assistance, distinguishing genuine cognitive enhancement from tool-mediated efficiency.

What Makes Collaborative Intelligence Measurable

The question is not whether AI helps humans produce more. Basel and Microsoft prove it does. The question is whether AI collaboration makes humans more intelligent in measurable, persistent ways.

HEQ treats collaboration as a cognitive environment that can be quantified across four dimensions. These metrics are tested across multiple AI platforms (ChatGPT, Claude, Gemini) with protocols that adapt to privacy constraints and memory limitations.

Privacy and platform diversity remain methodological challenges. HEQ acknowledges this transparently. Long-chat protocols measure deep collaboration where conversation history permits. Compact protocols run standardized assessments where privacy isolation requires it. The framework prioritizes measurement validity over platform convenience.

This is not theoretical modeling. It is operational measurement for real-world deployment.

The Three-Layer Intelligence Framework

What comes next is integration. Basel, Microsoft, and HEQ measure different aspects of the same phenomenon: human capability in AI-augmented environments.

These layers together form a complete intelligence measurement system:

Outcome Intelligence

Papers published, citations earned, journal rankings (Basel approach)
Test scores, completion rates, engagement metrics (Microsoft approach)
Validates that collaboration produces measurable effects

Process Intelligence

Cognitive amplification, reasoning quality, collaborative capacity (HEQ approach)
Tracks how humans change through the collaboration itself
Distinguishes enhancement from automation

Governance Intelligence

Equity measures, skill transfer, accessibility (integrated approach)
Ensures enhancement benefits are distributed fairly
Validates training effectiveness and identifies intervention needs

This three-layer framework lets us answer questions none of the current approaches addresses alone:

Do productivity gains come with cognitive development or at its expense? Which collaboration structures build permanent capability versus temporary scaffolding? How do we train for genuine enhancement rather than skilled tool use? When does AI collaboration amplify human intelligence and when does it simply automate human tasks?

Why “Generative AI” Obscures This Work

A brief note on terminology, because language shapes measurement.

When corporations and media call these systems “Generative AI,” they describe a commercial product, not a cognitive reality. Large language models perform statistical sequence prediction. They reflect and recombine human meaning at scale, weighted by probability, optimized for coherence.

Emily Bender and colleagues warned in On the Dangers of Stochastic Parrots that these systems produce fluent text without grounded understanding. The risk is not that machines begin to think, but that humans forget they do not.

If precision matters, the better term is Reflective AI: systems that mirror human input at scale. “Generative” implies autonomy. Autonomy sells investment. But it obscures the measurement question that actually matters.

The question is not what machines can generate. The question is what humans become when working with machines that reflect human meaning back at scale. That is an intelligence question. That is what HEQ measures.

Collaborative Intelligence Governance

Both Basel and Microsoft emphasize governance as essential. Basel’s authors call for equitable access policies supporting linguistically marginalized researchers. Microsoft’s review stresses pedagogical guardrails and explicit AI literacy instruction.

These governance recommendations rest on measurement. You cannot govern what you cannot measure. You cannot improve what you do not track.

Traditional governance asks: Are we using AI responsibly?

Intelligence governance asks: Are humans becoming more capable through AI use?

That second question requires measurement frameworks that track cognitive transformation. Without them, governance becomes guesswork. Organizations implement AI literacy training without metrics for reasoning development. Institutions adopt collaboration tools without frameworks for measuring genuine enhancement versus skilled automation.

HEQ moves from research contribution to governance necessity when we recognize that collaborative intelligence is the governance challenge.

The framework provides:

Capability Assessment: Quantify individual readiness for AI-augmented roles rather than assuming uniform benefit from training.

Training Validation: Measure whether AI collaboration programs build permanent capability or temporary productivity through pre/post cognitive assessment.

Equity Monitoring: Track whether enhancement benefits distribute fairly or concentrate among already-advantaged populations.

Intervention Design: Identify which cognitive processes require protection or development under specific collaboration structures.

This is not oversight of AI tools. This is governance of intelligence itself in collaborative environments.

Immediate Implementation Steps

For universities: Pilot HEQ assessment alongside existing outcome metrics in one department for one semester. Compare productivity gains with cognitive development measures.

For employers: Include collaborative intelligence capacity in job descriptions requiring AI tool use. Assess candidates on reasoning quality and adaptive growth, not just tool proficiency.

For training providers: Measure pre/post HEQ scores to demonstrate actual capability enhancement versus productivity gains. Use cognitive metrics to validate training effectiveness and justify continued investment.

What the Research Community Needs Next

As someone who builds measurement frameworks rather than commentary, I see these studies as allies in defining essential work.

For the Basel team: Your equity findings suggest early-career and non-English-speaking researchers benefit most from AI tools. The natural follow-up is whether that benefit reflects permanent capability enhancement or temporary productivity scaffolding. Longitudinal cognitive measurement using frameworks like HEQ could distinguish these and validate your impressive productivity findings with transformation data.

For the Microsoft researchers: Your emphasis on structure and oversight is exactly right. The follow-up question is which specific cognitive processes are protected or degraded under different scaffolding approaches. Process measurement frameworks could guide your intervention design recommendations with quantitative cognitive data.

For the broader research community: We now have evidence that AI collaboration affects human performance. The question becomes whether we can measure those effects at the level that matters: cognitive transformation itself.

This is not about replacing outcome metrics. It is about adding the intelligence layer that explains why those outcomes move as they do.

Closing Framework

The future of intelligence will not be machine or human. It will be measured by how well we understand what happens when they collaborate, and whether that collaboration builds capability or merely borrows it.

Basel and Microsoft mapped the outcomes. They proved collaboration produces measurable effects on productivity and learning. They also proved we lack frameworks to measure the cognitive transformation beneath those effects.

That is the measurement frontier. That is where HEQ operates. And that is what collaborative intelligence governance requires.

We can count papers published and test scores earned. Now we need to measure whether humans become more intelligent through the collaboration itself, with the same precision we expect from every other science.

The work ahead is not about building smarter machines. It is about learning to measure how intelligence evolves when humans and systems learn together.

Not productivity. Not outcomes. Intelligence itself.

References

Filimonovic, D., Rutzer, C., & Wunsch, C. (2025, October 2). Can GenAI Improve Academic Performance? Evidence from the Social and Behavioral Sciences. University of Basel / IZA Discussion Paper No. 17526. arXiv:2510.02408
Walker, K., & Vorvoreanu, M. (2025, October 7). Learning outcomes with GenAI in the classroom: A review of empirical evidence. Microsoft Technical Report MSR-TR-2025-42. Read the full report
Puglisi, B. (2025, September 28). The Human Enhancement Quotient: Measuring Cognitive Amplification Through AI Collaboration. https://basilpuglisi.com/the-human-enhancement-quotient-heq-measuring-cognitive-amplification-through-ai-collaboration-draft/
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? ACM FAccT 2021