{
"cells": [
{
"cell_type": "markdown",
"id": "5dacbe5f-0aa3-4b43-bfb1-cc58f02a7ded",
"metadata": {},
"source": [
"# Cardinality matching"
]
},
{
"cell_type": "markdown",
"id": "14c518ee-3bc0-493a-9eb3-e2a289642823",
"metadata": {},
"source": [
"Cardinality matching is the process of finding the size of the largest subset $\\hat{P}$ of a pool of patient $P$ within some \"distance\" of a given target population $T$:\n",
"\n",
"\\begin{equation}\n",
"\\begin{aligned}\n",
"& \\underset{\\hat{P}}{\\text{maximize}}\n",
"& & |\\hat{P}| \\\\\n",
"& \\text{subject to}\n",
"& & |\\mu_{\\hat{P}k} - \\mu_{Tk}| \\leq \\delta \\textrm{ for all }k\n",
"\\end{aligned}\n",
"\\end{equation}\n",
"\n",
"where $k$ indexes the covariates of $P$ and $T$, and $\\mu$ denotes the means of the corresponding covariates. In cardinality matching, at least as implemented here, we search only for the size of the largest subset.\n",
"Then in a second step, we optimize the balance (distance) among all subsets of the determined size."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "47d6b3c6-2bc8-4091-8108-ae6f8f2167a9",
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"logging.basicConfig(level='INFO')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "b83085cf-5a4d-4db3-85ff-317d6e376735",
"metadata": {},
"outputs": [],
"source": [
"from pybalance.utils.balance_calculators import *\n",
"from pybalance.utils import MatchingData\n",
"from pybalance.sim import generate_toy_dataset, load_paper_dataset\n",
"from pybalance.lp import ConstraintSatisfactionMatcher\n",
"from pybalance.visualization import *"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "2e8252d9-3a36-44c4-9bbd-c01349be440e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" Headers Numeric:
\n",
" ['age', 'height', 'weight']
\n",
" Headers Categoric:
\n",
" ['gender', 'haircolor', 'country', 'binary_0', 'binary_1', 'binary_2', 'binary_3']
\n",
" Populations
\n",
" ['pool', 'target']
\n",
"
\n", " | age | \n", "height | \n", "weight | \n", "gender | \n", "haircolor | \n", "country | \n", "population | \n", "binary_0 | \n", "binary_1 | \n", "binary_2 | \n", "binary_3 | \n", "patient_id | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "62.731988 | \n", "130.816972 | \n", "76.100401 | \n", "0.0 | \n", "0 | \n", "4 | \n", "pool | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "
1 | \n", "26.403338 | \n", "130.784188 | \n", "80.134423 | \n", "1.0 | \n", "1 | \n", "2 | \n", "pool | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "
2 | \n", "58.155044 | \n", "175.704961 | \n", "90.806745 | \n", "0.0 | \n", "1 | \n", "4 | \n", "pool | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "2 | \n", "
3 | \n", "68.334248 | \n", "167.485984 | \n", "90.081777 | \n", "0.0 | \n", "0 | \n", "4 | \n", "pool | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "3 | \n", "
4 | \n", "54.114518 | \n", "130.782073 | \n", "53.612174 | \n", "1.0 | \n", "1 | \n", "1 | \n", "pool | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "4 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
995 | \n", "21.474205 | \n", "168.602546 | \n", "70.342128 | \n", "0.0 | \n", "2 | \n", "5 | \n", "target | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "15995 | \n", "
996 | \n", "40.643320 | \n", "188.188724 | \n", "61.611744 | \n", "0.0 | \n", "2 | \n", "4 | \n", "target | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "15996 | \n", "
997 | \n", "29.472765 | \n", "161.408162 | \n", "57.214095 | \n", "0.0 | \n", "0 | \n", "1 | \n", "target | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "15997 | \n", "
998 | \n", "41.291949 | \n", "150.968833 | \n", "91.270798 | \n", "0.0 | \n", "0 | \n", "3 | \n", "target | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "15998 | \n", "
999 | \n", "67.530294 | \n", "155.124741 | \n", "56.196505 | \n", "1.0 | \n", "0 | \n", "1 | \n", "target | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "15999 | \n", "
16000 rows × 12 columns
\n", "