Benchmark Data

Data in Benchmark

DexHoldem releases two complementary data surfaces: physical robot demonstrations for policy learning, and agent-state problems for evaluating structured tabletop perception, routing, and action selection.

DexHoldem policy benchmark success and failure examples

1,470 Demos

Fourteen physical manipulation primitives with 105 accepted teleoperated demonstrations per primitive.

100/5 Split

Every primitive uses 100 training trajectories and 5 held-out validation trajectories.

378 GB

The Hugging Face TexasPokerRobot release reports a 378 GB hosted file footprint under CC-BY-4.0.

36 Agent Problems

The Skills benchmark contains p1-p36 with structured labels, route targets, and action targets.

Physical Demonstrations

TexasPokerRobot is the policy-learning data release.

Each trajectory records synchronized multi-view RGB-D observations and robot joint measurements on the ShadowHand-UR10e tabletop setup. The benchmark recorder stores instruction IDs, robot proprioception, and 30-dimensional joint-position action targets for card and chip manipulation.

Physical demonstration data used by the policy benchmark.
Item Benchmark Data
Platform Shadow Dexterous Hand mounted on a Universal Robots UR10e arm.
Objects Standard poker cards and poker chips with denominations 5, 10, 50, and 100.
Sensors Top-down, third-person, and wrist-mounted Intel RealSense RGB-D cameras plus robot joint-position proprioception.
Primitive split 105 accepted trajectories per primitive, organized as 100 training and 5 validation demonstrations.
Action format 30-dimensional joint-position targets: 6 arm dimensions and 24 ShadowHand dimensions.
Hosted release Winniechen2002/TexasPokerRobot

Agent Benchmark Data

DexHoldemSKills bench provides fixed tabletop states for agent evaluation.

The agent benchmark data lives under bench/problems. Each problem gives an agent-view state, predecessor context when relevant, ground-truth structured state, the expected routing decision, and the expected high-level action.

Agent-state problem inventory in DexHoldemSKills bench.
Problem Type Count Evaluation Focus
initial_turn_gate1Turn ownership, blind assignment, and first route selection.
opponent_wait4Stable states where the opponent owns the turn and the agent should wait.
robot_action_progress4Robot motion, loop stage, and scene stability while a primitive is in progress.
held_card_read2Held-card recognition plus cached sequence continuation.
cached_sequence_gate2Continuation of a pre-translated multi-atom action sequence.
recovery_safety5Retryable recovery versus human-help or unsafe continuation.
poker_table_decision9Table layout, cards, bets, chips, and legal decision routing.
fold_win_judge1Non-showdown win state and collect-winnings route.
showdown_outcome6Visible/cached cards and terminal win-or-lose judgment.
collect_winnings_sequence2Collect-winnings progress and robot-state routing.

Structured State

Labels include loop_stage, blind, scene_stable, is_my_turn, community cards, chip inventories, and bet dictionaries.

Routing Targets

Each problem records the expected route, such as wait, choose poker action, recover retryable, request human help, or collect winnings.

Audit Files

The bench includes ground_truth.json, problem_types.json, problem_clusters.json, and the core 36-problem list.