Remember what you did?:
Learning Behavioral Memories for Partially Observable Object Manipulation

Anonymous Authors
Under Review

Overview

CAMP teaser figure

From 0% to 70% success: a compressed memory of past actions resolves partial observability. Many manipulation tasks are partially observable, so memoryless policies act on an aliased signal and fail on tasks like Push-T-Multi-Goals (left). CAMP (right) pretrains a memory module ℰθ to reconstruct the past action trajectory, yielding a compressed code z that conditions a diffusion action head and turns the partially observed problem Markovian.

Abstract

Long horizon, contact rich manipulation is inherently partially observable. This is as a single visual observation rarely captures a robot's full action context, including prior attempts, interactions, or progress. Consequently, standard visuomotor policies or vision-language-action models are prone to struggle in such tasks due to a lack of memory. To address this, we introduce Compressed Action Memory Policy (CAMP) based on the insight that a robot's own action history serves as a highly informative, self-supervised signal, enabling the policy to learn a robust, compact history representation. In our approach, we train a memory module to maintain a compressed representation of past actions, forcing it to encode a latent behavioral memory of all the robot's past interactions that can then be used to better contextualize future actions. This allows our approach to implicitly track generalized task progress and learn from failed attempts without any additional supervision, or external oversight. We evaluate CAMP across four real-robot setups and two novel simulation benchmarks: Memory-T-Bench and Memory-Manip-Bench. By demonstrating substantial gains over state-of-the-art baselines, CAMP is, to our knowledge, the first policy to demonstrate substantial success on contact-rich partially observable manipulation tasks purely through learned memory.

Memory-T-Bench

Memory-T-Bench comprises four PushT-derived tasks that share the same contact-rich dynamics but cannot be solved from a single frame. Three are track-task-progress tasks -- the policy must remember what it has already accomplished -- and one is a learn-from-failure task, where it must remember what did not work. Hover a card for details.

Multi-Goals
Track Progress

Push the T-block into three goal regions in any order -- without ever pushing it into the same region twice.

Success: all three goals reached, each exactly once.

Swap-Direct
Track Progress

Two T-blocks must exchange positions, but a mid-episode frame no longer reveals which block started where.

Success: each block ends in the other's starting position.

Swap-Shuffle
Track Progress

The same swap as Swap-Direct, but the starting assignment is re-randomized every episode.

Success: the two blocks exchange positions despite the shuffled start.

Find-Track
Learn from Failure

Three tracks lead to the goal, but two are high-friction and block the T. Reach the goal through the open track -- without retrying a track that already failed.

Success: the T reaches the goal via the low-friction track, with no repeated failed attempt.

CAMP outperforms Diffusion Policy by over 32.5% across these four tasks. Because a single observation hides what the robot has already tried or achieved, a memoryless policy stalls; by compressing its own action history into a compact behavioral memory, CAMP recovers this hidden state -- effectively closing the partial-observability gap.

Memory-Manip-Bench

Memory-Manip-Bench extends the study to seven partially observable 3D manipulation tasks across a range of contact. Six are learn-from-failure -- the policy must remember which options already failed -- and one (Swap-Block) requires tracking task progress. Hover a card for details.

Swap-Block
Track Progress

Swap two blocks via a buffer spot while recalling their initial positions.

Success: each block ends in the other's initial position.

Button-Lightbulb
Learn from Failure

Probe buttons to infer a hidden one-to-one button--bulb mapping, then light the bulb with the red base.

Success: the lightbulb is activated using the inferred mapping.

Uncover-Blocks
Learn from Failure

Lift each look-alike cover once, moving on the moment nothing is found beneath it.

Success: the hidden target is uncovered without re-checking a cover.

Stack-Lego
Learn from Failure

Attempt a stack to discover which of the two blocks has a stackable bottom.

Success: the stackable block is identified and stacked.

Probe-Insert
Learn from Failure

Probe three holes until finding the one that accepts the peg at full depth.

Success: the peg is fully inserted into the accepting hole.

Find-Soda
Learn from Failure

Open closed drawers one at a time, skipping any already found empty, until the soda is located.

Success: the soda is found without re-opening an empty drawer.

Open-Door
Learn from Failure

Only one of the three doors is unlocked; try the doors not yet opened, skipping any already found locked, until one opens.

Success: the unlocked door is opened without re-trying a locked door.

CAMP reaches the highest average success of 64.3% -- over 23% above the strongest baseline -- and wins six of the seven tasks.

Real-Robot Experiments

We compare CAMP (Ours) against ACT, DP, π0.5, and MemoryVLA across four real-robot manipulation tasks. Hover a card for details.

Swap Can
Track Progress

Swap two cans via a buffer spot while recalling their initial positions.

Success: each can ends in the other's initial position.

CAMP (Ours)
ACT
DP
π0.5
MemoryVLA
Wipe Plate
Track Progress

Wipe a random plate with a random brush and return it, then wipe the other plate with the other brush -- a choice the current frame doesn't reveal.

Success: both plates are wiped, each with a different brush.

CAMP (Ours)
ACT
DP
π0.5
MemoryVLA
Push-T-Multi-Goals
Track Progress

Push the T-block into three goal regions in any order, without repeating one.

Success: all three goals reached, each exactly once.

CAMP (Ours)
ACT
DP
π0.5
MemoryVLA
Probe Insert
Learn from Failure

Probe three holes until finding the one that accepts the peg at full depth.

Success: the peg is fully inserted into the accepting hole.

CAMP (Ours)
ACT
DP
π0.5
MemoryVLA
On the real robot, every memoryless baseline collapses to 0/10 on Push-T-Multi-Goals, Swap-Can, and Wipe-Plate, while CAMP reaches 7/10 on each. It is the only method that consistently completes these partially observable tasks.

For more details and insights, please refer to the paper.

Limitations and Conclusion

We introduced CAMP, a memory-augmented visuomotor policy for long-horizon, contact-rich manipulation under partial observability that turns the robot's own action history into a scalable, self-supervised learning signal. CAMP demonstrates consistent gains across Memory-T-Bench, Memory-Manip-Bench, and multiple challenging real-robot tasks.

Limitations remain. Tasks requiring extremely-long-horizon memory are still challenging for our method. Additionally, extending CAMP to dynamic and dexterous manipulation is a natural direction we leave open for future work.