Remember what you did?:
Learning Behavioral Memories for Partially Observable Object Manipulation

Anonymous Authors

Under Review

Overview

From 0% to 70% success: a compressed memory of past actions resolves partial observability. Many manipulation tasks are partially observable, so memoryless policies act on an aliased signal and fail on tasks like Push-T-Multi-Goals (left). CAMP (right) pretrains a memory module ℰ_θ to reconstruct the past action trajectory, yielding a compressed code z that conditions a diffusion action head and turns the partially observed problem Markovian.

Abstract

Long horizon, contact rich manipulation is inherently partially observable. This is as a single visual observation rarely captures a robot's full action context, including prior attempts, interactions, or progress. Consequently, standard visuomotor policies or vision-language-action models are prone to struggle in such tasks due to a lack of memory. To address this, we introduce Compressed Action Memory Policy (CAMP) based on the insight that a robot's own action history serves as a highly informative, self-supervised signal, enabling the policy to learn a robust, compact history representation. In our approach, we train a memory module to maintain a compressed representation of past actions, forcing it to encode a latent behavioral memory of all the robot's past interactions that can then be used to better contextualize future actions. This allows our approach to implicitly track generalized task progress and learn from failed attempts without any additional supervision, or external oversight. We evaluate CAMP across four real-robot setups and two novel simulation benchmarks: Memory-T-Bench and Memory-Manip-Bench. By demonstrating substantial gains over state-of-the-art baselines, CAMP is, to our knowledge, the first policy to demonstrate substantial success on contact-rich partially observable manipulation tasks purely through learned memory.

Memory-T-Bench

Memory-T-Bench comprises four PushT-derived tasks that share the same contact-rich dynamics but cannot be solved from a single frame. Three are track-task-progress tasks -- the policy must remember what it has already accomplished -- and one is a learn-from-failure task, where it must remember what did not work. Hover a card for details.

Multi-Goals

Track Progress▾

Push the T-block into three goal regions in any order -- without ever pushing it into the same region twice.

Success: all three goals reached, each exactly once.

Swap-Direct

Track Progress▾

Two T-blocks must exchange positions, but a mid-episode frame no longer reveals which block started where.

Success: each block ends in the other's starting position.

Swap-Shuffle

Track Progress▾

The same swap as Swap-Direct, but the starting assignment is re-randomized every episode.

Success: the two blocks exchange positions despite the shuffled start.

Find-Track

Learn from Failure▾

Three tracks lead to the goal, but two are high-friction and block the T. Reach the goal through the open track -- without retrying a track that already failed.

Success: the T reaches the goal via the low-friction track, with no repeated failed attempt.

CAMP outperforms Diffusion Policy by over 32.5% across these four tasks. Because a single observation hides what the robot has already tried or achieved, a memoryless policy stalls; by compressing its own action history into a compact behavioral memory, CAMP recovers this hidden state -- effectively closing the partial-observability gap.

Memory-Manip-Bench

Memory-Manip-Bench extends the study to seven partially observable 3D manipulation tasks across a range of contact. Six are learn-from-failure -- the policy must remember which options already failed -- and one (Swap-Block) requires tracking task progress. Hover a card for details.

Swap-Block

Track Progress▾

Swap two blocks via a buffer spot while recalling their initial positions.

Success: each block ends in the other's initial position.

Button-Lightbulb

Learn from Failure▾

Probe buttons to infer a hidden one-to-one button--bulb mapping, then light the bulb with the red base.

Success: the lightbulb is activated using the inferred mapping.

Uncover-Blocks

Learn from Failure▾

Lift each look-alike cover once, moving on the moment nothing is found beneath it.

Success: the hidden target is uncovered without re-checking a cover.

Stack-Lego

Learn from Failure▾

Attempt a stack to discover which of the two blocks has a stackable bottom.

Success: the stackable block is identified and stacked.

Probe-Insert

Learn from Failure▾

Probe three holes until finding the one that accepts the peg at full depth.

Success: the peg is fully inserted into the accepting hole.

Find-Soda

Learn from Failure▾

Open closed drawers one at a time, skipping any already found empty, until the soda is located.

Success: the soda is found without re-opening an empty drawer.

Open-Door

Learn from Failure▾

Only one of the three doors is unlocked; try the doors not yet opened, skipping any already found locked, until one opens.

Success: the unlocked door is opened without re-trying a locked door.

CAMP reaches the highest average success of 64.3% -- over 23% above the strongest baseline -- and wins six of the seven tasks.

Real-Robot Experiments

We compare CAMP (Ours) against ACT, DP, π_0.5, and MemoryVLA across four real-robot manipulation tasks. Hover a card for details.

Swap Can▾

Track Progress

Swap two cans via a buffer spot while recalling their initial positions.

Success: each can ends in the other's initial position.

CAMP (Ours)

ACT

π_0.5

MemoryVLA

Wipe Plate▾

Track Progress

Wipe a random plate with a random brush and return it, then wipe the other plate with the other brush -- a choice the current frame doesn't reveal.

Success: both plates are wiped, each with a different brush.

CAMP (Ours)

ACT

π_0.5

MemoryVLA

Push-T-Multi-Goals▾

Track Progress

Push the T-block into three goal regions in any order, without repeating one.

Success: all three goals reached, each exactly once.

CAMP (Ours)

ACT

π_0.5

MemoryVLA

Probe Insert▾

Learn from Failure

Probe three holes until finding the one that accepts the peg at full depth.

Success: the peg is fully inserted into the accepting hole.

CAMP (Ours)

ACT

π_0.5

MemoryVLA

On the real robot, every memoryless baseline collapses to 0/10 on Push-T-Multi-Goals, Swap-Can, and Wipe-Plate, while CAMP reaches 7/10 on each. It is the only method that consistently completes these partially observable tasks.

For more details and insights, please refer to the paper.

Limitations and Conclusion

We introduced CAMP, a memory-augmented visuomotor policy for long-horizon, contact-rich manipulation under partial observability that turns the robot's own action history into a scalable, self-supervised learning signal. CAMP demonstrates consistent gains across Memory-T-Bench, Memory-Manip-Bench, and multiple challenging real-robot tasks.

Limitations remain. Tasks requiring extremely-long-horizon memory are still challenging for our method. Additionally, extending CAMP to dynamic and dexterous manipulation is a natural direction we leave open for future work.

Remember what you did?:Learning Behavioral Memories for Partially Observable Object Manipulation

Overview

Abstract

Memory-T-Bench

Memory-Manip-Bench

Real-Robot Experiments

Limitations and Conclusion

Remember what you did?:
Learning Behavioral Memories for Partially Observable Object Manipulation