Tasks

We carefully curate 1000 tasks to benchmark agents' various behaviors in StarDojo. These tasks are divided into five distinct categories, Farming, Crafting, Exploration, Combat and Social, which covers most of the production-living activities in the early and middle stages of the game. Each task is classified into three difficulties, easy, medium, and hard, with a heuristic maximum steps of 30, 50 and 150, based on their complexity and the time consuming.

🌽 Farming¶

Farming tasks can be broadly categorized into two types: cultivation (growing crops) and husbandry (raising animals). Easy tasks involve routine agricultural work such as clearing and tilling tiles, fertilizing, sowing seeds, watering plants, harvesting mature crops and animal products, as well as animal care—including feeding, grazing, and interaction. These simple, discrete operations test whether agents possess the most fundamental production capabilities. Medium tasks require agents to independently procure farming resources such as seeds, fertilizer, water, and hay. These resources can be obtained through foraging, crafting, or purchasing. Hard tasks typically span multiple days, requiring agents to perform daily routine farming activities. These activities form a cohesive production chain, following a specific sequence where each step is interdependent. For example, if a task involves growing a plant from seed to harvest, the agent must plant the seed in tilled dirt, water it daily over several days, and finally reap the crop. Any missed step—such as failing to water the plant on a given day—could delay or even prevent maturation. Thus, hard farming tasks demand that agents autonomously allocate time and resources efficiently, presenting significant tests of multi-step reasoning and long-term planning capabilities.

🛠️ Crafting¶

Crafting tasks encompass both fabrication and cooking. Most crafting activities simply require adequate raw materials, though some may need equipment like furnace or cookout kit. Typically, easy tasks provide all necessary materials and tools directly in the agent's inventory, requiring only proper execution of crafting procedures. Occasionally, agents might need to gather readily available resources like a few woods or stones. Medium tasks demand greater autonomy - agents must identify, locate and acquire appropriate materials and equipment through careful planning. Hard tasks involve procuring diverse materials through demanding methods (like deep mining for ores), followed by complex, multi-stage processes like crafting intermediary components, testing an agent's comprehensive crafting capabilities.

🔍 Exploration¶

Exploration tasks can be categorized into three types: map navigation/pathfinding, resource gathering in wilderness areas, and completing challenging in-game quests. Easy tasks may involve traveling to an accessible location or collecting specified items within a small nearby area. Medium tasks present greater complexity - some require venturing into more distant and hazardous environments (like the second floor of the mines), while others demand searching expansive zones (such as an entire forest) for randomly spawning resources. Hard tasks challenge agents to locate extremely rare resources with highly randomized spawn locations (like amethysts in mines). Additionally, built-in game quests that chain multiple sub-tasks of varying types - challenging enough to occasionally defeat even experienced human players - also qualify as high-difficulty exploration objectives.

⚔️ Combat¶

Agents are tasked with eliminating a specified number of monsters in mines. As difficulty escalates, targets become both more formidable and numerous. Basic adversaries like slimes, bugs, and grubs present minimal threat - their predictable movement patterns, low health pools, and weak attacks make them easy to dispatch or evade. However, advanced creatures employ deadly specialties: flies attack with erratic, lightning-fast strikes; duggies ambush from subterranean positions; rock crabs retreat into impregnable armored stances. Defeating these requires dynamic positioning, tactical strike timing, and adaptive combat strategies.

Social tasks encompass two primary objectives: cultivating relationships with NPCs, and conducting transactional interactions with specialized NPCs (such as carpenter, blacksmith, etc.). Easy tasks require only basic interactions like conversations or gift-giving, and agents are teleported directly to designated transaction locations (e.g., at the counter of Pierre's General Store). Medium tasks demand agents autonomously select and navigate to appropriate venues. Hard tasks challenge agents to build high friendship levels with NPCs. This requires strategic planning to increase rapport through daily greetings, thoughtful gift-giving, and fulfilling requests. Each NPC possesses unique behavioral patterns and preferences. Agents must develop customized engagement strategies, as actions that delight one NPC (e.g., a favored gift) may offend another (e.g., a disliked item). This nuanced system tests agents' adaptive social intelligence.

🗒️ Task Format¶

Each category of tasks is encoded in a YAML file. This dictionary-like file format is both highly readable and convenient for processing by Python programs. The structure of a task is as follows:

sow_5_dirt_with_cauliflower_seeds: The key serves as both the name and the description of the task, and will be used as the prompt input to LLM.

id: A unique identifier for the task within its category.

object: The target object of the task—such as item, location, character, or quest— that the player tries to acquire, reach, interact with, or complete in a specific quantity.

quantity: The required number of the target object. For non-quantifiable objects, such as location and NPC, quantity is simply set to 1.

tool: The tool required to complete the task.

save: The initial game save for the task, which has pre-configured some common environmental settings to reduce frequent calls to simulator APIs.

init_commands: This is a list of commands to invoke the simulator APIs. After loading the task, StarDojo will automatically execute these commands one by one, invoking the corresponding API to fine-tune the initial conditions. Combined with the save file, StarDojo achieves efficient standardization for each task.

evaluator: The type of evaluator to assess the task.

difficulty: The difficulty level of the task.

📝 Customization¶

According to the task format introduced above, you can freely modify existing tasks and add new ones. The task files are located in the env/tasks/task_suite/ folder. You can edit the task data in the YAML files or place newly designed tasks into the corresponding YAML files based on their categories.