Skip to content

Guidelines

Dataset Guidelines for Training Models

Creating a high-quality dataset is fundamental for training robust and effective models. This guideline outlines the essential principles and steps for curating a comprehensive and beneficial dataset for model training, with a focus on assets such as images.

Dataset Creation

1. Dataset Size and Scope

  • Initial Size: Begin with a minimum of 30 images to allow for pattern recognition.
  • Expansion: Gradually increase the dataset size to over 50 images to enhance learning capability.
  • Balance: Ensure the dataset is large enough to capture necessary patterns yet remains manageable for analysis and processing.

2. Consistency vs. Variety

  • Consistency: Maintain uniformity in key aspects like subject matter and aesthetics to provide a stable learning environment.
  • Variety: Integrate diversity in elements you wish to generalize, avoiding model memorization on specific details.

3. Image Quality

  • Format: Utilize PNG format for optimal asset quality.
  • Resolution: Employ high-resolution images, preferably in square format, for consistent model input.
  • Editing: Crop and resize images thoughtfully to ensure focus on the desired concept, which should occupy at least 45% of the image.

4. Testing Different Datasets

  • Conduct experiments with various datasets to identify the best configuration that aligns with your goals and requirements.

5. Avoid Overfitting

  • Limit redundancy of similar images to prevent the model from over-specializing in narrow features.

Specific Model Considerations

For Sbject or Object Models

1. Diverse Imagery
  • Include images showcasing a range of poses, illumination conditions, and body shots.
  • Capture subjects from different angles and with varied expressions to enhance generalization.
2. Contextual Variety
  • Portray subjects or objects in numerous contexts and settings to understand contextual application.

For Style Models

1. Consistent Styling
  • Uniformly display the targeted style attributes such as lighting and color schemes to consolidate the model’s focus.
2. Focused Imagery
  • Only incorporate images that align with the predetermined style criteria.
3. Image Cleaning
  • Exclude unwanted elements including stray objects, text, or signatures to maintain image integrity.

General Tips

  • Maintain essential consistency within your model to stabilize learning processes.
  • Initially restrict output variety to aid in solidifying foundational understanding.
  • Present non-essential elements with sufficient diversity to promote adaptable learning across various contexts.