Guidelines
Dataset Guidelines for Training Models¶
Creating a high-quality dataset is fundamental for training robust and effective models. This guideline outlines the essential principles and steps for curating a comprehensive and beneficial dataset for model training, with a focus on assets such as images.
Dataset Creation¶
1. Dataset Size and Scope¶
- Initial Size: Begin with a minimum of 30 images to allow for pattern recognition.
- Expansion: Gradually increase the dataset size to over 50 images to enhance learning capability.
- Balance: Ensure the dataset is large enough to capture necessary patterns yet remains manageable for analysis and processing.
2. Consistency vs. Variety¶
- Consistency: Maintain uniformity in key aspects like subject matter and aesthetics to provide a stable learning environment.
- Variety: Integrate diversity in elements you wish to generalize, avoiding model memorization on specific details.
3. Image Quality¶
- Format: Utilize PNG format for optimal asset quality.
- Resolution: Employ high-resolution images, preferably in square format, for consistent model input.
- Editing: Crop and resize images thoughtfully to ensure focus on the desired concept, which should occupy at least 45% of the image.
4. Testing Different Datasets¶
- Conduct experiments with various datasets to identify the best configuration that aligns with your goals and requirements.
5. Avoid Overfitting¶
- Limit redundancy of similar images to prevent the model from over-specializing in narrow features.
Specific Model Considerations¶
For Sbject or Object Models¶
1. Diverse Imagery¶
- Include images showcasing a range of poses, illumination conditions, and body shots.
- Capture subjects from different angles and with varied expressions to enhance generalization.
2. Contextual Variety¶
- Portray subjects or objects in numerous contexts and settings to understand contextual application.
For Style Models¶
1. Consistent Styling¶
- Uniformly display the targeted style attributes such as lighting and color schemes to consolidate the model’s focus.
2. Focused Imagery¶
- Only incorporate images that align with the predetermined style criteria.
3. Image Cleaning¶
- Exclude unwanted elements including stray objects, text, or signatures to maintain image integrity.
General Tips¶
- Maintain essential consistency within your model to stabilize learning processes.
- Initially restrict output variety to aid in solidifying foundational understanding.
- Present non-essential elements with sufficient diversity to promote adaptable learning across various contexts.