Filters#

This page shows the specifics of each filter.

- How to exclude filters -#

To exclude a filter, prefix the filter name with not (e.g. "not empty", "not extension": jpg, etc).

Note

If you want to exclude all filters you can set the rule's filter_mode to none.

Example:

rules:
  # using filter_mode
  - locations: ~/Desktop
    filter_mode: "none" # <- excludes all
    filters:
      - empty
      - name:
          endswith: "2022"
    actions:
      - echo: "{name}"

  # Exclude a single filter
  - locations: ~/Desktop
    filters:
      - not extension: jpg # <- matches all non-jpgs
      - name:
          startswith: "Invoice"
      - not empty # <- matches files with content
    actions:
      - echo: "{name}"

created#

Matches files / folders by created date

Attributes:
  • years (int) –

    specify number of years

  • months (int) –

    specify number of months

  • weeks (float) –

    specify number of weeks

  • days (float) –

    specify number of days

  • hours (float) –

    specify number of hours

  • minutes (float) –

    specify number of minutes

  • seconds (float) –

    specify number of seconds

  • mode (str) –

    either 'older' or 'newer'. 'older' matches files / folders created before the given time, 'newer' matches files / folders created within the given time. (default = 'older')

Returns:
  • {created} (datetime): The datetime the file / folder was created.

Source code in organize/filters/created.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
class Created(TimeFilter):
    """Matches files / folders by created date

    Attributes:
        years (int): specify number of years
        months (int): specify number of months
        weeks (float): specify number of weeks
        days (float): specify number of days
        hours (float): specify number of hours
        minutes (float): specify number of minutes
        seconds (float): specify number of seconds
        mode (str):
            either 'older' or 'newer'. 'older' matches files / folders created before
            the given time, 'newer' matches files / folders created within the given
            time. (default = 'older')

    Returns:
        `{created}` (datetime): The datetime the file / folder was created.
    """

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="created",
        files=True,
        dirs=True,
    )

    def get_datetime(self, path: Path) -> datetime:
        return read_created(path)

Examples:

Show all files on your desktop created at least 10 days ago

rules:
  - name: Show all files on your desktop created at least 10 days ago
    locations: "~/Desktop"
    filters:
      - created:
          days: 10
    actions:
      - echo: "Was created at least 10 days ago"

Show all files on your desktop which were created within the last 5 hours

rules:
  - name: Show all files on your desktop which were created within the last 5 hours
    locations: "~/Desktop"
    filters:
      - created:
          hours: 5
          mode: newer
    actions:
      - echo: "Was created within the last 5 hours"

Sort pdfs by year of creation

rules:
  - name: Sort pdfs by year of creation
    locations: "~/Documents"
    filters:
      - extension: pdf
      - created
    actions:
      - move: "~/Documents/PDF/{created.year}/"

Formatting the creation date

rules:
  - name: Display the creation date
    locations: "~/Documents"
    filters:
      - extension: pdf
      - created
    actions:
      - echo: "ISO Format:   {created.strftime('%Y-%m-%d')}"
      - echo: "As timestamp: {created.timestamp() | int}"

date_added#

Matches files by the time the file was added to a folder.

date_added is only available on macOS!

Attributes:
  • years (int) –

    specify number of years

  • months (int) –

    specify number of months

  • weeks (float) –

    specify number of weeks

  • days (float) –

    specify number of days

  • hours (float) –

    specify number of hours

  • minutes (float) –

    specify number of minutes

  • seconds (float) –

    specify number of seconds

  • mode (str) –

    either 'older' or 'newer'. 'older' matches files / folders last modified before the given time, 'newer' matches files / folders last modified within the given time. (default = 'older')

Returns:
  • {date_added}: The datetime the files / folders were added.

Source code in organize/filters/date_added.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
class DateAdded(TimeFilter):

    """Matches files by the time the file was added to a folder.

    **`date_added` is only available on macOS!**

    Attributes:
        years (int): specify number of years
        months (int): specify number of months
        weeks (float): specify number of weeks
        days (float): specify number of days
        hours (float): specify number of hours
        minutes (float): specify number of minutes
        seconds (float): specify number of seconds
        mode (str):
            either 'older' or 'newer'. 'older' matches files / folders last modified before
            the given time, 'newer' matches files / folders last modified within the given
            time. (default = 'older')

    Returns:
        `{date_added}`: The datetime the files / folders were added.
    """

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="date_added",
        files=True,
        dirs=True,
    )

    def __post_init__(self):
        if sys.platform != "darwin":
            raise EnvironmentError("date_added is only available on macOS")
        return super().__post_init__()

    def get_datetime(self, path: Path) -> datetime:
        return read_date_added(path)

Works the same way as created and lastmodified.

** Examples **

rules:
  - name: Show the date the file was added to the folder
    locations: "~/Desktop"
    filters:
      - date_added
    actions:
      - echo: "Date added: {date_added.strftime('%Y-%m-%d')}"

date_lastused#

Matches files by the time the file was last used.

date_lastused is only available on macOS!

Attributes:
  • years (int) –

    specify number of years

  • months (int) –

    specify number of months

  • weeks (float) –

    specify number of weeks

  • days (float) –

    specify number of days

  • hours (float) –

    specify number of hours

  • minutes (float) –

    specify number of minutes

  • seconds (float) –

    specify number of seconds

  • mode (str) –

    either 'older' or 'newer'. 'older' matches files / folders last used before the given time, 'newer' matches files / folders last used within the given time. (default = 'older')

Returns:
  • {date_lastused}: The datetime the files / folders were added.

Source code in organize/filters/date_lastused.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
class DateLastUsed(TimeFilter):

    """Matches files by the time the file was last used.

    **`date_lastused` is only available on macOS!**

    Attributes:
        years (int): specify number of years
        months (int): specify number of months
        weeks (float): specify number of weeks
        days (float): specify number of days
        hours (float): specify number of hours
        minutes (float): specify number of minutes
        seconds (float): specify number of seconds
        mode (str):
            either 'older' or 'newer'. 'older' matches files / folders last used before
            the given time, 'newer' matches files / folders last used within the given
            time. (default = 'older')

    Returns:
        {date_lastused}: The datetime the files / folders were added.
    """

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="date_lastused",
        files=True,
        dirs=True,
    )

    def __post_init__(self):
        if sys.platform != "darwin":
            raise EnvironmentError("date_added is only available on macOS")
        return super().__post_init__()

    def get_datetime(self, path: Path) -> datetime:
        return read_date_lastused(path)

Works the same way as created and lastmodified.

** Examples **

rules:
  - name: Show the date the file was added to the folder
    locations: "~/Desktop"
    filters:
      - date_lastused
    actions:
      - echo: "Date last used: {date_lastused.strftime('%Y-%m-%d')}"

duplicate#

A fast duplicate file finder.

This filter compares files byte by byte and finds identical files with potentially different filenames.

Attributes:
  • detect_original_by (str) –

    Detection method to distinguish between original and duplicate. Possible values are:

    • "first_seen": Whatever file is visited first is the original. This depends on the order of your location entries.
    • "name": The first entry sorted by name is the original.
    • "created": The first entry sorted by creation date is the original.
    • "lastmodified": The first file sorted by date of last modification is the original.

You can reverse the sorting method by prefixing a -.

So with detect_original_by: "-created" the file with the older creation date is the original and the younger file is the duplicate. This works on all methods, for example "-first_seen", "-name", "-created", "-lastmodified".

Returns:

{duplicate.original} - The path to the original

Source code in organize/filters/duplicate.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
@dataclass(config=ConfigDict(extra="forbid"))
class Duplicate:
    """A fast duplicate file finder.

    This filter compares files byte by byte and finds identical files with potentially
    different filenames.

    Attributes:
        detect_original_by (str):
            Detection method to distinguish between original and duplicate.
            Possible values are:

            - `"first_seen"`: Whatever file is visited first is the original. This
              depends on the order of your location entries.
            - `"name"`: The first entry sorted by name is the original.
            - `"created"`: The first entry sorted by creation date is the original.
            - `"lastmodified"`: The first file sorted by date of last modification is
               the original.

    You can reverse the sorting method by prefixing a `-`.

    So with `detect_original_by: "-created"` the file with the older creation date is
    the original and the younger file is the duplicate. This works on all methods, for
    example `"-first_seen"`, `"-name"`, `"-created"`, `"-lastmodified"`.

    **Returns:**

    `{duplicate.original}` - The path to the original
    """

    detect_original_by: DetectionMethod = "first_seen"
    hash_algorithm: str = "sha1"

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="duplicate", files=True, dirs=False
    )

    def __post_init__(self):
        # reverse original detection order if starting with "-"
        self._detect_original_by = self.detect_original_by
        self._detect_original_reverse = False
        if self.detect_original_by.startswith("-"):
            self._detect_original_by = self.detect_original_by[1:]
            self._detect_original_reverse = True

        self._files_for_size = defaultdict(list)
        self._files_for_chunk = defaultdict(list)
        self._file_for_hash = dict()

        # we keep track of the files we already computed the hashes for so we only do
        # that once.
        self._seen_files = set()
        self._first_chunk_known = set()
        self._hash_known = set()

    def pipeline(self, res: Resource, output: Output) -> bool:
        assert res.path is not None, "Does not support standalone mode"
        # skip symlinks
        if res.path.is_symlink():
            return False

        # the exact same path has already been handled. This happens if multiple
        # locations emit this file in a single rule or if we follow symlinks.
        # We skip these.
        if res.path in self._seen_files:
            return False

        self._seen_files.add(res.path)

        # check for files with equal size
        file_size = read_file_size(path=res.path)
        same_size = self._files_for_size[file_size]
        same_size.append(res.path)
        if len(same_size) == 1:
            # the file is unique in size and cannot be a duplicate
            return False

        # for all other files with the same file size:
        # make sure we know their hash of their first 1024 byte chunk
        for f in same_size[:-1]:
            if f not in self._first_chunk_known:
                chunk_hash = hash_first_chunk(f, algo=self.hash_algorithm)
                self._first_chunk_known.add(f)
                self._files_for_chunk[chunk_hash].append(f)

        # check first chunk hash collisions with the current file
        chunk_hash = hash_first_chunk(res.path, algo=self.hash_algorithm)
        same_first_chunk = self._files_for_chunk[chunk_hash]
        same_first_chunk.append(res.path)
        self._first_chunk_known.add(res.path)
        if len(same_first_chunk) == 1:
            # the file has a unique small hash and cannot be a duplicate
            return False

        # Ensure we know the full hashes of all files with the same first chunk as
        # the investigated file
        for f in same_first_chunk[:-1]:
            if f not in self._hash_known:
                hash_ = hash(f, algo=self.hash_algorithm)
                self._hash_known.add(f)
                self._file_for_hash[hash_] = f

        # check full hash collisions with the current file
        hash_ = hash(res.path, algo=self.hash_algorithm)
        self._hash_known.add(res.path)
        known = self._file_for_hash.get(hash_)
        if known:
            original, duplicate = detect_original(
                known=known,
                new=res.path,
                method=self._detect_original_by,
                reverse=self._detect_original_reverse,
            )
            if known != original:
                self._file_for_hash[hash_] = original

            res.path = duplicate
            res.vars[self.filter_config.name] = {"original": original}
            return True

        return False

Examples:

Show all duplicate files in your desktop and download folder (and their subfolders)

rules:
  - name: Show all duplicate files in your desktop and download folder (and their subfolders)
    locations:
      - ~/Desktop
      - ~/Downloads
    subfolders: true
    filters:
      - duplicate
    actions:
      - echo: "{path} is a duplicate of {duplicate.original}"

Check for duplicated files between Desktop and a Zip file, select original by creation date

rules:
  - name: "Check for duplicated files between Desktop and a Zip file, select original by creation date"
    locations:
      - ~/Desktop
      - zip://~/Desktop/backup.zip
    filters:
      - duplicate:
          detect_original_by: "created"
    actions:
      - echo: "Duplicate found!"

empty#

Finds empty dirs and files

Source code in organize/filters/empty.py
11
12
13
14
15
16
17
18
19
20
21
22
23
@dataclass(config=ConfigDict(extra="forbid"))
class Empty:

    """Finds empty dirs and files"""

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="empty",
        files=True,
        dirs=True,
    )

    def pipeline(self, res: Resource, output: Output) -> bool:
        return res.is_empty()

Examples:

Recursively delete empty folders

rules:
  - targets: dirs
    locations:
      - path: ~/Desktop
        max_depth: null
    filters:
      - empty
    actions:
      - delete

exif#

Filter by image EXIF data

The exif filter can be used as a filter as well as a way to get exif information into your actions.

By default this library uses the exifread library. If your image format is not supported you can install exiftool (exiftool.org) and set the environment variable:

ORGANIZE_EXIFTOOL_PATH="exiftool"

organize will then use exiftool to extract the EXIF data.

Exif fields which contain "datetime", "date" or "offsettime" in their fieldname will have their value converted to 'datetime.datetime', 'datetime.date' and 'datetime.timedelta' respectivly. - datetime.datetime : exif.image.datetime, exif.exif.datetimeoriginal, ... - datetime.date : exif.gps.date, ... - datetime.timedelta : exif.exif.offsettimeoriginal, exif.exif.offsettimedigitized, ...

Attributes:
  • lowercase_keys (bool) –

    Whether to lowercase all EXIF keys (Default: true)

:returns: {exif} -- a dict of all the collected exif inforamtion available in the file. Typically it consists of the following tags (if present in the file):

- ``{exif.image}`` -- information related to the main image
- ``{exif.exif}`` -- Exif information
- ``{exif.gps}`` -- GPS information
- ``{exif.interoperability}`` -- Interoperability information
Source code in organize/filters/exif.py
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
class Exif(BaseModel):
    """Filter by image EXIF data

    The `exif` filter can be used as a filter as well as a way to get exif information
    into your actions.

    By default this library uses the `exifread` library. If your image format is not
    supported you can install `exiftool` (exiftool.org) and set the environment variable:

    ```
    ORGANIZE_EXIFTOOL_PATH="exiftool"
    ```

    organize will then use `exiftool` to extract the EXIF data.

    Exif fields which contain "datetime", "date" or "offsettime" in their fieldname
    will have their value converted to 'datetime.datetime', 'datetime.date' and
    'datetime.timedelta' respectivly.
    - `datetime.datetime` : exif.image.datetime, exif.exif.datetimeoriginal, ...
    - `datetime.date` : exif.gps.date, ...
    - `datetime.timedelta` : exif.exif.offsettimeoriginal, exif.exif.offsettimedigitized, ...

    Attributes:
        lowercase_keys (bool): Whether to lowercase all EXIF keys (Default: true)

    :returns:
        ``{exif}`` -- a dict of all the collected exif inforamtion available in the
        file. Typically it consists of the following tags (if present in the file):

        - ``{exif.image}`` -- information related to the main image
        - ``{exif.exif}`` -- Exif information
        - ``{exif.gps}`` -- GPS information
        - ``{exif.interoperability}`` -- Interoperability information
    """

    filter_tags: Dict
    lowercase_keys: bool = True

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="exif",
        files=True,
        dirs=False,
    )

    def __init__(
        self,
        *args,
        filter_tags: Optional[Dict] = None,
        lowercase_keys: bool = True,
        **kwargs,
    ):
        # exif filter is used differently from other filters. The **kwargs are not
        # filter parameters but all belong into the filter_tags dictionary to filter
        # for specific exif tags.
        params = filter_tags or dict()
        params.update(kwargs)
        # *args are tags filtered without a value, like ["gps", "image.model"].
        for arg in args:
            params[arg] = None
        super().__init__(filter_tags=params, lowercase_keys=lowercase_keys)

    def pipeline(self, res: Resource, output: Output) -> bool:
        assert res.path is not None, "Does not support standalone mode"

        # gather the exif data in a dict
        if exiftool_available():
            data = exiftool_read(path=res.path)
        else:
            data = exifread_read(path=res.path)

        # lowercase keys if wanted
        if self.lowercase_keys:
            data = lowercase_keys_recursive(data)

        # convert strings to datetime objects where possible
        parsed = convert_recursive(data)

        res.vars[self.filter_config.name] = parsed
        return matches_tags(self.filter_tags, data)

Show available EXIF data of your pictures

rules:
  - name: "Show available EXIF data of your pictures"
    locations:
      - path: ~/Pictures
        max_depth: null
    filters:
      - exif
    actions:
      - echo: "{exif}"

Copy all images which contain GPS information while keeping subfolder structure:

rules:
  - name: "GPS demo"
    locations:
      - path: ~/Pictures
        max_depth: null
    filters:
      - exif: gps.gpsdate
    actions:
      - copy: ~/Pictures/with_gps/{relative_path}/

Filter by camera manufacturer

rules:
  - name: "Filter by camera manufacturer"
    locations:
      - path: ~/Pictures
        max_depth: null
    filters:
      - exif:
          image.model: Nikon D3200
    actions:
      - move: "~/Pictures/My old Nikon/"

Sort images by camera manufacturer. This will create folders for each camera model (for example "Nikon D3200", "iPhone 6s", "iPhone 5s", "DMC-GX80") and move the pictures accordingly:

rules:
  - name: "camera sort"
    locations:
      - path: ~/Pictures
        max_depth: null
    filters:
      - extension: jpg
      - exif: image.model
    actions:
      - move: "~/Pictures/{exif.image.model}/"

extension#

Filter by file extension

Attributes:
  • *extensions (list(str) or str) –

    The file extensions to match (does not need to start with a colon).

Returns:

  • {extension}: the original file extension (without colon)
Source code in organize/filters/extension.py
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
@dataclass(config=ConfigDict(coerce_numbers_to_str=True, extra="forbid"))
class Extension:
    """Filter by file extension

    Attributes:
        *extensions (list(str) or str):
            The file extensions to match (does not need to start with a colon).

    **Returns:**

    - `{extension}`: the original file extension (without colon)
    """

    extensions: Set[str] = Field(default_factory=set)

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="extension",
        files=True,
        dirs=False,
    )

    @field_validator("extensions", mode="before")
    def normalize_extensions(cls, v):
        as_list = convert_to_list(v)
        return set(map(normalize_extension, flatten(list(as_list))))

    def suffix_match(self, path: Path) -> Tuple[str, bool]:
        suffix = path.suffix.lstrip(".")
        if not self.extensions:
            return (suffix, True)
        if not suffix:
            return (suffix, False)
        return (suffix, normalize_extension(suffix) in self.extensions)

    def pipeline(self, res: Resource, output: Output) -> bool:
        assert res.path is not None, "Does not support standalone mode"
        if res.is_dir():
            raise ValueError("Dirs not supported")

        suffix, match = self.suffix_match(path=res.path)
        res.vars[self.filter_config.name] = suffix
        return match

Examples:

Match a single file extension

rules:
  - name: "Match a single file extension"
    locations: "~/Desktop"
    filters:
      - extension: png
    actions:
      - echo: "Found PNG file: {path}"

Match multiple file extensions

rules:
  - name: "Match multiple file extensions"
    locations: "~/Desktop"
    filters:
      - extension:
          - .jpg
          - jpeg
    actions:
      - echo: "Found JPG file: {path}"

Make all file extensions lowercase

rules:
  - name: "Make all file extensions lowercase"
    locations: "~/Desktop"
    filters:
      - extension
    actions:
      - rename: "{path.stem}.{extension.lower()}"

Using extension lists (yaml aliases

img_ext: &img
  - png
  - jpg
  - tiff

audio_ext: &audio
  - mp3
  - wav
  - ogg

rules:
  - name: "Using extension lists"
    locations: "~/Desktop"
    filters:
      - extension:
          - *img
          - *audio
    actions:
      - echo: "Found media file: {path}"

filecontent#

Matches file content with the given regular expression.

Supports .md, .txt, .log, .pdf and .docx files.

For PDF content extraction poppler should be installed for the pdftotext command. If this is not available filecontent will fall back to the pdfminer library.

Attributes:
  • expr (str) –

    The regular expression to be matched.

Any named groups ((?P<groupname>.*)) in your regular expression will be returned like this:

Returns:

  • {filecontent.groupname}: The text matched with the named group (?P<groupname>)

You can test the filter on any file by running:

python -m organize.filters.filecontent "/path/to/file.pdf"
Source code in organize/filters/filecontent.py
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
@dataclass(config=ConfigDict(coerce_numbers_to_str=True, extra="forbid"))
class FileContent:
    """Matches file content with the given regular expression.

    Supports .md, .txt, .log, .pdf and .docx files.

    For PDF content extraction poppler should be installed for the `pdftotext` command.
    If this is not available `filecontent` will fall back to the `pdfminer` library.

    Attributes:
        expr (str): The regular expression to be matched.

    Any named groups (`(?P<groupname>.*)`) in your regular expression will
    be returned like this:

    **Returns:**

    - `{filecontent.groupname}`: The text matched with the named group
      `(?P<groupname>)`

    You can test the filter on any file by running:

    ```sh
    python -m organize.filters.filecontent "/path/to/file.pdf"
    ```
    """

    expr: str = r"(?P<all>.*)"

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="filecontent",
        files=True,
        dirs=False,
    )

    def __post_init__(self):
        self._expr = re.compile(self.expr, re.MULTILINE | re.DOTALL)

    def matches(self, path: Path) -> Union[re.Match, None]:
        try:
            content = textract(path)
            match = self._expr.search(content)
            return match
        except Exception:
            return None

    def pipeline(self, res: Resource, output: Output) -> bool:
        assert res.path is not None, "Does not support standalone mode"
        match = self.matches(path=res.path)

        if match:
            res.deep_merge(self.filter_config.name, match.groupdict())
        return bool(match)

Examples:

Show the content of all your PDF files

rules:
  - name: "Show the content of all your PDF files"
    locations: ~/Documents
    filters:
      - extension: pdf
      - filecontent
    actions:
      - echo: "{filecontent}"

Match an invoice with a regular expression and sort by customer

rules:
  - name: "Match an invoice with a regular expression and sort by customer"
    locations: "~/Desktop"
    filters:
      - filecontent: 'Invoice.*Customer (?P<customer>\w+)'
    actions:
      - move: "~/Documents/Invoices/{filecontent.customer}/"

Exampe to filter the filename with respect to a valid date code.

The filename should start with <year>-<month>-<day>.

Regex:

  1. creates a placeholder variable containing the year
  2. allows only years which start with 20 and are followed by 2 numbers
  3. months can only have as first digit 0 or 1 and must be followed by a number
  4. days can only have 0, 1,2 or 3 and must followed by number Note: Filter is not perfect but still.
rules:
  - locations: ~/Desktop
    filters:
      - regex: '(?P<year>20\d{2})-[01]\d-[0123]\d.*'
    actions:
      - echo: "Year: {regex.year}"

Note

If you have trouble getting the filecontent filter to work, have a look at the installation hints

hash#

Calculates the hash of a file.

Attributes:
  • algorithm (str) –

    Any hashing algorithm available to python's hashlib. md5 by default.

Algorithms guaranteed to be available are shake_256, sha3_256, sha1, sha3_224, sha384, sha512, blake2b, blake2s, sha256, sha224, shake_128, sha3_512, sha3_384 and md5.

Depending on your python installation and installed libs there may be additional hash algorithms to chose from.

To list the available algorithms on your installation run this in a python interpreter:

>>> import hashlib
>>> hashlib.algorithms_available
{'shake_256', 'whirlpool', 'mdc2', 'blake2s', 'sha224', 'shake_128', 'sha3_512',
'sha3_224', 'sha384', 'md5', 'sha1', 'sha512_256', 'blake2b', 'sha256',
'sha512_224', 'ripemd160', 'sha3_384', 'md4', 'sm3', 'sha3_256', 'md5-sha1',
'sha512'}

Returns:

  • {hash}: The hash of the file.
Source code in organize/filters/hash.py
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
@dataclass(config=ConfigDict(extra="forbid"))
class Hash:

    """Calculates the hash of a file.

    Attributes:
        algorithm (str): Any hashing algorithm available to python's `hashlib`.
            `md5` by default.

    Algorithms guaranteed to be available are
    `shake_256`, `sha3_256`, `sha1`, `sha3_224`, `sha384`, `sha512`, `blake2b`,
    `blake2s`, `sha256`, `sha224`, `shake_128`, `sha3_512`, `sha3_384` and `md5`.

    Depending on your python installation and installed libs there may be additional
    hash algorithms to chose from.

    To list the available algorithms on your installation run this in a python
    interpreter:

    ```py
    >>> import hashlib
    >>> hashlib.algorithms_available
    {'shake_256', 'whirlpool', 'mdc2', 'blake2s', 'sha224', 'shake_128', 'sha3_512',
    'sha3_224', 'sha384', 'md5', 'sha1', 'sha512_256', 'blake2b', 'sha256',
    'sha512_224', 'ripemd160', 'sha3_384', 'md4', 'sm3', 'sha3_256', 'md5-sha1',
    'sha512'}
    ```

    **Returns:**

    - `{hash}`:  The hash of the file.
    """

    algorithm: str = "md5"

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="hash",
        files=True,
        dirs=False,
    )

    def __post_init__(self):
        self._algorithm = Template.from_string(self.algorithm)

    def pipeline(self, res: Resource, output: Output) -> bool:
        assert res.path is not None
        algo = render(self._algorithm, res.dict()).lower()
        result = hash(path=res.path, algo=algo)
        res.vars[self.filter_config.name] = result
        return True

Examples:

Show the hashes of your files:

rules:
  - name: "Show the hashes and size of your files"
    locations: "~/Desktop"
    filters:
      - hash
      - size
    actions:
      - echo: "{hash} {size.decimal}"

lastmodified#

Matches files by last modified date

Attributes:
  • years (int) –

    specify number of years

  • months (int) –

    specify number of months

  • weeks (float) –

    specify number of weeks

  • days (float) –

    specify number of days

  • hours (float) –

    specify number of hours

  • minutes (float) –

    specify number of minutes

  • seconds (float) –

    specify number of seconds

  • mode (str) –

    either 'older' or 'newer'. 'older' matches files / folders last modified before the given time, 'newer' matches files / folders last modified within the given time. (default = 'older')

Returns:
  • {lastmodified}: The datetime the files / folders was lastmodified.

Source code in organize/filters/lastmodified.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
class LastModified(TimeFilter):

    """Matches files by last modified date

    Attributes:
        years (int): specify number of years
        months (int): specify number of months
        weeks (float): specify number of weeks
        days (float): specify number of days
        hours (float): specify number of hours
        minutes (float): specify number of minutes
        seconds (float): specify number of seconds
        mode (str):
            either 'older' or 'newer'. 'older' matches files / folders last modified before
            the given time, 'newer' matches files / folders last modified within the given
            time. (default = 'older')

    Returns:
        {lastmodified}: The datetime the files / folders was lastmodified.
    """

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="lastmodified",
        files=True,
        dirs=True,
    )

    def get_datetime(self, path: Path) -> datetime:
        return read_lastmodified(path)

Examples:

rules:
  - name: "Show all files on your desktop last modified at least 10 days ago"
    locations: "~/Desktop"
    filters:
      - lastmodified:
          days: 10
    actions:
      - echo: "Was modified at least 10 days ago"

Show all files on your desktop which were modified within the last 5 hours:

rules:
  - locations: "~/Desktop"
    filters:
      - lastmodified:
          hours: 5
          mode: newer
    actions:
      - echo: "Was modified within the last 5 hours"

Sort pdfs by year of last modification

rules:
  - name: "Sort pdfs by year of last modification"
    locations: "~/Documents"
    filters:
      - extension: pdf
      - lastmodified
    actions:
      - move: "~/Documents/PDF/{lastmodified.year}/"

Formatting the last modified date

rules:
  - name: Formatting the lastmodified date
    locations: "~/Documents"
    filters:
      - extension: pdf
      - lastmodified
    actions:
      - echo: "ISO Format:   {lastmodified.strftime('%Y-%m-%d')}"
      - echo: "As timestamp: {lastmodified.timestamp() | int}"

macos_tags#

Filter by macOS tags

Attributes:
  • tags (list(str) or str) –

    The tags to filter by

Source code in organize/filters/macos_tags.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
@dataclass(config=ConfigDict(coerce_numbers_to_str=True, extra="forbid"))
class MacOSTags:
    """Filter by macOS tags

    Attributes:
        tags (list(str) or str):
            The tags to filter by
    """

    tags: List[str] = Field(default_factory=list)

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="macos_tags",
        files=True,
        dirs=True,
    )

    def __post_init__(self):
        if sys.platform != "darwin":
            raise EnvironmentError("The macos_tags filter is only available on macOS")

    @field_validator("tags", mode="before")
    def ensure_list(cls, v):
        if isinstance(v, str):
            return [v]
        return v

    def pipeline(self, res: Resource, output: Output) -> bool:
        file_tags = list_tags(res.path)
        res.vars[self.filter_config.name] = file_tags
        return matches_tags(filter_tags=self.tags, file_tags=file_tags)

Examples:

rules:
  - name: "Only files with a red macOS tag"
    locations: "~/Downloads"
    filters:
      - macos_tags: "* (red)"
    actions:
      - echo: "File with red tag"
rules:
  - name: "All files tagged 'Invoice' (any color)"
    locations: "~/Downloads"
    filters:
      - macos_tags: "Invoice (*)"
    actions:
      - echo: "Invoice found"
rules:
  - name: "All files with a tag 'Invoice' (any color) or with a green tag"
    locations: "~/Downloads"
    filters:
      - macos_tags:
          - "Invoice (*)"
          - "* (green)"
    actions:
      - echo: "Match found!"
rules:
  - name: "Listing file tags"
    locations: "~/Downloads"
    filters:
      - macos_tags
    actions:
      - echo: "{macos_tags}"

mimetype#

Filter by MIME type associated with the file extension.

Supports a single string or list of MIME type strings as argument. The types don't need to be fully specified, for example "audio" matches everything from "audio/midi" to "audio/quicktime".

You can see a list of known MIME types on your system by running this oneliner:

python3 -m organize.filters.mimetype
Attributes:
  • *mimetypes (list(str) or str) –

    The MIME types to filter for.

Returns:

  • {mimetype}: The MIME type of the file.
Source code in organize/filters/mimetype.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
@dataclass(config=ConfigDict(coerce_numbers_to_str=True, extra="forbid"))
class MimeType:

    """Filter by MIME type associated with the file extension.

    Supports a single string or list of MIME type strings as argument.
    The types don't need to be fully specified, for example "audio" matches everything
    from "audio/midi" to "audio/quicktime".

    You can see a list of known MIME types on your system by running this oneliner:

    ```sh
    python3 -m organize.filters.mimetype
    ```

    Attributes:
        *mimetypes (list(str) or str): The MIME types to filter for.

    **Returns:**

    - `{mimetype}`: The MIME type of the file.
    """

    mimetypes: FlatList[str] = Field(default_factory=list)

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="mimetype",
        files=True,
        dirs=False,
    )

    def matches(self, mimetype) -> bool:
        if mimetype is None:
            return False
        if not self.mimetypes:
            return True
        return any(mimetype.startswith(x) for x in self.mimetypes)

    def pipeline(self, res: Resource, output: Output) -> bool:
        mimetype = guess_mimetype(res.path)
        res.vars[self.filter_config.name] = mimetype
        return self.matches(mimetype)

Examples:

Show MIME types

rules:
  - name: "Show MIME types"
    locations: "~/Downloads"
    filters:
      - mimetype
    actions:
      - echo: "{mimetype}"

Filter by 'image' mimetype

rules:
  - name: "Filter by 'image' mimetype"
    locations: "~/Downloads"
    filters:
      - mimetype: image
    actions:
      - echo: "This file is an image: {mimetype}"

Filter by specific MIME type

rules:
  - name: Filter by specific MIME type
    locations: "~/Desktop"
    filters:
      - mimetype: application/pdf
    actions:
      - echo: "Found a PDF file"

Filter by multiple specific MIME types

rules:
  - name: Filter by multiple specific MIME types
    locations: "~/Music"
    filters:
      - mimetype:
          - application/pdf
          - audio/midi
    actions:
      - echo: "Found Midi or PDF."

name#

Match files and folders by name

Attributes:
  • match (str) –

    A matching string in simplematch-syntax

  • startswith (str) –

    The filename must begin with the given string

  • contains (str) –

    The filename must contain the given string

  • endswith (str) –

    The filename (without extension) must end with the given string

  • case_sensitive (bool) –

    By default, the matching is case sensitive. Change this to False to use case insensitive matching.

Source code in organize/filters/name.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
@dataclass(config=ConfigDict(coerce_numbers_to_str=True, extra="forbid"))
class Name:
    """Match files and folders by name

    Attributes:
        match (str):
            A matching string in [simplematch-syntax](https://github.com/tfeldmann/simplematch)

        startswith (str):
            The filename must begin with the given string

        contains (str):
            The filename must contain the given string

        endswith (str):
            The filename (without extension) must end with the given string

        case_sensitive (bool):
            By default, the matching is case sensitive. Change this to False to use
            case insensitive matching.
    """

    match: str = "*"
    startswith: Union[str, List[str]] = ""
    contains: Union[str, List[str]] = ""
    endswith: Union[str, List[str]] = ""
    case_sensitive: bool = True

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="name",
        files=True,
        dirs=True,
    )

    def __post_init__(self, *args, **kwargs):
        self._matcher = simplematch.Matcher(
            self.match,
            case_sensitive=self.case_sensitive,
        )
        self.startswith = self.create_list(self.startswith, self.case_sensitive)
        self.contains = self.create_list(self.contains, self.case_sensitive)
        self.endswith = self.create_list(self.endswith, self.case_sensitive)

    def matches(self, name: str) -> bool:
        if not self.case_sensitive:
            name = name.lower()

        is_match = (
            self._matcher.test(name)
            and any(x in name for x in self.contains)
            and any(name.startswith(x) for x in self.startswith)
            and any(name.endswith(x) for x in self.endswith)
        )
        return is_match

    def pipeline(self, res: Resource, output: Output) -> bool:
        assert res.path is not None, "Does not support standalone mode"
        if res.is_dir():
            name = res.path.stem
        else:
            name, ext = res.path.stem, res.path.suffix
            if not name:
                name = ext
        result = self.matches(normalize_unicode(name))
        m = self._matcher.match(normalize_unicode(name))
        if not m:
            m = name

        res.vars[self.filter_config.name] = m
        return result

    @staticmethod
    def create_list(x: Union[int, str, List[Any]], case_sensitive: bool) -> List[str]:
        if isinstance(x, (int, float)):
            x = str(x)
        if isinstance(x, str):
            x = [x]
        x = [str(x) for x in x]
        if not case_sensitive:
            x = [x.lower() for x in x]
        return x

Examples:

Match all files starting with 'Invoice':

rules:
  - locations: "~/Desktop"
    filters:
      - name:
          startswith: Invoice
    actions:
      - echo: "This is an invoice"

Match all files starting with 'A' end containing the string 'hole' (case insensitive):

rules:
  - locations: "~/Desktop"
    filters:
      - name:
          startswith: A
          contains: hole
          case_sensitive: false
    actions:
      - echo: "Found a match."

Match all files starting with 'A' or 'B' containing '5' or '6' and ending with '_end':

rules:
  - locations: "~/Desktop"
    filters:
      - name:
          startswith:
            - "A"
            - "B"
          contains:
            - "5"
            - "6"
          endswith: _end
          case_sensitive: false
    actions:
      - echo: "Found a match."

python#

Use python code to filter files.

Attributes:
  • code (str) –

    The python code to execute. The code must contain a return statement.

Returns:

  • If your code returns False or None the file is filtered out, otherwise the file is passed on to the next filters.
  • {python} contains the returned value. If you return a dictionary (for example return {"some_key": some_value, "nested": {"k": 2}}) it will be accessible via dot syntax actions: {python.some_key}, {python.nested.k}.
  • Variables of previous filters are available, but you have to use the normal python dictionary syntax x = regex["my_group"].
Source code in organize/filters/python.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
@dataclass(config=ConfigDict(coerce_numbers_to_str=True, extra="forbid"))
class Python:

    """Use python code to filter files.

    Attributes:
        code (str):
            The python code to execute. The code must contain a `return` statement.


    **Returns:**

    - If your code returns `False` or `None` the file is filtered out,
      otherwise the file is passed on to the next filters.
    - `{python}` contains the returned value. If you return a dictionary (for
      example `return {"some_key": some_value, "nested": {"k": 2}}`) it will be
      accessible via dot syntax actions: `{python.some_key}`, `{python.nested.k}`.
    - Variables of previous filters are available, but you have to use the normal python
      dictionary syntax `x = regex["my_group"]`.
    """

    code: str

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="python",
        files=True,
        dirs=True,
    )

    @field_validator("code", mode="after")
    @classmethod
    def must_have_return_statement(cls, value):
        if "return" not in value:
            raise ValueError("No return statement found in your code!")
        return value

    def __post_init__(self):
        self.code = textwrap.dedent(self.code)

    def __usercode__(self, print, **kwargs) -> Optional[Dict]:
        raise NotImplementedError()

    def pipeline(self, res: Resource, output: Output) -> bool:
        def _output_msg(*values, sep: str = " ", end: str = ""):
            """
            the print function for the use code needs to print via the current output
            """
            output.msg(
                res=res,
                msg=f"{sep.join(str(x) for x in values)}{end}",
                sender="python",
            )

        # codegen the user function with arguments as available in the resource
        kwargs = ", ".join(res.dict().keys())
        func = f"def __userfunc(print, {kwargs}):\n"
        func += textwrap.indent(self.code, "    ")
        func += "\n\nself.__usercode__ = __userfunc"

        exec(func, globals().copy(), locals().copy())
        result = self.__usercode__(print=_output_msg, **res.dict())

        if isinstance(result, dict):
            res.deep_merge(key=self.filter_config.name, data=result)
        else:
            res.vars[self.filter_config.name] = result
        return result not in (False, None)

Examples:

rules:
  - name: A file name reverser.
    locations: ~/Documents
    filters:
      - extension
      - python: |
          return {"reversed_name": path.stem[::-1]}
    actions:
      - rename: "{python.reversed_name}.{extension}"

A filter for odd student numbers. Assuming the folder ~/Students contains the files student-01.jpg, student-01.txt, student-02.txt and student-03.txt this rule will print "Odd student numbers: student-01.txt" and "Odd student numbers: student-03.txt"

rules:
  - name: "Filter odd student numbers"
    locations: ~/Students/
    filters:
      - python: |
          return int(path.stem.split('-')[1]) % 2 == 1
    actions:
      - echo: "Odd student numbers: {path.name}"

Advanced usecase. You can access data from previous filters in your python code. This can be used to match files and capturing names with a regular expression and then renaming the files with the output of your python script.

rules:
  - name: "Access placeholders in python filter"
    locations: files
    filters:
      - extension: txt
      - regex: (?P<firstname>\w+)-(?P<lastname>\w+)\..*
      - python: |
          emails = {
              "Betts": "dbetts@mail.de",
              "Cornish": "acornish@google.com",
              "Bean": "dbean@aol.com",
              "Frey": "l-frey@frey.org",
          }
          if regex.lastname in emails: # get emails from wherever
              return {"mail": emails[regex.lastname]}
    actions:
      - rename: "{python.mail}.txt"

Result:

  • Devonte-Betts.txt becomes dbetts@mail.de.txt
  • Alaina-Cornish.txt becomes acornish@google.com.txt
  • Dimitri-Bean.txt becomes dbean@aol.com.txt
  • Lowri-Frey.txt becomes l-frey@frey.org.txt
  • Someunknown-User.txt remains unchanged because the email is not found

regex#

Matches filenames with the given regular expression

Attributes:
  • expr (str) –

    The regular expression to be matched.

Returns:

Any named groups in your regular expression will be returned like this:

  • {regex.groupname}: The text matched with the named group (?P<groupname>.*)
Source code in organize/filters/regex.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
@dataclass(config=ConfigDict(coerce_numbers_to_str=True, extra="forbid"))
class Regex:

    """Matches filenames with the given regular expression

    Attributes:
        expr (str): The regular expression to be matched.

    **Returns:**

    Any named groups in your regular expression will be returned like this:

    - `{regex.groupname}`: The text matched with the named
      group `(?P<groupname>.*)`

    """

    expr: str

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="regex",
        files=True,
        dirs=True,
    )

    def __post_init__(self):
        self._expr = re.compile(self.expr, flags=re.UNICODE)

    def matches(self, path: str):
        return self._expr.search(path)

    def pipeline(self, res: Resource, output: Output) -> bool:
        assert res.path is not None, "Does not support standalone mode"
        match = self.matches(normalize_unicode(res.path.name))
        if match:
            res.deep_merge(key=self.filter_config.name, data=match.groupdict())
            return True
        return False

Examples:

Match an invoice with a regular expression:

rules:
  - locations: "~/Desktop"
    filters:
      - regex: '^RG(\d{12})-sig\.pdf$'
    actions:
      - move: "~/Documents/Invoices/1und1/"

Match and extract data from filenames with regex named groups: This is just like the previous example but we rename the invoice using the invoice number extracted via the regular expression and the named group the_number.

rules:
  - locations: ~/Desktop
    filters:
      - regex: '^RG(?P<the_number>\d{12})-sig\.pdf$'
    actions:
      - move: ~/Documents/Invoices/1und1/{regex.the_number}.pdf

size#

Matches files and folders by size

Attributes:
  • *conditions (list(str) or str) –

    The size constraints.

Accepts file size conditions, e.g: ">= 500 MB", "< 20k", ">0", "= 10 KiB".

It is possible to define both lower and upper conditions like this: ">20k, < 1 TB", ">= 20 Mb, <25 Mb". The filter will match if all given conditions are satisfied.

  • Accepts all units from KB to YB.
  • If no unit is given, kilobytes are assumend.
  • If binary prefix is given (KiB, GiB) the size is calculated using base 1024.

Returns:

  • {size.bytes}: (int) Size in bytes
  • {size.traditional}: (str) Size with unit (powers of 1024, JDEC prefixes)
  • {size.binary}: (str) Size with unit (powers of 1024, IEC prefixes)
  • {size.decimal}: (str) Size with unit (powers of 1000, SI prefixes)
Source code in organize/filters/size.py
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
@dataclass(config=ConfigDict(coerce_numbers_to_str=True, extra="forbid"))
class Size:
    """Matches files and folders by size

    Attributes:
        *conditions (list(str) or str):
            The size constraints.

    Accepts file size conditions, e.g: `">= 500 MB"`, `"< 20k"`, `">0"`,
    `"= 10 KiB"`.

    It is possible to define both lower and upper conditions like this:
    `">20k, < 1 TB"`, `">= 20 Mb, <25 Mb"`. The filter will match if all given
    conditions are satisfied.

    - Accepts all units from KB to YB.
    - If no unit is given, kilobytes are assumend.
    - If binary prefix is given (KiB, GiB) the size is calculated using base 1024.

    **Returns:**

    - `{size.bytes}`: (int) Size in bytes
    - `{size.traditional}`: (str) Size with unit (powers of 1024, JDEC prefixes)
    - `{size.binary}`: (str) Size with unit (powers of 1024, IEC prefixes)
    - `{size.decimal}`: (str) Size with unit (powers of 1000, SI prefixes)
    """

    conditions: FlatList[str] = Field(default_factory=list)

    filter_config: ClassVar[FilterConfig] = FilterConfig(
        name="size", files=True, dirs=True
    )

    def __post_init__(self):
        self._constraints = set()
        for x in self.conditions:
            for constraint in create_constraints(x):
                self._constraints.add(constraint)

    def matches(self, filesize: int) -> bool:
        if not self._constraints:
            return True
        return all(op(filesize, c_size) for op, c_size in self._constraints)

    def pipeline(self, res: Resource, output: Output) -> bool:
        assert res.path is not None
        bytes = read_resource_size(res=res)
        res.vars[self.filter_config.name] = {
            "bytes": bytes,
            "traditional": traditional(bytes),
            "binary": binary(bytes),
            "decimal": decimal(bytes),
        }
        return self.matches(bytes)

Examples:

Trash big downloads:

rules:
  - locations: "~/Downloads"
    targets: files
    filters:
      - size: "> 0.5 GB"
    actions:
      - trash

Move all JPEGS bigger > 1MB and <10 MB. Search all subfolders and keep the original relative path.

rules:
  - locations:
      - path: "~/Pictures"
        max_depth: null
    filters:
      - extension:
          - jpg
          - jpeg
      - size: ">1mb, <10mb"
    actions:
      - move: "~/Pictures/sorted/{relative_path}/"